Stanford CS236: Deep Generative Models I 2023 I Lecture 17

Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

Рет қаралды 4,414

Stanford Online

Күн бұрын

Пікірлер: 5

@zeeshanmehdi3994 3 ай бұрын

Really good lectures, please keep them coming !

@CPTSLEARNER 3 ай бұрын

0:40 Close connection between score based models and DDPMs (denoising diffusion probabilistic models) 1:00 Score based model goes from noise to data by running Langevin dynamics chains. VAE perspective (DDPM), fixed encoder (SDE) adds noise (Gaussian transition kernel) at each time step, decoder is a joint distribution (parameterized in reverse direction) over the same RVs, sequence of decoders are also Gaussian and parameterized by neural networks (simple DDPM formula), train in usual way (as in VAE) by optimizing evidence lower bound (minimize KL divergence), ELBO is equivalent to a sum of denoising score matching objectives (learning the optimal decoder requires estimating the score of the noise perturbed data density), learning ELBO corresponds to learning a sequence of denoisers (noise conditional score based models) 4:55 Means of decoders at optimality correspond to the score functions, the updates performed in DDPM are very similar to annealed Langevin dynamics 5:45 Diffusion version, infinite noise levels 9:15 SDE describes how the RVs in the continuous diffusion model (or fine discreteness of VAE) are related, enables sampling 14:25 Reversing SDE is change of variables 19:30 Interpolation between two data sets requires gradients wrt t's 19:45? Fokker Planck equation, gradient wrt is completely determined by these objects 20:25 Discretize SDE is equivalent to Langevin dynamics or sampling procedure of DDPM (follow gradient and add noise at every step) 21:40 Get generative model by learning the score functions (of reverse SDE), score functions parameterized by neural networks (theta) 21:55? Same as DDPM 1000 steps 23:15? Equivalence of Langevin, DDPM and diffusion generative modelling 24:40? DDPM, SDE numerical predictor is a Taylor expansion 25:40? Score based MCMC corrector uses Langevin dynamics to generate a sample at the corresponding density 27:15? Score based model uses corrector without predictor, DDPM uses predictor without corrector 27:50 Decoder is trying to invert encoder, defined as Gaussian (only limit of continuous time after infinite steps yields a tight ELBO assuming Gaussian decoders) 29:05? Predictor takes one step, corrector uses Langevin to generate a sample 34:50 Neural ODE 35:55 Reparametrizing randomness into the initial condition and then transforming it deterministically (equivalent computation graph), variational inference backprop through encoder is stochastic computation 38:55 ODE formula (integral) to compute probability density, conversion to ODE accesses solving techniques to generate samples fast 40:10 DDPM as VAE with fixed encoder and same dimension, latent diffusion which first learns a VAE to map data to a lower dimensional space and then learns a diffusion model over that latent space 44:50? Compounding errors in denoiser but not SDE 46:30 Maximum likelihood would differentiate through ODE solver, very difficult and expensive 49:35 Scores and marginals are equivalent (SDE and ODE models) and always learned by score matching, inference time samples generated differently 58:40 Stable diffusion pretrained autoencoder, not trained end to end, only care about reconstruction (disregarding distribution of latent space similar to Gaussian) and getting a good autoencoder, keep initial autoencoder fixed and train diffusion model over latent space 1:08:35 Score of prior (unconditional score), likelihood (forward model/classifier) and normalization constant 1:09:55 Solve SDE or ODE and follow the gradient of the prior plus the likelihood (controlled sampling), Langevin increases the likelihood of the image wrt prior and makes sure the classifier predicts that image (changing the drift to push the samples towards specific classifications) 1:12:35 Classifier free guidance, train two diffusion models on conditional and unconditional scores and take the difference