Stanford CS236: Deep Generative Models I 2023 I Lecture 6

Stanford CS236: Deep Generative Models I 2023 I Lecture 6 - VAEs

Рет қаралды 4,813

Күн бұрын

For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerative...
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.ed...
Learn more about the online course and how to enroll: online.stanfor...
To view all online courses and programs offered by Stanford, visit: online.stanfor...

Пікірлер: 6

@CPTSLEARNER 4 ай бұрын

2:30 Summary. Latent variables are Gaussian with mean 0. The conditional probabilities are also Gaussian, with mean and sd parameters modelled by two neural networks, dependent on each z. The resultant probability p(x) is a complex infinite sum of gaussians. 4:15 Different from autoregressive where it is was trivial to multiply conditionals to get likelihoods 13:20 ELBO derived from KL, explains equality when z is posterior distribution 15:45 Too expensive to compute posterior, possibly model with neural network 17:45 Joint probability is generative model (simple Gaussian prior, map to mean and sd parameters modelled by two neural networks) 28:00 Jointly optimize theta and phi to minimize the KL divergence 31:15 Approximation of log-likelihood (see summary explanation in lecture 5), final equations 32:10 For generation, theta parameter is sufficient, phi discarded 39:00? EM theta constant and optimize phi? Not joint optimization 44:40 Single theta and different variational parameters phi for each data sample 48:00 VAE training steps for illustrative purposes, in practice train theta and phi in sync 50:40 Amortized inference, single q, different variational parameters for each data point not scalable (but higher accuracy due to less constraints) 58:50? Sampling from distribution which depends on phi, samples themselves would change when phi is changed 1:03:10? Reparameterization trick, sample epsilon, gradient wrt phi does not depend on epsilon 1:07:45 Reparameterization trick is possible when the sampling procedure can be written as a deterministic transformation of a basic rv that can be sampled from. For discrete (e.g. categorical ) rvs, it's possible to sample by inverting the cdf, but wouldn't know how to get gradients through. Use REINFORCE, or other ways that relax the optimization problem. 1:10:00 Variational parameter for each data point, expensive. Amortization, encoder of VAE denoted by lambda parameters of a neural network. Performs regression which determines a posterior for each data point without revealing phi variational parameters. This has a benefit when there is a new data point, the optimization problem does not have to be solved for new variational parameters. 1:17:35 Notation q(z; phi^i) to q_phi(z|x) 1:21:40 Encoder is variational posterior. Encoder and decoder optimize ELBO, a regularized type of autoencoding objective.

@420_gunna 2 ай бұрын

41:30 I don't really understand why we would obviously have a different set of variational parameters for every datapoint x (in the first place), but then why obviously we would just share the same set of parameters theta for the generative/decoder side of the model. Anyone have an idea?

@leochang3185 2 ай бұрын

Having a different set of variational parameters for every data point 𝑥 is necessary because the best choice of latent variables depends on the input. Considering the analogy given in slide 11 (41:58), where the lower half of the image represents 𝑥 and the upper half represents the latent variable 𝑧, it makes more sense to have different sets of variational parameters for different sets of 𝑥 (when it appears to be the lower part of 9, or 1, etc.). On the other hand, 𝜃 is the parameter or weight of the decoder or generative part of the model, mapping the latent variable 𝑧 to 𝑥. This does not depend on the individual data points.