Stanford CS236: Deep Generative Models I 2023 I Lecture 6

Stanford CS236: Deep Generative Models I 2023 I Lecture 6 - VAEs

Рет қаралды 1,088

19 күн бұрын

For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

Пікірлер: 1

@CPTSMONSTER 5 күн бұрын

2:30 Summary. Latent variables are Gaussian with mean 0. The conditional probabilities are also Gaussian, with mean and sd parameters modelled by two neural networks, dependent on each z. The resultant probability p(x) is a complex infinite sum of gaussians. 4:15 Different from autoregressive where it is was trivial to multiply conditionals to get likelihoods 13:20 ELBO derived from KL, explains equality when z is posterior distribution 15:45 Too expensive to compute posterior, possibly model with neural network 17:45 Joint probability is generative model (simple Gaussian prior, map to mean and sd parameters modelled by two neural networks) 28:00 Jointly optimize theta and phi to minimize the KL divergence 31:15 Approximation of log-likelihood (see summary explanation in lecture 5), final equations 32:10 For generation, theta parameter is sufficient, phi discarded 39:00? EM theta constant and optimize phi? Not joint optimization 44:40 Single theta and different variational parameters phi for each data sample 48:00 VAE training steps for illustrative purposes, in practice train theta and phi in sync 50:40 Amortized inference, single q, different variational parameters for each data point not scalable (but higher accuracy due to less constraints) 58:50? Sampling from distribution which depends on phi, samples themselves would change when phi is changed 1:03:10? Reparameterization trick, sample epsilon, gradient wrt phi does not depend on epsilon 1:07:45 Reparameterization trick is possible when the sampling procedure can be written as a deterministic transformation of a basic rv that can be sampled from. For discrete (e.g. categorical ) rvs, it's possible to sample by inverting the cdf, but wouldn't know how to get gradients through. Use REINFORCE, or other ways that relax the optimization problem. 1:10:00 Variational parameter for each data point, expensive. Amortization, encoder of VAE denoted by lambda parameters of a neural network. Performs regression which determines a posterior for each data point without revealing phi variational parameters. This has a benefit when there is a new data point, the optimization problem does not have to be solved for new variational parameters. 1:17:35 Notation q(z; phi^i) to q_phi(z|x) 1:21:40 Encoder is variational posterior. Encoder and decoder optimize ELBO, a regularized type of autoencoding objective.