Stanford CS236: Deep Generative Models I 2023 I Lecture 6 - VAEs

  Рет қаралды 1,088

Stanford Online

Stanford Online

19 күн бұрын

For more information about Stanford's Artificial Intelligence programs visit: stanford.io/ai
To follow along with the course, visit the course website:
deepgenerativemodels.github.io/
Stefano Ermon
Associate Professor of Computer Science, Stanford University
cs.stanford.edu/~ermon/
Learn more about the online course and how to enroll: online.stanford.edu/courses/c...
To view all online courses and programs offered by Stanford, visit: online.stanford.edu/

Пікірлер: 1
@CPTSMONSTER
@CPTSMONSTER 5 күн бұрын
2:30 Summary. Latent variables are Gaussian with mean 0. The conditional probabilities are also Gaussian, with mean and sd parameters modelled by two neural networks, dependent on each z. The resultant probability p(x) is a complex infinite sum of gaussians. 4:15 Different from autoregressive where it is was trivial to multiply conditionals to get likelihoods 13:20 ELBO derived from KL, explains equality when z is posterior distribution 15:45 Too expensive to compute posterior, possibly model with neural network 17:45 Joint probability is generative model (simple Gaussian prior, map to mean and sd parameters modelled by two neural networks) 28:00 Jointly optimize theta and phi to minimize the KL divergence 31:15 Approximation of log-likelihood (see summary explanation in lecture 5), final equations 32:10 For generation, theta parameter is sufficient, phi discarded 39:00? EM theta constant and optimize phi? Not joint optimization 44:40 Single theta and different variational parameters phi for each data sample 48:00 VAE training steps for illustrative purposes, in practice train theta and phi in sync 50:40 Amortized inference, single q, different variational parameters for each data point not scalable (but higher accuracy due to less constraints) 58:50? Sampling from distribution which depends on phi, samples themselves would change when phi is changed 1:03:10? Reparameterization trick, sample epsilon, gradient wrt phi does not depend on epsilon 1:07:45 Reparameterization trick is possible when the sampling procedure can be written as a deterministic transformation of a basic rv that can be sampled from. For discrete (e.g. categorical ) rvs, it's possible to sample by inverting the cdf, but wouldn't know how to get gradients through. Use REINFORCE, or other ways that relax the optimization problem. 1:10:00 Variational parameter for each data point, expensive. Amortization, encoder of VAE denoted by lambda parameters of a neural network. Performs regression which determines a posterior for each data point without revealing phi variational parameters. This has a benefit when there is a new data point, the optimization problem does not have to be solved for new variational parameters. 1:17:35 Notation q(z; phi^i) to q_phi(z|x) 1:21:40 Encoder is variational posterior. Encoder and decoder optimize ELBO, a regularized type of autoencoding objective.
Stanford CS25: V4 I Demystifying Mixtral of Experts
1:04:32
Stanford Online
Рет қаралды 3,3 М.
Joven bailarín noquea a ladrón de un golpe #nmas #shorts
00:17
顔面水槽がブサイク過ぎるwwwww
00:58
はじめしゃちょー(hajime)
Рет қаралды 115 МЛН
The World's Fastest Cleaners
00:35
MrBeast
Рет қаралды 180 МЛН
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 76 М.
One-Shot Imitation Learning: A Pose Estimation Perspective
5:01
The Robot Learning Lab at Imperial College London
Рет қаралды 912
Stanford CS25: V3 I Retrieval Augmented Language Models
1:19:27
Stanford Online
Рет қаралды 132 М.
What are Diffusion Models?
15:28
Ari Seff
Рет қаралды 198 М.
Joven bailarín noquea a ladrón de un golpe #nmas #shorts
00:17