L17.3 The Log-Var Trick

Рет қаралды 6,333

3 жыл бұрын

Slides: sebastianraschka.com/pdf/lect...
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L17.4 Variational Auto...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka

Пікірлер: 7

@Darkev77 2 жыл бұрын

Hey professor, thanks for these brilliant lectures! One question if I may, at 4:56, "this allows us for + and - values", which I guess makes since since it gives our model more flexibility. However, in the sampling process when generating the "z", the term that represents the standard deviation (exp term) will always be positive, so doesn't that go against out initial intention? Sorry if I got things confused!

@SebastianRaschka 2 жыл бұрын

Thanks for the feedback! The exp term will always be positive, that's true. But you only use it for sampling the eps term from the random normal distribution that is centered at 0 and can be positive and negative.

@Darkev77 2 жыл бұрын

@@SebastianRaschka hey professor thanks a lot for your response I really do appreciate it. So I guess my question would now be why didn’t we just use the variance vector as it is; why did we have to take it into the log (to allow for negative values initially) but then exponentiate it if the outcome is the same? Again, really sorry if I’m mixing things up!

@SebastianRaschka 2 жыл бұрын

@@Darkev77 good q! the log one is the one that is being "optimized" via backpropagation. But then to sample from a normal distribution, we need the exponentiated one because the sampling functions expects a standard deviation on the original scale

@shashankshekhar7052 2 жыл бұрын

From one of the NYU lectures i understood this, please tell me know if my understanding is correct. If we have a latent space of 10dimension, then we will have 10 mean and 10 std dev to sample a single point. When we bring noise during sampling and call reparametrize 100 times then it will create a cluster of point around that mean and std to form a single bubble. For different means and std dev we would have different bubbles, all these bubbles will represent different classes. Their will be a big bubble containing all the small bubbles something called bubble of bubbles. The big bubble is a gaussian N(0,1) and the small bubbles are also N(0,1).

@SebastianRaschka 2 жыл бұрын

Yeah, if your latent space has 10-dimensional, we are sampling from a 10-dimensional normal (i.e., Gaussian) distribution. Since you can see a multi-dimensional Gaussian as a multi-dimensional bubble, this sounds about right. The smaller bubbles inside can occur if you have different classes and there is some relationship between the samples from each class that makes cases within a class more similar than cases outside that class. E.g., in MNIST this might be the case. However, VAEs are unsupervised, so the smaller bubbles may not occur in practice. The bubbles might be more related to the dimensions, like in each dimension, the points will be around its mean (which we usually choose at 0). One last thing: regarding the N(0, 1): you might be right in most cases, but it depends on the distribution we sample from. If you choose a standard normal distribution, N(0, 1) is correct.

@shashankshekhar7052 2 жыл бұрын

@@SebastianRaschka thanks for clarifying!!