Deep Learning 20: (2) Variational AutoEncoder : Explaining KL (Kullback-Leibler) Divergence

  Рет қаралды 36,273

Ahlad Kumar

Ahlad Kumar

Күн бұрын

Its a part of mini lecture series on Variational Auto-Encoders which is divided into six lectures. This is the second lecture that discuss the detail derivation and insight to Kullback-Leibler Divergence
#autoencoder#variational#generative
Implementation by Andrew Spanopoulos
github.com/And...

Пікірлер: 54
@vatsalamolly
@vatsalamolly 3 жыл бұрын
Can't thank you enough! Thank you for explaining in such detail. I had been stuck understanding this proof for hours and it was driving me crazy!
@xruan6582
@xruan6582 4 жыл бұрын
Super detailed explanation. Human beings need more teachers like this. A small question. 11:47 why the 1/2 disappeared in tr[∑₁(1/2)∑₁⁻¹]. Why we end up with k rather than k/2
@yashkhurana9170
@yashkhurana9170 4 жыл бұрын
I have the same question. Did you find an answer to it?
@xruan6582
@xruan6582 4 жыл бұрын
@@yashkhurana9170 not yet
@MrSouvikdey
@MrSouvikdey 4 жыл бұрын
There will the 1/2 term as well. In the final expression of Dkl (after proving), he is taking the 1/2 common. So it has to be K/2
@moliv8927
@moliv8927 2 жыл бұрын
it's a typo he missed it.
@moliv8927
@moliv8927 2 жыл бұрын
@@yashkhurana9170 it's a typo he missed it.
@oriabnu1
@oriabnu1 4 жыл бұрын
truly excellent, i have studied a lot of papers and lectures but could not remove my confusion , now feel comfortable thanks again
@sabazainab1524
@sabazainab1524 3 жыл бұрын
Wow, Amazing tutorial. Thank you for all of your videos on every topic.
@dailyDesi_abhrant
@dailyDesi_abhrant 4 жыл бұрын
I am so happy I found you, I was struggling to get a clear idea of this concept !
@VarounsVlogs
@VarounsVlogs 5 жыл бұрын
Damn, this is awesome. THANKS!!!!
@IgorAherne
@IgorAherne 2 жыл бұрын
Thank you so much for this. The way you structure it, the examples, is great
@omniscienceisdead8837
@omniscienceisdead8837 2 жыл бұрын
simple and straight to the point , thank you
@BiranchiNarayanNayak
@BiranchiNarayanNayak 5 жыл бұрын
Excellent tutorial.
@yangzhou6314
@yangzhou6314 2 жыл бұрын
Best explanation I have seen so far!
@SP-db6sh
@SP-db6sh 3 жыл бұрын
Thank you very much for such a simple explanation.
@bekaluderbew19
@bekaluderbew19 6 ай бұрын
God bless you brother
@nitinkumarmittal4369
@nitinkumarmittal4369 3 жыл бұрын
Thank you Sir for the explanation!
@rosyluo7710
@rosyluo7710 4 жыл бұрын
very clean explanation! Thanks dr.
@johnbmisquith2198
@johnbmisquith2198 2 жыл бұрын
Great details on WGAN
@navaneethmahadevan2458
@navaneethmahadevan2458 4 жыл бұрын
Gaussian distributions are supposed to be for continuous random variables? How are we using it for a discrete random variable where it can take K possble states here? Shouldn't we consider an integral here? Please correct me if i am wrong - totally new to machine learning
@nikiamini2768
@nikiamini2768 3 жыл бұрын
Thank you so, so much! This is super helpful!
@bozhaoliu2050
@bozhaoliu2050 3 жыл бұрын
Excellent tutorial
@adityaprakash256
@adityaprakash256 5 жыл бұрын
this is too good. thank You
@bosepukur
@bosepukur 5 жыл бұрын
one of the very best
@Vighneshbalaji1
@Vighneshbalaji1 4 жыл бұрын
E_p(x) = µ is fine, but 16:05 you said that E_p(x) = µ_1. I think it was coming from Q(x) so it should be µ_2 right?
@saadanwer2106
@saadanwer2106 5 жыл бұрын
Amazing tutorial
@RealMcDudu
@RealMcDudu 4 жыл бұрын
Excellent video for showing the KL for 2 gaussians. A bit too fast, but luckily KZbin has 0.75 speed :-)
@guofangtt
@guofangtt 5 жыл бұрын
Thanks for your tutorial. I have one question, on 13:49, why (B^T)A=(A^T)B. Thanks!
@NeerajAithani
@NeerajAithani 4 жыл бұрын
Earlier we applied trace trick because A.T*X* A is scaler. Here B.T*X* A has same dimension as A.T*X* A . so B.T*X* A is a scalar number also B.T*X*A is. so we can add two scalers.
@coolankush100
@coolankush100 4 жыл бұрын
Yes we can certainly add them because they will have the same dimension i.e. 1x1, a scalar.. It is also worth mentioning that they are indeed equal. You might wanna take some matrices and vectors to try to convince yourself.
@jerrycheung8158
@jerrycheung8158 2 жыл бұрын
great video! but wonder why we can't use the same operation to the second terms (1/2 (x - mean2) transpose x inverse of covariance matrix 2 x ..) to get K, same as the first term?
@compilations6358
@compilations6358 4 жыл бұрын
Great lectures, But i think you should have shown derivation of loss through MLE.
@apocalypt0723
@apocalypt0723 4 жыл бұрын
awesome
@not_a_human_being
@not_a_human_being 4 жыл бұрын
great content, but sometimes we can hear your mic scratching against something, 6:48 . Not a big deal!
@satyamdubey4110
@satyamdubey4110 7 ай бұрын
💖💖
@codewithyouml8994
@codewithyouml8994 2 жыл бұрын
I have some questions like when u were writting the first part at 8:37 then that time, we got a good answer which is K (trace), but I do not understand what is the need of the doing of replacing mew 2 with mew 1 in the second part at 12:30, like we can do the same as the first first part and can get some another K2 and got their difference, so what is the need of that? Thank you for the video.
@kwippo
@kwippo 4 жыл бұрын
14:31 - the constant expression should be multiplied by 0.5. Doesn't really matter because it was denoted by beta.
@husamalsayed8036
@husamalsayed8036 3 жыл бұрын
what is the intuition behind the KL divergence of two distribution is not symetric ? for me it seems like they should be symmetric
@choungyoungjae8271
@choungyoungjae8271 5 жыл бұрын
so cool!
@songsbyharsha
@songsbyharsha 4 жыл бұрын
From the probability refresher, it is told that sigma of (x*p(x)) is called Expectation but sigma of p(x) is considered as Expectation at timestamp 9:08. Pls help :)
@zachwolpe9665
@zachwolpe9665 5 жыл бұрын
minute 11: what about the 1\2?
@arielbernal
@arielbernal 5 жыл бұрын
I think the result is K/2, but I might be missing something else
@darkmythos4457
@darkmythos4457 5 жыл бұрын
Yeah, we see in the results a factor of 1/2 appears again.
@abubakarali6399
@abubakarali6399 3 жыл бұрын
Where we can learn these algebra and prob?
@naklecha
@naklecha 4 жыл бұрын
This is the error function for the auto-encoder right?
@jakevikoren
@jakevikoren 4 жыл бұрын
kl divergence is part of the loss function. The other part is reconstruction loss.
@Darkev77
@Darkev77 2 жыл бұрын
12:12 shouldn't the simplified result for the other expression (containing mu2 and sigma2) also be "k" by symmetry?
@rajinish0
@rajinish0 2 жыл бұрын
I thought the same, but mu2 is q(x)'s mean and the expectation is with respect to p(x); so when you do E[(x-u2)(x-u2)^T] it won't simplify to sigma2.
@Darkev77
@Darkev77 2 жыл бұрын
@@rajinish0 I love you, thanks a lot!
@mukundsrinivas8426
@mukundsrinivas8426 3 жыл бұрын
Bro. What did u study?
@chaitanyakalagara3045
@chaitanyakalagara3045 4 жыл бұрын
Great explanation but too many adds
@aniketsinha101
@aniketsinha101 4 жыл бұрын
ad ad ad
@decrepitCrescendo
@decrepitCrescendo 5 жыл бұрын
Thanks Ahlad
Kullback-Leibler divergence (KL divergence) intuitions
11:23
CabbageCat
Рет қаралды 2,8 М.
Don't look down on anyone#devil  #lilith  #funny  #shorts
00:12
Devil Lilith
Рет қаралды 18 МЛН
Do you choose Inside Out 2 or The Amazing World of Gumball? 🤔
00:19
KL Divergence - How to tell how different two distributions are
13:48
Serrano.Academy
Рет қаралды 6 М.
Intuitively Understanding the KL Divergence
5:13
Adian Liusie
Рет қаралды 85 М.
A Short Introduction to Entropy, Cross-Entropy and KL-Divergence
10:41
Aurélien Géron
Рет қаралды 350 М.
KL Divergence - CLEARLY EXPLAINED!
11:35
Kapil Sachdeva
Рет қаралды 28 М.
Variational Autoencoder from scratch in PyTorch
39:34
Aladdin Persson
Рет қаралды 31 М.
Explaining the Kullback-Liebler divergence through secret codes
10:08
Don't look down on anyone#devil  #lilith  #funny  #shorts
00:12
Devil Lilith
Рет қаралды 18 МЛН