MADE: Masked Autoencoder for Distribution Estimation

  Рет қаралды 5,410

Kapil Sachdeva

Kapil Sachdeva

Күн бұрын

Пікірлер: 40
@andrzejreinke
@andrzejreinke 6 ай бұрын
this explanation was amazing! thanks! sub!
@KapilSachdeva
@KapilSachdeva 5 ай бұрын
🙏
@vitaliy6479
@vitaliy6479 Жыл бұрын
I like your neural networks to go from left to right
@KapilSachdeva
@KapilSachdeva Жыл бұрын
🙂
@jiahao2709
@jiahao2709 5 ай бұрын
How you plot this? which software you are using?
@nastaranmarzban1419
@nastaranmarzban1419 3 жыл бұрын
Hi, hope you're doing well. Thanks for your good explanations I've learned a lot. I appreciate you. I have a question, i can't understand the difference between RNN and autoregressive in deep learning.... Would you please help me?
@KapilSachdeva
@KapilSachdeva 3 жыл бұрын
Thanks, Nastaran for the kind words. Hope you are doing well too. Indeed appreciating the differences between RNN and Autoregressive can be confusing & tricky. Here is an attempt to distill out the differences - a) In RNN (and its variants) you typically are capturing and "storing" information from previous time sequences. This (temporary and limited) storage is called a hidden state. We also say that history is being captured in this hidden state. However, there is a limited amount of history you can capture or remember. To make predictions or to generate the new sequence, you rely on this storage/hidden state. Training also incorporates a tiny modification to the traditional backpropagation algorithm. RNN also inherits properties of "non-linearity" ... as it is nothing but a variant of a feed-forward neural network. RNN "learns" the parameters of your model using backpropagation & gradient descent i.e. you do not define the relationship between previous observations & current observations "explicitly". b) (Traditional) Autoregressive models are generally popular in the statistics and time series domain. There you "explicitly" define the model. For example, you would explicitly state the previous inputs you depend on and the associated parameters; very much like linear regression; but in this case, your independent variable(s) would be the previous observation(s) and hence the word "autoregressive". These are mostly linear models (although there are variants to incorporate non-linearity; I do not know much about them at the moment). c) (Deep) Autoregressive models keep the spirit of relying on previous observations but you would typically make use of neural networks to create them. Since they typically do not have a storage/hidden state like RNN they get their own category. TCN (Temporal Convolution Networks), Wavenet, etc are examples of such types of models. The benefit is of course the notion of time steps/sequence and the non-linearity ....and somewhere the relief of not explicitly defining the model. Here is a link to a great article that may be helpful to you: bair.berkeley.edu/blog/2018/08/06/recurrent/
@nastaranmarzban1419
@nastaranmarzban1419 3 жыл бұрын
@@KapilSachdeva Thanks for taking time, and sorry for my delay. I was just thinking about the things that you've written. I reckon I comprehend much more better than before. Thanks a lot.
@KapilSachdeva
@KapilSachdeva 3 жыл бұрын
🙏
@shankar2chari
@shankar2chari 3 жыл бұрын
This is one of the best explanations of density estimation problem. At last I understood the masking logic, was a nightmare for a while. On the lighter side, me neither like my network bottom-up or any other orientation, It has to be left-right. Had I born a jew or from middle-east, I would have preferred right-left. If Japanese, probably top-down orientation... I do not see a logic for bottom-up at all.
@KapilSachdeva
@KapilSachdeva 3 жыл бұрын
😀😀
@bowenzhang4471
@bowenzhang4471 3 жыл бұрын
Thank you for the excellent explanation.
@kanalarchis
@kanalarchis Жыл бұрын
Fantastic presentation, thank you for doing this for the world.
@KapilSachdeva
@KapilSachdeva Жыл бұрын
🙏
@sparkletwilight4524
@sparkletwilight4524 2 жыл бұрын
Is it true that to generate the 100th pixel we need to first generate 1-99th pixel? So if we want to generate a 300pixels image, we need to go through the forward pass 300 times?
@prithviprakash1110
@prithviprakash1110 3 жыл бұрын
Thanks for the excellent explanation. Looking forward to learning more from you!
@KapilSachdeva
@KapilSachdeva 3 жыл бұрын
🙏
@smjain11
@smjain11 Жыл бұрын
Can I use this method to estimate density of time series data and use it to calculate likelihood of new data ?
@KapilSachdeva
@KapilSachdeva Жыл бұрын
Yes. But search for more modern Variational Autoencoders designed for Time series. A quick search would reveal many papers in this area.
@smjain11
@smjain11 Жыл бұрын
Wondering if I should use the score based models as they give the exact likelihood as per the claim
@sreeharis2846
@sreeharis2846 2 жыл бұрын
well explained. can you explain NADE paper
@KapilSachdeva
@KapilSachdeva 2 жыл бұрын
🙏 will try!
@rameshnagineni84
@rameshnagineni84 3 жыл бұрын
Awesome presentation ...You are inspiring us with your work ...You have explained complex mechanics in simplified way 😊
@easton1137
@easton1137 2 жыл бұрын
Thanks for the perfect presentation. But I have confused in the Figure 2(Impact of the number of masks used with a single hidden.....) for a while. I couldn't realize how to connect such many masks at once ? I only know 1 mask method, but don't figure out multiply masks, such as 2,4,8,16 masks.
@KapilSachdeva
@KapilSachdeva 2 жыл бұрын
Hello Easton, I am not 100% sure if I understand the question but assuming you are talking about how one "implements" such masking logic, here is a link to one source code you can look to get a better understanding: github.com/e-hulten/made/blob/master/models/made.py Also, see my reply to a previous question (by tm) in the comment section and maybe your question is similar. If not, ask again I would try to clarify.
@easton1137
@easton1137 2 жыл бұрын
@@KapilSachdeva Thanks for reply. Sorry, I didn't clearly convey my problem. I want to talk about how to implement 2 masks with a single hidden layer during training?
@KapilSachdeva
@KapilSachdeva 2 жыл бұрын
Apologies but I am still guessing the question you have. Here is an answer based on my guess of what your question is. You may be asking how one can utilize different masks with the same neural network during training. The workflow is: 1) You generate a set of masks before the training starts. To keep it simple let's say you are only interested in 2 masks. 2) You select mask 1 for the minibatch 1 3) Do Forward and backward passes 4) You select mask 2 for the minibatch 2 5) Do Forward and backward passes 6) Go to step 2 and repeat
@easton1137
@easton1137 2 жыл бұрын
@@KapilSachdeva Let me think about it. First, Back to 1 mask with a single hidden layer, in this condition we don't do any sampling a order before each mini-batch gradient update, isn't it? Second, in 2 masks condition with a single hidden layer condition we do sampling a order before each mini-batch gradient update, and these order must satisfy with mask 1,2 rules. Is that right?
@KapilSachdeva
@KapilSachdeva 2 жыл бұрын
Correct. Looking at the source code will help you a lot: github.com/e-hulten/made/blob/master/models/made.py
@jianbochen1142
@jianbochen1142 3 жыл бұрын
Superb review and presentation!
@tm-jw2sq
@tm-jw2sq 2 жыл бұрын
Hi, I'm stuck in the concept of arbitray ordering since I think we already structure the order of x_i in our input vector, when will the order be rearranged like the house example you mentioned? Thank you so much.
@tm-jw2sq
@tm-jw2sq 2 жыл бұрын
I re-watched your video and found out that it sample an ordering per batch as it was suggested that trainning an autoregressive model on all orderings can be beneficial. now my question is how the model is implemented such that connections between neurons change based on the algorithm every time when the ordering is different? I am thinking there might be a gate to control if a connection is on or not.
@KapilSachdeva
@KapilSachdeva 2 жыл бұрын
>> sample an ordering per batch Correct. >> it was suggested that training an autoregressive model on all orderings can be beneficial. Not necessarily all orderings. training on all orderings can "underfit" the model. Think of it as putting too much pressure on the network to learn. This is why they ended up selecting some fixed orderings. For e..g let's say there are 100 possible ordering, then you would pick 18 orderings and be content with it. How you select these orderings is a different matter! ... hyperparameter search etc could help perhaps. >> how the model is implemented such that connections between neurons change based on the algorithm every time when the ordering is different This is where the purpose of "logical masking" comes and helps you. "Logical" means that the connection between layers (of neurons) are fully connected [i.e. when u define your model, you would still use fully connected layers] however you would use the "masks" to turn on/off the outputs of the neurons. I talk about it in the later parts of the tutorials where I explain the masking logic. I have not carefully reviewed the below-mentioned code but I think should give you some hints as to how one would do the masking - github.com/e-hulten/made/blob/master/models/made.py Hope this helps.
@tm-jw2sq
@tm-jw2sq 2 жыл бұрын
@@KapilSachdeva Really appreciate. It's clear!!
@KapilSachdeva
@KapilSachdeva 2 жыл бұрын
🙏
@aruntakhur
@aruntakhur 3 жыл бұрын
Well presented and explained!!!
@fhools
@fhools Жыл бұрын
You have a gift of explaining concepts that come from a higher power. These have been the clearest explanation of deep learning. I hope you continue making videos of all kind
@KapilSachdeva
@KapilSachdeva Жыл бұрын
🙏 Thanks for the appreciation. Am glad you found them helpful
Normalizing Flows - Motivations, The Big Idea, & Essential Foundations
59:24
Variational Autoencoder - VISUALLY EXPLAINED!
35:33
Kapil Sachdeva
Рет қаралды 13 М.
Inside Out 2: ENVY & DISGUST STOLE JOY's DRINKS!!
00:32
AnythingAlexia
Рет қаралды 13 МЛН
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 14 МЛН
Spongebob ate Michael Jackson 😱 #meme #spongebob #gmod
00:14
Mr. LoLo
Рет қаралды 10 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 328 М.
CV Study Group: Masked Autoencoders Paper Walkthrough
39:21
HuggingFace
Рет қаралды 5 М.
Masked Autoregressive Flow for Density Estimation with George Papamakarios - #145
35:57
The TWIML AI Podcast with Sam Charrington
Рет қаралды 2,8 М.
Reparameterization Trick - WHY & BUILDING BLOCKS EXPLAINED!
25:27
Kapil Sachdeva
Рет қаралды 11 М.
Masked Autoencoders (MAE) Paper Explained
15:20
Soroush Mehraban
Рет қаралды 3,2 М.
What are Normalizing Flows?
12:31
Ari Seff
Рет қаралды 71 М.
Distilling the Knowledge in a Neural Network
19:05
Kapil Sachdeva
Рет қаралды 19 М.
KL Divergence - CLEARLY EXPLAINED!
11:35
Kapil Sachdeva
Рет қаралды 28 М.
Variational Autoencoders
15:05
Arxiv Insights
Рет қаралды 501 М.