Shouldn't be Gamma should approximate to the true variance of the neuron activation and beta should approximate to the true mean of the neuron activation? I am just confused...
@CodeEmporium4 жыл бұрын
You're right. Misspoke there. Nice catch!
@ssshukla264 жыл бұрын
@@CodeEmporium Cool
@dhananjaysonawane19964 жыл бұрын
How is this approximation happening? And how do we use beta, gamma at test time? We have only one example at a time during testing.
@FMAdestroyer2 жыл бұрын
@@dhananjaysonawane1996 in most frameworks when you create a BN Layer, the mean and variance (Beta and gamma) are both learnable parameters usually represented as the weights and bias from the layer. You can deduce that from Torch BN2D Layer's description bellow "The mean and standard-deviation are calculated per-dimension over the mini-batches and γ and β are learnable parameter vectors of size C (where C is the input size)."
@AndyLee-xq8wq2 жыл бұрын
Thanks for clarification!
@sumanthbalaji17684 жыл бұрын
Just found your channel and binged through all your videos so heres a general review. As a student i assure you your content is on point and goes in depth unlike other channels that just skim the surface. Keep it up and dont be afraid to go more in depth on concepts. We love it. Keep it up brother you have earned a supporter till your channels end
@CodeEmporium4 жыл бұрын
Thanks ma guy. I'll keep pushing up content. Good to know my audience loves the details ;)
@sumanthbalaji17684 жыл бұрын
@@CodeEmporium damn did not actually expect you to reply lol. Maybe let me throw a topic suggestion then. More NLP please, take a look at summarisation tasks as a topic. Would be damn interesting.
@efaustmann4 жыл бұрын
Exactly what I was looking for. Very well researched and explained in a simply way with visualizations. Thank you very much!
@EB31033 жыл бұрын
The loss is not a function of the features but a function of the weights
@parthshastri24514 жыл бұрын
why did you plot the cost against height and the age isnt it supposed to be a function of weights in a neural network
@ryanchen61472 жыл бұрын
at 3:27, I think your axises should be the *weight* for the height feature and the *weight* for the age feature if that is a contour plot of the cost function
@mohameddjilani41092 жыл бұрын
Yes , that was an error across a long period in the video
@oheldad4 жыл бұрын
Hey there . Im on my way to become data scientist , and your videos help me a lot ! Keep going Im sure I am not the only one you inspired :) thank you !!
@CodeEmporium4 жыл бұрын
Awesome! Glad these videos help! Good luck with your Data science ventures :)
@ccuuttww4 жыл бұрын
Your aim should not become a data scientist to fit other people expectation you should become a people who can deal with data and estimate any unknown parameter with your own standard
@oheldad4 жыл бұрын
@@ccuuttww dont know why you decided that Im fulfilling others expectations on me - its not true. Im on the last semester of my electrical engineering degree , and decided to change path a little :)
@ccuuttww4 жыл бұрын
because most of people think in the following pattern : Finish all exam semester and graduate with good marks send mass CV and try to get a job titled:"Data Scientist" try to fit their jobs what they learn from university like a trained monkey however u are not deal with a real wold situation u just try to deal with your customer or your boss since this topic never have standard answer u can only define by yourself and your client only trust your title I fell this is really bad
@taghyeertaghyeer5974 Жыл бұрын
Hello, thank you for your video. I am wondering regarding the batch normalisation speeding up the training: you showed at 2:42 the contour plot of the loss as a function of height and age. However, the loss function contours should be plotted against the weights (the optimization is performed in the weights' space, and not the input space). In other words, why did you base your argument on the loss function with weight and and height being the variable (they should be held constant during optimization)? Thank you! Lana
@marcinstrzesak346 Жыл бұрын
For me, it also seemed quite confusing. I'm glad someone else noticed it too.
@atuldivekar Жыл бұрын
The contour plot is being shown as a function of height and age to show the dependence of the loss on the input distribution, not the weights
@seyyedpooyahekmatiathar6244 жыл бұрын
Subtracting the mean and dividing by std is standardization. Normalization is when you change the range of the dataset to be [0,1].
@angusbarr79524 жыл бұрын
Hey! Just cited you in my undergrad project because your example finally made me understand batch norm. Thanks a lot!
@CodeEmporium4 жыл бұрын
Sweet! Glad it was helpful homie
@aaronk8394 жыл бұрын
Good explanation until 7:17 after which, I think, you miss the point which makes the whole thing very confusing. You say: "Gamma should approximate to the true mean of the neuron activation and beta should approximate to the true variance of the neuron activation." Apart from the fact that this should be the other way around, as you acknowledge in the comments, you don't say what you mean by "true mean" and "true variance". I learned from Andrew Ng's video (kzbin.info/www/bejne/qn-soXiQgduSm8k) that the actual reason for introducing two learnable parameters is that you actually don't necessarily want all batch data to be normalized to mean 0 and variance 1. Instead, shifting and scaling all normalized data at one neuron to obtain a different mean (beta) and variance (gamma) might be advantageous in order to exploit the non-linearity of your activation functions. Please don't skip over important parts like this one with sloppy explanations in future videos. This gives people the impression that they understand what's going on, when they actually don't.
@dragonman1013 жыл бұрын
Thank you very much for this explanation. The link and the correction are very helpful and do provide some clarity to a question I had. That being said, I don't think it's fair to call his explanation sloppy. He broke down complicated material in a fantastic and clear way for the most part. He even linked to research so we could do further reading, which is great because now I have a solid foundation to understand what I read in the papers. He should be encouraged to fix his few mistakes rather than slapped on the wrist.
@sachinkun213 жыл бұрын
thanks a ton!! I was actually looking for this comment as I had the same question as to why do we even need to approximate!
@ahmedshehata95223 жыл бұрын
You are really and also really good because you reference paper and introduce the idea
@ultrasgreen13492 жыл бұрын
thats actually a very very good and intuitive video. Honestly Thank you
@Slisus2 жыл бұрын
Awesome video. I really like, how you go into the actual papers behind it.
@CodeEmporium2 жыл бұрын
Glad you liked this!
@yeripark11353 жыл бұрын
I clearly understand the need of batch normalization and its advantages! Thanks !!
@maxb55604 жыл бұрын
Love your videos. They help me alot understanding machine learning more and more
@balthiertsk85963 жыл бұрын
Hey man, thank you. I really appreciate this quality content!
@jo-of-joey4 жыл бұрын
This is good! I think that giving an example as well as the use cases (advantages) before diving into the details alwayd gets the job done
@erich_l46444 жыл бұрын
This was so well put together- why less than 10k views? Oh... it's batch normalization
@MaralSheikhzadeh2 жыл бұрын
thanks, this video helped me understand BN better. and I liked your sense of humor. made watching is more fun.:)
@thoughte24323 жыл бұрын
I found this a really good and intuitive explanation, thanks for that. But there was one thing that confused me: isn't the effect of batch normalization the smoothing of the loss function? I found it difficult to associate the loss function directly to the graph shown at 2:50.
@Paivren Жыл бұрын
yes, the graph is a bit weird in the sense that the loss function is not a function of the features but of the model parameters.
@rishidixit793918 күн бұрын
Why after normalising data model needs less samples to train ?
@МаксимМакаров-р7ь4 ай бұрын
Clear explanation, thank you!
@dragonman1013 жыл бұрын
Quick note: at 6:50 there should be brackets after 1/3 (see below) Yours: 1/3 (4 - 5.33)^2 + (5 - 5.33)^2 + (7 - 5.33)^2
@pranavjangir83384 жыл бұрын
Is not Batch Normalization also used to counter the exploding gradient problem? Would have loved some explanation on that too..
@superghettoindian01 Жыл бұрын
I see you are checking all these comments - so will try to comment on all the videos I see going forward and how I’m using these videos. Currently using this video as supplement to Andrej Karpathy’s makemore series pt 3. The other video has a more detailed implementation of batch normalization but you do a great job of summarizing the key concepts. I hope one day you and Andrej can create a video together 😊.
@CodeEmporium Жыл бұрын
Thanks a ton for the comment. Honestly, any critical feedback is appreciated. So thanks you. It would certainly be a privilege to collaborate with Andrej for sure. Maybe in the future :)
@JapiSandhu2 жыл бұрын
can I add a Batch Normalization layer after an LSTM layer in pytorch?
@danieldeychakiwsky19284 жыл бұрын
Thanks for the video. I wanted to add that there's debate in the community over whether to normalize pre vs. post non-linearity within the layers, i.e., for a given neuron in some layer, do you normalize the result of the linear function that gets piped through non-linearity or do you pipe the linear combination through non-linearity and then apply normalization, in both cases, over the mini-batch.
@kennethleung44873 жыл бұрын
Here's what I found from MachineLearningMastery: o Batch normalization may be used on inputs to the layer before or after the activation function in the previous layer o It may be more appropriate after the activation function if for S-shaped functions like the hyperbolic tangent and logistic function o It may be appropriate before the activation function for activations that may result in non-Gaussian distributions like the rectified linear activation function, the modern default for most network types
@mohammadkaramisheykhlan93 жыл бұрын
How can we use batch normalization in the test set?
@mizzonimirko Жыл бұрын
I do not understand property how this Is going to be implemented. At the end of an epoch actually we perform those operations right? At the end of that epoch, at this point the layer where i have applied It Is normalized right?
@akremgomri90858 ай бұрын
Very good explanation. However, there is something I didn't understand. Doesn't batch normalisation modify the inout data so that m=0 and v=1 as explained in the beginning ?? So how the heck we moved from normalisation being applied on inputs, to normalisation affecting activation function ? 😅😅
@Inzurrekto14 ай бұрын
Thank you vor this video. Very well explained
@user-nx8ux5ls7q3 жыл бұрын
Also if someone can say how to make gamma and beta learnable? gamma can be thought as an additional weight attached to the activation but how about beta? how to train that?
@SillyMakesVids4 жыл бұрын
Sorry, but where did gamma and beta come from and how is it used?
@chandnimaria9748 Жыл бұрын
Just what I was looking for, thanks.
@SaifMohamed-de8uo7 ай бұрын
Great explanation thank you!
@hervebenganga85612 жыл бұрын
This is beautiful. Thank you
@nobelyhacker3 жыл бұрын
Nice video, but i guess there is a little error at 6:57? I guess you have to multiply the whole with 1/3 not only the first term
@lamnguyentrong2754 жыл бұрын
wow, easy to understand , and clear accent. Thank you, sir. u done a great job
@priyankakaswan75283 жыл бұрын
the real magic starts at 6.07, this video was exactly what I needed
@nyri03 жыл бұрын
Your visualizations are misleading. Normalization doesn't turn the shape on the left into the circle seen on the right. It will be less elongated but still keep a diagonal ellipse shape.
@sriharihumbarwadi59814 жыл бұрын
Can you please make a video on how batch normalization and l1/l2 regularization interact with each other ?
@SunnySingh-tp6nt8 ай бұрын
can I get these slides?
@user-nx8ux5ls7q3 жыл бұрын
Do we calculate the mean and SD across a mini-batch for a given neutron or across all the neurone in a layer? Andrew NG says it's across each layer. Thanks.
@ayandogra29524 жыл бұрын
Amazing work really liked it
@themightyquinn1002 жыл бұрын
Wasn't there an episode where Peter was playing against Larry Bird?
@enveraaa84143 жыл бұрын
Bro you have made the perfect video
@pupfer2 жыл бұрын
The only difficult part of batch norm, namely the back prop isn't explained.
@iliasaarab79223 жыл бұрын
Great explanation, thanks!
@ccuuttww4 жыл бұрын
I wonder is it suitable to use population estimator? I think nowadays most of the machine learning learner/student/fans spent very less time on statistics after several year study I find that The model selection and the statistical theory take the most important part especially the Bayesian learning the most underrated topic today
@99dynasty2 жыл бұрын
BatchNorm reparametrizes the underlying optimization problem to make it more stable (in the sense of loss Lipschitzness) and smooth (in the sense of “effective” β-smoothness of the loss). Not my words
@adosar7261 Жыл бұрын
And why not just normalizing the whole training set instead of batch normalization?
@CodeEmporium Жыл бұрын
Batch normalization will normalize through different steps of the network. If we want to “normalize the whole training set”, we need to pass all training examples at once to the network as a single batch. This is what we see in “batch gradient descent”, but isn’t super common for large datasets because of memory constraints.
@sanjaykrish87194 жыл бұрын
Fantastic explanation using contour plots.
@CodeEmporium4 жыл бұрын
Thanks! Contour plots are the best!
@חייםזיסמן-ש8ב4 жыл бұрын
Awesome explanation.
@rockzzstartzz23394 жыл бұрын
Why to use beta and gamma?
@sevfx2 жыл бұрын
Great explanation, but missing parantheses at 6:52 :p
@luisfraga32814 жыл бұрын
Hello, I wonder what if we don't normalize the image input data (RGB 0-255) and then we use batch normalization? Is it going to work smoothly? or is it going to mess up with the learning?
@SetoAjiNugroho4 жыл бұрын
what about layer norm ?
@manthanladva65474 жыл бұрын
Thanks for awesome video Get many idea about Batch Norm
@abhishekp48184 жыл бұрын
@CodeEmporium , could you please tell me that why do we need to normalize the outputs of activation function whe they are already within a small range(example sigmoid ranges from 0 to 1)? and if we do normalize them, then how do we compute and updates of its parameters during backpropgation? please answer.
@boke61844 жыл бұрын
The activation function should be the modifiing the predictability of error or learning too
@Hard_Online4 жыл бұрын
The best I have seen so far
@shaz-z5064 жыл бұрын
Good video, could you please make a video on capsule network.
@PavanTripathi-rj7bd Жыл бұрын
great explanation
@CodeEmporium Жыл бұрын
Thank you! Enjoy your stay on the channel :)
@strateeg322 жыл бұрын
Awesome thank you!
@anishjain80964 жыл бұрын
Hey brother can you please tell me how on fly data augmentation increase the image data set every on blogs and vedios they said it increase the data size but hiw
@CodeEmporium4 жыл бұрын
For images, you would need to make minor distortions (rotation, crop, scale, blur) in an image such that the result is a realistic input. This way, you have more training data for your model to generalize
@God-vl5uz8 ай бұрын
Thank you!
@gyanendradas4 жыл бұрын
Can u make a video for all types pooling layers
@CodeEmporium4 жыл бұрын
Interesting. I'll look into this. Thanks for the idea
@aminmw52582 жыл бұрын
Thank you bro.
@samratkorupolu4 жыл бұрын
wow, you explained pretty clearly
@JapiSandhu2 жыл бұрын
this is a great video
@sealivezentrum3 жыл бұрын
fuck me, you explained way better than my prof did
@Acampandoconfrikis3 жыл бұрын
Hey 🅱eter, did you make it to the NBA?
@hemaswaroop79704 жыл бұрын
Thanks, Man!
@SAINIVEDH3 жыл бұрын
For RNN's Batch Normalisation should be avoided, use Layer Normalisation instead
@rodi48504 жыл бұрын
Sorry to say but very poor video. Intro was way too long and explaining more the math and why BN works was left for 1-2mins.
@CodeEmporium4 жыл бұрын
Thanks for watching till the end. I tried going for a layered approach to the explanation - get the big picture. Then the applications. Then details. I wasn't sure how much more math was necessary. This was the main math in the paper, so I thought that was adequate. Always open to suggestions if you have any. If you've looked at my recent videos, you can tell the delivery is not consistent. Trying to see what works
@PhilbertLin4 жыл бұрын
I think the intro with the samples in the first few minutes was a little drawn out but the majority of the video spent on intuition and visuals without math was nice. Didn’t go through the paper so can’t comment on how much more math detail is needed.