Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

  Рет қаралды 26,183

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 38
@kyuhyoungchoi
@kyuhyoungchoi 5 жыл бұрын
I really love the way you explain. You are using very standard language which the old pre-deeplearning guys are familiar with.
@Vroomerify
@Vroomerify 2 жыл бұрын
This was great! Your explanation of batch normalization is by far the most intuitive one I've found.
@lakerstv3021
@lakerstv3021 6 жыл бұрын
Love your content! Some of the best explanations on the internet. Would be amazing if you could go through the Neural ODE or Taskonomy paper next.
@dvirginz4001
@dvirginz4001 5 жыл бұрын
That's the best video on youtube about batchnorm, thanks for going over the paper.
@deepakarora987
@deepakarora987 5 жыл бұрын
This is really cool explanation , would love to hear more from you.
@KulkarniPrashant
@KulkarniPrashant 7 ай бұрын
Absolutely amazing explanation!
@黃一-h6b
@黃一-h6b 4 жыл бұрын
Thank you so much for your intro. I had a hard time grasping the concepts, this helped a lot. Thank you :)
@madhavimehta6010
@madhavimehta6010 2 жыл бұрын
thanks for putting effort in explaining in simpler manner
@ross825
@ross825 5 жыл бұрын
Why don’t you have more subscribers, so helpful!!!
@manasagarwal9485
@manasagarwal9485 4 жыл бұрын
Hi, thanks a lot for these amazing paper reviews. Can you make a video about layer normalization as well, and why is it more suited for recurrent nets than batch normalization?
@payalbhatia5244
@payalbhatia5244 4 жыл бұрын
@ Yannic Kilcher. Thanks , it has been best explaination. You simplified maths as well. Would request you to explain all recent papers in same way. Thansk
@tudor6210
@tudor6210 4 жыл бұрын
Thanks for going over the paper!
@Konstantin-qk6hv
@Konstantin-qk6hv 3 жыл бұрын
Thank you for the review. I like to watch your videos instead of reading paper
@Laszer271
@Laszer271 4 жыл бұрын
Batch Normalization doesn't reduce internal covariate shift, see: How Does Batch Normalization Help Optimization? arXiv:1805.11604
@Engidea
@Engidea 4 жыл бұрын
What is the app are you using to edit and write on the the paper.
@MLDawn
@MLDawn 3 жыл бұрын
Absolutely beautiful man. I love how you mentioned the model.train and mdoel.eval coding implication as well. OPut of curiousity: 1) What software are you using to show the paper (not adobe right?) 2) What kind of drawing pad are you using? I have a Wacom but since I cannot see what I'm doing on it, it is annoying to teach using it really.
@abdulhkaimal0352
@abdulhkaimal0352 3 жыл бұрын
why we normalize the data and then we multiply by gamma and add it with beta i understand it produce the best distribution of data but rather cant we just multiply by gamma and add the beta without even do the normalization part ?
@shinbi880928
@shinbi880928 5 жыл бұрын
I really like it, thank you! :)
@wolftribe66
@wolftribe66 5 жыл бұрын
Could you make a video about group normalization from FAIR?
@BlakeEdwards333
@BlakeEdwards333 5 жыл бұрын
Awesome thanks!
@kynguyencao1630
@kynguyencao1630 Ай бұрын
thank you so much
@matthewtang1489
@matthewtang1489 4 жыл бұрын
Wouldn't it be cool for some professors to make the students derive the derivatives in the test =).
@hoangnhatpham8076
@hoangnhatpham8076 3 жыл бұрын
I had to do that my DL exams. Just feedforward though, nothing this involved =)
@adhiyamaanpon4168
@adhiyamaanpon4168 4 жыл бұрын
someone clear the following doubt: does gamma and beta value will be different for each input feature in a particular layer?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
yes
@adhiyamaanpon4168
@adhiyamaanpon4168 4 жыл бұрын
@@YannicKilcher Thanks a lot!!
@AvielLivay
@AvielLivay 4 жыл бұрын
But why do you need the lambda/beta? what's wrong with just shifting to average 0, variance 1? And also - how do you train them, you mean they are part of the network and so they are trained but I thought we want things not to be shaky but you are actually adding these that are adding to the 'shakiness'... what's the point?
@rbhambriiit
@rbhambriiit Жыл бұрын
Idea is to learn the better repression. Identity or normalised or something in between. Think of it as data preprocessing
@rbhambriiit
@rbhambriiit Жыл бұрын
Agreed it does question the original hypothesis/definition of normalisation at input layer as well
@rbhambriiit
@rbhambriiit Жыл бұрын
It's not that shaky. It's another layer trying to learn the better data dimensions. With images identity layers work well. So the batchnorm learning should effectively reverse the mean/variance shift.
@yamiyagami7141
@yamiyagami7141 6 жыл бұрын
Nice video! You might also want to check out "How Does Batch Normalization Help Optimization?" (arxiv.org/abs/1805.11604), presented at NeurIPS18, which casts doubt on the idea that batchnorm improves performance through reduction in internal covariate shift.
@dlisetteb
@dlisetteb 3 жыл бұрын
i really cant understand it
@yasseraziz1287
@yasseraziz1287 4 жыл бұрын
YOU DA MAN LONG LIVE YANNIC KILCHER
@michaelcarlon1831
@michaelcarlon1831 6 жыл бұрын
All of the cool kids use SELU
@ssshukla26
@ssshukla26 4 жыл бұрын
If not more then almost as complicated explanation as in the paper.
@garrettosborne4364
@garrettosborne4364 4 жыл бұрын
You got lost in the weeds on this one.
@rbhambriiit
@rbhambriiit Жыл бұрын
The backprop was bit unclear n perhaps the hardest bit
@stefanogrillo6040
@stefanogrillo6040 Жыл бұрын
lol
Group Normalization (Paper Explained)
29:06
Yannic Kilcher
Рет қаралды 31 М.
Mom Hack for Cooking Solo with a Little One! 🍳👶
00:15
5-Minute Crafts HOUSE
Рет қаралды 23 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
Batch Normalization - EXPLAINED!
8:49
CodeEmporium
Рет қаралды 114 М.
CS 152 NN-12:  Regularization: Batch Normalization
19:10
Neil Rhodes
Рет қаралды 405
Neural Architecture Search without Training (Paper Explained)
35:06
Yannic Kilcher
Рет қаралды 28 М.
Linformer: Self-Attention with Linear Complexity (Paper Explained)
50:24
Deep Ensembles: A Loss Landscape Perspective (Paper Explained)
46:32
Yannic Kilcher
Рет қаралды 23 М.
Reformer: The Efficient Transformer
29:12
Yannic Kilcher
Рет қаралды 20 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 432 М.