Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Рет қаралды 25,093

Күн бұрын

arxiv.org/abs/1502.03167
Abstract:
Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.
Authors:
Sergey Ioffe, Christian Szegedy

Пікірлер: 37

@lakerstv3021 5 жыл бұрын

Love your content! Some of the best explanations on the internet. Would be amazing if you could go through the Neural ODE or Taskonomy paper next.

@dvirginz4001 4 жыл бұрын

That's the best video on youtube about batchnorm, thanks for going over the paper.

@deepakarora987 4 жыл бұрын

This is really cool explanation , would love to hear more from you.

@Vroomerify 2 жыл бұрын

This was great! Your explanation of batch normalization is by far the most intuitive one I've found.

@kyuhyoungchoi 4 жыл бұрын

I really love the way you explain. You are using very standard language which the old pre-deeplearning guys are familiar with.

@KulkarniPrashant 5 күн бұрын

Absolutely amazing explanation!

@user-hf6mx4qu3l 4 жыл бұрын

Thank you so much for your intro. I had a hard time grasping the concepts, this helped a lot. Thank you :)

@tudor6210 4 жыл бұрын

Thanks for going over the paper!

@madhavimehta6010 Жыл бұрын

thanks for putting effort in explaining in simpler manner

@ross825 4 жыл бұрын

Why don’t you have more subscribers, so helpful!!!

@shinbi880928 5 жыл бұрын

I really like it, thank you! :)

@payalbhatia5244 4 жыл бұрын

@ Yannic Kilcher. Thanks , it has been best explaination. You simplified maths as well. Would request you to explain all recent papers in same way. Thansk

@manasagarwal9485 4 жыл бұрын

Hi, thanks a lot for these amazing paper reviews. Can you make a video about layer normalization as well, and why is it more suited for recurrent nets than batch normalization?

@BlakeEdwards333 4 жыл бұрын

Awesome thanks!

@MLDawn 2 жыл бұрын

Absolutely beautiful man. I love how you mentioned the model.train and mdoel.eval coding implication as well. OPut of curiousity: 1) What software are you using to show the paper (not adobe right?) 2) What kind of drawing pad are you using? I have a Wacom but since I cannot see what I'm doing on it, it is annoying to teach using it really.

@Konstantin-qk6hv 2 жыл бұрын

Thank you for the review. I like to watch your videos instead of reading paper

@Engidea 3 жыл бұрын

What is the app are you using to edit and write on the the paper.

@wolftribe66 4 жыл бұрын

Could you make a video about group normalization from FAIR?

@abdulhkaimal0352 3 жыл бұрын

why we normalize the data and then we multiply by gamma and add it with beta i understand it produce the best distribution of data but rather cant we just multiply by gamma and add the beta without even do the normalization part ?

@matthewtang1489 4 жыл бұрын

Wouldn't it be cool for some professors to make the students derive the derivatives in the test =).

@hoangnhatpham8076 3 жыл бұрын

I had to do that my DL exams. Just feedforward though, nothing this involved =)

@Laszer271 4 жыл бұрын

Batch Normalization doesn't reduce internal covariate shift, see: How Does Batch Normalization Help Optimization? arXiv:1805.11604

@adhiyamaanpon4168 3 жыл бұрын

someone clear the following doubt: does gamma and beta value will be different for each input feature in a particular layer?

@YannicKilcher 3 жыл бұрын

yes

@adhiyamaanpon4168 3 жыл бұрын

@@YannicKilcher Thanks a lot!!

@AvielLivay 4 жыл бұрын

But why do you need the lambda/beta? what's wrong with just shifting to average 0, variance 1? And also - how do you train them, you mean they are part of the network and so they are trained but I thought we want things not to be shaky but you are actually adding these that are adding to the 'shakiness'... what's the point?

@rbhambriiit Жыл бұрын

Idea is to learn the better repression. Identity or normalised or something in between. Think of it as data preprocessing

@rbhambriiit Жыл бұрын

Agreed it does question the original hypothesis/definition of normalisation at input layer as well

@rbhambriiit Жыл бұрын

It's not that shaky. It's another layer trying to learn the better data dimensions. With images identity layers work well. So the batchnorm learning should effectively reverse the mean/variance shift.

@yamiyagami7141 5 жыл бұрын

Nice video! You might also want to check out "How Does Batch Normalization Help Optimization?" (arxiv.org/abs/1805.11604), presented at NeurIPS18, which casts doubt on the idea that batchnorm improves performance through reduction in internal covariate shift.