Batch Size Powers of 2 Really Necessary?

  Рет қаралды 4,016

Aladdin Persson

Aladdin Persson

Күн бұрын

Пікірлер: 14
@Ayayron1998
@Ayayron1998 2 жыл бұрын
If it works, works fast and fits into VRAM, those are the most important factors for me when training in batches. Excellent overview of the article btw. More videos like this please.
@konataizumi5829
@konataizumi5829 Жыл бұрын
From reddit: "There is entire manual from nvidia describing why powers of 2 in layer dimensions and batch sizes are a must for maximum performance on a cuda level. As many people mentioned - your testing is not representive because of bottlenecks and most likely monitoring issues."
@alvaroir
@alvaroir 2 жыл бұрын
Another explanation I learnt back in the day was related to threads. Since CPU/GPU vendors offer power of 2 number of processors (both physical and logical) it seems that you can make better utilization if each one of them take one sample (or batch or block or whatever). If you use odd numbers, one or more processors will be idle at some point. My guess is that in this article there are no noticeable time differences because memory access could be the major bottleneck when training (image batches are bigger in size). Thanks for the video Aladdin!
@nicksanders1438
@nicksanders1438 2 жыл бұрын
I had asked colleagues why they use powers of two several times, but was never really convinced. When I worked in industry, I just trained with the highest batch size that would fit within the GPU's VRAM. I agree that for academia, it's just more of a convention and looks neat.
@viktortodosijevic3270
@viktortodosijevic3270 2 жыл бұрын
Is a bigger batch size worse for training than a medium sized batch? I want to put more data on my GPU but it seems to me that that degrades the training. Or is the problem in the smaller number of weight updates?
@adamgrygielski1201
@adamgrygielski1201 2 жыл бұрын
Generally it shouldn't matter as long as you keep BS/LR ratio constant. As you've mentioned, with larger batches you get less gradient updates thus you should increase learning rate. E.g. if you use LR=0.01 for bs=32 when you go to bs=64 you should, by the good practice, increase LR to 0.02. It's very simplified approach but should work in most cases.
@yannickpezeu3419
@yannickpezeu3419 2 жыл бұрын
I always wondered if there was a reason. Thanks
@rabia3746
@rabia3746 2 жыл бұрын
Hey Aladdin, Could you make a video tutorial about TE-GAN ? This is for thermal image enhancement. Good luck and thx.
@Hoxle-87
@Hoxle-87 2 жыл бұрын
This is a weak paper review. A grad student can’t be reviewing a paper and be like “I don’t know what this is” “I don’t know why this is so” . You need to do your research. That’s what grad school is for. Ask classmates, TAs, post docs. Other than bringing the paper up to our attention and u making some profit, there’s little utility in this review. Just a constructive criticism
@AladdinPersson
@AladdinPersson 2 жыл бұрын
What do you think I could've added that would make it more useful?
@Hoxle-87
@Hoxle-87 2 жыл бұрын
@@AladdinPersson Need to add confidence on what you are doing. Saying that you don’t know what a figure in the paper is, or that you don’t know why Nvidia uses multiples of 8s reduces your statute as a competent reviewer. If you don’t know what something is, offer alternative explanations, spend some time doing research for each paper.
@AladdinPersson
@AladdinPersson 2 жыл бұрын
@@Hoxle-87 Thanks for feedback
@FLLCI
@FLLCI 2 жыл бұрын
Certainly really good and helpful comment for Aladdin to improve himself. Thanks for bringing this up!
@gradientattack
@gradientattack 2 жыл бұрын
😮
PYTORCH COMMON MISTAKES - How To Save Time 🕒
19:12
Aladdin Persson
Рет қаралды 56 М.
NeRFs: Neural Radiance Fields - Paper Explained
20:14
Aladdin Persson
Рет қаралды 37 М.
They Chose Kindness Over Abuse in Their Team #shorts
00:20
I migliori trucchetti di Fabiosa
Рет қаралды 12 МЛН
Haunted House 😰😨 LeoNata family #shorts
00:37
LeoNata Family
Рет қаралды 15 МЛН
Миллионер | 3 - серия
36:09
Million Show
Рет қаралды 1,7 МЛН
Amazing remote control#devil  #lilith #funny #shorts
00:30
Devil Lilith
Рет қаралды 16 МЛН
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 338 М.
The Wrong Batch Size Will Ruin Your Model
7:04
Underfitted
Рет қаралды 18 М.
EfficientNet Paper Walkthrough
25:50
Aladdin Persson
Рет қаралды 23 М.
Graph Neural Networks: A gentle introduction
29:15
Aladdin Persson
Рет қаралды 46 М.
AI Basics: Accuracy, Epochs, Learning Rate, Batch Size and Loss
10:55
Prof. Ryan Ahmed
Рет қаралды 22 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 213 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,7 МЛН
They Chose Kindness Over Abuse in Their Team #shorts
00:20
I migliori trucchetti di Fabiosa
Рет қаралды 12 МЛН