Batch Size Powers of 2 Really Necessary?

Рет қаралды 4,016

Күн бұрын

Пікірлер: 14

@Ayayron1998 2 жыл бұрын

If it works, works fast and fits into VRAM, those are the most important factors for me when training in batches. Excellent overview of the article btw. More videos like this please.

@konataizumi5829 Жыл бұрын

From reddit: "There is entire manual from nvidia describing why powers of 2 in layer dimensions and batch sizes are a must for maximum performance on a cuda level. As many people mentioned - your testing is not representive because of bottlenecks and most likely monitoring issues."

@alvaroir 2 жыл бұрын

Another explanation I learnt back in the day was related to threads. Since CPU/GPU vendors offer power of 2 number of processors (both physical and logical) it seems that you can make better utilization if each one of them take one sample (or batch or block or whatever). If you use odd numbers, one or more processors will be idle at some point. My guess is that in this article there are no noticeable time differences because memory access could be the major bottleneck when training (image batches are bigger in size). Thanks for the video Aladdin!

@nicksanders1438 2 жыл бұрын

I had asked colleagues why they use powers of two several times, but was never really convinced. When I worked in industry, I just trained with the highest batch size that would fit within the GPU's VRAM. I agree that for academia, it's just more of a convention and looks neat.

@viktortodosijevic3270 2 жыл бұрын

Is a bigger batch size worse for training than a medium sized batch? I want to put more data on my GPU but it seems to me that that degrades the training. Or is the problem in the smaller number of weight updates?

@adamgrygielski1201 2 жыл бұрын

Generally it shouldn't matter as long as you keep BS/LR ratio constant. As you've mentioned, with larger batches you get less gradient updates thus you should increase learning rate. E.g. if you use LR=0.01 for bs=32 when you go to bs=64 you should, by the good practice, increase LR to 0.02. It's very simplified approach but should work in most cases.

@yannickpezeu3419 2 жыл бұрын

I always wondered if there was a reason. Thanks

@rabia3746 2 жыл бұрын

Hey Aladdin, Could you make a video tutorial about TE-GAN ? This is for thermal image enhancement. Good luck and thx.

@Hoxle-87 2 жыл бұрын

This is a weak paper review. A grad student can’t be reviewing a paper and be like “I don’t know what this is” “I don’t know why this is so” . You need to do your research. That’s what grad school is for. Ask classmates, TAs, post docs. Other than bringing the paper up to our attention and u making some profit, there’s little utility in this review. Just a constructive criticism

@AladdinPersson 2 жыл бұрын

What do you think I could've added that would make it more useful?

@Hoxle-87 2 жыл бұрын

@@AladdinPersson Need to add confidence on what you are doing. Saying that you don’t know what a figure in the paper is, or that you don’t know why Nvidia uses multiples of 8s reduces your statute as a competent reviewer. If you don’t know what something is, offer alternative explanations, spend some time doing research for each paper.

@AladdinPersson 2 жыл бұрын

@@Hoxle-87 Thanks for feedback

@FLLCI 2 жыл бұрын

Certainly really good and helpful comment for Aladdin to improve himself. Thanks for bringing this up!