If it works, works fast and fits into VRAM, those are the most important factors for me when training in batches. Excellent overview of the article btw. More videos like this please.
@konataizumi5829 Жыл бұрын
From reddit: "There is entire manual from nvidia describing why powers of 2 in layer dimensions and batch sizes are a must for maximum performance on a cuda level. As many people mentioned - your testing is not representive because of bottlenecks and most likely monitoring issues."
@alvaroir2 жыл бұрын
Another explanation I learnt back in the day was related to threads. Since CPU/GPU vendors offer power of 2 number of processors (both physical and logical) it seems that you can make better utilization if each one of them take one sample (or batch or block or whatever). If you use odd numbers, one or more processors will be idle at some point. My guess is that in this article there are no noticeable time differences because memory access could be the major bottleneck when training (image batches are bigger in size). Thanks for the video Aladdin!
@nicksanders14382 жыл бұрын
I had asked colleagues why they use powers of two several times, but was never really convinced. When I worked in industry, I just trained with the highest batch size that would fit within the GPU's VRAM. I agree that for academia, it's just more of a convention and looks neat.
@viktortodosijevic32702 жыл бұрын
Is a bigger batch size worse for training than a medium sized batch? I want to put more data on my GPU but it seems to me that that degrades the training. Or is the problem in the smaller number of weight updates?
@adamgrygielski12012 жыл бұрын
Generally it shouldn't matter as long as you keep BS/LR ratio constant. As you've mentioned, with larger batches you get less gradient updates thus you should increase learning rate. E.g. if you use LR=0.01 for bs=32 when you go to bs=64 you should, by the good practice, increase LR to 0.02. It's very simplified approach but should work in most cases.
@yannickpezeu34192 жыл бұрын
I always wondered if there was a reason. Thanks
@rabia37462 жыл бұрын
Hey Aladdin, Could you make a video tutorial about TE-GAN ? This is for thermal image enhancement. Good luck and thx.
@Hoxle-872 жыл бұрын
This is a weak paper review. A grad student can’t be reviewing a paper and be like “I don’t know what this is” “I don’t know why this is so” . You need to do your research. That’s what grad school is for. Ask classmates, TAs, post docs. Other than bringing the paper up to our attention and u making some profit, there’s little utility in this review. Just a constructive criticism
@AladdinPersson2 жыл бұрын
What do you think I could've added that would make it more useful?
@Hoxle-872 жыл бұрын
@@AladdinPersson Need to add confidence on what you are doing. Saying that you don’t know what a figure in the paper is, or that you don’t know why Nvidia uses multiples of 8s reduces your statute as a competent reviewer. If you don’t know what something is, offer alternative explanations, spend some time doing research for each paper.
@AladdinPersson2 жыл бұрын
@@Hoxle-87 Thanks for feedback
@FLLCI2 жыл бұрын
Certainly really good and helpful comment for Aladdin to improve himself. Thanks for bringing this up!