Walk with fastai, all about Hugging Face Accelerate

Рет қаралды 2,131

Жыл бұрын

A small snippet from my course walk with fastai revisited (store.walkwithfastai.com) where I discuss Hugging Face Accelerate, a project I work on.
Documentation: hf.co/docs/accelerate
Github: github.com/huggingface/accele...

Пікірлер: 1

@giovannibonetta3949 4 ай бұрын

Hi Zach, very nice presentation, you make it seem easy! So, since i stumbled on this video by chance (youtube algo, and i thank it for that), i feel i may push my chances even further puzzling you with a tricky (for me) question: How do you use Accelerate if you wanna train more than one model instance, so distributed training? I envision this chat.... [Zach] - just launch with accelerate launch --num_processes=2 et voilà. This loads 2 instances on 2 gpus and you can train distributed. [dumb me] - BUT, what if unfortunately even a single batch do not fit in GPU memory? [Zach] -Make it smaller. [dumb me] - BUT, unfortunately i cannot since i need an entire batch to gather logits of all the samples within it and train a model to choose the best (like in NLP multichoice setting), if i can not put the entire set of possibilities on the batch i am screwed. [Zach] - Just use Accelerate and HF dataset integration via accelerator.prepare(dataset, model) and it will split and dispatch/gather minibatches like a charm. (not sure about this answer tho) [dumb me] - BUT, what if you can not really use use Accelerate and HF dataset integration to allow for smart distributred batching, since you are doing RL and you don't really have a usual dataset? [Zach] - ... not sure what is the problem. [dumb me] - yes, maybe i am not explaining the problem well, but i asked for help about it on discord, and if are still reading, maybe you would like to give a glance at the whole story here: discord.com/channels/879548962464493619/1201516075389554778/1224650024793935895 which btw is shorter than this post, which i liked writing... (NOTE: taking this confersation and feeding LLaMA, Mistral and company with it did not solve the problem). :)