Coding Llama 3 from scratch in PyTorch

Coding Llama 3 from scratch in PyTorch - Part 2

Рет қаралды 3,016

Prince Canuma

Күн бұрын

Пікірлер: 17

@kishoretvk 5 ай бұрын

Thanks for committing to the open source and educating people on cutting edge knowledge.

@princecanuma 5 ай бұрын

Most welcome, it’s my pleasure!

@yoanijosias 5 ай бұрын

Very good, can’t wait to see updates to it.

@princecanuma 5 ай бұрын

You and me both!

@liyanan2004 4 ай бұрын

Could you please make a tutorial on vlm, and how it works. Like this series of videos, from scratch.

@princecanuma 4 ай бұрын

That’s a great idea! 💡 Will do 👌🏽

@spkgyk 5 ай бұрын

Why do you use 32 bit paged optimzier when the model is being fine-tuned with QLoRA? Surely QLoRA stores the weights in 8bit double quantized form, so using a 32 bit optimizer makes no difference, and the weight updates need to be converted back to 8 bit anyway? Please help me understand this

@princecanuma 5 ай бұрын

Additionally, 8bit states are dequantized to 32bit for the update anyways. huggingface.co/docs/bitsandbytes/main/en/explanations/optimizers

@spkgyk 5 ай бұрын

@@princecanuma Thank you for the quick response. With 8-bit optimizers, large models can be finetuned with 75% less GPU memory without losing any accuracy compared to training with standard 32-bit optimizers. The reduced memory requirements means 8-bit optimizers are 4x faster than a standard optimizer, and no hyperparameter tuning is required. Surely this means that using 32 bit just wastes compute power? Please correct me if I'm wrong, I'm really trying to understand the benefits. Is it because training with 32 bit means that despite converting to 8 bit for the weight update, the conversion leads to small accuracy gains?

@princecanuma 5 ай бұрын

There are no accuracy gains only reduced GPU usage and potentially some extra speed. In terms of speed, I personally didn’t notice any changes. I tested it yesterday and besides reduced GPU usage I noticed that it would take just as long as the 32bit to complete training.