Coding Llama 3 from scratch in PyTorch

Coding Llama 3 from scratch in PyTorch - Part 1

Рет қаралды 1,769

20 күн бұрын

In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention.
📚Papers:
- Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
: arxiv.org/abs/2212.05055
- Pre-training Small Base LMs with Fewer Tokens: arxiv.org/abs/2404.08634
Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention: arxiv.org/abs/2404.07143
💻 To follow along you can use this colab notebook:
- github.com/Blaizzy/Coding-LLM...
🎥 Coding Llama 2 from scratch video series
Part 1: kzbin.infoXHmag4damTg
Part 2: kzbin.infoLSWDpFmbE90
Part 3: • Coding Llama 2 from sc...

Пікірлер: 12

@AC-go1tp 19 күн бұрын

This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!

@princecanuma 18 күн бұрын

Most welcome! It’s my pleasure:) I lived through this so others don’t have to.

@ngamcode2485 8 күн бұрын

this is very impressive and great content. thank you

@princecanuma 2 күн бұрын

You're very welcome!

@kishoretvk 18 күн бұрын

Super impressive. Great value One question How do I further train the model on my custom content Instead of LORA ? Can we further full training it and add new memory

@princecanuma 12 күн бұрын

Most welcome! You can do that, but that can be very expensive.

@vivekpadman5248 6 күн бұрын

Bro how did you train llama 3 without paper?

@princecanuma 2 күн бұрын

Could you elaborate?

@vivekpadman5248 2 күн бұрын

@@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅

@princecanuma 2 күн бұрын

@@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance. Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel. kzbin.info/aero/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK&si=0Gyt9mdaA-ydiWOA Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.

@vivekpadman5248 2 күн бұрын

@@princecanuma oh understood, thanks I'll check it out and also your video 💙

@princecanuma 2 күн бұрын

Most welcome :)