Coding Llama 3 from scratch in PyTorch - Part 1

  Рет қаралды 1,769

Prince Canuma

Prince Canuma

20 күн бұрын

In this video series, you will learn how to train and fine-tune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infini-attention.
📚Papers:
- Sparse Upcycling Training Mixture-of-Experts from Dense Checkpoints
: arxiv.org/abs/2212.05055
- Pre-training Small Base LMs with Fewer Tokens: arxiv.org/abs/2404.08634
Leave No Context Behind Efficient Infinite Context Transformers with Infini-attention: arxiv.org/abs/2404.07143
💻 To follow along you can use this colab notebook:
- github.com/Blaizzy/Coding-LLM...
🎥 Coding Llama 2 from scratch video series
Part 1: kzbin.infoXHmag4damTg
Part 2: kzbin.infoLSWDpFmbE90
Part 3: • Coding Llama 2 from sc...

Пікірлер: 12
@AC-go1tp
@AC-go1tp 19 күн бұрын
This is very thoughtful and great initiative! researchers with enough gray matter but limited means can be still in the game . Thank you PC🙏!
@princecanuma
@princecanuma 18 күн бұрын
Most welcome! It’s my pleasure:) I lived through this so others don’t have to.
@ngamcode2485
@ngamcode2485 8 күн бұрын
this is very impressive and great content. thank you
@princecanuma
@princecanuma 2 күн бұрын
You're very welcome!
@kishoretvk
@kishoretvk 18 күн бұрын
Super impressive. Great value One question How do I further train the model on my custom content Instead of LORA ? Can we further full training it and add new memory
@princecanuma
@princecanuma 12 күн бұрын
Most welcome! You can do that, but that can be very expensive.
@vivekpadman5248
@vivekpadman5248 6 күн бұрын
Bro how did you train llama 3 without paper?
@princecanuma
@princecanuma 2 күн бұрын
Could you elaborate?
@vivekpadman5248
@vivekpadman5248 2 күн бұрын
@@princecanuma As far as I know there hasn't been an official llama 3 paper released and no data Info as well. But I could be wrong... 😅
@princecanuma
@princecanuma 2 күн бұрын
@@vivekpadman5248 true, they only released a blog detailing the data, model arch and performance. Here is how I did it: Llama-3 has the same exact architecture of Llama-2 which we already covered in this channel. kzbin.info/aero/PLDn_JsyofyfQp4td_ub6LfIg5vxyu6YJK&si=0Gyt9mdaA-ydiWOA Finally, if you understand how these models work you don't need the paper, the code implementation is more than enough.
@vivekpadman5248
@vivekpadman5248 2 күн бұрын
@@princecanuma oh understood, thanks I'll check it out and also your video 💙
@princecanuma
@princecanuma 2 күн бұрын
Most welcome :)
GEOMETRIC DEEP LEARNING BLUEPRINT
3:33:23
Machine Learning Street Talk
Рет қаралды 164 М.
КАКОЙ ВАШ ЛЮБИМЫЙ ЦВЕТ?😍 #game #shorts
00:17
Poopigirl
Рет қаралды 4,2 МЛН
Uma Ki Super Power To Dekho 😂
00:15
Uma Bai
Рет қаралды 59 МЛН
Indian sharing by Secret Vlog #shorts
00:13
Secret Vlog
Рет қаралды 32 МЛН
Build Anything with Llama 3 Agents, Here’s How
12:23
David Ondrej
Рет қаралды 96 М.
Mistral Fine Tuning for Dummies (with 16k, 32k, 128k+ Context)
24:15
Nodematic Tutorials
Рет қаралды 11 М.
Building RAG at 5 different levels
24:25
Jake Batsuuri
Рет қаралды 9 М.
Using Llama Coder As Your AI Assistant
9:18
Matt Williams
Рет қаралды 61 М.
Mistral 8x7B Part 1- So What is a Mixture of Experts Model?
12:33
Sam Witteveen
Рет қаралды 37 М.
Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)
23:16
Nodematic Tutorials
Рет қаралды 15 М.
Обзор игрового компьютера Макса 2в1
23:34
😱НОУТБУК СОСЕДКИ😱
0:30
OMG DEN
Рет қаралды 2 МЛН
Carregando telefone com carregador cortado
1:01
Andcarli
Рет қаралды 1,5 МЛН
Эволюция телефонов!
0:30
ТРЕНДИ ШОРТС
Рет қаралды 6 МЛН