Let's pretrain a 3B LLM from scratch: on 16+ H100 GPUs, no detail skipped.

  Рет қаралды 14,675

william falcon

william falcon

Күн бұрын

We learn to pretrain a 3B parameter LLM across multiple H100 machines from scratch skipping no details. Learn to handle OOM errors, how to develop on cheap GPUs before scaling to multi-GPU. Finally, we end with running multinode with FSDP and explain how to take the model beyond 3B params.
This is a full lecture with no edits or details skipped. At the end of this lecture you will improve your set of skills and intuition needed for pretraining and scaling LLMs beyond a simple demo.
We start tuning and developing on cheap A10G GPUs. Then we run on 8 H100 GPUs and finally scale it to 2 machines, for a total of 16 H100 GPUs. This workflow saves a ton in cloud costs.
I start at 1B parameters and scale it to 3B. To go beyond 3B, simply use the same process but with more machines.
Chapters:
00:00 Introduction
01:40 Run the Llama template
02:19 Llama template overview
05:00 Run the template on 1 GPU (A10G)
06:20 Monitor GPU memory usage
06:40 Code walkthrough
10:30 How to handle OOM (out of memory) errors
13:20 Connect local VSCode (optional)
14:40 Overview of hyperparameters
15:50 Run a hyperparameter sweep to find the context window
24:50 Speed up by 2x on 4 GPUs (A10G)
29:40 VRAM vs power for profiling
33:07 From 1B to 3B parameters
37:00 How to release ghost GPU memory
42:00 Change to machine with 8 x H100 GPUs
42:20 Number of parameters vs data size
45:00 Hyperparameter sweep results
48:00 3B params on the H100 at 4x speed
54:40 Troubleshoot Tensorboard error
58:40 TensorBoard and artifacts on separate Studio for analysis
1:02:00 Measure cloud costs spent so far
1:05:00 Discuss and view data concerns
1:10:20 Getting to steady state
1:10:50 How to increase speed for the 3B parameter model
1:16:00 How to run DeepSpeed, FSDP and other scaling techniques
1:20:00 Start training with multi-node (multiple machines)
1:28:00 Monitor multi-node training
1:29:00 Summary

Пікірлер: 64
Local LLM Challenge | Speed vs Efficiency
16:25
Alex Ziskind
Рет қаралды 167 М.
What if all the world's biggest problems have the same solution?
24:52
ПОДРИФТИЛ С БАБУЛЕЙ #shorts
00:22
Паша Осадчий
Рет қаралды 2,5 МЛН
String Competition for iPhone! 😱
00:37
Alan Chikin Chow
Рет қаралды 30 МЛН
How to Build LLMs on Your Company’s Data While on a Budget
40:37
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Рет қаралды 5 МЛН
Don't Stop Pretraining!
15:11
Connor Shorten
Рет қаралды 4,6 М.
LLaMA2 for Multilingual Fine Tuning?
15:59
Sam Witteveen
Рет қаралды 17 М.
How to Build an LLM from Scratch | An Overview
35:45
Shaw Talebi
Рет қаралды 308 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
World's First Language Processing Unit 🚀 🚀 🚀
10:33
Prompt Engineering
Рет қаралды 17 М.