Efficient Large-Scale Language Model Training on GPU Clusters

  Рет қаралды 4,694

Databricks

Databricks

Күн бұрын

Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these large models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on a single GPU or even on a multi-GPU server; and b) the number of compute operations required to train these models can result in unrealistically long training times. New methods of model parallelism such as tensor and pipeline parallelism have been proposed to address these challenges; unfortunately, naive usage leads to fundamental scaling issues at thousands of GPUs due to various reasons, e.g., expensive cross-node communication or idle periods waiting on other devices.
In this work, we show how to compose different types of parallelism methods (tensor, pipeline, and data parallelism) to scale to thousands of GPUs, achieving a two-order-of-magnitude increase in the sizes of models we can efficiently train compared to existing systems. We discuss various implementations of pipeline parallelism and propose a novel schedule that can improve throughput by more than 10% with comparable memory footprint compared to previously-proposed approaches. We quantitatively study the trade-offs between tensor, pipeline, and data parallelism, and provide intuition as to how to configure distributed training of a large model. The composition of these techniques allows us to perform training iterations on a model with 1 trillion parameters at 502 petaFLOP/s on 3072 GPUs with achieved per-GPU throughput of 52% of peak; previous efforts to train similar-sized models achieve much lower throughput (36% of theoretical peak). Our code has been open-sourced at github.com/nvidia/megatron-lm.
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com/databricks-nam...

Пікірлер: 3
@rexi4238
@rexi4238 7 ай бұрын
Very good video 👍
@wayne8863
@wayne8863 Жыл бұрын
I guess.. This is how gpt3 is made.
@alexilaiho6441
@alexilaiho6441 2 ай бұрын
Great vid. The singy-songy voice is just too annoying
Observability for Data Pipelines With OpenLineage
23:38
Databricks
Рет қаралды 7 М.
50 YouTubers Fight For $1,000,000
41:27
MrBeast
Рет қаралды 168 МЛН
Mom's Unique Approach to Teaching Kids Hygiene #shorts
00:16
Fabiosa Stories
Рет қаралды 12 МЛН
ЧУТЬ НЕ УТОНУЛ #shorts
00:27
Паша Осадчий
Рет қаралды 7 МЛН
A friendly introduction to distributed training (ML Tech Talks)
24:19
Data Parallelism Using PyTorch DDP | NVAITC Webinar
27:11
NVIDIA Developer
Рет қаралды 3,8 М.
Efficient Distributed Hyperparameter Tuning with Apache Spark
25:43
Build your own Deep learning Machine - What you need to know
11:58
The AI Hacker
Рет қаралды 213 М.
Rethinking Feature Stores
27:14
Databricks
Рет қаралды 11 М.
Build Real-Time Applications with Databricks Streaming
29:37
Databricks
Рет қаралды 14 М.
Turing-NLG, DeepSpeed and the ZeRO optimizer
21:18
Yannic Kilcher
Рет қаралды 16 М.
Large Language Models (LLMs) - Everything You NEED To Know
25:20
Matthew Berman
Рет қаралды 66 М.
How do GPUs speed up Neural Network training?
8:20
CodeEmporium
Рет қаралды 18 М.
Красиво, но телефон жаль
0:32
Бесполезные Новости
Рет қаралды 1,4 МЛН
Собери ПК и Получи 10,000₽
1:00
build monsters
Рет қаралды 2,8 МЛН
Choose a phone for your mom
0:20
ChooseGift
Рет қаралды 7 МЛН
EXEED VX 2024: Не өзгерді?
9:06
Oljas Oqas
Рет қаралды 46 М.