Efficient Large-Scale Language Model Training on GPU Clusters

Observability for Data Pipelines With OpenLineage

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

50 YouTubers Fight For $1,000,000

Когда победа БЫЛА уже в КАРМАНЕ, он совершил САМУЮ ГЛУПУЮ ОШИБКУ в своей жизни #shorts

Mom's Unique Approach to Teaching Kids Hygiene #shorts

ЧУТЬ НЕ УТОНУЛ #shorts

Efficient Large-Scale Language Model Training on GPU Clusters

Рет қаралды 4,694

Databricks

Күн бұрын

Large language models have led to state-of-the-art accuracies across a range of tasks. However, training these large models efficiently is challenging for two reasons: a) GPU memory capacity is limited, making it impossible to fit large models on a single GPU or even on a multi-GPU server; and b) the number of compute operations required to train these models can result in unrealistically long training times. New methods of model parallelism such as tensor and pipeline parallelism have been proposed to address these challenges; unfortunately, naive usage leads to fundamental scaling issues at thousands of GPUs due to various reasons, e.g., expensive cross-node communication or idle periods waiting on other devices.
In this work, we show how to compose different types of parallelism methods (tensor, pipeline, and data parallelism) to scale to thousands of GPUs, achieving a two-order-of-magnitude increase in the sizes of models we can efficiently train compared to existing systems. We discuss various implementations of pipeline parallelism and propose a novel schedule that can improve throughput by more than 10% with comparable memory footprint compared to previously-proposed approaches. We quantitatively study the trade-offs between tensor, pipeline, and data parallelism, and provide intuition as to how to configure distributed training of a large model. The composition of these techniques allows us to perform training iterations on a model with 1 trillion parameters at 502 petaFLOP/s on 3072 GPUs with achieved per-GPU throughput of 52% of peak; previous efforts to train similar-sized models achieve much lower throughput (36% of theoretical peak). Our code has been open-sourced at github.com/nvidia/megatron-lm.
Connect with us:
Website: databricks.com
Facebook: / databricksinc
Twitter: / databricks
LinkedIn: / databricks
Instagram: / databricksinc Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. databricks.com/databricks-nam...

Пікірлер: 3

@rexi4238 7 ай бұрын

Very good video 👍

@wayne8863 Жыл бұрын

I guess.. This is how gpt3 is made.

@alexilaiho6441

@alexilaiho6441 2 ай бұрын

Great vid. The singy-songy voice is just too annoying

Observability for Data Pipelines With OpenLineage

23:38

Observability for Data Pipelines With OpenLineage

Databricks

Рет қаралды 7 М.

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

24:04

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

@Scale

Рет қаралды 2,6 М.

50 YouTubers Fight For $1,000,000

41:27

50 YouTubers Fight For $1,000,000

MrBeast

Рет қаралды 168 МЛН

Когда победа БЫЛА уже в КАРМАНЕ, он совершил САМУЮ ГЛУПУЮ ОШИБКУ в своей жизни #shorts

01:00

Когда победа БЫЛА уже в КАРМАНЕ, он совершил САМУЮ ГЛУПУЮ ОШИБКУ в своей жизни #shorts

BalcevMMA_BOXING

Рет қаралды 4,4 МЛН

Mom's Unique Approach to Teaching Kids Hygiene #shorts

00:16

Mom's Unique Approach to Teaching Kids Hygiene #shorts

Fabiosa Stories

Рет қаралды 12 МЛН

ЧУТЬ НЕ УТОНУЛ #shorts

00:27

ЧУТЬ НЕ УТОНУЛ #shorts

Паша Осадчий

Рет қаралды 7 МЛН

A friendly introduction to distributed training (ML Tech Talks)

24:19

A friendly introduction to distributed training (ML Tech Talks)

TensorFlow

Рет қаралды 42 М.

Data Parallelism Using PyTorch DDP | NVAITC Webinar

27:11

Data Parallelism Using PyTorch DDP | NVAITC Webinar

NVIDIA Developer

Рет қаралды 3,8 М.

Efficient Distributed Hyperparameter Tuning with Apache Spark

25:43

Efficient Distributed Hyperparameter Tuning with Apache Spark

Databricks

Рет қаралды 4,7 М.

Build your own Deep learning Machine - What you need to know

11:58

Build your own Deep learning Machine - What you need to know

The AI Hacker

Рет қаралды 213 М.

Rethinking Feature Stores

27:14

Rethinking Feature Stores

Databricks

Рет қаралды 11 М.

Build Real-Time Applications with Databricks Streaming

29:37

Build Real-Time Applications with Databricks Streaming

Databricks

Рет қаралды 14 М.

Turing-NLG, DeepSpeed and the ZeRO optimizer

21:18

Turing-NLG, DeepSpeed and the ZeRO optimizer

Yannic Kilcher

Рет қаралды 16 М.

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

44:22

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

InsideHPC Report

Рет қаралды 15 М.

Large Language Models (LLMs) - Everything You NEED To Know

25:20

Large Language Models (LLMs) - Everything You NEED To Know

Matthew Berman

Рет қаралды 66 М.

How do GPUs speed up Neural Network training?

8:20

How do GPUs speed up Neural Network training?

CodeEmporium

Рет қаралды 18 М.

Красиво, но телефон жаль

0:32

Красиво, но телефон жаль

Бесполезные Новости

Рет қаралды 1,4 МЛН

Новый питомец! Робот с искусственным интеллектом! Он меня узнал! Anki Cozmo

10:18

Новый питомец! Робот с искусственным интеллектом! Он меня узнал! Anki Cozmo

Alex Boyko

Рет қаралды 318 М.

Собери ПК и Получи 10,000₽

1:00

Собери ПК и Получи 10,000₽

build monsters

Рет қаралды 2,8 МЛН

Choose a phone for your mom

0:20

Choose a phone for your mom

ChooseGift

Рет қаралды 7 МЛН

🔥 Лютая вещь для геймеров Да и вообще для тех кто проводит время за компом 💻

0:20

🔥 Лютая вещь для геймеров Да и вообще для тех кто проводит время за компом 💻

Pochinka_blog

Рет қаралды 4,4 МЛН

ИГРОВОЙ ПК ЗА 10К КОТОРЫЙ ДЕЙСТВИТЕЛЬНО ТАЩИТ В 2024 ГОДУ / СБОРКА ПК ЗА 10000 РУБЛЕЙ by KOMPUKTER

12:15

ИГРОВОЙ ПК ЗА 10К КОТОРЫЙ ДЕЙСТВИТЕЛЬНО ТАЩИТ В 2024 ГОДУ / СБОРКА ПК ЗА 10000 РУБЛЕЙ by KOMPUKTER

KOMPUKTER

Рет қаралды 369 М.

EXEED VX 2024: Не өзгерді?

9:06

EXEED VX 2024: Не өзгерді?

Oljas Oqas

Рет қаралды 46 М.