Deep Dive: Quantizing Large Language Models, part 1

Deep Dive: Quantizing Large Language Models, part 2

Deep Dive: Optimizing LLM inference

Викторина от МАМЫ 🆘 | WICSUR #shorts

Gym belt !! 😂😂 @kauermotta

Clown takes blame for missing candy 🍬🤣 #shorts

Doing This Instead Of Studying.. 😳

Deep Dive: Quantizing Large Language Models, part 1

Рет қаралды 8,584

Julien Simon

Julien Simon

5 ай бұрын

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference.
In this video, we discuss model quantization, first introducing what it is, and how to get an intuition of rescaling and the problems it creates. Then we introduce the different types of quantization: dynamic post-training quantization, static post-training quantization, and quantization-aware training. Finally, we start looking at and comparing actual quantization techniques: PyTorch, ZeroQuant, and bitsandbytes.
In part 2 • Deep Dive: Quantizing ... , we look at and compare more advanced quantization techniques: SmoothQuant, GPTQ, AWQ, HQQ, and the Hugging Face Optimum Intel library based on Intel Neural Compressor and Intel OpenVINO.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
02:05 What is quantization?
06:50 Rescaling weights and activations
08:17 The mapping function
12:38 Picking the input range
16:15 Getting rid of outliers
19:50 When can we apply quantization?
26:00 Dynamic post-training quantization with PyTorch
28:42 ZeroQuant
34:50 bitsandbytes

Пікірлер: 17

@itayatelis2898

@itayatelis2898 2 ай бұрын

Love your content! thank you!

@juliensimonfr 2 ай бұрын

Glad you enjoy it!

@bibiworm 2 ай бұрын

I have been wanting to understand quantization for a very long time. Thank you! Would you mind sharing the slides please? Thank you.

@matthewrice7590

@matthewrice7590 5 ай бұрын

Thanks for this, Julien

@juliensimonfr 5 ай бұрын

you're welcome :)

@road2nohand 4 ай бұрын

Glorious Content :D

@juliensimonfr 4 ай бұрын

Glad you like it!

@Joe-nh9fy 4 ай бұрын

Great explanation! I have one question... Is it common practice to regularize the LLM cost function like with L2 to reduce the weight "outliers" while training?

@juliensimonfr 4 ай бұрын

I don't think there is a strong consensus. It looks like regularization during fine-tuning can help with generalization. There are new ideas too, like noisy embeddings wandb.ai/byyoung3/ml-news/reports/A-New-Method-For-LLM-Regularization--Vmlldzo1ODIyMzIw

@jacehua7334 5 ай бұрын

🔥 🔥 🔥

@juliensimonfr 5 ай бұрын

:)

@monishostwal8255

@monishostwal8255 4 ай бұрын

what is meant by calibration dataset? is it eqivalent to evaluation set?

@juliensimonfr 4 ай бұрын

Pretty much, yes. It's used to figure out the "best" hyperparameter values.

@monishostwal8255

@monishostwal8255 4 ай бұрын

okay got it thanks

@caiyu538 4 ай бұрын

👍

@juliensimonfr 4 ай бұрын

😃

@joaogalego7229

@joaogalego7229 5 ай бұрын

Watching this at 1.25x speed. High-quality content as usual. Keep it up, Julien 💪

Deep Dive: Quantizing Large Language Models, part 2

27:13

Deep Dive: Quantizing Large Language Models, part 2

Julien Simon

Рет қаралды 1 М.

Deep Dive: Optimizing LLM inference

36:12

Deep Dive: Optimizing LLM inference

Julien Simon

Рет қаралды 19 М.

Викторина от МАМЫ 🆘 | WICSUR #shorts

00:58

Викторина от МАМЫ 🆘 | WICSUR #shorts

Бискас

Рет қаралды 4,9 МЛН

Gym belt !! 😂😂 @kauermotta

00:10

Gym belt !! 😂😂 @kauermotta

Tibo InShape

Рет қаралды 18 МЛН

Clown takes blame for missing candy 🍬🤣 #shorts

00:49

Clown takes blame for missing candy 🍬🤣 #shorts

Yoeslan

Рет қаралды 47 МЛН

Doing This Instead Of Studying.. 😳

00:12

Doing This Instead Of Studying.. 😳

Jojo Sim

Рет қаралды 14 МЛН

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

40:40

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained)

Yannic Kilcher

Рет қаралды 134 М.

What are AI Agents?

12:29

What are AI Agents?

IBM Technology

Рет қаралды 113 М.

Deep dive: model merging

47:26

Deep dive: model merging

Julien Simon

Рет қаралды 8 М.

The moment we stopped understanding AI [AlexNet]

17:38

The moment we stopped understanding AI [AlexNet]

Welch Labs

Рет қаралды 845 М.

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

19:46

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Efficient NLP

Рет қаралды 16 М.

Generative AI in a Nutshell - how to survive and thrive in the age of AI

17:57

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Henrik Kniberg

Рет қаралды 1,8 МЛН

LoRA explained (and a bit about precision and quantization)

17:07

LoRA explained (and a bit about precision and quantization)

DeepFindr

Рет қаралды 52 М.

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

19:17

Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA

Chris Alexiuk

Рет қаралды 102 М.

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

11:03

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

AemonAlgiz

Рет қаралды 22 М.

This is why Deep Learning is really weird.

2:06:38

This is why Deep Learning is really weird.

Machine Learning Street Talk

Рет қаралды 377 М.

8 Товаров с Алиэкспресс, о которых ты мог и не знать!

49:47

8 Товаров с Алиэкспресс, о которых ты мог и не знать!

РасПаковка ДваПаковка

Рет қаралды 175 М.

Tag him😳💕 #miniphone #iphone #samsung #smartphone #fy

0:11

Tag him😳💕 #miniphone #iphone #samsung #smartphone #fy

Pockify™

Рет қаралды 3,9 МЛН

Всегда проверяйте нет ли камер в съемной квартире

0:31

Всегда проверяйте нет ли камер в съемной квартире

Up Your Brains

Рет қаралды 2,4 МЛН

Looks very comfortable. #leddisplay #ledscreen #ledwall #eagerled

0:19

Looks very comfortable. #leddisplay #ledscreen #ledwall #eagerled

LED Screen Factory-EagerLED

Рет қаралды 11 МЛН

КУПИЛ САМЫЙ ПОПУЛЯРНЫЙ ПК ARDOR GAMING в DNS для CS2

25:51

КУПИЛ САМЫЙ ПОПУЛЯРНЫЙ ПК ARDOR GAMING в DNS для CS2

JOSKIY

Рет қаралды 141 М.

Product Link in Bio ( # 1776 ) @MaviGadgets ▶️ LED Digital Tally Counter

0:10

Product Link in Bio ( # 1776 ) @MaviGadgets ▶️ LED Digital Tally Counter

MaviGadget

Рет қаралды 901 М.

Функция, которая позволяет чужому человеку получить доступ к вашему iPhone 📵Советуем отключить её🧡

0:37

Функция, которая позволяет чужому человеку получить доступ к вашему iPhone 📵Советуем отключить её🧡

KingStore_astrakhan

Рет қаралды 5 МЛН