Deep Dive: Quantizing Large Language Models, part 2

Deep Dive: Quantizing Large Language Models, part 1

Parameter-efficient fine-tuning with QLoRA and Hugging Face

ВЕНГАЛБИ ПОЛУЧИЛ ПОДАРОК за 20 МЛН! Тамаев удивил Ахмеда!

Schoolboy - Часть 2

Jumping off balcony pulls her tooth! 🫣🦷

The hard turtle was blasted into pieces |Chinese Mountain Forest Life And Food #MoTiktok #Fyp

Deep Dive: Quantizing Large Language Models, part 2

Рет қаралды 1,000

Julien Simon

Julien Simon

Күн бұрын

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference.
Following up on part 1 • Deep Dive: Quantizing ... , we look at and compare more advanced quantization techniques: SmoothQuant, GPTQ, AWQ, HQQ, and the Hugging Face Optimum Intel library based on Intel Neural Compressor and Intel OpenVINO.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
00:55 SmoothQuant
07:00 Group-wise Precision Tuning Quantization (GPTQ)
12:35 Activation-aware Weight Quantization (AWQ)
18:10 Half-Quadratic Quantization (HQQ)
23:15 Optimum Intel
25:45 Accelerating Stable Diffusion with Intel OpenVINO

Пікірлер: 7

@cybermanaudiobooks3231

@cybermanaudiobooks3231 5 ай бұрын

Great info. Well presented. Learned a lot. Thanks Julien!

@juliensimonfr 5 ай бұрын

You're welcome !

@bashintosh107 4 ай бұрын

Thank you so much ! Very clear and helpful ! Does the slides are available somewhere ?

@juliensimonfr 4 ай бұрын

Sorry, no. You can find the graphs and tables in the research papers.

@itayatelis2898

@itayatelis2898 2 ай бұрын

Amazing! Julien do you have any plan on doing a distributed parallel algorithms? (Data par, inter layer, intra layer etc)?

@juliensimonfr 2 ай бұрын

Thank you. I did cover some techniques in kzbin.info/www/bejne/sIu4Zn2gd6xknKs. Anything else you'd be interested in?

@itayatelis2898

@itayatelis2898 Ай бұрын

@@juliensimonfr Yes maybe touch on the concept of superposition?

Deep Dive: Quantizing Large Language Models, part 1

40:28

Deep Dive: Quantizing Large Language Models, part 1

Julien Simon

Рет қаралды 9 М.

Parameter-efficient fine-tuning with QLoRA and Hugging Face

22:51

Parameter-efficient fine-tuning with QLoRA and Hugging Face

Julien Simon

Рет қаралды 2,5 М.

ВЕНГАЛБИ ПОЛУЧИЛ ПОДАРОК за 20 МЛН! Тамаев удивил Ахмеда!

39:19

ВЕНГАЛБИ ПОЛУЧИЛ ПОДАРОК за 20 МЛН! Тамаев удивил Ахмеда!

Асхаб Тамаев

Рет қаралды 6 МЛН

Schoolboy - Часть 2

00:12

Schoolboy - Часть 2

⚡️КАН АНДРЕЙ⚡️

Рет қаралды 3,2 МЛН

Jumping off balcony pulls her tooth! 🫣🦷

01:00

Jumping off balcony pulls her tooth! 🫣🦷

Justin Flom

Рет қаралды 27 МЛН

The hard turtle was blasted into pieces |Chinese Mountain Forest Life And Food #MoTiktok #Fyp

00:19

The hard turtle was blasted into pieces |Chinese Mountain Forest Life And Food #MoTiktok #Fyp

Eater Straw Hat

Рет қаралды 28 МЛН

Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

4:57

Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

Julien Simon

Рет қаралды 1,1 М.

Key Value Cache in Large Language Models Explained

17:37

Key Value Cache in Large Language Models Explained

Tensordroid

Рет қаралды 748

Large Language Models in Five Formulas

58:02

Large Language Models in Five Formulas

Sasha Rush 🤗

Рет қаралды 33 М.

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

10:07

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

Julien Simon

Рет қаралды 6 М.

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

15:51

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Maarten Grootendorst

Рет қаралды 16 М.

Should You Use Open Source Large Language Models?

6:40

Should You Use Open Source Large Language Models?

IBM Technology

Рет қаралды 348 М.

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

11:03

LLaMa GPTQ 4-Bit Quantization. Billions of Parameters Made Smaller and Smarter. How Does it Work?

AemonAlgiz

Рет қаралды 22 М.

Train tabular models automatically with Hugging Face AutoTrain

11:03

Train tabular models automatically with Hugging Face AutoTrain

Julien Simon

Рет қаралды 8 М.

Mapping GPT revealed something strange...

1:09:14

Mapping GPT revealed something strange...

Machine Learning Street Talk

Рет қаралды 205 М.

I'll tell you who has the strongest iPad keyboard #ipadkeyboard 3ipadcase #typecase #ipad

0:24

I'll tell you who has the strongest iPad keyboard #ipadkeyboard 3ipadcase #typecase #ipad

Typecase

Рет қаралды 10 МЛН

iPhone socket cleaning #Fixit

0:30

iPhone socket cleaning #Fixit

Tamar DB (mt)

Рет қаралды 18 МЛН

iPhone 15 Pro в реальной жизни

24:07

iPhone 15 Pro в реальной жизни

HUDAKOV

Рет қаралды 489 М.

Look, this is the 97th generation of the phone?

0:13

Look, this is the 97th generation of the phone?

Edcers

Рет қаралды 8 МЛН

CONFIGURATION💘PERFECTA🔔para✅ SAMSUNG😊A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 #shorts

0:24

CONFIGURATION💘PERFECTA🔔para✅ SAMSUNG😊A3,A5,A6,A7,J2,J5,J7,S5,S6,S7,S9,A10,A20,A30,A50,A70 #shorts

Aman Soni Official

Рет қаралды 8 МЛН

Это - iPhone 16!

16:29

Это - iPhone 16!

Rozetked

Рет қаралды 203 М.

Всегда проверяйте нет ли камер в съемной квартире

0:31

Всегда проверяйте нет ли камер в съемной квартире

Up Your Brains

Рет қаралды 2,4 МЛН