Deep Dive: Quantizing Large Language Models, part 2

  Рет қаралды 1,000

Julien Simon

Julien Simon

Күн бұрын

Quantization is an excellent technique to compress Large Language Models (LLM) and accelerate their inference.
Following up on part 1 • Deep Dive: Quantizing ... , we look at and compare more advanced quantization techniques: SmoothQuant, GPTQ, AWQ, HQQ, and the Hugging Face Optimum Intel library based on Intel Neural Compressor and Intel OpenVINO.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at / julsimon or Substack at julsimon.substack.com. ⭐️⭐️⭐️
00:55 SmoothQuant
07:00 Group-wise Precision Tuning Quantization (GPTQ)
12:35 Activation-aware Weight Quantization (AWQ)
18:10 Half-Quadratic Quantization (HQQ)
23:15 Optimum Intel
25:45 Accelerating Stable Diffusion with Intel OpenVINO

Пікірлер: 7
@cybermanaudiobooks3231
@cybermanaudiobooks3231 5 ай бұрын
Great info. Well presented. Learned a lot. Thanks Julien!
@juliensimonfr
@juliensimonfr 5 ай бұрын
You're welcome !
@bashintosh107
@bashintosh107 4 ай бұрын
Thank you so much ! Very clear and helpful ! Does the slides are available somewhere ?
@juliensimonfr
@juliensimonfr 4 ай бұрын
Sorry, no. You can find the graphs and tables in the research papers.
@itayatelis2898
@itayatelis2898 2 ай бұрын
Amazing! Julien do you have any plan on doing a distributed parallel algorithms? (Data par, inter layer, intra layer etc)?
@juliensimonfr
@juliensimonfr 2 ай бұрын
Thank you. I did cover some techniques in kzbin.info/www/bejne/sIu4Zn2gd6xknKs. Anything else you'd be interested in?
@itayatelis2898
@itayatelis2898 Ай бұрын
@@juliensimonfr Yes maybe touch on the concept of superposition?
Deep Dive: Quantizing Large Language Models, part 1
40:28
Julien Simon
Рет қаралды 9 М.
Parameter-efficient fine-tuning with QLoRA and Hugging Face
22:51
Julien Simon
Рет қаралды 2,5 М.
Schoolboy - Часть 2
00:12
⚡️КАН АНДРЕЙ⚡️
Рет қаралды 3,2 МЛН
Jumping off balcony pulls her tooth! 🫣🦷
01:00
Justin Flom
Рет қаралды 27 МЛН
Key Value Cache in Large Language Models Explained
17:37
Tensordroid
Рет қаралды 748
Large Language Models in Five Formulas
58:02
Sasha Rush 🤗
Рет қаралды 33 М.
Deploying Llama3 with Inference Endpoints and AWS Inferentia2
10:07
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)
15:51
Maarten Grootendorst
Рет қаралды 16 М.
Should You Use Open Source Large Language Models?
6:40
IBM Technology
Рет қаралды 348 М.
Train tabular models automatically with Hugging Face AutoTrain
11:03
Mapping GPT revealed something strange...
1:09:14
Machine Learning Street Talk
Рет қаралды 205 М.
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 18 МЛН
iPhone 15 Pro в реальной жизни
24:07
HUDAKOV
Рет қаралды 489 М.
Look, this is the 97th generation of the phone?
0:13
Edcers
Рет қаралды 8 МЛН
Это - iPhone 16!
16:29
Rozetked
Рет қаралды 203 М.