Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

  Рет қаралды 25,290

Efficient NLP

Efficient NLP

Күн бұрын

Пікірлер: 35
@thomasschmitt9669
@thomasschmitt9669 9 ай бұрын
This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌
@muhannadobeidat
@muhannadobeidat 8 ай бұрын
Excellent video. Well spoken. Nice visualizations.
@420_gunna
@420_gunna 10 ай бұрын
This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!
@vineetkumarmishra2989
@vineetkumarmishra2989 9 ай бұрын
wonderfully explained !! Thanks for the video.
@carlpeterson8279
@carlpeterson8279 5 ай бұрын
Great summary/outline at 17:16 This video covers a lot of relevant topics for neural networks and edge AI.
@jokmenen_
@jokmenen_ 9 ай бұрын
Awesome video!
@heteromodal
@heteromodal 11 ай бұрын
What a great video! Thank you!
@huiwencheng4585
@huiwencheng4585 11 ай бұрын
Fantastic introduction and explanation !
@bonob0123
@bonob0123 6 ай бұрын
that was really nicely done. as a non-expert, I feel like I can now have a great general idea of what a quantized model is. thank you
@ljkeller_yt
@ljkeller_yt 8 ай бұрын
Great format, succinctness, and diagrams. Thank you!
@RamBabuB-r9s
@RamBabuB-r9s Жыл бұрын
your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP
@kevon217
@kevon217 Жыл бұрын
^
@unclecode
@unclecode 10 ай бұрын
Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.
@kevon217
@kevon217 Жыл бұрын
Thanks for this!
@xuantungnguyen9719
@xuantungnguyen9719 3 ай бұрын
You are a good teacher
@AmishaHSomaiya
@AmishaHSomaiya 7 ай бұрын
Great summary, thank you.
@jeremyuzan1169
@jeremyuzan1169 8 ай бұрын
Great video
@xiaoxiandong7382
@xiaoxiandong7382 2 ай бұрын
super clear
@MuhammadAli-dw7mv
@MuhammadAli-dw7mv 7 ай бұрын
nicely done
@tosinadekunle646
@tosinadekunle646 3 ай бұрын
Thank you for the video Sir. So please, is quantization just about feature engineering task of data types enforcement of enforcing data types that take only small space? Or it is more than that?
@EfficientNLP
@EfficientNLP 3 ай бұрын
I'm not sure if this is what you're asking, but model quantization is not related to feature engineering or enforcing data types; it is methods for making a model more space or compute-efficient after training.
@tosinadekunle646
@tosinadekunle646 3 ай бұрын
@EfficientNLP But looking at what you have shown in the examples there, it says changing data types to for example int8 for both the input features and the weights. Which i think can be done in a line of code like model.weight.torch.int8. Looking at that, it looks like we must try to ensure that the dataset is stored using a data type that utilizes less memory and to ensure that this is done before model training. What do you think sir?
@DurgaNagababuMolleti
@DurgaNagababuMolleti 6 ай бұрын
Superb
@hrsight
@hrsight 8 ай бұрын
nice video
@yunlu4657
@yunlu4657 11 ай бұрын
Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.
@EfficientNLP
@EfficientNLP 11 ай бұрын
The example I showed is zero-point quantization because 0 in the original domain is mapped to 0 in the quantized domain (before transforming to unsigned). In abs-max (not covered in this video), the maximum in the original domain would be mapped to 127, and the minimum would be mapped to -128.
@ricardokullock2535
@ricardokullock2535 7 ай бұрын
And if one was to quantize a distilled model? Is the outcome any good?
@EfficientNLP
@EfficientNLP 7 ай бұрын
Yes, these two techniques are often used together to improve efficiency.
@julians7785
@julians7785 16 күн бұрын
I heard multiply by 0 operations are faster to process. Are you sure all operations take the same speed?
@EfficientNLP
@EfficientNLP 16 күн бұрын
Generally, for most instructions and most hardware, arithmetic takes a fixed amount of time per operation and does not get faster if the inputs are zeros. However, multiplying by 0 could be faster if the software logic checks for zeros to skip some operations, like in sparsity-aware methods.
@julians7785
@julians7785 16 күн бұрын
@@EfficientNLP Thanks for the reply!
@nothingtoseehere5760
@nothingtoseehere5760 20 сағат бұрын
I am extremely grateful for this detailed explanation, but I am left with a lot of questions. Can I pm you?
@EfficientNLP
@EfficientNLP 16 сағат бұрын
Sure, happy to discuss over LinkedIn!
@andrea-mj9ce
@andrea-mj9ce 9 ай бұрын
The explanation for distillation remains at the surface, it is not enough to understand it
@EfficientNLP
@EfficientNLP 9 ай бұрын
If you have any specific questions I’ll try to answer them!
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
28:10
Efficient NLP
Рет қаралды 8 М.
How ChatGPT Cheaps Out Over Time
9:28
bycloud
Рет қаралды 42 М.
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 36 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 400 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 545 М.
Why LLMs Are Going to a Dead End Explained | AGI Lambda
14:46
AGI Lambda
Рет қаралды 3,6 М.
Pruning Deep Learning Models for Success in Production
24:35
Neural Magic
Рет қаралды 14 М.
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)
15:51
Maarten Grootendorst
Рет қаралды 22 М.
Better not Bigger: Distilling LLMs into Specialized Models
16:49
Snorkel AI
Рет қаралды 4,7 М.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 210 М.