Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Рет қаралды 25,290

Күн бұрын

Пікірлер: 35

@thomasschmitt9669 9 ай бұрын

This was one of the best explanation videos I have ever seen! Well structured and right complexity grade to follow without getting a headache. 👌

@muhannadobeidat 8 ай бұрын

Excellent video. Well spoken. Nice visualizations.

@420_gunna 10 ай бұрын

This felt very nicely taught -- I loved that you pulled back a summary/review at the end of the video - great practice. Please continue, thank you!

@vineetkumarmishra2989 9 ай бұрын

wonderfully explained !! Thanks for the video.

@carlpeterson8279 5 ай бұрын

Great summary/outline at 17:16 This video covers a lot of relevant topics for neural networks and edge AI.

@jokmenen_ 9 ай бұрын

Awesome video!

@heteromodal 11 ай бұрын

What a great video! Thank you!

@huiwencheng4585 11 ай бұрын

Fantastic introduction and explanation !

@bonob0123 6 ай бұрын

that was really nicely done. as a non-expert, I feel like I can now have a great general idea of what a quantized model is. thank you

@ljkeller_yt 8 ай бұрын

Great format, succinctness, and diagrams. Thank you!

@RamBabuB-r9s Жыл бұрын

your teaches so excellent.. we accepted many more videos from your side to understand for the fundamental NLP

@kevon217 Жыл бұрын

@unclecode 10 ай бұрын

Great content, well done. Please make a video for ONNX, and another one for Flash Attention. Appreciate.

@kevon217 Жыл бұрын

Thanks for this!

@xuantungnguyen9719 3 ай бұрын

You are a good teacher

@AmishaHSomaiya 7 ай бұрын

Great summary, thank you.

@jeremyuzan1169 8 ай бұрын

Great video

@xiaoxiandong7382 2 ай бұрын

super clear

@MuhammadAli-dw7mv 7 ай бұрын

nicely done

@tosinadekunle646 3 ай бұрын

Thank you for the video Sir. So please, is quantization just about feature engineering task of data types enforcement of enforcing data types that take only small space? Or it is more than that?

@EfficientNLP 3 ай бұрын

I'm not sure if this is what you're asking, but model quantization is not related to feature engineering or enforcing data types; it is methods for making a model more space or compute-efficient after training.

@tosinadekunle646 3 ай бұрын

@EfficientNLP But looking at what you have shown in the examples there, it says changing data types to for example int8 for both the input features and the weights. Which i think can be done in a line of code like model.weight.torch.int8. Looking at that, it looks like we must try to ensure that the dataset is stored using a data type that utilizes less memory and to ensure that this is done before model training. What do you think sir?

@DurgaNagababuMolleti 6 ай бұрын

Superb

@hrsight 8 ай бұрын

nice video

@yunlu4657 11 ай бұрын

Excellent video, learnt a lot! However, the definition of zero-point quantization is off. What you're showing in the video is the abs-max quantization instead.

@EfficientNLP 11 ай бұрын

The example I showed is zero-point quantization because 0 in the original domain is mapped to 0 in the quantized domain (before transforming to unsigned). In abs-max (not covered in this video), the maximum in the original domain would be mapped to 127, and the minimum would be mapped to -128.

@ricardokullock2535 7 ай бұрын

And if one was to quantize a distilled model? Is the outcome any good?

@EfficientNLP 7 ай бұрын

Yes, these two techniques are often used together to improve efficiency.

@julians7785 16 күн бұрын

I heard multiply by 0 operations are faster to process. Are you sure all operations take the same speed?

@EfficientNLP 16 күн бұрын

Generally, for most instructions and most hardware, arithmetic takes a fixed amount of time per operation and does not get faster if the inputs are zeros. However, multiplying by 0 could be faster if the software logic checks for zeros to skip some operations, like in sparsity-aware methods.

@julians7785 16 күн бұрын

@@EfficientNLP Thanks for the reply!