Understanding int8 neural network quantization

  Рет қаралды 1,655

Oscar Savolainen

Oscar Savolainen

Күн бұрын

Пікірлер: 6
@ShAqSaif
@ShAqSaif 11 ай бұрын
🔥
@none-hr6zh
@none-hr6zh 5 ай бұрын
Fake quantization means quantize and dequantize but how does it benifits? like we are converting float to int and int to float.can you please elobrate
@OscarSavolainen
@OscarSavolainen 7 күн бұрын
So it depends on quite a lot. For some uses cases, fake-quant is only useful for determining good quantization paramaters, and then one has to actually convert the model to a "true" quantized model to run on integer arithmetic. However, in certain cases, fake-quant in itself is the goal. For exmaple, certain hardware, e.g. Intel, often does the quantied operations in fake-quant space. There are other examples too that are a bit of a mix. During LLM inference, we typically use quantization to compress the weight tensors to pack multiple weight elements into as single int32 value, but the actual matrix multiplication happens in floating point. So the kernel dequantizes the compressed weight tensors on the fly, and does the matrix multiplication in floating point space, and quantization is just used to compress weight tensors to reduce the amount of data being loaded from GPU global memory,.
@none-hr6zh
@none-hr6zh 5 ай бұрын
What is fake quantization .why it is called fake?
@OscarSavolainen
@OscarSavolainen 7 күн бұрын
Because it doesn't actually convert the tensor to a different dtype, it stays as fp32. However, it simulates the effect of quantization via the quantization modules attached to weights and activations.
@Sara-gm6on
@Sara-gm6on 11 ай бұрын
"Promo SM" 😏
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН
Вопрос Ребром - Джиган
43:52
Gazgolder
Рет қаралды 3,8 МЛН
MLT __init__ Session #17: LLM int8
26:49
MLT Artificial Intelligence
Рет қаралды 2 М.
GPTQ Quantization EXPLAINED
34:13
Oscar Savolainen
Рет қаралды 557
22. Квантизация нейронных сетей. Иван Печенко
27:26
Samsung Innovation Campus
Рет қаралды 3,5 М.
How to statically quantize a PyTorch model (Eager mode)
23:55
Oscar Savolainen
Рет қаралды 1,5 М.
vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024
56:09
Neural Magic
Рет қаралды 1,7 М.
Структура моей базы знаний в Obsidian 2024 | создание и организация заметок
25:43
Иван Залевский | Системное обучение
Рет қаралды 18 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,3 МЛН
Quantization in Deep Learning (LLMs)
13:04
AI Bites
Рет қаралды 8 М.
GTC 2021: Systematic Neural Network Quantization
21:21
Amir Gholaminejad
Рет қаралды 3 М.
Вопрос Ребром - Джиган
43:52
Gazgolder
Рет қаралды 3,8 МЛН