Understanding int8 neural network quantization

But what is a neural network? | Deep learning chapter 1

Павел Плюснин - Квантование нейронных сетей

Вопрос Ребром - Джиган

Best friends forever (BFF) villains - funny moments! #trending #funny #megan

МАҒАН НАЗАР АУДАР - ҚЫЗЫҚ TIMES | Ақболат Өтебай, Мадина Оспан, Ақбота-Нұр | Қызық Live

Нужно переименовать Петропавловск в Кызылжар?

Understanding int8 neural network quantization

Рет қаралды 1,655

Oscar Savolainen

Oscar Savolainen

Күн бұрын

Пікірлер: 6

@ShAqSaif 11 ай бұрын

🔥

@none-hr6zh 5 ай бұрын

Fake quantization means quantize and dequantize but how does it benifits? like we are converting float to int and int to float.can you please elobrate

@OscarSavolainen

@OscarSavolainen 7 күн бұрын

So it depends on quite a lot. For some uses cases, fake-quant is only useful for determining good quantization paramaters, and then one has to actually convert the model to a "true" quantized model to run on integer arithmetic. However, in certain cases, fake-quant in itself is the goal. For exmaple, certain hardware, e.g. Intel, often does the quantied operations in fake-quant space. There are other examples too that are a bit of a mix. During LLM inference, we typically use quantization to compress the weight tensors to pack multiple weight elements into as single int32 value, but the actual matrix multiplication happens in floating point. So the kernel dequantizes the compressed weight tensors on the fly, and does the matrix multiplication in floating point space, and quantization is just used to compress weight tensors to reduce the amount of data being loaded from GPU global memory,.

@none-hr6zh 5 ай бұрын

What is fake quantization .why it is called fake?

@OscarSavolainen

@OscarSavolainen 7 күн бұрын

Because it doesn't actually convert the tensor to a different dtype, it stays as fp32. However, it simulates the effect of quantization via the quantization modules attached to weights and activations.

@Sara-gm6on 11 ай бұрын

"Promo SM" 😏

But what is a neural network? | Deep learning chapter 1

18:40

But what is a neural network? | Deep learning chapter 1

3Blue1Brown

Рет қаралды 18 МЛН

Павел Плюснин - Квантование нейронных сетей

28:39

Павел Плюснин - Квантование нейронных сетей

ODS AI Ru

Рет қаралды 1,9 М.

Вопрос Ребром - Джиган

43:52

Вопрос Ребром - Джиган

Gazgolder

Рет қаралды 3,8 МЛН

Best friends forever (BFF) villains - funny moments! #trending #funny #megan

0:53

Best friends forever (BFF) villains - funny moments! #trending #funny #megan

ZNAK

Рет қаралды 8 МЛН

МАҒАН НАЗАР АУДАР - ҚЫЗЫҚ TIMES | Ақболат Өтебай, Мадина Оспан, Ақбота-Нұр | Қызық Live

41:25

МАҒАН НАЗАР АУДАР - ҚЫЗЫҚ TIMES | Ақболат Өтебай, Мадина Оспан, Ақбота-Нұр | Қызық Live

Marat Oralgazin

Рет қаралды 579 М.

Нужно переименовать Петропавловск в Кызылжар?

0:33

Нужно переименовать Петропавловск в Кызылжар?

AIRAN

Рет қаралды 457 М.

MLT __init__ Session #17: LLM int8

26:49

MLT __init__ Session #17: LLM int8

MLT Artificial Intelligence

Рет қаралды 2 М.

GPTQ Quantization EXPLAINED

34:13

GPTQ Quantization EXPLAINED

Oscar Savolainen

Рет қаралды 557

22. Квантизация нейронных сетей. Иван Печенко

27:26

22. Квантизация нейронных сетей. Иван Печенко

Samsung Innovation Campus

Рет қаралды 3,5 М.

How to statically quantize a PyTorch model (Eager mode)

23:55

How to statically quantize a PyTorch model (Eager mode)

Oscar Savolainen

Рет қаралды 1,5 М.

vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024

56:09

vLLM Office Hours - FP8 Quantization Deep Dive - July 9, 2024

Neural Magic

Рет қаралды 1,7 М.

Структура моей базы знаний в Obsidian 2024 | создание и организация заметок

25:43

Структура моей базы знаний в Obsidian 2024 | создание и организация заметок

Иван Залевский | Системное обучение

Рет қаралды 18 М.

Transformers (how LLMs work) explained visually | DL5

27:14

Transformers (how LLMs work) explained visually | DL5

3Blue1Brown

Рет қаралды 4,3 МЛН

Quantization in Deep Learning (LLMs)

13:04

Quantization in Deep Learning (LLMs)

AI Bites

Рет қаралды 8 М.

GTC 2021: Systematic Neural Network Quantization

21:21

GTC 2021: Systematic Neural Network Quantization

Amir Gholaminejad

Рет қаралды 3 М.

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

26:52

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Snowflake Inc.

Рет қаралды 418 М.

Вопрос Ребром - Джиган

43:52

Вопрос Ребром - Джиган

Gazgolder

Рет қаралды 3,8 МЛН