How to Quantize a ResNet from Scratch! Full Coding Tutorial (Eager Mode)

  Рет қаралды 1,083

Oscar Savolainen

Oscar Savolainen

Күн бұрын

Пікірлер: 15
@shashankshekhar7052
@shashankshekhar7052 8 ай бұрын
Amazing piece of work!
@mayaambalapat3671
@mayaambalapat3671 6 ай бұрын
Great video! and thanks for taking us through the thought process of coding each line.
@OscarSavolainen
@OscarSavolainen 6 ай бұрын
No problem, I’m glad it was useful!
@archieloong1118
@archieloong1118 6 ай бұрын
nice tutorial👍
@OscarSavolainen
@OscarSavolainen 6 ай бұрын
Thanks 🙂
@haianhle9984
@haianhle9984 9 ай бұрын
awesome project
@Alexandra-l6v9o
@Alexandra-l6v9o 2 ай бұрын
Thanks for video! I have 2 questions 1) when you renamed layers from .relu to .relu1 and .relu_out in model definition, wouldn't it affect the correct loading weights from the pretrained checkpoint? 2) what if the model has LeakyRelu instead of Relu or GroupNorm instead of BatchNorm - does it mean we can't fuse them with conv?
@OscarSavolainen
@OscarSavolainen 8 күн бұрын
For 1), fortunately ReLUs are stateless, so there aren't any parameters! As a result, it'll load fine. For 2), fusing of layers in eager mode is currently limited to ReLU, but if you use the more modern FX Graph mode quantization (I have a 3-part series on that if you want, and the doxs are great: pytorch.org/docs/stable/fx.html) that will fuse leaky ReLUs / PReLUs as well! If you really want to do it in Eager mode you can do it manually by writing some complicated wrapper, but I wouldn't recommend it (speaking as someone who did it once). A lot of hardware, e.g. Intel via OpenVINO, does support fusing of "advanced" activations and layers, so it's mainly just an Eager mode PyTorch limitatioN!
@YCL89
@YCL89 2 ай бұрын
if I want to quantize a layer, what kind of function I can use instead of PerChannelMinMax one?
@OscarSavolainen
@OscarSavolainen 8 күн бұрын
Really there are a handful of observers available, the main ones being PerChannelMinMax (best for weight tensors, since it gives per-channel resolution). A good one for activations is HistogramObserver, it calculates the MSE between the floating point and fake-quant activations and uses that to assign qparams, but it is slow. There are observers that allow one to fix qparams at given values (e.g. discuss.pytorch.org/t/fixed-scale-and-zero-point-with-fixedqparamsobserver/200306), but you can also do that via the learnable ones. Reading the observer docs is a good place to start! pytorch.org/docs/stable/quantization-support.html
@Chendadon
@Chendadon 5 ай бұрын
Thanks for that! Please make a tutorial quantizing aiming to tensorrt. And deploying to nvidia hw
@OscarSavolainen
@OscarSavolainen 8 күн бұрын
Sorry for the extremely slow reply, I ended up creating a whole open source project that allows one to deploy one's model to basically any hardware, including Tensor RT! github.com/saifhaq/alma Short answer is, torch.compile with the tensorRT backend is a good way to do it!
@haianhle9984
@haianhle9984 9 ай бұрын
can you introduce a benchmark between resnet and quant_resnet . Thank you so much
@OscarSavolainen
@OscarSavolainen 9 ай бұрын
Sure, I can add it in the future videos and onto the Github! Generally, ResNets are measured on top-1 accuracy (e.g. does it correctly classify any image to the correct class). So far I've only been dealing with one image, so top 1 accuracy isn't a great metric. I'll get some more validation data for future videos!
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Рет қаралды 5 МЛН
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН
Support each other🤝
00:31
ISSEI / いっせい
Рет қаралды 81 МЛН
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
小丑教训坏蛋 #小丑 #天使 #shorts
00:49
好人小丑
Рет қаралды 54 МЛН
7 Outside The Box Puzzles
12:16
MindYourDecisions
Рет қаралды 375 М.
GPTQ Quantization EXPLAINED
34:13
Oscar Savolainen
Рет қаралды 558
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,3 МЛН
Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)
20:43
Alpha Leaders
Рет қаралды 2,3 МЛН
Learn To Code Like a GENIUS and Not Waste Time
9:41
The Coding Sloth
Рет қаралды 1,9 МЛН
Understanding int8 neural network quantization
22:53
Oscar Savolainen
Рет қаралды 1,6 М.
5 Secrets to Stop Stuttering & Speak More Clearly!
12:44
Vinh Giang
Рет қаралды 59 М.
Fast Inverse Square Root - A Quake III Algorithm
20:08
Nemean
Рет қаралды 5 МЛН
Vim Tips I Wish I Knew Earlier
23:00
Sebastian Daschner
Рет қаралды 86 М.
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН