How to Quantize a ResNet from Scratch! Full Coding Tutorial (Eager Mode)

Рет қаралды 1,083

Oscar Savolainen

Күн бұрын

Пікірлер: 15

@shashankshekhar7052 8 ай бұрын

Amazing piece of work!

@mayaambalapat3671 6 ай бұрын

Great video! and thanks for taking us through the thought process of coding each line.

@OscarSavolainen 6 ай бұрын

No problem, I’m glad it was useful!

@archieloong1118 6 ай бұрын

nice tutorial👍

@OscarSavolainen 6 ай бұрын

Thanks 🙂

@haianhle9984 9 ай бұрын

awesome project

@Alexandra-l6v9o 2 ай бұрын

Thanks for video! I have 2 questions 1) when you renamed layers from .relu to .relu1 and .relu_out in model definition, wouldn't it affect the correct loading weights from the pretrained checkpoint? 2) what if the model has LeakyRelu instead of Relu or GroupNorm instead of BatchNorm - does it mean we can't fuse them with conv?

@OscarSavolainen 8 күн бұрын

For 1), fortunately ReLUs are stateless, so there aren't any parameters! As a result, it'll load fine. For 2), fusing of layers in eager mode is currently limited to ReLU, but if you use the more modern FX Graph mode quantization (I have a 3-part series on that if you want, and the doxs are great: pytorch.org/docs/stable/fx.html) that will fuse leaky ReLUs / PReLUs as well! If you really want to do it in Eager mode you can do it manually by writing some complicated wrapper, but I wouldn't recommend it (speaking as someone who did it once). A lot of hardware, e.g. Intel via OpenVINO, does support fusing of "advanced" activations and layers, so it's mainly just an Eager mode PyTorch limitatioN!

@YCL89 2 ай бұрын

if I want to quantize a layer, what kind of function I can use instead of PerChannelMinMax one?

@OscarSavolainen 8 күн бұрын

Really there are a handful of observers available, the main ones being PerChannelMinMax (best for weight tensors, since it gives per-channel resolution). A good one for activations is HistogramObserver, it calculates the MSE between the floating point and fake-quant activations and uses that to assign qparams, but it is slow. There are observers that allow one to fix qparams at given values (e.g. discuss.pytorch.org/t/fixed-scale-and-zero-point-with-fixedqparamsobserver/200306), but you can also do that via the learnable ones. Reading the observer docs is a good place to start! pytorch.org/docs/stable/quantization-support.html

@Chendadon 5 ай бұрын

Thanks for that! Please make a tutorial quantizing aiming to tensorrt. And deploying to nvidia hw

@OscarSavolainen 8 күн бұрын

Sorry for the extremely slow reply, I ended up creating a whole open source project that allows one to deploy one's model to basically any hardware, including Tensor RT! github.com/saifhaq/alma Short answer is, torch.compile with the tensorRT backend is a good way to do it!

@haianhle9984 9 ай бұрын

can you introduce a benchmark between resnet and quant_resnet . Thank you so much

@OscarSavolainen 9 ай бұрын

Sure, I can add it in the future videos and onto the Github! Generally, ResNets are measured on top-1 accuracy (e.g. does it correctly classify any image to the correct class). So far I've only been dealing with one image, so top 1 accuracy isn't a great metric. I'll get some more validation data for future videos!