Great video! and thanks for taking us through the thought process of coding each line.
@OscarSavolainen6 ай бұрын
No problem, I’m glad it was useful!
@archieloong11186 ай бұрын
nice tutorial👍
@OscarSavolainen6 ай бұрын
Thanks 🙂
@haianhle99849 ай бұрын
awesome project
@Alexandra-l6v9o2 ай бұрын
Thanks for video! I have 2 questions 1) when you renamed layers from .relu to .relu1 and .relu_out in model definition, wouldn't it affect the correct loading weights from the pretrained checkpoint? 2) what if the model has LeakyRelu instead of Relu or GroupNorm instead of BatchNorm - does it mean we can't fuse them with conv?
@OscarSavolainen8 күн бұрын
For 1), fortunately ReLUs are stateless, so there aren't any parameters! As a result, it'll load fine. For 2), fusing of layers in eager mode is currently limited to ReLU, but if you use the more modern FX Graph mode quantization (I have a 3-part series on that if you want, and the doxs are great: pytorch.org/docs/stable/fx.html) that will fuse leaky ReLUs / PReLUs as well! If you really want to do it in Eager mode you can do it manually by writing some complicated wrapper, but I wouldn't recommend it (speaking as someone who did it once). A lot of hardware, e.g. Intel via OpenVINO, does support fusing of "advanced" activations and layers, so it's mainly just an Eager mode PyTorch limitatioN!
@YCL892 ай бұрын
if I want to quantize a layer, what kind of function I can use instead of PerChannelMinMax one?
@OscarSavolainen8 күн бұрын
Really there are a handful of observers available, the main ones being PerChannelMinMax (best for weight tensors, since it gives per-channel resolution). A good one for activations is HistogramObserver, it calculates the MSE between the floating point and fake-quant activations and uses that to assign qparams, but it is slow. There are observers that allow one to fix qparams at given values (e.g. discuss.pytorch.org/t/fixed-scale-and-zero-point-with-fixedqparamsobserver/200306), but you can also do that via the learnable ones. Reading the observer docs is a good place to start! pytorch.org/docs/stable/quantization-support.html
@Chendadon5 ай бұрын
Thanks for that! Please make a tutorial quantizing aiming to tensorrt. And deploying to nvidia hw
@OscarSavolainen8 күн бұрын
Sorry for the extremely slow reply, I ended up creating a whole open source project that allows one to deploy one's model to basically any hardware, including Tensor RT! github.com/saifhaq/alma Short answer is, torch.compile with the tensorRT backend is a good way to do it!
@haianhle99849 ай бұрын
can you introduce a benchmark between resnet and quant_resnet . Thank you so much
@OscarSavolainen9 ай бұрын
Sure, I can add it in the future videos and onto the Github! Generally, ResNets are measured on top-1 accuracy (e.g. does it correctly classify any image to the correct class). So far I've only been dealing with one image, so top 1 accuracy isn't a great metric. I'll get some more validation data for future videos!