Only pruning the network doesn't actually speed up the network. It actually slows down the inference latency. (cause most AI frameworks we use doesn't support sparse matrix calculation) The most important thing is to install sparsity calculation supported kernel in your system.
@kartikpodugu Жыл бұрын
Can pruning be done after completing the training ? For example, I have a pre-trained model. Using PTQ (Post Training Quantization), we Quantize the model, similarly can we do Post Training Pruning also ?