Pytorch Transfer Learning and Fine Tuning Tutorial

Рет қаралды 48,185

Күн бұрын

In this tutorial we show how to do transfer learning and fine tuning in Pytorch!
❤️ Support the channel ❤️
/ @aladdinpersson
Paid Courses I recommend for learning (affiliate links, no extra cost for you):
⭐ Machine Learning Specialization bit.ly/3hjTBBt
⭐ Deep Learning Specialization bit.ly/3YcUkoI
📘 MLOps Specialization bit.ly/3wibaWy
📘 GAN Specialization bit.ly/3FmnZDl
📘 NLP Specialization bit.ly/3GXoQuP
✨ Free Resources that are great:
NLP: web.stanford.edu/class/cs224n/
CV: cs231n.stanford.edu/
Deployment: fullstackdeeplearning.com/
FastAI: www.fast.ai/
💻 My Deep Learning Setup and Recording Setup:
www.amazon.com/shop/aladdinpe...
GitHub Repository:
github.com/aladdinpersson/Mac...
✅ One-Time Donations:
Paypal: bit.ly/3buoRYH
▶️ You Can Connect with me on:
Twitter - / aladdinpersson
LinkedIn - / aladdin-persson-a95384153
Github - github.com/aladdinpersson

Пікірлер: 54

@rodrigolopezportillo9225 3 жыл бұрын

Thank you! Even after having used PyTorch for some years there is always something new to learn in your videos :)

@speed-stick 3 жыл бұрын

Thank you so much man, this helped me understand some of the logics I was facing trouble with.

@ruotianzhang3139 3 жыл бұрын

Thank you so much! You set me free from the PyTorch documentation! Your videos deserve more thumb up!

@AladdinPersson 3 жыл бұрын

Thank you for the kind words! I believe we will never be free from the documentation, but maybe one day 🙃

@user-nv5vj4ws7y 2 жыл бұрын

Thanks, such a great video it is!

@victorsuarezvara5909 3 жыл бұрын

I'm from Barcelona and I love you, thanks for your work.

@AladdinPersson 3 жыл бұрын

Love you too

@soonapaana24 3 жыл бұрын

give this man a medal

@murtazajabalpurwala8124 2 жыл бұрын

very nice video

@yurig1756 4 жыл бұрын

Thanks!

@AladdinPersson 4 жыл бұрын

No problemo :)

@frankrobert9199 Жыл бұрын

great. great. great

@ale7sani 3 жыл бұрын

Thank you for a great explanation Aladdin. Just a quick question, regarding the model that you referred to in your video, the one with the 89% accuracy 8:15 . I couldnt find it in the previous video. Can you tell me which video were you referring to?

@aurkom 2 жыл бұрын

Same question.

@elifcerengok8274 2 жыл бұрын

I would like to know how the model understands to finetune only the classifier part by only writing: for param in model.parameters(): param.requires_grad = False ? Do I need to specify optimizer = optim.Adam(model.classifier.parameters(), lr=learning_rate) as it is to make sure to only finetune classifier layer ? Thanks :) 👍

@owenlie 2 жыл бұрын

I don't understand the average pooling part. why u changed it to identity? I've re-watch that part so many times but still got no idea why u did that

@canernm 2 жыл бұрын

Hi and thank you for the detailed video :) I do have one question: you showcase how we can freeze the entire VGG pretrained model. How can someone freeze part of the VGG model though? For example, leave the last 5 layers of the 'features' part trainable. I'd have to alter the for-loop you used so as to affect only the parameters corresponding to the layers I am interested in changing? Thanks.

@sdwysc 2 жыл бұрын

for i1,param in enumerate(net.parameters()): print(i1,param) if i1

@donfeto7636 Жыл бұрын

in line 31 we can freeze using model.parameters()[-2] this will give us 2 more layers to modify their parameters

@anastasiaberlianna1743 Жыл бұрын

Great! Could you make tutorial pretrained resnet34 for UNet encoder?

@yixinwu5797 2 жыл бұрын

I have a question, what situation it is when the accuracy is 53% 63% 89% respectively, when pretrained=False, does it mean VGG train from scratch?

@CHANDANSAH-ew9rk Жыл бұрын

Earlier fully connected layer took input of 25088 dimension , but now the fully connected layer is taking input of 512 dimension. so, the code should return error right? as adaptive.avrpool2d takes 512*7*7 as input and gives 512*7*7 as output

@jericho1751 3 жыл бұрын

Hi, I tried to run your code but I keep getting Runtime error: CUDA error: device-side assert triggered, in line 'model.to(device)', may I know what is the problem of that error? Thanks!

@user-mc9rt9eq5s 3 жыл бұрын

Thank you so much! I have a problem with my implementation. I am using a dataset of grey-scale images (256*256) with batch_size = 5 I got a run time error during the training loop >> RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 3, 3], but got 3-dimensional input of size [5, 256, 256] instead my model is pretty simple: model.avgpool = Identity() model.classifier = nn.Linear(512, 3) Any help?

@AladdinPersson 3 жыл бұрын

You need to .unsqueeze(1) so you have a channel dimension and the model we are using requires 3 channels as input (RGB) so if you're working with grayscale I would probably just duplicate the same channel three times by torch.repeat or something like that and then you can send it through your preferred model

@ximingdong503 3 жыл бұрын

hi bro, for the last part, just training last classifier, need we write this" optimizer = torch.optim.Adam(model.classifier.parameters()) " or just in your way or both of them are ok ?

@AladdinPersson 3 жыл бұрын

You can write model.parameters() if you make sure to disable gradients with requires_grad=False for the layers you don't want to train

@ramchandracheke Жыл бұрын

Sorry. I am still confused. Whether to use model.classifier.parameters or model.parameters.

@hizircanbayram9898 3 жыл бұрын

Can't we replace what you did with your Identity class with nn.Identity() from doc? Thank you.

@AladdinPersson 3 жыл бұрын

Yeah you absolutely can, and it would be a bit cleaner I would say! :)

@gorkemcanates594 3 жыл бұрын

should one use spyder or pycharm? What do you recommend? Thanks in advance

@prathmeshvishwakarma9027 3 жыл бұрын

It really depends on your personal choice imo some people like Pycharm, some like VSC some are fan of vim and emacs and some are happy with their Sublime text I have seen many tensorflow devs (including me) use jupyter notebooks too for Pytorch specially if you go with jupyter notebooks, it seems little bit messy but its not really of a big deal if you can understand your code :P Pycharm is a heavy ide with a lot of features which I don't think you'd ever need :P its overkill for small projects. As far as for Spyder it's used by less people, but you can give it a shot. Maybe you'll become of those rare guys who uses Spyder :O I personally use Sublime though

@codewithyouml8994 3 жыл бұрын

I did't understand the identity part, can u pls exa\plain it, what it does?

@eltonlobo8697 3 жыл бұрын

Hi, I used VGG11 model for the same dataset and using the pretrained weights I trained the model. Got 24.4% accuracy on the train and approx the same for the test set which is worse compared to model which I created which had 5 layers and accuracy of >50% in first epoch. Am I doing something wrong?

@AladdinPersson 3 жыл бұрын

You might just have to train for longer since it's a much larger network. Kinda hard to say if you've done anything wrong, there's a lot of factors involved

@eltonlobo8697 3 жыл бұрын

@@AladdinPersson I had used the learning rate scheduler inside the batch loop inside of epoch loop.

@mundeepcool 3 жыл бұрын

Is the vgg16 pretrained model 500 MB? Why does it occupy so much space?

@AladdinPersson 3 жыл бұрын

It has 138 million parameters :3

@user-qc3vf9uo9g 3 жыл бұрын

VGG16 has 5 max_pool and the conv_layers do not change the image shape, which will reduce the image size to 224/(2**5)=7. So, the input size for the first nn.Linear should be 512*7*7 = 25088 (we can see this in the original VGG16 at classifier[0] 4:15). While after changing the avg_pool to Identity, the first linear layer of nn.Linear(512, 10) worked, as in this video, but nn.Linear(512*7*7, 10) failed. I can't figure out the reason. Can you help me with this? ### Further experiment ### If not replace the avg_pool to Identity (i.e, only modify the classifier layers), then the first linear layer of nn.Linear(512*7*7, 10) worked but nn.Linear(512, 10) failed. This looks even stranger to me as it indicating the Identity layer shrinks the image size from 7*7 to 1*1. Really don't know why this happens.

@speed-stick 3 жыл бұрын

Regarding your first question, *nn.Linear(512, 10)* worked because the previous layer's output is 512x1x1... so the current layer's input must conform with it... the output of the current layer is irrelevant here. Of course *nn.Lienar(512x7x7,10)* wouldn't work because the input doesn't conform with the previous output... Regarding your second question, same explanation: *nn.Linear(512x7x7, 10)* works because the avgpool converts the size from 512x?x? to 512x7x7 ('?' because we don't care what it was. the avg pool guarantees what the final output is, regardless of the input). So if the output of avgpool is 512x7x7, the next layer's input must be 512x7x7, not 512.

@huseyinsenol1769 Жыл бұрын

PyTorch documentation tensor broadcasting

@CHANDANSAH-ew9rk Жыл бұрын

@@speed-stick Regarding ur comment on the second question " the avgpool converts the size from 512x?x? to 512x7x7 ", Please correct me if I am wrong but actually the avgpool converts the size from 512x?x? to 512x1x1. do check this video (kzbin.info/www/bejne/i6bOk3Vsppt2itE ) at 9 : 34 I think that clears your mistake The point is that, in this model, instead of average pool( pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html ) they used adaptive average pool( pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html )

@user-dk9td7kl8c 4 жыл бұрын

Can you please explain line 30, why it is like that?

@AladdinPersson 4 жыл бұрын

I'm assuming you're meaning by line 30 you're referring to nn.avgpool = Identity(), so essentially when pytorch coded vgg they separated it into different blocks, in this case they separated into three blocks and called them features, avgpool, classifier. We can either change the entire block, or we can modify a specific part of that block. If we do nn.avgpool = Identity(), we replace the entire block. If we would do nn.classifier[1] = Identity() we would change only a line of the classifier block. In this case since the avgpool is only one line we can just as well replace the entire thing. You might be wondering how the blocks are created, it's just when we perform nn.Sequential, so an example would be self.my_classifier_block = nn.Sequential(nn.Linear, ...)

@merouanechettat6272 2 ай бұрын

how to do fine tuning for specific layer, not feature extraction I say fine tuning to specific layers, model.features not classifier how to do that

@helimehuseynova6631 2 жыл бұрын

I got following error : mat1 and mat2 shapes cannot be multiplied (16x25088 and 512x256). Would anyone help me to solve this error?

@emilemagicien 2 жыл бұрын

Guess I'm a bit late, but the model.classifier line, you need to input 25088 instead of 512

@krv76 2 жыл бұрын

Why you didn't use nn.Identity instead of your Identity?

@SilentSage 3 жыл бұрын

Applied this to the resnet18 to train a dataset with 2 classes of film grain/noise. I have a dataset of 780 samples and trained on 600. The accuracy with pretrained=false is always around 50% (which is equal to when the model just guess the answer). May it be that i just have to less samples or may there be no thing to learn and it learns nothing that way? Anything i could change to prove if i does not learn but works correctly? What i did not understood, when i make pretrained = True, i always get 80-100%. But how is this possible? Like the ImageNet set consists only "real pictures" and i'm training on noisy grainy images, nothing close to the pretrained trained input. Are those real ≈ 90% accuracy? Or like asked above, how can i prove the accuracy of my model?

@AladdinPersson 3 жыл бұрын

Using a pretrained network will adapt to many different kinds of visual tasks because it learns to recognize important features in the image, although since your dataset is so small and the dataset seems relatively simple you should be able to overfit on the training dataset and get 100% training accuracy. Have you tried overfitting on a small batch first (and not doing Common Mistake #1)?

@SilentSage 3 жыл бұрын

Aladdin Persson i upgraded to around 4000 samples now. With Pretrained false i get results between 55-80% which seems like the networks learns something, but the high differences between the results of over 25% confuse me a little. Till now i made 20 Testruns and changed parameters like learning rate/epochs and batch size little, so maybe the fluctuation comes from there. With the bigger Dataset and Pretrained False the training accuracy almost matches the test accuracy, so i guess overfitting is no problem here anymore. When i still had around 700 Samples and prettained true i got around 10-20% more accuracy on the trainings set, which seemed like due to overfitting. Same, but little worse on small test sets of around 50-100 Samples. But never guessed to come so far till now. Thank you for your help! Your videos are the biggest help till now.

@SilentSage 3 жыл бұрын

Forgot to answer this, but yes, on smaller sets i got 100 percent (and around 70-80 on the test set)