Pytorch ResNet implementation from Scratch

Рет қаралды 94,081

Күн бұрын

Пікірлер: 128

@prudvi01 3 жыл бұрын

Honestly I've put this video aside for a while because it was 30 minutes long but it didn't even feel like 30 minutes now that I've watched it. I now understand the architecture really well. Thank you!

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@codevacaphe3763 4 ай бұрын

@@doggydoggy578 It probably means you have a dimensional mismatch in some layers maybe the identity mapping ?

@alirezasadri4081 4 жыл бұрын

thanks for the video, a minor change that improves your code, is to implement residual mapping inside the class block. If you look at figure 2 from the paper, the definition of a block uses the mapping. Here you have put the mapping as part of the design of the network. This suggests that you a re very experienced with networks without skip connections and just changed them in your imagination rather than defining a block. :), still works, I know.

@AladdinPersson 4 жыл бұрын

Interesting viewpoint, am I understanding you correctly in that you would rather have the identity_downsample inside the init of the class block?

@alirezasadri4081 4 жыл бұрын

Hi, yes. so it moves from _make_layer to inside __init__ of block, but carefully

@1chimaruGin0_0 4 жыл бұрын

Thank for this tutorial. Need tutorial for EfficientNet

@AladdinPersson 4 жыл бұрын

Noted!

@amegatron07 Жыл бұрын

I love the idea of residual layers. Not taking math into account, on a higher level it intuitively seems useful, because with usual layers, the low-level information gets lost from layer to layer. But with skip-connections, we keep track of lower-level information, sort of. Unfortunately, I can't now remember the IRL-example to depict this, but in general it is the same: while constructing something high-level, we don't only need to see what we have at this high-level, but also need to keep track of some lower-level steps we're performing.

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@talha_anwar 3 жыл бұрын

This is my favorite series

@sourodipkundu8421 2 жыл бұрын

I didn't understand where did you implement the skip connection?

@terjeoseberg990 6 ай бұрын

“x += identity” is the skip connection. “identity” is set to the input at the top of the function, then added to the output at the end, thus skipping all the calculations in the middle.

@neelabhmadan6820 4 жыл бұрын

Excellent work!! Super fun

@ashwinjayaprakash7991 3 жыл бұрын

I didnot understand is indentity downsample the part where we skip connections ahead

@nomad1104 3 жыл бұрын

"x += identity" is the skip connection part. But downsampling is required if channels of identity and x does not match. Identity is taken first "identity = x". But the output of the resnet layer, "x" will have have channels = 4 times the identity channels. So downsampling just equalizes the number of channels in order for skip connection to be possible.

@ashwinjayaprakash7991 3 жыл бұрын

@@nomad1104 Thank you bro!

@kamikamen_official 7 ай бұрын

Will give this a look.

@Rafi-nc3nw 3 жыл бұрын

Sir, please suggest how anyone can reach at your coding level. Just how you have done the coding of ResNet netwrok is mindblowing!!!

@nuriyeakin1206 3 жыл бұрын

Can you make a video on implementing mask rcnn from scratch :)

@adesiph.d.journal461 3 жыл бұрын

Hello Aladdin, Great Videos. To appreciate your efforts and encourage you to make more! joined your Community. I was implementing this video and got stuck on making sense of the identity_downsample. I would really appreciate if you could spare some information on what exactly is the role of an identity_downsample.

@adesiph.d.journal461 3 жыл бұрын

My understanding is in residual networks with skip connections, the output is f(x) + x . We want the f(x) and x to be of the same dimensions. So to do that, we use an identity downsample on x to make sure they [f(x) and x] are of the same size?

@AladdinPersson 3 жыл бұрын

Appreciate the support 👊 Yeah you're exactly right, when running x through f(x) the shapes might not match in order to do the addition and we might need to modify it which we do through the identity_downsample function. I think coding ResNets could be done in a more clear way and I might revisit this implementation if there's a better way of implementing it

@adesiph.d.journal461 3 жыл бұрын

@@AladdinPersson thanks for coming back! You deserve all the support. Sure looking forward to see a newer implementation if you are going for it!

@anthonydavid6578 Жыл бұрын

i expect the whole reimplement that includes the dataset preprocessing, training code, visualization and so on; is there any of these videoes?

@Rohit-bs4zv 2 ай бұрын

Exactly

@MrMarcowally 4 жыл бұрын

like the previous comment said, please do an EfficientNet from Scratch

@AladdinPersson 4 жыл бұрын

Will look into that!

@abhisekpanigrahi1033 Жыл бұрын

Hello Aladdin , Can you please make video explaining the concept of _make_layer function. It is really confusing.

@saruaralam2723 3 жыл бұрын

@Aladdin Persson Could you also make a hands-on coding video of Efficient net

@ywy6810 Жыл бұрын

Thanks sir you are so kind

@jacky2476 3 күн бұрын

Thx.

@hengzhichen8932 3 жыл бұрын

Thank you for your tutoring Aladdin. Since block is not saved in ResNet Class. Can we delete block argument from ResNet __init__ and makerlayer.

@КириллНикоров 4 жыл бұрын

Thank you for tutorial. I have some questions about it: 1) Why do you use "identity = x" in your code? Is not it dangerous as identity and x in fact share the same memory after that? Do any reasons exists for not using " identity = x.clone() " ? 2) Don't you try to use the shortcut " x += identity " after the non-linearity ? I've read the article and can't understand exactly when the authors apply it: before or after, but for me it seems more reasonable to put it after ReLU following the equation H(x) = F(x) + x in the article. I've also read the PyTorch implementation of resnet model and understand that the scheme of your implementation is taken from there, so maybe you can explain me why it is more proper to do it in this way? My English is far away from fluent so I want to say that I don't mean to be rude in any point.

@AladdinPersson 4 жыл бұрын

Thanks for the comment and questions! I'll try my best to answer them. For your first question I do think you're correct to be cautious of doing these operations, dealing with pointers in general can be quite tricky and in this case I'm uncertain as well. I tried a few examples just to see what it does. If we would have a = torch.zeros(5) b = a c = a.clone() a[0] = 1 print(a) print(b) print(c) then it could cause issues if we would believe that b is a copy of a rather than pointing to the same memory. But if we change the shape of x by doing something like a = torch.zeros(5) b = a c = a.clone() a[0] = 1 a = torch.cat((a, torch.tensor([10.])), 0) print(a) print(b) print(c) They will no longer point to the same, and I guess this similar to this case because of the conv layers etc that are changing the shape. When I try and train the network using x.clone() or simply using x I obtain the same results. I do think you bring up a good point and it is more clear to use .clone(), in pytorch own implementation they use two different variables x and out to be clear and avoid this issue you bring up, and I will change the implementation on Github to use x.clone() instead. For your second question, in the paper they use the equation y = F(x) + x where F(x) is the residual block and x is the identity. After this they say they apply the non linearity on y, which is what we're doing in the code too. This is written in section 3.2 of the paper.

@КириллНикоров 4 жыл бұрын

@@AladdinPersson Thank you for your answer. I also checked 3 variants: assignment, clone () and copy_(), after I wrote the comment and they really seem to be equivalent in this case in the question of memory sharing, but the question of differences of the gradients calculation for these three approaches is not fully clear for me yet. I'm very grateful for your reference to the section. I really don't understand how I missed it. I may have been to focused on the idea that the shortcut is used to help the non-linear function to approximate the identity, so I thought we should add this identity after we got the final non-linear function, relu, of the current "block".

@nubi9315 2 жыл бұрын

@@КириллНикоров here, x = self.conv1(x) is creating new item, instead of changing x, so x now pointing to new area, and what was value of x before stays where identity points. Taking care of pointer assignment is a must, though here it is all fine.

@grahamastor4194 3 жыл бұрын

Any chance you could create a Tensorflow version of these advance network implementations? Many thanks, super useful videos.

@saivenkateshchilukoti7057 2 жыл бұрын

Hi Aladdin Persson, Can you please share the notebook of how to implement resnet from scratch using the full pre-activation bottleneck block? or please make a video regarding that. Thanks in advance

@jacobusjacobs76 3 жыл бұрын

Hi. Thank you for the video. Would you please help me to understand how I would adapt the implementation to create the ResNet34 and ResNet18 models? I tried but had no success.

@lalithaevani5942 10 ай бұрын

For the Stride part for down-sampling in each layer, in the paper, it is written to down-sample at conv3_1, conv4_1 and conv5_1. If I understand your code correctly does it mean that there are conv3_0, conv4_0, and conv5_0 and hence stride of 2 is applied to the second block in each layer?

@kdubovetskyi 2 жыл бұрын

Could someone explain, please *Why the expansion is hardcoded?*

@aneekaazmat6653 2 жыл бұрын

Hello , Your video was very interesting for me as I am just using resnet first time. But I have a question about how we can use it for audio classification, I have to do boundary detection in music files. My mel spectograms shape is not actually same for all files it is (80,1789) , (80,3356) , and so on. means the 2nd dimension is changing at ever song. so how can I use this kind of mel spectograms for RESNET? Can you pleas make a video for audio classification using RESNET?

@Alihamza-s1d Жыл бұрын

NotImplementedError: Module [ResNet] is missing the required "forward" function getting this error anyone can tell about it when i use def test(): net = ResNet152() x = torch.randn(2, 3, 224, 224) y = net(x).to(device) print(y.shape) test(

@陈俊杰-q4u 4 жыл бұрын

Hey! Aladdin, Nice tutorial! I have a question: when I omit this sentence >> if stride != 1 or self.in_channels != intermediate_channels*4: It also works, so I really don't know why add >> self.in_channels != intermediate_channels*4, Please kindly reply to me, THX!

@AladdinPersson 4 жыл бұрын

If I remember correctly I think this was needed for the ResNet models that we didn't implement. For ResNet50,101,152, we don't need this line, so it was a bit unnecessary that I included it in the video.

@陈俊杰-q4u 4 жыл бұрын

@@AladdinPersson Do you mean ResNet18 or 34 need this line?

@AladdinPersson 4 жыл бұрын

@@陈俊杰-q4u If I remember correctly, since for the resnets expect those two the intermediate_channels always expand by 4: 64 -> 256, 128->512, 256->1024 etc. I'd need to reread the paper again and check though.

@陈俊杰-q4u 4 жыл бұрын

@@AladdinPersson Yeah! I see, but my quesition is whether you add this condition or not, these sentences inside will work.

@jawher9 4 жыл бұрын

@@陈俊杰-q4u You should add the second statement because in the first Residual Layer your in_channels = 64 and after the first Residual Block they get expanded by 4 so 256 channels, however, the residual still has 64 channels therefor the shapes of the output and the residual mismatch. When you add the second statement, it gets corrected because the downsampler expands the residual channels by 4 i.e. 64*4=256.

@random-drops 3 жыл бұрын

Thanks for the tutorial. I just started to learn DL, and only recently did I come to learn this ResNet, particularly the ResNet9. I wonder how to apply this ResNet50/101/152 into training. Sorry for my dumbness.

@AladdinPersson 3 жыл бұрын

There are pretrained ResNet models available through PyTorch torchvision library that I would recommend that you use. You can read more about them here: pytorch.org/docs/stable/torchvision/models.html

@deepexplorationcode9456 4 жыл бұрын

Thanks a lot for the great tutorial. I've now understood how to program Resnet. Do you have any program to implement one of these architectures: Resnext, DenseNet, Mask R-CNN, YOLACT++

@AladdinPersson 4 жыл бұрын

I've got some plans for the next videos, but I'll take a look at these in the future and can make a video if I find any of them interesting :) Thanks for the suggestion!

@bradduy7329 3 жыл бұрын

can you explain how forward function can be run without calling it? E.g def forward(): forward()

@AladdinPersson 3 жыл бұрын

The call method inside the parent class nn.Module calls forward()

@LalitPandeyontube 2 жыл бұрын

I am trying to prune the residual blocks such that my resnet will have 3 residual blocks.. but I keep on getting an error with mat dimensions.

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@ArunKumar-sg6jf 2 жыл бұрын

l learnt how u applied pading 0 and padding = 1

@AbdulQayyum-kd3gf 4 жыл бұрын

Great tutorial. How can I used different layers features from pretrained models in pytroch as a fintune?

@AladdinPersson 4 жыл бұрын

I actually think I've made a video to answer this question: kzbin.info/www/bejne/p5KnlmOnhr9od7M. Maybe it helps you out. I think code would explain it for you better than I could in words so the code for the video can be found: github.com/AladdinPerzon/Machine-Learning-Collection/blob/804c45e83b27c59defb12f0ea5117de30fe25289/ML/Pytorch/Basics/pytorch_pretrain_finetune.py#L33-L54

@maomao1591 2 жыл бұрын

Thank you for your insightful explanation. But I'm confused with a part of this condition code " if stride != 1 or self.in_channels != intermediate_channels * 4". Why there is in_channels != intermediate_channels * 4. Could you help me ,thank you.

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@kevinyang6815 3 жыл бұрын

I wish I understood a single thing you did in this video

@giaphattram7932 4 жыл бұрын

Thank you, very effective tutorial. Please do Yolov3

@AladdinPersson 4 жыл бұрын

Yolo v3 or v4?

@giaphattram7932 4 жыл бұрын

I was not aware of v4 actually. Either would be good to learn and us viewers can practice to modify to other versions after. But I guess a tut video on v4 will stay current longer than v3 )

@rushirajparmar9602 4 жыл бұрын

@@AladdinPersson V4 would be great, I guess!

@minhajuddinansari561 Жыл бұрын

In the condition: if stride != 1 or self.in_channels != out_channels*4 shouldn't it instead be self.out_channels != in_channels*4 EDIT: Oh you clarified that out_channels is out_channels * expansion

@Sekharhimansu 4 жыл бұрын

where u have given test ..at that place how I can train this model on coco dataset..can u help me out??

@AladdinPersson 4 жыл бұрын

I'm not entirely sure what you mean by test in this scenario and training the model on the COCO dataset (can be used for object detection, caption etc) will depend on your use case. In the video we built the ResNet model for classification and I didn't want to spend unnecessary time on setting up a training loop etc, I have other videos if you want to learn more about that

@Sekharhimansu 4 жыл бұрын

@@AladdinPersson what i mean is as u have written this model from scratch but how to train this model on coco dataset?? I am a beginner so I am asking for the code for it...

@unknown3158 3 жыл бұрын

It's actually not hard to follow. I think using PyTorch makes it even easier since you get a better idea of what is going on. Btw, how did you manage to run PyTorch on Spyder? Whenever I do simply 'import torch', Spyder crashes for me, that is why I am using PyCharm with PyTorch.

@shambhaviaggarwal9977 3 жыл бұрын

pytorch worked normally for me in pycharm but not in other editors. later I found out there were issues with the installation of pytorch. i still don't understand how did pycharm work if the installation was not proper. for me, I installed wrong version of cuda.

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@doggydoggy578 Жыл бұрын

@@shambhaviaggarwal9977 Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@unknown3158 Жыл бұрын

@@doggydoggy578 You are trying to multiply 2 Matrices which are not compatible, i.e., the no of cols of A should be equal to number of rows of B. Check the dimensions.

@vijaypatneedi 4 жыл бұрын

Whats the best resources to learn pytorch ?

@AladdinPersson 4 жыл бұрын

It's an interesting question, I'll try to give my answer in two parts. First I believe the bottleneck in most cases isn't actually Pytorch but rather knowledge about machine learning / deeplearning itself. To learn the concepts I believe excellent resources are Machine Learning (great introductory course to ML on coursera) by Andrew Ng, Deeplearning Specialization also by Andrew ng. Following Cs231n, Cs224n from the online lectures and doing the assignments I think is a very efficient way to learn. After that I think like I am currently doing reading research papers, implementing those research papers and doing projects are ways to further develop. Now for the part of learning pytorch specifically I think reading the pytorch tutorials pytorch.org/tutorials/ is great, reading other peoples code/watching others code stuff (like I am doing in these videos) can be beneficial and reading old posts on pytorch forums is also beneficial. Most importantly I think it's about getting started coding in Pytorch. Remember I'm still learning a lot and don't consider myself having "learned" Pytorch, but those are my thoughts on your question currently. Hope that answers your question at least somewhat :)

@vijaypatneedi 4 жыл бұрын

@@AladdinPersson Thanks for a detailed response, do you think for a beginner it's better to stick with pytorch, than implementing in tensorflow keras as well? Which of them gives a good learning curve and strengthen the underlying concepts? How important is it to implement code from scratch vs transfer learning or using API calls

@AladdinPersson 4 жыл бұрын

I don't think it matters too much. Pick either and just stick with it, I wouldn't implenent everything in both. It seems to be the case that Pytorch allows for faster iterations and researchers tend to prefer it meanwhile tf is used for production. I like the Python way of coding so Pytorch is a natural choice, it's a very natural extension to normal Python. I think it's useful to read papers, understanding what they've done and implementing it. This is more practice getting into that mindset than the usefulness of implementing the model from scratch if you understand what I mean

@vijaypatneedi 4 жыл бұрын

@@AladdinPersson If you have time consider making a video about how you started to learn deep learning architectures and how do you do it on daily basis... And a few tips/suggestions for beginners. Because you explain things so beautifully ❤️

@doggydoggy578 Жыл бұрын

Omg I don't know what happens but no what why I try, the code return the same error : in forward(self, x) 33 print(x.shape,identity.shape) 34 print('is identity_downsamples none ?', self.identity_downsamples==None) ---> 35 x += identity 36 x = self.relu(x) 37 RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1 Help please I have re check my code multiple times and make sure it is exactly as yours but to no avail I can't make it to work. :( I run on Colab btw

@soumyagupta3924 Жыл бұрын

I am also getting same error

@activision4170 6 ай бұрын

I had the same error. The shape of the identity is not the same as x. You probably made a typo in the init function of the class block. Make sure all the parameters are the same. In my case, I accidently put padding=1 instead of padding=0 in conv3; which caused the output size to be different.

@doggydoggy578 6 ай бұрын

@@activision4170 thanks bro

@ZobeirRaisi 4 жыл бұрын

Thanks for Tutorial, Do you have any program to implement U-Net?

@AladdinPersson 4 жыл бұрын

I have not heard of U-Net before so I havn't unfortunately. Seems like an interesting architecture from reading the paper abstract. I'll add it to the list and if I got time I can do it :)

@glassylove 4 жыл бұрын

@@AladdinPersson +1 for an implementation of U-Net.

@danish551 3 жыл бұрын

+1 for UNet

@struggler5134 2 жыл бұрын

thanks your from china pytorch rookie

@erdichen984 3 жыл бұрын

The best ResNet tutorial ever, Thank you !!! if possible, please help us to make the tutorial about Siamese Network

@Idontknow-vh2hl 3 жыл бұрын

how to code resnet 18 and 34 ?

@doggydoggy578 Жыл бұрын

He answered it at the end of video. Watch carefullly before commenting.

@홍중택 2 жыл бұрын

Hi Aladdin! Thanks so much for a great content. I had a quick question at aroud 3:50 (calculating the padding). I'm looking at this formula [(W−K+2P)/S]+1 that people often use to calculate the output size, and tried letting W = 7, K = 3, S = 2 etc, but I just don't see how a P=3 would get us an output of 112. How can I calculate/estimate padding sizes from input and output sizes (+ kernel sizes, steps, etc)?

@berkgur868 2 жыл бұрын

Padding gets ceiling functioned from 2.5 to 3! this is the case with most of the odd numbered kernels :p, I believe caffe used to round down from 2.5 to 2 back in the day

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@krisnaargadewa4376 Жыл бұрын

why bias not TRUE?

@kostiantynbevzuk3807 Жыл бұрын

Sry probably for stupid question, but dont we need to pass stride as parameter in `class.block.conv1` and set padding to 1, and `block.conv2` to stride=1 and padding to 0 instead? Or am I missing something from original paper?

@zakariasenousy4551 4 жыл бұрын

Thanks for this. Very helpful :) Can you do for us an implementation of ensemble model of resnets and densenets?

@AladdinPersson 4 жыл бұрын

Are you looking for how the training structure would look like when we are training an ensemble of models?

@reemawangkheirakpam8165 3 жыл бұрын

awesome ..can you make a video on ensembling please

@Bunny-eh4ji 2 ай бұрын

Thank you for tutorial. You're a real mad lad for this.

@science.20246 Жыл бұрын

the final layer software or we keep fc and go forward ?

@gokulakrishnancandassamy4995 2 жыл бұрын

Are there cases where identity_downsample is actually None? Because at the end of every block (in each layer) we end up changing the number of channels. Could someone explain this?

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@OlegKorsak 2 жыл бұрын

u can do super().__init__()

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@rekindle9 3 жыл бұрын

very useful for beginning researchers who don`t know how to implement papers work!

@mohit723 3 жыл бұрын

please implement ResNeSt... pleaseeee

@mayank1334 4 жыл бұрын

Which environment are you using here?

@Dougystyle11 4 жыл бұрын

Thank you for the tutorial series, it's been great so far. I gotta say, ResNet implementation is trickier than it looks haha

@AladdinPersson 4 жыл бұрын

Thanks dougy I like your style ; ) Yeah ResNet was definitely the hardest one. Initially I thought Inception would be the hardest because I felt it was conceptually more difficult whereas the idea behind ResNet is super simple. But the implementation was totally the opposite

@abhishek-shrm 3 жыл бұрын

@@AladdinPersson Haha! Looks like Aladdin was waiting for the moment to go on full about how much time he spent figuring out the implementation of ResNet. By the way, great video as always.

@siddhu8224 3 жыл бұрын

I have watched only once but you explained really well. Right now working on some assignment hope this could help me. Thanks man. You lift my hopes on this Resnet. Thanks keep sharing knowledge.

@doggydoggy578 Жыл бұрын

Hey sorry for asking but do you get this error "RuntimeError: The size of tensor a (256) must match the size of tensor b (64) at non-singleton dimension 1" ?

@shivanshutyagi232 4 жыл бұрын

did u use manim??

@AladdinPersson 4 жыл бұрын

Yeah for the intro! :)

@shivanshutyagi232 4 жыл бұрын

@@AladdinPersson awsome, well thanks for the tutorial. It's pretty helpful. :)

@AladdinPersson 4 жыл бұрын

I really appreciate the kind feedback

@Ssc2969 Жыл бұрын

Hi, thanks a lot for this tutorial. This code is extremely helpful. If I use part of this code in my project and cite your GitHub link if my paper gets published, would that be, okay? Please let me know. Thanks!