PyTorch Autograd Explained - In-depth Tutorial

  Рет қаралды 100,738

Elliot Waite

Elliot Waite

5 жыл бұрын

In this PyTorch tutorial, I explain how the PyTorch autograd system works by going through some examples and visualize the graphs with diagrams. As you perform operations on PyTorch tensors that have requires_grad=True, you build up an autograd backward graph. Then when you call the backward() method on one of the output nodes, the backward graph gets traversed, starting at the node that the grad_fn attribute of the output node points to, and traversing backwards from there, accumulating gradients until leaf nodes are reached. The final leaf node gradients will be stored on the grad attribute of the leaf tensors.
This is my first PyTorch tutorial video. If you'd like to see more PyTorch related videos, let me know in the comments. And if you have anything specific about PyTorch that you would like me to make videos about, let me know.
The diagrams.net flowcharts shown in the video:
🔗 drive.google.com/file/d/1bq3a...
(Note: Click this link to go to the Google Drive file, then click the button in the top center that says "Open with diagrams.net", then once it's loaded, there will be tabs along the bottom of the diagrams.net page for all the different graphs shown in the video.)
Join our Discord community:
💬 / discord
Connect with me:
🐦 Twitter - / elliotwaite
📷 Instagram - / elliotwaite
👱 Facebook - / elliotwaite
💼 LinkedIn - / elliotwaite
🎵 ksolis - Nobody Else ( • ksolis - Nobody Else )

Пікірлер: 285
@weebchina9639
@weebchina9639 5 жыл бұрын
This is simply amazing man. It should be included as part of the official guide.
@pradeepadmingradalpharecord
@pradeepadmingradalpharecord 4 жыл бұрын
Hey I made an issue on github here (github.com/pytorch/tutorials/issues/1024#issue-637571028), regarding this, I am having some issues merging, so if someone could help, would be great.
@ankurgupta2806
@ankurgupta2806 4 жыл бұрын
Absolutely
@elliotwaite
@elliotwaite 3 жыл бұрын
@@pradeepadmingradalpharecord This looks like a great idea. I'll check it out on GitHub.
@israrkhan-bj3zh
@israrkhan-bj3zh 3 жыл бұрын
You said this is your first video on pytorch but its amazing. Looking forward to have more videos.
@zilianglin7417
@zilianglin7417 5 жыл бұрын
This video is really helpful. It teaches me not only the presupposition of the 60 minute Blitz but also an overview of the computational/backward graph.
@wanhonglau779
@wanhonglau779 4 жыл бұрын
this explanation is in-depth and original! can't find any other resources that explain pytorch autograd more in-depth than this
@chienyao8799
@chienyao8799 5 жыл бұрын
Thank you for this wonderful explaination! Looking forward to your subsequent PyTorch videos.
@ishanmishra3320
@ishanmishra3320 5 жыл бұрын
Been searching the net for a while about autograd but this is one of the most intuitive explanation with an awesome walkthrough. Thanks!
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks, Ishan!
@jiawei8319
@jiawei8319 4 жыл бұрын
This is awesome. I've been trying to understand the autograd of PyTorch via bared codes for a whole week and I think I learned more in the 10 minutes of your video. Thank you and keep creating.
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks for the encouragement, Jiawei!
@KSK986
@KSK986 4 жыл бұрын
Nice, crisp and clear explanation. Thanks for sharing this knowledge.
@jingbolin8835
@jingbolin8835 5 жыл бұрын
From Pytorch Forums, nice and detailed explanation for Pytorcher!!!
@xzl20212
@xzl20212 3 жыл бұрын
Brilliant. I super like your metaphors of colors for "leaves", dried-up leaves and magics😍
@GoodOldYoucefCef
@GoodOldYoucefCef 5 жыл бұрын
Thanks, your graphs made things clear to me!
@bitdribble
@bitdribble 4 жыл бұрын
I could finally understand a bit how this works. Thank you.
@ravihammond
@ravihammond 5 жыл бұрын
This is the exact explanation I needed to clear up autograd. Many thanks from Australia!
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks for the feedback. Glad you found it helpful.
@egecnargurpnar5732
@egecnargurpnar5732 4 жыл бұрын
Elliot, that was an marvelous video. You are explaining such a natural way and fast that you implemented that function. Thank you :)
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks, Cemal!
@rong5008
@rong5008 5 жыл бұрын
Woudl be amazing if you make more pytorch videos!!!!! this is simply the best explanation of pytorch autograd
@elliotwaite
@elliotwaite 5 жыл бұрын
Sirong Huang Thanks! Not sure how soon it will be, but I’ll probably make some more eventually.
@MithileshVaidya
@MithileshVaidya 4 жыл бұрын
Awesome introduction! Thanks a lot!
@mohammadhassanvali1942
@mohammadhassanvali1942 2 жыл бұрын
A really fantastic and thorough explanation of autograd. Thanks a lot!
@elliotwaite
@elliotwaite 2 жыл бұрын
Glad you liked it.
@m.y.s4260
@m.y.s4260 4 жыл бұрын
Nicest explanation for autograd on the internet! Thx!
@yassineouali1888
@yassineouali1888 5 жыл бұрын
Great introduction to autograd, thanks
@weiwang6706
@weiwang6706 4 жыл бұрын
That's so great and clear! Thanks Elliot!
@forresthu6204
@forresthu6204 Жыл бұрын
Mr. Waite makes very complex concept easy to understand, big thanks!
@elliotwaite
@elliotwaite Жыл бұрын
Thanks, Forrest!
@muhammadroshan7315
@muhammadroshan7315 5 жыл бұрын
So much information in such short time. I am amazed just simply startled :o
@interlingua2612
@interlingua2612 5 жыл бұрын
Wow this is what I exactly needed of! thank you so much!
@user-xh5gf2ee2t
@user-xh5gf2ee2t 5 жыл бұрын
Thank you sooo much!! Your first video is super helpful!!
@yusun5722
@yusun5722 2 жыл бұрын
Great video on the internals of PyTorch. Thanks.
@ijyotir
@ijyotir 4 жыл бұрын
Appreciate the tutorial. Please keep doing this.
@AlexeyMatushevsky
@AlexeyMatushevsky Жыл бұрын
That's magical! Some much effort to explain simple concept! As a result - I really feel like I really understood it. Thanks! Schema's in the hands of the master - priceless!
@elliotwaite
@elliotwaite Жыл бұрын
Glad you liked it!
@SachinKumar-js8yd
@SachinKumar-js8yd 4 жыл бұрын
Nice explanation bro !! THanks for this video.
@amitozazad1584
@amitozazad1584 3 жыл бұрын
This is splendid, highly impressive.
@parthasarathimukherjee7020
@parthasarathimukherjee7020 5 жыл бұрын
Great video! More pytorch tutorials please!
@elliotwaite
@elliotwaite 5 жыл бұрын
Partha Mukherjee, thanks! I’ll probably make some more soon.
@shantanuagarwal9660
@shantanuagarwal9660 2 жыл бұрын
Thanks a lot Elliot. Excellent work.
@gabrielcbenedito
@gabrielcbenedito 3 жыл бұрын
I'm so glad you decided to make this video... this explanation is just perfect! I just subscribed and hope to see more of those!
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks! I hope to make some more machine learning related videos soon.
@EdedML
@EdedML 2 жыл бұрын
Amazing explanation, thank you so much!
@jackperrin3852
@jackperrin3852 4 жыл бұрын
Awesome explanation. Really helped me understand how PyTorch implements back prop
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks, Jack!
@raufbhat8096
@raufbhat8096 3 жыл бұрын
Thanks for the amazing in-depth explanation. Please do more videos on PyTroch.
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks, Rauf!
@terohannula30
@terohannula30 2 жыл бұрын
Thanks! This kind of information I was looking for, I want to write own tiny autograd system and searched larger frameworks might have implemented it
@Sneha_Negi
@Sneha_Negi Жыл бұрын
would have never been able to understand such concept in just one video... thanks to you.
@elliotwaite
@elliotwaite Жыл бұрын
I'm glad you found it helpful.
@shashank3165
@shashank3165 4 жыл бұрын
The video is amazing, man. Subscribed and looking forward to other videos.
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks!
@AndPacheco34
@AndPacheco34 5 жыл бұрын
Thank you! It was very useful!
@akashupadhyay4373
@akashupadhyay4373 5 жыл бұрын
Great video , please make more video on pytorch
@scottvasquez9880
@scottvasquez9880 5 жыл бұрын
Great visualizations, dude!
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks!
@TalibHussain-ih1ev
@TalibHussain-ih1ev 2 жыл бұрын
That is extremely helpful understanding the concept of autograd and backward
@elliotwaite
@elliotwaite 2 жыл бұрын
Thanks. I'm glad you found it helpful.
@Flibber2
@Flibber2 5 жыл бұрын
Nice video, more machine learning/deep learning related materials please!
@elliotwaite
@elliotwaite 5 жыл бұрын
Coming right up. Thanks, Flibber2!
@bijonguha2299
@bijonguha2299 4 жыл бұрын
Superb explanation elliot. Thanks for making this video
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks, Bijon Guha!
@QiuzhuangLian
@QiuzhuangLian Жыл бұрын
Awesome torch `Tensor` tutorial, thank you so much.
@elliotwaite
@elliotwaite Жыл бұрын
I'm glad you liked it. Thanks for the feedback.
@user-kx3ig9nf6w
@user-kx3ig9nf6w 2 жыл бұрын
it's very clear. Thanks!
@corentinlingier3549
@corentinlingier3549 3 жыл бұрын
Awesome viz! Thanks a lot
@tungnguyendinh331
@tungnguyendinh331 3 жыл бұрын
Just an awesome explanation. Thank you very much
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks, Tung!
@BOURNE399
@BOURNE399 5 жыл бұрын
!One word: AWESOME!!!
@RohitKumarSingh25
@RohitKumarSingh25 5 жыл бұрын
best explanation of autograd available on youtube (Y)
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks, Rohit!
@andreazanetti01
@andreazanetti01 3 жыл бұрын
thanks for this video, really clarified a lot of things!
@elliotwaite
@elliotwaite 3 жыл бұрын
Glad it helped!
@egogo5675
@egogo5675 3 жыл бұрын
Excellent :) thank you so much
@RonItzikovitch
@RonItzikovitch 4 ай бұрын
Great video! loved the details and the examples
@elliotwaite
@elliotwaite 4 ай бұрын
Thanks!
@vigneshpadmanabhan
@vigneshpadmanabhan 3 жыл бұрын
Glad I got your video as a recommendation... it's amazing. ... Have been using tensorflow for a while and youtube wants me to learn pytorch ...
@elliotwaite
@elliotwaite 3 жыл бұрын
Haha. I used to use TensorFlow as well, but I was glad I made the switch to PyTorch. The main thing I like more about PyTorch is that the modularity feels cleaner. For example, the modules feel like legos made of legos, and when I want to try something new, it's easy to switch out any of those legos for my own custom one, regardless of if I'm working at the high level or the low level. And other parts of the framework, not just the modules, work similarly.
@Aditya_Kumar_12_pass
@Aditya_Kumar_12_pass 3 жыл бұрын
thank you. this was the best
@reemgody5492
@reemgody5492 3 жыл бұрын
totally amazing explanation. Thanks
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks, Reem!
@bobobopan2354
@bobobopan2354 4 жыл бұрын
excellent video!
@namanchindaliya1012
@namanchindaliya1012 5 жыл бұрын
Great explanation. Keep up good work.
@elliotwaite
@elliotwaite 5 жыл бұрын
naman chindaliya, thanks!
@snehotoshbanerjee1938
@snehotoshbanerjee1938 4 жыл бұрын
Excellent!
@gunjanmimo
@gunjanmimo 2 жыл бұрын
This video is awesome. Thank you for making this video.
@elliotwaite
@elliotwaite 2 жыл бұрын
I'm glad you liked it. Thanks for the comment.
@user-mv4oh8yp1y
@user-mv4oh8yp1y Жыл бұрын
Hi, Elliot Waite thank you so much for creating this, it's really helpful for me!
@elliotwaite
@elliotwaite Жыл бұрын
I'm glad you liked it.
@slavanikulin8069
@slavanikulin8069 3 жыл бұрын
Thanks, dude. Didn't understand the last example, will try again later
@elliotwaite
@elliotwaite 3 жыл бұрын
Sounds good. If you let me know which part of the last example you didn't understand, I could try to elaborate on it.
@BOURNE399
@BOURNE399 5 жыл бұрын
Also the ending bgm is a good picking!
@adhoc3018
@adhoc3018 3 жыл бұрын
Excellent explanation
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks!
@pabloo.o1912
@pabloo.o1912 Жыл бұрын
Great explanation!
@elliotwaite
@elliotwaite Жыл бұрын
Thanks, Pablo!
@atursams6471
@atursams6471 3 жыл бұрын
This is a good video. Thanks for making it.
@starlord7383
@starlord7383 3 жыл бұрын
Very informative!
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks!
@mayankpj
@mayankpj 4 жыл бұрын
Color analogy is wonderful :) A very nicely done video Elliot !! Well done.... would love to see more of your stuff .....
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks for the encouragement! I'm considering making more videos, just trying to figure out how to do so in a way that will best assist my long-term goals. What kind of content would you find most valuable, useful, or helpful?
@mayankpj
@mayankpj 4 жыл бұрын
@@elliotwaite From presentation perspective, I feel that one of the best ones i have seen out there are from 3blue1brown. I don't mean to say that its the only way. From content perspective, I would believe that videos explaining implementation of ML algorithms using Pytorch like SVM, Linear / logistic regression would be good to begin with. Videos explaining the fastai library's datablock and data loaders (in contrast with pytorch's stock versions) would be really helpful and be probably more viewed too. This can then also lead to explaining more advanced model architectures and contrast them like various vision architectures, nlp architectures especially as they are in most use.
@elliotwaite
@elliotwaite 4 жыл бұрын
@@mayankpj Nice. Thanks for the suggestions! I really like 3blue1brown too. I might have to just start experimenting with different video styles to see which are most rewarding.
@mayankpj
@mayankpj 4 жыл бұрын
@@elliotwaite Good Luck ..... my best wishes !
@DizzyDadProductions
@DizzyDadProductions 5 жыл бұрын
Subbed! I hope to see more content soon!
@elliotwaite
@elliotwaite 5 жыл бұрын
DizzyDad Reviews, thanks!
@hiiamlawliet480
@hiiamlawliet480 5 жыл бұрын
GREAT!!!
@JP-vo7zh
@JP-vo7zh 4 жыл бұрын
Thanks a lot!
@ShahabShokouhi
@ShahabShokouhi 5 ай бұрын
Dude! you are the best.
@elliotwaite
@elliotwaite 5 ай бұрын
Thanks!
@MrDarkNightmare666
@MrDarkNightmare666 3 жыл бұрын
after a week trying to understand autograd , now it's clear !! well, clear maybe too much but it doesn't seem impossible
@elliotwaite
@elliotwaite 3 жыл бұрын
Glad to hear this video helped.
@eladjohn491
@eladjohn491 3 жыл бұрын
This is the best video in the world.
@ghaliahmed
@ghaliahmed 3 жыл бұрын
SUPER GOOD VIDEO!!!!!!!!!!!!!!!!
@dingusagar
@dingusagar 2 жыл бұрын
Great stuff
@elliotwaite
@elliotwaite 2 жыл бұрын
Thanks!
@user-ys2nd2bg6r
@user-ys2nd2bg6r Жыл бұрын
Great Video
@elliotwaite
@elliotwaite Жыл бұрын
Thanks!
@christopherross9214
@christopherross9214 11 ай бұрын
this is exactly what I needed. been pulling my hair out. thank you.
@alexanderlewzey1102
@alexanderlewzey1102 4 жыл бұрын
v good explanation
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks, Alexander!
@user-gc5bw7zu2i
@user-gc5bw7zu2i 5 жыл бұрын
awesome
@pierreeugenevalassakis8897
@pierreeugenevalassakis8897 5 жыл бұрын
That's a great walk-through for autograd, thanks!! Something that would be nice for the next video would be if you have any tricks to make sure the graph is setup correctly and the gradients propagate where you want them to ect... for instance how to use crayon, which I personally haven't really got into yet, but I'm hearing is good.
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks for the suggestion, that's a good idea. I haven't tried logging my PyTorch training in TensorBoard yet, but perhaps I'll make a video about how to do it once I learn more about it. Also, here's the link to the draw.io diagrams I made for this video if you're still interested, regarding your earlier comment: drive.google.com/file/d/1bq3akhmA5DGRCiFYJfNPSn7il2wvCkEY/view?usp=sharing
@pierreeugenevalassakis8897
@pierreeugenevalassakis8897 5 жыл бұрын
Nice, thanks! I realised it was something like draw.io upon re-watching, in the beginning I though it was some sort of interactive front-end to a backend pytorch model!
@lucanina8221
@lucanina8221 Жыл бұрын
This video saved me daysssss
@elliotwaite
@elliotwaite Жыл бұрын
I'm glad it helped.
@maidy199
@maidy199 4 жыл бұрын
Thank you for the great explanation. I would appreciate if you can create another one for recurrent architectures. I am having a lot of trouble with creating novel recurrent networks due to the retain_grad problems.
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks for the video suggestion. I may not make a video on that subject soon, but for recurrent architectures, you shouldn't need to use retain grad since the weights are leaf nodes and they will accumulate the gradients even if those weights are used in multiple places in the network, which is what you do in recurrent architectures.
@kalekalekale
@kalekalekale 5 жыл бұрын
Beautiful video and great for a beginner level understanding. The explanation of your color choice made me giggle. One conceptual question I'm struggling with: When you wish to not update (freeze) parts of the network, the recommended solution is to set requires_grad to False. I would like to clarify that all this does is avoid unnecessary computation and storage of gradients at those nodes. However, the node will still contain a grad_fn (if it has one) so that in the backward pass, the gradient from that node is still technically passed backward and the chain rule is still maintained? Another solution that was recommended is to not send the parameters you wish to freeze to the optimizer function. However, some recommend to set requires_grad to False as well to save memory storage.
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks. Great question. I should have shown in the video what happens when you freeze nodes, but I'll try to explain here. You can actually only set requires_grad = False on leaf nodes, and leaf nodes don't have grad_fn values, it's the intermediate branch nodes (non-leaf nodes) that have the grad_fn values. So for example: x = torch.tensor(1.0) weight_1 = torch.tensor(2.0, requires_grad=True) weight_2 = torch.tensor(3.0, requires_grad=True) branch_node_1 = x * weight_1 branch_node_2 = branch_node_1 * weight_2 Here branch_node_2's grad_fn will be a MulBackward object that passes the gradient along to an AccumulateGrad object for weight_2 and also passes the gradient along to branch_node_1's MulBackward object, which then passes the gradient to the AccumulateGrad object for weight_1. If we freeze weight_2: x = torch.tensor(1.0) weight_1 = torch.tensor(2.0, requires_grad=True) weight_2 = torch.tensor(3.0, requires_grad=True) weight_2.requires_grad = False branch_node_1 = x * weight_1 branch_node_2 = branch_node_1 * weight_2 This will be the same as above, except branch_node_2's MulBackward won't pass the gradient along to the AccumulateGrad for weight_2, it will only pass the gradient along to branch_node_1's MulBackward, which then passes the gradient along to the AccumulateGrad for weight_1. However, if we freeze weight_1: x = torch.tensor(1.0) weight_1 = torch.tensor(2.0, requires_grad=True) weight_2 = torch.tensor(3.0, requires_grad=True) weight_1.requires_grad = False branch_node_1 = x * weight_1 branch_node_2 = branch_node_1 * weight_2 Then branch_node_1 will not have a grad_fn at all because none of the nodes going into it require a gradient. In fact, branch_node_1 will become a leaf node. And branch_node_2's grad_fn will be a MulBackward that only passes the gradient along to the AccumalteGrad for weight_2. And if we try to freeze one of the branch nodes that has a grad_fn: x = torch.tensor(1.0) weight_1 = torch.tensor(2.0, requires_grad=True) weight_2 = torch.tensor(3.0, requires_grad=True) branch_node_1 = x * weight_1 branch_node_1.requires_grad = False branch_node_2 = branch_node_1 * weight_2 We get the following error: "RuntimeError: you can only change requires_grad flags of leaf variables. ..." So using the tree analogy again, you can only freeze green leaves. And freezing a green leaf is like changing it to a yellow leaf, and any brown branches that lead to only yellow leaves will actually also become yellow leaves. And the backward graph that gets created will only be for the brown branches that lead to green leaves. So when you freeze part of the graph, the intermediate nodes (branches) will update their grad_fn objects to not accumulate gradients for those frozen nodes (which are now yellow leaves), and if all of the inputs to a branch node don't require gradients (are all yellow leaves), that branch node won't have a grad_fn at all, and will become a leaf node that doesn't require a gradient (a yellow leaf), and it won't be involved in the backward graph. A new backward graph is created with each forward pass, so when you change the requires_grad values of any of the nodes involved in the forward graph, it will also change the structure of the backward graph that gets created. This change may just be that some of the AccumulateGrad nodes are not created, but in some cases, larger parts of the backward graph may be omitted if they aren't needed, such as when you freeze the early nodes in a graph. So using the requires_grad = False method to freeze nodes is better than just not passing those nodes to the optimizer, because setting requires_grad = False will be like pruning the backward graph, reducing both memory usage and computation time when the backward gradients are computed. I hope this helps clarify what happens when you freeze nodes. If there is any part that you'd like me to clarify further, let me know.
@kalekalekale
@kalekalekale 5 жыл бұрын
​@@elliotwaite Thank you very, very much for the detailed and thoughtful reply, it was extremely helpful. I think the mental block I had was that I was failing to separate the weights (parameters) from the computation nodes, and I thought that setting requires_grad = False in a network.parameter() loop was occurring on the weights AND computation nodes. But as you demonstrated, that is invalid for branches like Mul, and I was considering the computations as parameters. With the tree/leaf/branch analogy (new-ish to Comp. Sci.), it all seems so simple and easy to visualize now. Do I need to set requires_grad = False in every loop? I've read that you do not and can do it once before training (and should remember to set to True after), but your comment of "A new backward graph is created with each forward pass" leads me to think otherwise.
@elliotwaite
@elliotwaite 5 жыл бұрын
@@kalekalekale You only need to call requires_grad = True once to freeze the nodes. The backward graph that gets recreated each time is the collection of blue nodes, the MulBackward and AccumulateGrad nodes. The frozen leaf nodes are not recreated and will retain their attribute values.
@kalekalekale
@kalekalekale 5 жыл бұрын
@@elliotwaite Thanks again! You have my vote for more PyTorch content! Cheers.
@user-fl3ky1ce1y
@user-fl3ky1ce1y Жыл бұрын
My english skills are bad, but this explanation is ultra-understandable🤝
@elliotwaite
@elliotwaite Жыл бұрын
Good to know, thanks for the feedback.
@AbhishekSinghSambyal
@AbhishekSinghSambyal 5 жыл бұрын
Enjoyed a lot. Thanks. May I know which tool did you use to create those figures? Dia?
@elliotwaite
@elliotwaite 5 жыл бұрын
Abhishek Singh Sambyal, Thanks! I used this: www.draw.io
@user-vm9hl3gl5h
@user-vm9hl3gl5h Жыл бұрын
tensor.detach() is used when we do not need to keep the gradient in that specific tensor. There are also other alternatives, such as .item(), .tolist(), etc. A typical example is the value generated by target-NN within the compute_loss part in the RL literature.
@elliotwaite
@elliotwaite Жыл бұрын
Thanks for sharing this info.
@Red_Fox_Miro
@Red_Fox_Miro 11 ай бұрын
great!
@prakhars962
@prakhars962 2 жыл бұрын
cheers mate.
@elliotwaite
@elliotwaite 2 жыл бұрын
cheers
@OttoFazzl
@OttoFazzl 5 жыл бұрын
This is a great video! Can I ask what software you used to create schematics? They look good!
@elliotwaite
@elliotwaite 5 жыл бұрын
a a, thanks. For the flowcharts I used this: www.draw.io/
@ridfieldcris4064
@ridfieldcris4064 5 жыл бұрын
Very intuitive and straight forward explanation, I am looking forward to further PyTorch video, any update will pop-up in the future?
@elliotwaite
@elliotwaite 5 жыл бұрын
Ridfield Cris, thanks! I’m not sure when I’ll make another PyTorch video, but if I do I’ll post it here on this channel, so if you’re subscribed, the video should show up in your subscriptions (if that’s what you were asking about).
@ridfieldcris4064
@ridfieldcris4064 5 жыл бұрын
Thank you for replying, I will definitely subscribe.@@elliotwaite
@Jimmy-et1bp
@Jimmy-et1bp 3 жыл бұрын
That was a great video. Have a quick question. Even though I do not use retain_grad, I still can use 'print(i.grad)' to check the i's gradient value. So what is the point of the retain_grad() attribute?
@elliotwaite
@elliotwaite 3 жыл бұрын
Thanks. The retain_grad() method is for intermediate nodes (non-leaf nodes that are part of your computation graph). For example: a = torch.tensor(2.0, requires_grad=True) b = torch.tensor(3.0) c = a * b d = torch.tensor(4.0) e = c * d e.backward() print(a.grad) # Will print: tensor(12.) print(b.grad) # Will print: None print(c.grad) # Will print: None print(d.grad) # Will print: None print(e.grad) # Will print: None ... in this example, a, b, and d are leaf nodes, and c and e are intermediate nodes. If you call retain_grad() on c and e they will retain their gradient from the backward pass. For example: a = torch.tensor(2.0, requires_grad=True) b = torch.tensor(3.0) c = a * b c.retain_grad() d = torch.tensor(4.0) e = c * d e.retain_grad() e.backward() print(a.grad) # Will print: tensor(12.) print(b.grad) # Will print: None print(c.grad) # Will print: tensor(4.) print(d.grad) # Will print: None print(e.grad) # Will print: tensor(1.) And calling retain_grad() on the a tensor will have no effect since it is a leaf node that already has requires_grad set to True, and since b and d do not require a gradient, if you try to call retain_grad() on them it will raise the following error: "can't retain_grad on Tensor that has requires_grad=False". I hope this clarifies where it might be useful.
@alalaben
@alalaben 5 жыл бұрын
this video is great, do you have plan to make more like this?
@elliotwaite
@elliotwaite 5 жыл бұрын
yang yuan, thanks! Not sure yet what future videos I’ll make.
@intisarchy7059
@intisarchy7059 3 жыл бұрын
Thanks Elliot for amazing explanation. I have a question, recently I froze the classifier part of VGG-16 network (last 6 layers) and trained the network features parts and received no error. Rather the accuracy and training curve kept on improving. Does it mean that the backward graph was there the whole time despite of setting them to false?
@elliotwaite
@elliotwaite 3 жыл бұрын
Freezing network parameters just means the gradients to those tensors will not be computed, but the gradients to any previous layers that require a gradient will still be computed and passed along. For example, let's say I have W1 as my first layer of weights and I freeze W2, my second layer of weights, then I multiply my initial input (some image data for example) by W1 to get my output from layer 1 and then I multiply that output by W2 to get my output from layer 2, and then I call .backward() on that output. The backward gradient will start at the output from layer 2 and it will check which of its inputs require a gradient and it will see that the input that was the output from layer 1 does require a gradient, but the input from W2 does not, so it will only send the backward gradient to the output from layer 1. It will then check which of the inputs to layer 1 require a gradient and it will see that the initial input (the image data) does not require a gradient and that the input value of W1 does require a gradient, so it will pass the gradient along to W1 where it will get stored on the W1 tensor's .grad property. So freezing the parameters in a layer just means those parameters are considered constants, so the backward gradients won't be passed to those parameters, but the gradient will still get passed along to any previous layers that do require a gradient. As another example, if you think of your network as an actual tree (where the leaves are the inputs and the trunk is the output), when you freeze some of the output layers, you aren't actually freezing the lower half of the tree, you are only freezing the branches on that lower half, and the water (the backward gradients) will still go up the trunk to the unfrozen branches at the top of the tree. If, however, you freeze the top of the tree, you do actually freeze the entire top of the tree, trunk and all, because there will be no previous inputs that require a gradient, and the backward graph will only go back as far as gradients are required. I hope this helps.
@intisarchy7059
@intisarchy7059 3 жыл бұрын
@@elliotwaite The explanation clears everything. Really looking forward to learning more from your video. Thanks.
@robinranabhat3125
@robinranabhat3125 2 жыл бұрын
Hi Elliot. Thanks for this !! . Could you be able to explain on how would these computational graphs diagrams would extend for the higher order derivatives. I am hopelessly trying to figure out and cannot find the answer.
@elliotwaite
@elliotwaite 2 жыл бұрын
The first time through the forward graph you get a backward graph that can be used to compute the first-order derivatives. You can then traverse that backward graph to create a second backward graph that can be used to compute the second-order derivatives. And so on again and again for higher-order derivatives. For example, let's say our forward graph is the same as in the example: a = torch.tensor(2.0, requires_grad=True) b = torch.tensor(3.0, requires_grad=True) c = a * b d = torch.tensor(4.0, requires_grad=True) e = c * d This will create a backward graph, which we can then traverse to calculate the first-order derivatives with respect to `e` by calling: [a_grad_e, b_grad_e, d_grad_e] = torch.autograd.grad(e, [a, b, d], create_graph=True) Calling that will be similar to running this code (where `a_grad_e` means the gradient of `a` with respect to `e`): // Same as above, except we don't need to calculate `e` so that line is commented out. a = torch.tensor(2.0, requires_grad=True) b = torch.tensor(3.0, requires_grad=True) c = a * b d = torch.tensor(4.0, requires_grad=True) // e = c * d // Plus this part. e_grad_e = torch.tensor(1.0) d_grad_e = e_grad_e * c // Which equals: 1 * (a * b) = 1 * (2 * 3) = 6 c_grad_e = e_grad_e * d // Which equals: 1 * d = 1 * 4 = 4 b_grad_e = c_grad_e * a // Which equals: 4 * a = 4 * 2 = 8 a_grad_e = c_grad_e * b // Which equals: 4 * b = 4 * 3 = 12 You can see that the extra part is generated by just starting at `e` in the original graph and working backward applying the derivative rules. And if we ran that code it would create another bigger backward graph, which we could then traverse to calculate the second-order derivatives with respect to `a_grad_e` by calling: [ a__grad__a_grad_e, b__grad__a_grad_e, d__grad__a_grad_e, ] = torch.autograd.grad(a_grad_e, [a, b, d], allow_unused=True) Calling that will be similar to running this code (here `_grad_` is used for first derivatives and `__grad__` for second derivatives): // Same as above, except I commented out the lines we don't need. // a = torch.tensor(2.0, requires_grad=True) b = torch.tensor(3.0, requires_grad=True) // c = a * b d = torch.tensor(4.0, requires_grad=True) // e = c * d e_grad_e = torch.tensor(1.0) // d_grad_e = e_grad_e * c c_grad_e = e_grad_e * d // b_grad_e = c_grad_e * a // a_grad_e = c_grad_e * b // Plus this part. Again we start with `a_grad_e` and work backward. a_grad_e__grad__a_grad_e = torch.tensor(1.0) b__grad__a_grad_e = a_grad_e__grad__a_grad_e * c_grad_e // Which equals: 1 * (e_grad_e * d) = 1 * (1 * 4) = 4 c_grad_e__grad__a_grad_e = a_grad_e__grad__a_grad_e * b // Which equals: 1 * b = 1 * 3 = 3 d__grad__a_grad_e = c_grad_e__grad__a_grad_e * e_grad_e // Which equals: 3 * 1 = 3 So the second-order gradients we end up with are: a__grad__a_grad_e = None (it wasn't reached in this backward graph) b__grad__a_grad_e = 4 d__grad__a_grad_e = 3 You can confirm those values by running this example: import torch a = torch.tensor(2.0, requires_grad=True) b = torch.tensor(3.0, requires_grad=True) c = a * b d = torch.tensor(4.0, requires_grad=True) e = c * d [a_grad_e] = torch.autograd.grad(e, [a], create_graph=True) [ a__grad__a_grad_e, b__grad__a_grad_e, d__grad__a_grad_e, ] = torch.autograd.grad(a_grad_e, [a, b, d], allow_unused=True) print(a__grad__a_grad_e) print(b__grad__a_grad_e) print(d__grad__a_grad_e) Hope that helps.
@samyamr
@samyamr 5 жыл бұрын
Thank you for explaining the AutoGrad with such clarity. I am also wondering how you generated such beautiful graphs. Could you share the tool you used?
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks! The graphs were made with www.draw.io.
@samyamr
@samyamr 5 жыл бұрын
@@elliotwaite I am also wondering, is it possible to modify the backward graph after its construction? For instance, would it be possible to modify it in a way to just compute the gradients of the activations in a neural network, while computing the gradients of the parameters at a later phase.
@elliotwaite
@elliotwaite 5 жыл бұрын
@@samyamr I'm not aware of a way to modify the backward graph after its constructed, but you could split up the graph during the forward pass using the detach method. Something like: # Keep a copy of the original activations around that are attached to the first graph. activations_separate_graph = activations # Create a copy of the activations that are detached from the graph. activations = activations.detach() # Set `requires_grad` to True to start a new graph from these activations onward. activations.requires_grad = True ... # Then you could get the gradients of the activations in the first backward call. loss.backward() # And then the gradients of the variables in a second backward call, passing alow the gradients from the first. acivations_separate_graph.backward(activations.grad) That's the idea at least, but I haven't tested this code so there might be errors in it.
@timharris72
@timharris72 4 жыл бұрын
​​@​Elliot Waite , I noticed that you always use 1 as the gradient when you go backwards from the end of the graph. Does PyTorch always start with a gradient of 1 in all situations or is this just something you used as an example? I'm guessing this is what PyTorch does but I'm not sure. This is the best explanation of autograd in PyTorch I have seen. Thanks for posting the video. I feel like I know PyTorch a little better.
@elliotwaite
@elliotwaite 4 жыл бұрын
When you call the .backward() method on a tensor, you can pass in a gradient argument to specify a certain gradient to start with, otherwise, the default starting gradient of 1 is used, which means that you will get gradients with respect to the tensor on which .backward() was called. Glad you liked the vid.
@timharris72
@timharris72 4 жыл бұрын
@@elliotwaite That makes sense. Thanks for the explanation. Understanding the details really helps when you are trying to learn the concepts!
@alirezag6603
@alirezag6603 5 жыл бұрын
dude it was so amazing!! can you give a try to explain hook as well?
@elliotwaite
@elliotwaite 5 жыл бұрын
Thanks for the suggestion. I'm not sure when I'll make more PyTorch videos, but that would be a good topic to cover if I do.
@elliotwaite
@elliotwaite 3 жыл бұрын
@isalirezag, I finally made a video explaining PyTorch hooks, here it is: kzbin.info/www/bejne/qaqvd3aMjtqUbLM I also featured your comment at the end of the video, but I'm not sure if I pronounced your username correctly. Anyways, thanks again for the video suggestion.
@yumnafatma9198
@yumnafatma9198 2 жыл бұрын
I have some difficulty understanding how DivBackward works, an explanation would be very helpful. Thanks for the nice video.
@elliotwaite
@elliotwaite 2 жыл бұрын
To explain, I'll use the example in the video (11:18) of: i = g / h (where g = 16, and h = 2). To understand how DivBackward works, we need to know how to backpropagate gradients through a division. This means figuring out how changing each of the inputs will change the output. To make this equation look more familiar, I'll replace some of the values so that we get the output is represented by "y" and the value we are trying to find the derivative of is represented by "x". So, to figure out the gradient for "g", we replace "g" with "x", and we replace the output "i" with "y", and then we replace any other variables with their current value, meaning we replace "h" with 2, and it becomes: y = x / 2 Now we just find the derivative of this equation (dy/dx) using the rules of calculus: dy/dx = 1 / 2 So the gradient that gets passed back for "g" is half the input gradient to DivBackward. Then we can do the same for "h", replacing "h" with "x", replacing the output "i" with "y", and the other variable "g" with its current value of 16, and we get: y = 16 / x And then we find the derivative of this which is: dy/dx = -16 / (x^2) And now that we've done the derivative part, we can replace "x" with its current value, and since "x" represents "h", we can now replace the "x" with the current value of "h", which is 2. So it becomes: dy/dx = -16 / (2^2) = -16 / 4 = -4 So the gradient that gets passed back for "h" is -4 times the input gradient to DivBackward. Now we can compare these values to what can be seen in the video at 11:18, and we can that these are the values that DivBackward uses. The input gradient to DivBackward is 18, and the gradient passed back for "g" is 18 * 0.5 (9), and for "h" it's 18 * -4 (-72). So how DivBackward works is that it saves the values of the numerator and denominator passed into the division operation and then uses them during the backward pass in the following way: gradient_of_numerator = input_gradient * 1 / denominator gradient_of_denominator = input_gradient * -numerator / (denominator ** 2) These equations represent the same steps we followed above. I hope this helps. Sorry if it was confusing. Feel free to let me know if there is anything you want me to clarify.
@yumnafatma9198
@yumnafatma9198 2 жыл бұрын
@@elliotwaite Thanks for writing detailed explanation. It is clear to me now.
@elliotwaite
@elliotwaite 2 жыл бұрын
@@yumnafatma9198 glad I could help.
@tg__nano
@tg__nano 3 жыл бұрын
Thanks for the video! Can I make video explaining backpropagation based on your example in the video?
@elliotwaite
@elliotwaite 3 жыл бұрын
Yeah, thanks for asking.
@endjack218
@endjack218 4 жыл бұрын
Amazingly understandable! Could you help provide the illustrations in the video to github or somewhere? Thanks!!!
@elliotwaite
@elliotwaite 4 жыл бұрын
Thanks! Yeah, here is the link to the draw.io file: drive.google.com/file/d/1bq3akhmA5DGRCiFYJfNPSn7il2wvCkEY/view?usp=sharing When you follow that link, you should see a button at the top of the page to "Open with draw.io Diagrams." Clicking it should open the file in draw.io. You might need to be signed in to your Google account to see that button.
What Does an RL Parameter Space Look Like?
6:59
Elliot Waite
Рет қаралды 4,6 М.
PyTorch Hooks Explained - In-depth Tutorial
22:08
Elliot Waite
Рет қаралды 33 М.
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 88 МЛН
She ruined my dominos! 😭 Cool train tool helps me #gadget
00:40
Go Gizmo!
Рет қаралды 54 МЛН
La revancha 😱
00:55
Juan De Dios Pantoja 2
Рет қаралды 67 МЛН
What is Automatic Differentiation?
14:25
Ari Seff
Рет қаралды 104 М.
AES Explained (Advanced Encryption Standard) - Computerphile
14:14
Computerphile
Рет қаралды 1,2 МЛН
Mind-maps and Flowcharts in ChatGPT! (Insane Results)
13:05
AI Foundations
Рет қаралды 302 М.
What is PyTorch? (Machine/Deep Learning)
11:57
IBM Technology
Рет қаралды 21 М.
Softmax Function Explained In Depth with 3D Visuals
17:39
Elliot Waite
Рет қаралды 34 М.
PYTORCH COMMON MISTAKES - How To Save Time 🕒
19:12
Aladdin Persson
Рет қаралды 53 М.
If __name__ == "__main__" for Python Developers
8:47
Python Simplified
Рет қаралды 380 М.
Policy Gradient Theorem Explained - Reinforcement Learning
59:36
Elliot Waite
Рет қаралды 57 М.
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 88 МЛН