PyTorch Hooks Explained - In-depth Tutorial

Рет қаралды 36,867

Elliot Waite

Күн бұрын

Пікірлер: 143

@altostratous 3 жыл бұрын

Most professional video I've ever seen in programming.

@elliotwaite 3 жыл бұрын

Thanks, Ali!

@rockapedra1130 9 ай бұрын

These lectures are gold!!

@elliotwaite 9 ай бұрын

Thanks!

@scottmiller2591 3 жыл бұрын

Very nice presentation - good pacing, good use of animation, good example - for complicated subject that is not explained clearly in one place in the documentation, but scattered throughout without a unifying set of examples.

@kartheekakella2757 4 жыл бұрын

awesome vid! this channel's gonna go viral, take my word for it..

@elliotwaite 4 жыл бұрын

Thanks, Sukruth! You're the first to comment this prediction as far as I can remember. I'll try to make it come true.

@xzl20212 3 жыл бұрын

@@elliotwaite I really appreciate the quality of your video. Glad you do not sacrifice quality for the subscription.

@zichenwang8068 Жыл бұрын

Thank you so much for sharing this high-quality tutorial.

@elliotwaite Жыл бұрын

Woah, this is the first Super Thanks I've ever received. Thanks, Zichen! 😊 I'm glad you found the tutorial helpful.

@hilmandayo 4 жыл бұрын

I LOVE the small and little details/catch-up you threw into the video! It can clear a lot of doubts that beginner will probably face. Keep the videos coming! You contribute a lot to the world with this kind of video. Between, your channel makes a huge transition from producing totally random videos of exercise, etc. to deep learning ha3.

@elliotwaite 4 жыл бұрын

Thanks, Hilman! I'll keep the video coming. Hah, yep, this channel has been through some interesting phases. But I think I've found my niche.

@abhijitdeo2683 3 жыл бұрын

Dude this content is gold

@abhijitdeo2683 3 жыл бұрын

If u are planning for memeber only content and stuff too, I'm in man.. this is literally gold

@elliotwaite 3 жыл бұрын

@@abhijitdeo2683, thanks! I don't have any plans for member only content at the moment, but I appreciate your comment.

@SumerbankaNeOldu 4 жыл бұрын

Finally, i've nailed the hooks. Thank you :)

@junweizheng1994 2 жыл бұрын

My first comment on KZbin. This video is amazing and I can image you have done lots of work for making this video. I really appreciate that. Good contents, good presentation, good slides. This channel will get popular if you continue making great videos like this!

@elliotwaite 2 жыл бұрын

Thank you! I hope to make more videos in the future.

@chaupham1186 2 жыл бұрын

@@elliotwaite Looking forward to it! Thanks for great videos

@pizhichil 4 жыл бұрын

Thank you so much for this video. As always, very helpful. If not this one, it would have taken a large effort to understand everything ... thanks

@elliotwaite 4 жыл бұрын

Thanks, Amit! I'm glad you found it helpful.

@khushpatelmd 3 жыл бұрын

Thank you so much. Please make more videos. You are incredible teacher!!

@elliotwaite 3 жыл бұрын

Thanks for the comment, glad you liked the video. I have been thinking about starting to make videos again, and your encouragement helps.

@abdelmananabdelrahman4099 Жыл бұрын

Wow. Great content! We need more of these videos.

@elliotwaite Жыл бұрын

Thank you for the encouraging comment. I may make more in the future.

@fernandofariajunior 6 ай бұрын

This video is so helpful, thanks for making it!

@elliotwaite 6 ай бұрын

Thanks. I'm glad you liked it.

@datascience3008 Жыл бұрын

Its amazing how you can reply to all comments that you recieve.

@elliotwaite Жыл бұрын

🙂 I enjoy the subject matter. Also, it's not too many to get overwhelmed by at the moment.

@samanthaqiu3416 4 жыл бұрын

very interesting. I saw your autograd video and was very cool. Something that gets confusing for me is when you need to retain the graph in order to use the gradients computed in a first backward, in a second metaloss calculation

@elliotwaite 4 жыл бұрын

Ah, the reason you have to set retain_graph=True is because the default behavior of the backward method is that after the gradients have been passed through, it will delete the data stored in the backward graph that was needed to calculate those gradients (such as the data for to the tensors that were used in the forward pass). The reason that this is the default behavior is because most of the time people only do one pass through the backward graph, and deleting the graph's data saves memory. So you have to specify that you want to keep the graph's data in memory if you want to do a second pass through it. Let me know if I didn't answer your question, or if there is anything you're still unsure about.

@shubhamthapa7586 3 жыл бұрын

Wow thanx for making this video , finally my doubts are cleared now !

@elliotwaite 3 жыл бұрын

Gald it helped.

@shubhamthapa7586 3 жыл бұрын

@@elliotwaite yeah I was trying to implement grad cam so thought of clearing the concept of hooks first and this video is just perfect for that.

@ohotpow 3 жыл бұрын

Very good video! It should be liked in the pytorch documentation.

@carlossegura403 3 жыл бұрын

Great summary and video-quality to PyTorch hooks!

@elliotwaite 3 жыл бұрын

Thanks, Carlos!

@raunakkbanerjee9016 5 ай бұрын

Excellent video.. crystal clear explanations

@elliotwaite 5 ай бұрын

@@raunakkbanerjee9016 thanks!

@shaozhuowang3403 4 жыл бұрын

It's great as always, thank you guys.

@elliotwaite 4 жыл бұрын

Thanks, shaozhuo!

@jiangpengli86 4 ай бұрын

Marvelous tutorial. Thank you so much.

@elliotwaite 4 ай бұрын

@@jiangpengli86 I'm glad to hear you liked it.

@aymensekhri2133 2 жыл бұрын

Very amazing, could you please take for example the state-of-the-art models in deep learning and break them down and explain how the flow is working espacially those models that contains very specific pytorch methods like "register hooks". Because i have noticed that on youtube most of the youtubers are focusing on big terms on pytorch and they are explaing the simple concepts but once we get to the SOTA models we find many things new and complex.

@elliotwaite 2 жыл бұрын

That's a good suggestion for potential future videos, thanks. I've noticed that as well, that KZbinrs usually don't break down the more complex PyTorch models. I'm currently busy working on another project and have taken a break from making KZbin videos, but I'll add this idea to my list of video ideas in case I get back into making KZbin videos in the future.

@andreborgescavalcante4589 2 жыл бұрын

One thing related is that for reading grad properties of intermediate tensors, we only need use first retain_grad() and then to return that tensor as an output of the forward method.

@elliotwaite 2 жыл бұрын

Thanks for mentioning this tip.

@phuclai4492 Жыл бұрын

great video, I love it !!! I hope you make more great videos in the future.

@elliotwaite Жыл бұрын

Thanks!

@catthrowvandisc5633 4 жыл бұрын

hey Elliot, thank you for this as well! i came to your channel for your autograd video and it really helped me quickly get a clearer picture. this one's just as good too. i really like how you incrementally take the problems deeper in each of your videos, they are very helpful to cement the understanding. would you be able to do one on pytorch distributed training? i couldn't find a good video explanation to help with it.

@elliotwaite 4 жыл бұрын

Glad to hear you're finding these videos helpful. And thanks for the suggestion! I still don't have a good understanding of how PyTorch distributed training works either yet, but it seems like something I should learn at some point. I'm not sure when I'll get around to this one, since it seems like it might take a bit of research to get a deeper understanding of it, but I'll definitely add it to my list of potential future videos. And if I decide to make it, I'll leave another reply here letting you know when it's posted.

@ArunKumar-bp5lo 2 жыл бұрын

Nicely explain visually

@elliotwaite 2 жыл бұрын

Thanks!

@wayc04 Ай бұрын

Greate and helpful video! But I get the question that when I execute your program 12:50 and I check the grad of e by using print(e.grad), I find its output is tensor(2.).I get a little confused about it.

@elliotwaite Ай бұрын

Thanks for pointing this out. It looks like maybe the way `.retain_grad()` works has changed since this video was recorded so that it will now always retain the output gradient after it has gone through all your other hook, which is why the gradient of e is now printed as 2.0 instead of 1.0. If you try to run the code on PyTorch 1.6.0 (which was the latest version when this video was made), you'll see the old behavior that's described in the video.

@wayc04 Ай бұрын

@@elliotwaite Thank you, I understand the reason now. Additionally, PyTorch has fixed the bug related to register_backward_hook by replacing it with register_full_backward_hook.

@elliotwaite Ай бұрын

@wayc04 ah, good to know.

@jizhang2407 3 жыл бұрын

@11:00, I don't get it why `return grad + 2` will update the `grad` variable, if this is not done by `grad +=2` and then `return grad` in thie c_hook function... Can anybody enlighten me? Thanks. Anyway, brilliant tutorial, and I learned a lot. Thank you, Elliot.

@elliotwaite 3 жыл бұрын

Thanks, glad you liked the tutorial. About your question, I think during the backward pass, the PyTorch code does something like this: grad = registered_hook_function(grad) So it updates the grad variable with whatever was returned from the hook function, but not in the same as an in-place operation would because the grad variable is now pointing to a different tensor, the tensor returned from the hook function (unless the returned tensor is the same tensor that was passed into the hook function). This new tensor is then passed along as the gradient to the next backward nodes in the graph.

@samllanwarne6512 4 жыл бұрын

at 4:55 when you multiply by 4, should accumulatedGrad for a be 8 and accumulatedGrad for b be 12, the other way around to in the video?

@elliotwaite 4 жыл бұрын

I think the gradients are correct in this case, but it is a bit counterintuitive and this has mixed me up before. The counter intuitive part is that when you backprop through a multiplication, the gradients actually get multiplied by the flip of the input values. For example when you backprop through A * B, the gradient for A is B * the incoming gradient, and the gradient for B is A * the incoming gradient. This is because for each little increase in A, it will increase the output by that little increase times B, and for each little increase in B it will increase the output by that little increase times A.

@samllanwarne6512 4 жыл бұрын

@@elliotwaite Thank you Elliot!

@rayanzaki9314 4 жыл бұрын

Awesome Explanation. It was really very helpful. Could you make a video on Quantization in Pytorch? Thanks

@elliotwaite 4 жыл бұрын

Thanks! Glad you liked it. And thanks for the suggestion. I'll add that to my list of potential future videos. Quantization is something I also want to learn more about at some point, and I'll probably make a video about it when I do.

@peasant12345 Жыл бұрын

I still don't follow why c.grad will be modified to 100 14:32. Does it have an enforced integrity something like the grad of two sides of an add operation must be equal?

@elliotwaite Жыл бұрын

Yeah, finding the gradient is basically saying, "if I change the input by a little bit, by how much will that change the output." And the add operation (or the sum of any number of inputs) will result in the inputs all having the same gradient with respect to the output, because changing any of the inputs by a little bit (dx), will changing the output by that exact same amount (dx). So the gradient for all the inputs of a sum with respect to the output is 1, which means that when backpropagating the gradients through the add (or sum) operation, the code can use the optimization of just passing the same gradient tensor to all the inputs. But this optimization is only safe if none of the backward hook functions apply any in-place operations to this shared tensor, because the result of an in-place operation would be visible to all that are using that shared tensor.

@sudhirdeshmukh8445 3 жыл бұрын

Hi Elliot, thanks for yet another wonderful PyTorch video. I was just wondering why there is "@staticmethod" mentioned before the forward function of a module. Why use "@staticmethod" also when and where. REF: 15:21 in the video. Thank you

@elliotwaite 3 жыл бұрын

"@staticmethod" is just a function decorator that makes it so that when the method is called, its first argument isn't auto-filled with the value of the instance that is calling the method, or in other words, it just makes it so that the first parameter of the method doesn't have to be "self". You can use it on any methods where the "self" parameter is not used, in which case you can add the "@staticmethod" decorator and remove the "self" parameter. Some tests show that using it when appropriate provides a tiny performance boost, but I think it mostly just makes it cleaner in the sense that you don't list any unused parameters. You can find more info about it here: docs.python.org/3/library/functions.html#staticmethod

@jonatan01i 4 жыл бұрын

Thank you for this video, very informative, helped me a lot! Thanks!

@elliotwaite 4 жыл бұрын

Glad you liked it!

@programmer8064 Жыл бұрын

Thank you so much!!!!!!!!!!!!!!!!!!!!! I love this video!!!

@elliotwaite Жыл бұрын

Thanks!

@jizhang2407 3 жыл бұрын

@14:21, I don't understand why the in-place operation inside d_hook also changes the gradient passed to MulBackward0. Isn't the gradient of e, i.e. 1.0, is passed to both c and d as two "1.0"s, i.e. the same but two independent values? Can anybody enlighten me? Thanks.

@elliotwaite 3 жыл бұрын

Since the AddBackward0 node doesn't change the gradient, it saves memory by not duplicating the data and just passing along the same "1.0" tensor object to both c and d (or in other words, it passes along pointers that point to the same underlying data). This is why when that data is change in the d_hook by an in-place operation, it also affects the data that is seen in the c node. I hope that helps clarify. P.S. - Sorry for the late reply. I'm not sure how I missed your comment earlier. Thanks for the question.

@markomitrovic4925 Жыл бұрын

Thank you for the explanation :)

@elliotwaite Жыл бұрын

I'm glad you liked it.

@oheldad 2 жыл бұрын

Great tutorial well done !

@elliotwaite 2 жыл бұрын

Thanks!

@pouyaparsa5851 4 жыл бұрын

Nice job, thank you I wonder is there any way to see these nodes in code and print their properties like where they point to an so on ? in other words could we go any further than knowing grand_fn is actually there ?

@elliotwaite 4 жыл бұрын

Yeah, you can inspect some things, like the nodes in the backwards graph. I advise using a debugger. I use the one in PyCharm. The grad_fn property will point to a node in the backward graph, and then that node will have a next_functions property which will be a tuple of tuples that contain other nodes in the backward graph, and so on. For example: a = torch.tensor(2.0, requires_grad=True) b = (a * 3) * 4 print(b.grad_fn.next_functions[0][0].next_functions[0][0]) # Will print out: print(b.grad_fn.next_functions[0][0].next_functions[0][0].variable) # Will print out: tensor(2., requires_grad=True) The first print function will work its way back through the backward graph until it gets to the AccumulateGrad node for the `a` tensor, so it will print out that AccumulateGrad node object. And the second print statement will print out the variable associated with that AccumulateGrad node, which is the `a` tensor, so it will print out the `a` tensor. However, the things you can access is limited, for example, I don't think you can access which tensors are associated with the intermediate nodes, like the MulBackward0 nodes, since I think that information is stored on the C++ side of things. Good question. Thanks, Pouya!

@pouyaparsa5851 4 жыл бұрын

@@elliotwaite thanks for this perfect answer !

@andreborgescavalcante4589 2 жыл бұрын

Amazing video.

@elliotwaite 2 жыл бұрын

Thanks!

@nezgi8220 3 жыл бұрын

What about stacking a and b to another tensor? How grads are calculated if many of these grad required mini tensors stacks to a big tensor?

@elliotwaite 3 жыл бұрын

In the backward pass, the gradient will get distributed to each of the tensors that were stacked together, only passing along to each tensor the part of the gradient that corresponds with that tensor.

@nezgi8220 3 жыл бұрын

@@elliotwaite Indeed, it is, I tested empirically. What a miracle!

@케이케이-u8y 3 жыл бұрын

Hi Elliot , very good video , I have a question in your video 14:39 I define def d_hook(grad): grad*=100 Replacing e = c + d to e = c * d and i wrote code d.register_grad(d_hook) and it affect to c , c gradient = 100 But, i do e= c+ 1*d and use hook func grad*=100 and I wrote the code as d.register_grad(d_hook) .it does not affect , c gradient= 1 I don't know why there is such a difference

@elliotwaite 3 жыл бұрын

When going backwards through c + d, the same gradient gets passed to both c and d, so multiplying the d gradient in-place by 100 will also affect the c gradient. When going backwards through c + 1 * d, the same gradient is passed to both c and the 1 * d term, and then the gradient passed to the 1 * d term is multiplied by 1 to get the gradient for d, and this multiplication by one ends up creating a new tensor, so now when you update the d gradient in-place it is no longer the same tensor used for the c gradient, so it doesn't affect the c gradient. I hope that makes sense. Let me know if that doesn't answer your question.

@케이케이-u8y 3 жыл бұрын

@@elliotwaite Thanks your answer is really great.

@jonatan01i 4 жыл бұрын

Besides the inplace operation on the grads, is there any difference between - using hooks and - going through the model.parameters()'s grads in order to modify them before a call on optimizer.step()?

@elliotwaite 4 жыл бұрын

As far as I'm aware, there won't be any difference. When you call optimizer.step(), the optimizer will only be concerned with what the current grad values of the parameters are, and it won't matter how those grad values were assigned.

@jonatan01i 4 жыл бұрын

@@elliotwaite Makes sense, thank you!

@mariogalindoq 3 жыл бұрын

Elliot: very interesting, but could you give examples of using hooks? That's it, why could be useful to use hooks? In which problem do you needs hooks? What can be done with hooks that can't be done other way?

@elliotwaite 3 жыл бұрын

Good question. I probably should have included more specific use cases in the video. The best use case I can think of for hooks is when you want to change how data flows through an existing module (so these would be using the module style hooks, `register_forward_hook` and `register_forward_pre_hook`). For example, these hooks were used in PyTorch's library to implement these tools: Quantization: pytorch.org/docs/stable/quantization.html Pruning: pytorch.org/tutorials/intermediate/pruning_tutorial.html Spectral Norm: pytorch.org/docs/stable/generated/torch.nn.utils.spectral_norm.html Weight Norm: pytorch.org/docs/stable/generated/torch.nn.utils.weight_norm.html You can download the PyTorch library and search the source code for "register_forward_" to see how they were used. Good use cases for the Tensor hooks are harder to think of, but they could be used to experiment with novel ways of managing the gradients as they flows through the backward graph. For example, to do gradient clipping, most people just clip the gradients at the end of the backward pass, but maybe it would be better in certain cases to clip the gradients as they are flowing through the backward graph instead, which could be done with Tensor hooks (that might actually be a good experiment to try 🤔). I personally haven't ever run into a situation where I needed to use them in the projects I've worked on, but it's good to know they're there if needed, and I wanted to explain them since my viewers had asked about them.

@PankajGupta-ki9gx 3 жыл бұрын

Can you make a full-fledged Series covering various PyTorch functionalities and inbuild classes since the documentation is quite tough to interpret for custom datasets in form of a playlist! Thank You

@elliotwaite 3 жыл бұрын

Thanks for the suggestion. I haven't been interested in making PyTorch videos lately, but I'll add your recommendation to my potential future videos ideas list.

@anas.2k866 2 жыл бұрын

Thanks, so we cant track the gradient in backward when it is in module ? Is there any other way

@elliotwaite 2 жыл бұрын

After I made this video, PyTorch added a new hook called register_full_backward_hook() that works on modules. It is called whenever the gradient with respect to the inputs is computed. The docs for it are here: pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_full_backward_hook However, if you are asking about tracking the gradients inside the module, and not just the gradients of the inputs, to do that, I think you would have to updated the actual module code by adding calls to register hooks on the intermediate tensors.

@anas.2k866 2 жыл бұрын

@@elliotwaite Ah thank you. So if I put this hook in the layer 5 of my multylayer perceptron and I lunch the loss.backward(), the grad_input is the gradient of the loss with respect to the weihts and bias of layer 5 is not it ? and what is the grad_output. Thanks again for your huge effort !!!!

@elliotwaite 2 жыл бұрын

@@anas.2k866 hooks are usually used to intercept gradients in places where you wouldn't otherwise have access to them. The graident of the loss with respect to the weights and bias of layer 5 will already be accessible by accessing the the `.grad` attribute of the weights/bias tensors of that module. Registering a hook on a module would be used for something else. It would be used to access the gradients just before they enter the module in the backward pass and just after they leave the module in the backward pass. The gradients just before entering the module in the backward pass will be the `grad_output` value of the hook function, because those will be the gradients with respect to the output of the module. And after those gradients flow backward through the module, you'll get the `grad_input` values, the gradients of the loss with respect to the inputs of the module. The variable names `grad_input` and `grad_output` are using input/output to refer to input/output of the forward pass, which is why they are the reverse names of what they are when flowing backward through the graph (`grad_output` is the input gradient in the backward pass and `grad_input` is the output gradient in the backward pass).

@anas.2k866 2 жыл бұрын

@@elliotwaite ah ok so grad_output is the gradient of the loss with respect to the output of the module, wich in my case the gradient of the loss with respect to the activation of the neurones in layer 5. And grad_input is the gradient of the loss with respect to the activation of the neurones in layer 4 ?

@elliotwaite 2 жыл бұрын

@@anas.2k866 Yep

@az8134 3 жыл бұрын

damn, I never looked into those details before.

@AmitYadav-zs4ft 3 жыл бұрын

Hi, what are using to display those cpu/memory specs in your upper right corner? In linux, we have system monitor, but I am looking for an alternative in mac. Thanks

@elliotwaite 3 жыл бұрын

iStat Menus is the one I use: bjango.com/mac/istatmenus/

@TheMazyProduction 4 жыл бұрын

What do you use to make these diagrams?

@elliotwaite 4 жыл бұрын

For this one I used Figma. I learned from this tutorial: kzbin.info/www/bejne/hX6QnYewe9JsgLM

@TheMazyProduction 4 жыл бұрын

@@elliotwaite Perfect I needed something like this to make flowcharts!

@thevikinglord9209 2 жыл бұрын

Nice video, so how do you check out the graph ?

@elliotwaite 2 жыл бұрын

Do you mean how do you check out the backward graph of the code you wrote? If so, I explored the backward graph by using a debugger and looking at the `grad_fn` property on the tensors.

@ハェフィシェフ 2 жыл бұрын

Do you make the diagrams yourself or did you write code for it?

@elliotwaite 2 жыл бұрын

I made them myself using Figma.

@Aditya-ne4lk 4 жыл бұрын

a.grad will have the same shape as the shape of a correct?

@elliotwaite 4 жыл бұрын

Yep. The gradient tensor will have a gradient value corresponding to each of the values in the A tensor, so it will have the same shape as A.

@liweidai4474 3 жыл бұрын

There is a register_full_backward_hook() method now which is recommended over register_backward_hook(). You can check it out. One thing bothers me is that, in the documentation it clearly says that modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error. But whatif I indeed have some a predefined model, which has some inplace operators. I just want to know the grads w.r.t to the tensors before doing the inplace op and after the inplace op. Is there any means to accomplish this other than have to modify my code to not use inplace op? How about the register hooks for the tensors method？

@elliotwaite 3 жыл бұрын

Thanks for letting me know about the new register_full_backward_hook() method. I've added a note about this to the video description. And about your question, I don't know the answer to this one.

@ravivaishnav20 4 жыл бұрын

Awesome explanation, Could you please give intuition on Data parallesim in pytorch, and is there any we can use colab GPU with our Laptop GPU ?

@elliotwaite 4 жыл бұрын

Thanks! I'll add data parallelism in PyTorch to my list of potential future videos (but not sure when I might get around to making it). For RL tasks, I've just been using MPI for Python (mpi4py.readthedocs.io/en/stable/). An example of it being used can be seen in OpenAI's Spinning Up in Deep RL code: github.com/openai/spinningup/blob/master/spinup/utils/mpi_pytorch.py github.com/openai/spinningup/blob/master/spinup/utils/mpi_tools.py github.com/openai/spinningup/blob/master/spinup/algos/pytorch/vpg/vpg.py And there's also PyTorch's built-in distributed training tools, but I haven't dived into those much yet. Using a colab GPU and your laptop's GPU in parallel should be possible, but I'm not sure the details of how you would get it to work. I would imagine you'd establish a way to communicate between the two processes running on the separate machines, then you synchronize the models at the start of training, and then each model computes the gradients for a separate batch of data, and then those gradients would get averaged using the communication method before using them to update the models.

@michpo1445 Жыл бұрын

What is the program you're using to graphically design py torch code?

@elliotwaite Жыл бұрын

I make the designs with Figma.

@michpo1445 Жыл бұрын

@@elliotwaite Thanks, but to clarify does this tool create the pytorch code for you, or do you just use it to graphically represent what you are coding?

@elliotwaite Жыл бұрын

@@michpo1445 it wasn't auto-generated, I just designed the slides by hand to match the info I was seeing in the Python debugger.

@Jimmy-et1bp 3 жыл бұрын

how that forward_pre_hook and forward_hook affect a,b,c gradients?

@elliotwaite 3 жыл бұрын

Any operations performed within the forward_pre_hook or forward_hook functions will affect the gradients the same as any computations performed in the module's forward method. It's almost as if you are just inserting the forward_pre_hook function's code into the beginning of the forward method, and inserting the forward_hook function's code into the end of the forward method (forward_hook probably should have been named forward_post_hook, I'm not sure why it wasn't).

@rachelliu7253 2 жыл бұрын

Thanks so much

@elliotwaite 2 жыл бұрын

You're welcome. Thanks for the comment.

@BlackHermit 3 жыл бұрын

So, still no answer as to why the 0 is there in "MulBackward0"?

@elliotwaite 3 жыл бұрын

I think I just figured it out. It allows for the same operation to be called in multiple ways (function overloading), and each different overloaded way of calling that operation gets a different index number for its backward version. The part of the PyTorch library that generates these backward operation names can be found here (and the comment above the code also describes that this is done to de-duplicate overloaded operation names): github.com/pytorch/pytorch/blob/master/tools/autograd/load_derivatives.py#L355 For example, the `min` operation can be called in multiple ways. In the code below, I call the `min` operation in two of these different ways, and the resulting backward operations associated with different output tensors end up having different index numbers. "torch.min(a)" generates a MinBackward1 operation, and "torch.min(a, dim=0, keepdim=False)" generates a MinBackward0 operation. Code example: import torch a = torch.tensor([2.0, 3.0], requires_grad=True) b = torch.min(a) (c, c_indices) = torch.min(a, dim=0, keepdim=False) print(b) # Prints: tensor(2., grad_fn=) print(c) # Prints: tensor(2., grad_fn=)

@BlackHermit 3 жыл бұрын

@@elliotwaite Oh, interesting. Thanks!

@randomforrest9251 3 жыл бұрын

So why is it fallee mulbackward0?

@elliotwaite 3 жыл бұрын

I recently figured it out. It allows for the same operation to be called in multiple ways (function overloading), and each different overloaded way of calling that operation gets a different index number for its backward version. The part of the PyTorch library that generates these backward operation names can be found here (and the comment above the code also describes that this is done to de-duplicate overloaded operation names): github.com/pytorch/pytorch/blob/master/tools/autograd/load_derivatives.py#L565 For example, the `min` operation can be called in multiple ways. In the code below, I call the `min` operation in two of these different ways, and the resulting backward operations associated with different output tensors end up having different index numbers. "torch.min(a)" generates a MinBackward1 operation, and "torch.min(a, dim=0, keepdim=False)" generates a MinBackward0 operation. Code example: import torch a = torch.tensor([2.0, 3.0], requires_grad=True) b = torch.min(a) (c, c_indices) = torch.min(a, dim=0, keepdim=False) print(b) # Prints: tensor(2., grad_fn=) print(c) # Prints: tensor(2., grad_fn=)

@randomforrest9251 3 жыл бұрын

@@elliotwaite thank you a lot!

@shvprkatta 4 жыл бұрын

Thanks a ton Elliot!...it would take a lot of time to understand the concepts otherwise...

@elliotwaite 4 жыл бұрын

Thanks! Glad you found it helpful.

@MaximYudayev 3 жыл бұрын

Hi Elliot. Great videos. Sub’d :) It would be outstanding if we could dive into customization of the quantization workflow! Things like making custom modules compatible with the fusing and quantization workflows, as well as expanding the data type formats. Thank you!

@elliotwaite 3 жыл бұрын

Thanks. Glad you liked the videos. So far I've only briefly looked into PyTorch's quantization capabilities, but it looks interesting. But I'm not sure if I'll ever get around to making a video about it since I've been more focused on learning Jax these days, but I'll add the idea to my list of potential future KZbin videos. Thanks for the recommendation.