PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation

Рет қаралды 107,255

Күн бұрын

Пікірлер: 168

@Jonas-qz2gb 3 жыл бұрын

This is the best ML tutorial series I've seen so far. You're explaining the underlying concepts so well. This is the first time that I have such a concrete understanding of what goes on during backpropagation and gradient descent while also learning about the relevant programming framework. It is awesome how you implemented everything manually and are then switching step by step to PyTorch. Now everything from the previous videos starts to really come together. Just wanted to say that I'm very thankful and greatly appreciate your efforts! You gained a subscriber. Greetings from a fellow German!

@patloeber 3 жыл бұрын

thank you so much!

@harissajwani2583 4 жыл бұрын

First Deep Learning tutorials where I don't feel sleepy. Good work brother.

@patloeber 4 жыл бұрын

Thanks!

@mawkuri5496 3 жыл бұрын

me too

@MohamedShatarah-l7t Жыл бұрын

dude oh my god. you are so amazing. all this complex stuff I now understand because of you brilliant teaching. you use examples that are easy to understand and explain in detail. oh god i can only imagine the effort you put in this. thank you so much.

@lucyfrye6723 Жыл бұрын

Just in case people are a little confused : the partial derivatives he is REALLY dotting for Dloss/dw are : x (which is Dy_predicted / Dw) and Dloss/Dy_predicted), which is : 2 * (y_predicted - y) / N, where N = x.size ==y_predicted.size . Also, y_predicted is often simply called 'a' (of the top layer in this case, but we only have one layer in this example), according to convention. I hope this helps someone. Nice video, it's a good idea to manually do the gradient just so people have at least a rough idea of what goes on under the hood of a neural network. Things get a lot trickier for more (hidden) layers, though!

@MrThezyga Жыл бұрын

thanks, I actually didn't catch that while watching

@kurtulusbulus6728 3 жыл бұрын

One of the best Pytorch tutorial on the internet. Great work my friend! Keep sharing please!

@patloeber 3 жыл бұрын

glad to hear that :)

@theusualcouple 10 ай бұрын

I have build many Linear regression models but this is the first time I have actually coded the linear regression. I am so happy today. Thanks @patloeber

@hector1502 4 жыл бұрын

Thank you for your videos. I think there is a mistake in the calculus of the gradient function in the numpy example. It is worthless to compute the mean of a scalar (the output of np.dot(2*x, y_predicted-y) is a scalar). You should simply divide the result by (x.size). So the function should be: return(np.dot(2*x, y_pred - y) / x.size). I think that is the reason why your results are different from Pytorch.

@patloeber 4 жыл бұрын

You are correct! Great catch, and thank you for detecting this :) I will fix this in my github repo. In this case the error was not too bad and it only resulted in a different scaling of w, so the gradient descent worked anyway.

@patloeber 4 жыл бұрын

I put some more thinking into this. It's also possible to not use the dot product and write: return (2*x * (y_pred - y)).mean(). The result should be the same as your solution.

@xinqiaozhao5154 4 жыл бұрын

@@patloeber I have a question about your reply, in this case, x and y are both vector, and I do an element level chain rule calculation and get a result the gradient is equal to 2/N *(w*np.dot(x,x) - np.dot(x, y)) in this case, you can try this and compare the result with the torch.nn.mseloss, the results are the same

@patloeber 4 жыл бұрын

my calculation which I have in the comment is: # J = MSE = 1/N * (w*x - y)**2 # dJ/dw = 1/N * 2x(w*x - y) if we put x inside the parenthesis, then it is 1/N * 2(w*x*x - y*x) which is your formula. so yes it should be the same.

@ericohana9650 2 жыл бұрын

@@patloeber It is the clearest solution imo

@taishi573 2 жыл бұрын

Just saw your videos and I'm now binge watching them. Explanations are so easy to understand. You are the best!

@jyotipch 2 жыл бұрын

For those who are struggling in line number 21 like me, you need to binomially expand 1/N*(wx-y)**2 which becomes 1/N((wx)**2+y**2-2wxy). Derivative of that wrt w becomes 1/N(2wx**2-2xy) => 1/N*2x(wx-y)

@Coconut7403 Жыл бұрын

He uses the chain rule: F'(x)=f'(g(x))*g'(x)

@andersondacostaferreira3119 3 жыл бұрын

I have to study AI for my actual astrophysics project and you are helping me a lot. Thank you so much!

@TheOraware Жыл бұрын

for gradient calculation , you are using dot product which yields scalar gradient for all samples combined, getting mean of this scalar value will result the same value , this scalar value should be divided by len(x) def gradient(x, y, y_predicted): N = len(x) return np.dot(2 * x, y_predicted - y)/N

@tiagoleviski 3 жыл бұрын

The way you first explained in numpy and then changed to torch was fantastic. I finally understood the training loop in torch thank's to you!

@patloeber 3 жыл бұрын

great to hear :)

@dingusagar 2 жыл бұрын

wow, now i finally get a picture of what is happening behind the scenes of autograd. great stuff

@abdelrahmanhammad1020 3 жыл бұрын

Thanks for the great tutorial. I believe both solutions should converge similarly. The difference in number of iterations was due to the bug in gradient calculation discussed earlier in the comments. I tried both solutions, and they converge in 411 steps using the same initialization and learning rate. Code below: _____ Common Code: _____ import torch x = torch.tensor(1) y = torch.tensor(2) lr = .01 def forward(x): return w*x _____ Manual Gradient Code: _____ def gradient(x, y, y_hat): return 2*x*(y_hat-y) w = torch.tensor(0., requires_grad=True) n_iter = 500 for epoch in range(n_iter): y_hat = forward(x) grad = gradient(x, y, y_hat) print(f'Iternation = {epoch}, weight = {w:0.3f}, loss = {l:0.3f}, gradient = {grad:0.3f}, f(5) = {forward(5)}') with torch.no_grad(): w -= lr*grad _____ Autograd Code: _____ def loss(y, y_hat): return (y_hat-y)**2 w = torch.tensor(0., requires_grad=True) n_iter = 500 for epoch in range(n_iter): y_hat = forward(x) l = loss(y, y_hat) l.backward() print(f'Iternation = {epoch}, weight = {w:0.3f}, loss = {l:0.3f}, gradient = {w.grad:0.3f}, f(5) = {forward(5)}') with torch.no_grad(): w -= lr*w.grad w.grad.zero_()

@gonzalopolo2612 Жыл бұрын

Really great tutorial @Patrick Loeber. I have a question that I cannot find anywhere. In your loop in the context of torch.no_grad(), it is quite confusing for me why if you write `w = w - learning_rate * w.grad` instead of what you actually wrote `w -= learning_rate * w.grad` the loop fails because the w.grad attribute is None after this. Do you know why this is the case? Why if you do `w.data = w.data - learning_rate * w.grad` on the contrary it works? Thank you very much

@joymaurya3658 9 ай бұрын

I have same doubt

@gonzalopolo2612 9 ай бұрын

@@joymaurya3658 I understood later why this happens. Simply: `w = w - learning_rate * w.grad` is doint an inplace modification of the tensor w. While w = w - learning_rate * w.grad is actually **creating a new tensor** (not using the 'in-place' version of the tensor operation) and assigning it to the old `w` **breaking the computation graph**. This means that, the `grad` attribute of the original `w` tensor is not automatically transferred to the new tensor, so the new tensor does not have a valid gradient anymore and fails after the first iteration. More details: the new `w` is no longer a **leaf tensor** and so it does not have a `w.grad` attribute populated by default in Pytorch. Hope this helps.

@joymaurya3658 9 ай бұрын

@@gonzalopolo2612 thanks

@gheorghemihai-mircea6500 3 ай бұрын

You are a very, very good teacher! Thanks!

@aboubekrlawan1501 4 жыл бұрын

a wonderful tutorial ! please keeg sharing with us

@patloeber 4 жыл бұрын

Thank you! I will

@schmijo 3 жыл бұрын

This series has helped me a lot. Thank you

@patloeber 3 жыл бұрын

glad to hear this

@aishwaryalakshmisrinivasan4585 3 жыл бұрын

One of the best explanation ever !!

@patloeber 3 жыл бұрын

thanks a lot!

@BARaaz04 Жыл бұрын

Thanks for your effort. One of the best lectures on pytorch.

@abdulkafiyahia402 3 жыл бұрын

you are really the best,who has explained ML Thanks alot

@patloeber 3 жыл бұрын

Thanks! Glad you like it!

@vicentino_twelve 3 жыл бұрын

This tutorial help me a lot! I'm working in scientific project on undergraduate. Greetings from Brazil!

@patloeber 3 жыл бұрын

glad it's helpful :)

@davidaliaga4708 2 жыл бұрын

Why mean in the gradient(numpy)? np.dot already gives an scalar so the mean of a scalar is the same scalar...

@squirrelpatrick3670 3 жыл бұрын

yes Python Engineer. Watching your tutorials, I'm like, this is easy, I knew it all along. Except that wasn't the case yesterday. I was honestly terrified. Looking forward to finishing the course, thank you

@patloeber 3 жыл бұрын

Haha, hope you will enjoy the course!

@ericohana9650 2 жыл бұрын

Thanks for the video! Just a question: Why did you take the mean() of the dot product for the gradient (step 1)? The result is already a scalar so the mean() is the same value isn't it? Apologies ... just seen the comment below

@ahmedidris305 4 жыл бұрын

Thank you for these awesome tutorials, they are practical and crystal clear

@patloeber 4 жыл бұрын

thanks!

@daliborgrudenic9966 Жыл бұрын

This series are pleaseure to watch

@BrianPondiGeoGeek 2 ай бұрын

Great explanation

@varungupta01 Жыл бұрын

With the Numpy implementation, changing X and Y to other numbers like X = 2,8,16,48, Y = 4,16, 32, 96: the loss won't converge at all, in-fact it's increasing. Any insights on this? (Code is correct because using 1,2,3,4 works as expected.

@musicalworld2207 4 жыл бұрын

Hi! Thank you for the tutorials. They are extremely good and clear. I have a doubt regarding this step: with torch.no_grad(): w = w - learning_rate *w_grad you mention that we do not want it to be a part of computational graph, but we do need to get the gradient of w right? Why is necessary to disable the autograd? If we don't disable it, does it mean, it will change the value of w in addition to our change? In the sense, the change will have no control from our side? Thank you once again.

@patloeber 4 жыл бұрын

we already calculated the gradient by calling backward() in the training loop, but the actual weight update here does not have to be part of the computational graph...if we don't disable it then yes this update operation might be tracked and then affect the next backward() call

@musicalworld2207 4 жыл бұрын

@@patloeber Thank you so much for your reply!

@ЭнесЭсветКузуджу 3 жыл бұрын

man this was awesome thank you

@zainthemaynnn 3 жыл бұрын

thanks for showing it without torch. it was a little bit difficult to understand what torch did behind the scenes, this video made it click.

@ShahabShokouhi 9 ай бұрын

Amazing job Patrick, thank you so much

@santoshumeshshet1083 4 жыл бұрын

DAM! after 2 years the concept is now clear.

@patloeber 4 жыл бұрын

that's nice to hear!

@430matthew6 4 жыл бұрын

In Chinese, almost pytorch course content is stagnant under pytorch v0.4, when I found here, I think your course is the best of pytorch, even my English is suck, but it does not affect my learn because of your graphic teaching is vivid. Could you post about attention mechanism tutorial, thanks a lot!!!!

@InfoArduino 3 жыл бұрын

Hello great work! In the example shown with linear regression with numpy i think there is something wrong: def gradient(x, y, y_pred): return np.dot(2 * x, y_pred - y).mean() I think the mean-function doesnt work, because these print commands gives the same values: def gradient(x, y, y_pred): print("Gradient") print(np.dot(2 * x, y_pred - y)) print(np.dot(2 * x, y_pred - y).mean()) return np.dot(2 * x, y_pred - y).mean() python 3.9, numpy 1.19.4

@tesfalemhaile8427 3 жыл бұрын

yes, in this case mean function has no effect.

@camus6525 4 жыл бұрын

Thanks a lot !!! A video showing the gradient ASCENT technique would be great, especially with policy gradients on deep reinforcement learning !!!

@patloeber 4 жыл бұрын

could be something for the future..one of my upcoming videos is about Reinforcement learning :)

@hanzack918 4 жыл бұрын

This is the best tutorial I've ever seen!

@patloeber 4 жыл бұрын

Thank you !

@akashverma4280 4 жыл бұрын

We mean it, bro! you are the best!!!

@lorryzou9367 Жыл бұрын

Why do we update the weight in 'with torch.no_grad()' ?

@UserLS96 4 жыл бұрын

Excellent video, I really enjoyed. Greetings from Mexico

@patloeber 4 жыл бұрын

Thanks! Greetings back from Germany

@DgibrillyMutabazi 2 ай бұрын

If the manual computation to update weight takes less time to converge to the why should the pytorch backpropagation be used? It went from 20 to more than 61 iterations before the correct weights was found?

@taker9246 3 жыл бұрын

Bro you're the best! Deserved sub !

@patloeber 3 жыл бұрын

thanks so much!

@NikhilCherianKurian 3 жыл бұрын

Hi, liked your videos. Btw, which editor/IDE are you using?

@patloeber 3 жыл бұрын

Thanks! VS Code. I have a tutorial about my editor setup on this channel

@scottmorrison7828 4 жыл бұрын

Outstanding!

@patloeber 4 жыл бұрын

Thanks!

@shrawansahu9500 2 жыл бұрын

Great Work 😊

@amankushwaha8927 2 жыл бұрын

Thanks a lot. It helps

@xiquandong1183 4 жыл бұрын

Thanks a lot. I wonder why it has so less views. Keep it up !!! Subscribed.

@patloeber 4 жыл бұрын

Thank you :)

@pathikghugare9918 3 жыл бұрын

Hey , instead of w -= I tried it with w = w - ... but then I got this error w.grad.zero_() AttributeError: 'NoneType' object has no attribute 'zero_' but when I am doing w -= it works fine why ?

@TuanTran-mk9jf 9 ай бұрын

Thank you so much

@chungweinlee Жыл бұрын

Hi, thank you for the tutorials, the tutorial 5 video is not focused.

@tanmaykulkarni6046 4 ай бұрын

You are the best 💯💯💯

@Ali-dbds 4 жыл бұрын

Superb tutorial! Thank you very much. I have a question. When I try to run gradients manually (by Numpy), the speed of calculation is too high in comparison to the calculation by Torch function. I want to know where the problem is? It should be noted that on my laptop torch runs on the CPU.

@patloeber 4 жыл бұрын

You mean the manual computation is faster? That can easily be true because here you only have to calculate the formula in one operation. But the backward pass is far more computationally expensive...

@BhanudaySharma506 3 жыл бұрын

I tried defining gradient in two ways- def gradient(y, y_predicted, x): #return (2*(y_predicted - y)*x).mean() return torch.dot(2*x, y_predicted - y ).mean() Both of them converging to correct solution. But, the convergence of second one was much much faster than the first one, i.e., upper one. Could you explain why?

@patloeber 3 жыл бұрын

I think the first equation is only correct for 1-d tensors, so it will produce incorrect values otherwise

@Forka137 3 жыл бұрын

It's because the second one is not calculating the mean. the function .dot() returns only one value (an escalar), then .mean( ) gives you the mean for one value (it divides it by one). By doing this you are using a gradient 4 times bigger than in the first one, which it's the same as using a bigger learning rate.

@helenadati6363 3 жыл бұрын

Perfect ! Thank you so much !

@patloeber 3 жыл бұрын

glad you like it!

@nibelungueros Жыл бұрын

I just dont quite understand why in the gradient function we create at first, w is not included in the function when doing the np.dot. If the derivative of J is 1/N * (wx - y) * x w is still in the formula

@paulaperdomo7921 3 жыл бұрын

Hello great video! I implemented by own and it works! But I was just wondering, if we want to calculate something say like y = x^2, then we would no longer be able to use linear regression algorithms right?

@8462anto 3 жыл бұрын

Yes you could still use linear regression, but it will not be a perfect fit, no matter how long you train. It would be similar to approximating y=x^2 with a first order Taylor polynomial

@vl4416 2 жыл бұрын

@@8462anto Hello! And how do you solve such a task (x^2) using the neural network? Adding a hidden layer with an arbitrary number of neurons?

@xflory26x Жыл бұрын

Not sure if it's really daft of me but when implementing Autograd, you replaced the gradients dw = gradient(X,Y,y_pred) to l.backwards, but replaced the "dw" from w-= learning_rate * dw with w.grad. Why is that?

@Cantordust027 26 күн бұрын

hey mate !, l.backward() computes all local grad included in a model , but earlier dw = gradient(X,Y,y_pred) it was calculating directly dl/dw , so that's why in case of updation we changed w-=lr*dw to w-=lr*w.grad as w.grad gives dl/dw , but l.backward() just computes it , hope it helps!

@gabrieleliuzzo7859 Ай бұрын

thankyou😀

@omerfeyyazselcuk7325 3 жыл бұрын

awesome, thank you master.

@patloeber 3 жыл бұрын

You are very welcome

@filosofiadetalhista Жыл бұрын

It would have been great if somewhere you ever explained where this "computational graph" is stored. I takes viewers the longest of time to realize that operations such as loss.backward() have effects hidden from view, namely, on the computational graph (which is why it could change the value of w.grad).

@breakdancerQ 4 жыл бұрын

That is one damn good video!

@patloeber 4 жыл бұрын

Thank you!

@peter_. 3 жыл бұрын

def gradient(x, y, y_predicted): return np.dot(2*x, y_predicted-y).mean() In the gradient function above, the mean() function is unnecessary. I think it should be divided by the number of elements in the array.

@Forka137 3 жыл бұрын

I had the same doubt, for this case you need to use: return (2*x*(y_predicted - y)).mean( ) then it does what it's supposed to do :)

@genetixx01 3 жыл бұрын

Freakin good tutorials. You should teach university professors how to teach properly.

@patloeber 3 жыл бұрын

thanks!

@lakeguy65616 3 жыл бұрын

great video series!

@patloeber 3 жыл бұрын

Thanks :)

@lakeguy65616 3 жыл бұрын

I'm getting this error, any ideas why? Thank you for helping me understand the error of my ways... print(f'prediction before training: f(5) = {forward(5):.3f}') TypeError: unsupported format string passed to numpy.ndarray.__format__

@peregudovoleg 3 жыл бұрын

make sure you write everything in f" your code here ". You need an f before quotation marks.

@juleswombat5309 3 жыл бұрын

This is really great stuff. My problems is that I cannot get VS Code to respond with Intelligence in the same way as it is working for you. The python Intellisense on VS Code is very slow, with suggestion some 4 seconds after the np.dot. and I do not get any suggestions for what is to go inside the [ ], within a numpy array. So I cannot be as productive on VS Code, compared to compiled code like C#. PyCharm is also very slow Intellisense.

@patloeber 3 жыл бұрын

hmm that's bad :( for me sometimes it's also slow in VS Code...

@ilkayand4 4 жыл бұрын

You are awesome!

@okaynext8692 3 жыл бұрын

Great stuff. Keep it up !!!

@patloeber 3 жыл бұрын

Appreciate it!

@ravivarma5703 4 жыл бұрын

Hi when i do the below part with torch.no_grad(): w = w - learning_rate * w.grad # here after doing w.grad my w.grad becomes None why is it like that ? print(w.grad)----> None # so if w.grad is None we cant zero the gradients right? can you elaborate this please w.grad.zero_()

@patloeber 4 жыл бұрын

be careful. there is a difference between w = w - x, and w -= x...in your case you assign w to a new variabel and therefore it loses the gradient. it my calculation i used with torch.no_grad(): w -= learning_rate * w.grad as alternative you can also use: w.data = w.data - learning_rate * w.grad

@ravivarma5703 4 жыл бұрын

got it 🙂 Thanks a lot

@xtian_neuralx 4 жыл бұрын

Hi, another alternative is use: w.sub_(w.grad*learning_rate) instead: w = w - learning_rate * w.grad

@peregudovoleg 3 жыл бұрын

@@patloeber thanks for this insight - "you assign w to a new variabel and therefore it loses the gradient" I would have never guessed it could be the case. These 2 have always seemed the interchangable before.

@delphinemico4283 3 жыл бұрын

Great video with very nice explanations! One thing though, for your 05_gradients.numpy.py file, in your for loop, I am assuming you are looping over epochs, right? And so perhaps you should probably change the name of the variable 'n_iters' to 'n_epochs' (or something along these lines), otherwise, it is kind of confusing when you say 'number of iterations equals 10' which can be confused with the number of iterations in the Machine Learning sense (the maximum batch size possible for your example is 4, and so technically, you can't have more than 4 iterations).

@patloeber 3 жыл бұрын

Yes I think you are right. I use n_epochs in later tutorials

@scharupa 4 жыл бұрын

Best tutorial

@patloeber 4 жыл бұрын

Thanks!

@Frostbyte-Game-Studio 3 жыл бұрын

thanks man you explain this very well appreciate the github code as well

@patloeber 3 жыл бұрын

Glad it helped!

@m15995 4 жыл бұрын

That is a neat explanation of the framework, great content and useful tips! However I can not seem to make it converge regardless of the learning rate, even if I make it < 10E-4 and run up 100-1000 gradient update iterations it still oscillates. Doing it manually converges asymptotically as expected. Why would implementing it in a torch framework cause this?

@patloeber 4 жыл бұрын

Maybe you need to play around with the learning_rate. Using backpropagation is not as precise as calculating the gradient manually, so there can be differences. Also make sure to empty your gradients after the update step. A lot of beginner forget to call w.grad.zero_()

@abinashsahu4661 3 жыл бұрын

First of all thank you for your effort in putting together a great tutorial. When I use a set of numbers lager than say 7, the numpy implementation does not work. Try this X = np.array([1,2,3,4,5,6,7], dtype = np.float32) Y = np.array([2,4,6,8,10,12,14], dtype = np.float32) and the training loop throws an error. any suggestions?

@patloeber 3 жыл бұрын

what error do you get? it works for me with 7 numbers...try to compare with my code on github...

@abinashsahu4661 3 жыл бұрын

@@patloeber Hello. I used your code and changed the n_iters = 100. X = np.array([1, 2, 3, 4,5,6,7,8], dtype=np.float32) Y = np.array([2, 4, 6, 8,10,12,14,16], dtype=np.float32). Rest all same as your code. Gets a run time warning with the loss increasing - RuntimeWarning: overflow encountered in square return ((y_pred - y) ** 2).mean()

@PriyankaJain-dg8rm Жыл бұрын

Getting this error :TypeError: 'numpy.float32' object is not callable how to fix it?

@DanielWeikert 4 жыл бұрын

From scratch - very nice. Thank you. What's in your pipeline for the next videos? Best regards

@patloeber 4 жыл бұрын

Hi, the following is just a rough schedule of what I want to do. If you have any suggestions let me know! Training Pipeline Linear Regression Logistic Regression Custom Neural Net DataLoader CNN Tensorboard

@nicolasgabrielsantanaramos291 4 жыл бұрын

Very good class!!! (edit)

@adsgfsgd 4 жыл бұрын

didactic is not a positive word btw

@thatchipmunksings 4 жыл бұрын

You are AWESOME!

@patloeber 4 жыл бұрын

Thanks!

@mishaalnaeem6135 4 жыл бұрын

What editor are you using?

@patloeber 4 жыл бұрын

VS Code

@하민박사 4 жыл бұрын

How could you run your codes in the 'output' panel instead of 'terminal' panel??

@sudarshankoirala2072 4 жыл бұрын

you can do that by installing the code runner extension in vscode

@patloeber 4 жыл бұрын

Exactly! I’m using this Extension

@prajganesh 4 жыл бұрын

My calculus is rusty, but in line 23, don't we have to take inner derivative as well. so wx-y becomes x

@patloeber 4 жыл бұрын

you mean applying the chain rule? yes (wx-y)/dw becomes x. That's what we are we doing, and that's why we have 2*x in the first term now

@prajganesh 4 жыл бұрын

@@patloeber oh thanks.

@VeyselDeste-p4l 9 ай бұрын

Why are the results of manual differentiation for 10 epochs better than those for 20 epochs with autograd? Is using autograd ridiculous?

@Cantordust027 26 күн бұрын

because manually you are giving exact values of each differentiation but in autograd you are just intializing the vals so it comes close to exact , but if you increase the iterations it will be equal to manual output

@mahery_ranaivoson 3 жыл бұрын

just discovered that the syntax "w -=" and "w = w -" are different at the graph computation's view. Anyone can explain why?

@nitinkapoor4472 3 жыл бұрын

actually these operations are called as "in-place" operation , they operate without making a copy and as per Pytorch documentation these "in-place" operations are not allowed on tensors which are having "grad_fn" . I hope that resolves the question.

@peregudovoleg 2 жыл бұрын

by the way, there is going to be an error if we update with: w = w - lr * w.grad (talking about the torch part here) and its ok with numpy. Must be what Nitin Kapoor said

@МихаилПоликарпов-ф4м 4 жыл бұрын

I repeat the code after you,Time of your video is 16:23.Spyder gives error in w.grad.zero_():AttributeError: 'NoneType' object has no attribute 'zero_'.Why?

@patloeber 4 жыл бұрын

Be careful if you use w-=... or w = w - ... when updating the weights . You can compare with the code on GitHub

@tobi9668 3 жыл бұрын

Why you use the dot product?

@tobi9668 3 жыл бұрын

My first thought was to use matmul

@我想學英文 2 жыл бұрын

5:25 dJ jet function

@ahmadsystems3560 3 жыл бұрын

dear, when i use this line: w -= learning_rate * w.grad it shows error: TypeError: unsupported operand type(s) for *: 'float' and 'NoneType' please give solution:

@skymanaditya 3 жыл бұрын

did you set requires_grad=True?

@patloeber 3 жыл бұрын

yep that might be the issue

@MpTSprocket 4 жыл бұрын

Your videos are top!!!! If I use: with torch.no_grad(): w = w - w.grad*learning_rate instead with torch.no_grad(): w -= w.grad*learning_rate I get the following error: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 45 46 # zero gradients, sets all gradients back to zero, otherwise this would cumulate after each iteration ---> 47 w.grad.zero_() AttributeError: 'NoneType' object has no attribute 'zero_'. --------------------------------------------------------------------------- what is the reason for this?