This is the best ML tutorial series I've seen so far. You're explaining the underlying concepts so well. This is the first time that I have such a concrete understanding of what goes on during backpropagation and gradient descent while also learning about the relevant programming framework. It is awesome how you implemented everything manually and are then switching step by step to PyTorch. Now everything from the previous videos starts to really come together. Just wanted to say that I'm very thankful and greatly appreciate your efforts! You gained a subscriber. Greetings from a fellow German!
@patloeber3 жыл бұрын
thank you so much!
@harissajwani25834 жыл бұрын
First Deep Learning tutorials where I don't feel sleepy. Good work brother.
@patloeber4 жыл бұрын
Thanks!
@mawkuri54963 жыл бұрын
me too
@MohamedShatarah-l7t Жыл бұрын
dude oh my god. you are so amazing. all this complex stuff I now understand because of you brilliant teaching. you use examples that are easy to understand and explain in detail. oh god i can only imagine the effort you put in this. thank you so much.
@lucyfrye6723 Жыл бұрын
Just in case people are a little confused : the partial derivatives he is REALLY dotting for Dloss/dw are : x (which is Dy_predicted / Dw) and Dloss/Dy_predicted), which is : 2 * (y_predicted - y) / N, where N = x.size ==y_predicted.size . Also, y_predicted is often simply called 'a' (of the top layer in this case, but we only have one layer in this example), according to convention. I hope this helps someone. Nice video, it's a good idea to manually do the gradient just so people have at least a rough idea of what goes on under the hood of a neural network. Things get a lot trickier for more (hidden) layers, though!
@MrThezyga Жыл бұрын
thanks, I actually didn't catch that while watching
@kurtulusbulus67283 жыл бұрын
One of the best Pytorch tutorial on the internet. Great work my friend! Keep sharing please!
@patloeber3 жыл бұрын
glad to hear that :)
@theusualcouple10 ай бұрын
I have build many Linear regression models but this is the first time I have actually coded the linear regression. I am so happy today. Thanks @patloeber
@hector15024 жыл бұрын
Thank you for your videos. I think there is a mistake in the calculus of the gradient function in the numpy example. It is worthless to compute the mean of a scalar (the output of np.dot(2*x, y_predicted-y) is a scalar). You should simply divide the result by (x.size). So the function should be: return(np.dot(2*x, y_pred - y) / x.size). I think that is the reason why your results are different from Pytorch.
@patloeber4 жыл бұрын
You are correct! Great catch, and thank you for detecting this :) I will fix this in my github repo. In this case the error was not too bad and it only resulted in a different scaling of w, so the gradient descent worked anyway.
@patloeber4 жыл бұрын
I put some more thinking into this. It's also possible to not use the dot product and write: return (2*x * (y_pred - y)).mean(). The result should be the same as your solution.
@xinqiaozhao51544 жыл бұрын
@@patloeber I have a question about your reply, in this case, x and y are both vector, and I do an element level chain rule calculation and get a result the gradient is equal to 2/N *(w*np.dot(x,x) - np.dot(x, y)) in this case, you can try this and compare the result with the torch.nn.mseloss, the results are the same
@patloeber4 жыл бұрын
my calculation which I have in the comment is: # J = MSE = 1/N * (w*x - y)**2 # dJ/dw = 1/N * 2x(w*x - y) if we put x inside the parenthesis, then it is 1/N * 2(w*x*x - y*x) which is your formula. so yes it should be the same.
@ericohana96502 жыл бұрын
@@patloeber It is the clearest solution imo
@taishi5732 жыл бұрын
Just saw your videos and I'm now binge watching them. Explanations are so easy to understand. You are the best!
@jyotipch2 жыл бұрын
For those who are struggling in line number 21 like me, you need to binomially expand 1/N*(wx-y)**2 which becomes 1/N((wx)**2+y**2-2wxy). Derivative of that wrt w becomes 1/N(2wx**2-2xy) => 1/N*2x(wx-y)
@Coconut7403 Жыл бұрын
He uses the chain rule: F'(x)=f'(g(x))*g'(x)
@andersondacostaferreira31193 жыл бұрын
I have to study AI for my actual astrophysics project and you are helping me a lot. Thank you so much!
@TheOraware Жыл бұрын
for gradient calculation , you are using dot product which yields scalar gradient for all samples combined, getting mean of this scalar value will result the same value , this scalar value should be divided by len(x) def gradient(x, y, y_predicted): N = len(x) return np.dot(2 * x, y_predicted - y)/N
@tiagoleviski3 жыл бұрын
The way you first explained in numpy and then changed to torch was fantastic. I finally understood the training loop in torch thank's to you!
@patloeber3 жыл бұрын
great to hear :)
@dingusagar2 жыл бұрын
wow, now i finally get a picture of what is happening behind the scenes of autograd. great stuff
@abdelrahmanhammad10203 жыл бұрын
Thanks for the great tutorial. I believe both solutions should converge similarly. The difference in number of iterations was due to the bug in gradient calculation discussed earlier in the comments. I tried both solutions, and they converge in 411 steps using the same initialization and learning rate. Code below: _____ Common Code: _____ import torch x = torch.tensor(1) y = torch.tensor(2) lr = .01 def forward(x): return w*x _____ Manual Gradient Code: _____ def gradient(x, y, y_hat): return 2*x*(y_hat-y) w = torch.tensor(0., requires_grad=True) n_iter = 500 for epoch in range(n_iter): y_hat = forward(x) grad = gradient(x, y, y_hat) print(f'Iternation = {epoch}, weight = {w:0.3f}, loss = {l:0.3f}, gradient = {grad:0.3f}, f(5) = {forward(5)}') with torch.no_grad(): w -= lr*grad _____ Autograd Code: _____ def loss(y, y_hat): return (y_hat-y)**2 w = torch.tensor(0., requires_grad=True) n_iter = 500 for epoch in range(n_iter): y_hat = forward(x) l = loss(y, y_hat) l.backward() print(f'Iternation = {epoch}, weight = {w:0.3f}, loss = {l:0.3f}, gradient = {w.grad:0.3f}, f(5) = {forward(5)}') with torch.no_grad(): w -= lr*w.grad w.grad.zero_()
@gonzalopolo2612 Жыл бұрын
Really great tutorial @Patrick Loeber. I have a question that I cannot find anywhere. In your loop in the context of torch.no_grad(), it is quite confusing for me why if you write `w = w - learning_rate * w.grad` instead of what you actually wrote `w -= learning_rate * w.grad` the loop fails because the w.grad attribute is None after this. Do you know why this is the case? Why if you do `w.data = w.data - learning_rate * w.grad` on the contrary it works? Thank you very much
@joymaurya36589 ай бұрын
I have same doubt
@gonzalopolo26129 ай бұрын
@@joymaurya3658 I understood later why this happens. Simply: `w = w - learning_rate * w.grad` is doint an inplace modification of the tensor w. While w = w - learning_rate * w.grad is actually **creating a new tensor** (not using the 'in-place' version of the tensor operation) and assigning it to the old `w` **breaking the computation graph**. This means that, the `grad` attribute of the original `w` tensor is not automatically transferred to the new tensor, so the new tensor does not have a valid gradient anymore and fails after the first iteration. More details: the new `w` is no longer a **leaf tensor** and so it does not have a `w.grad` attribute populated by default in Pytorch. Hope this helps.
@joymaurya36589 ай бұрын
@@gonzalopolo2612 thanks
@gheorghemihai-mircea65003 ай бұрын
You are a very, very good teacher! Thanks!
@aboubekrlawan15014 жыл бұрын
a wonderful tutorial ! please keeg sharing with us
@patloeber4 жыл бұрын
Thank you! I will
@schmijo3 жыл бұрын
This series has helped me a lot. Thank you
@patloeber3 жыл бұрын
glad to hear this
@aishwaryalakshmisrinivasan45853 жыл бұрын
One of the best explanation ever !!
@patloeber3 жыл бұрын
thanks a lot!
@BARaaz04 Жыл бұрын
Thanks for your effort. One of the best lectures on pytorch.
@abdulkafiyahia4023 жыл бұрын
you are really the best,who has explained ML Thanks alot
@patloeber3 жыл бұрын
Thanks! Glad you like it!
@vicentino_twelve3 жыл бұрын
This tutorial help me a lot! I'm working in scientific project on undergraduate. Greetings from Brazil!
@patloeber3 жыл бұрын
glad it's helpful :)
@davidaliaga47082 жыл бұрын
Why mean in the gradient(numpy)? np.dot already gives an scalar so the mean of a scalar is the same scalar...
@squirrelpatrick36703 жыл бұрын
yes Python Engineer. Watching your tutorials, I'm like, this is easy, I knew it all along. Except that wasn't the case yesterday. I was honestly terrified. Looking forward to finishing the course, thank you
@patloeber3 жыл бұрын
Haha, hope you will enjoy the course!
@ericohana96502 жыл бұрын
Thanks for the video! Just a question: Why did you take the mean() of the dot product for the gradient (step 1)? The result is already a scalar so the mean() is the same value isn't it? Apologies ... just seen the comment below
@ahmedidris3054 жыл бұрын
Thank you for these awesome tutorials, they are practical and crystal clear
@patloeber4 жыл бұрын
thanks!
@daliborgrudenic9966 Жыл бұрын
This series are pleaseure to watch
@BrianPondiGeoGeek2 ай бұрын
Great explanation
@varungupta01 Жыл бұрын
With the Numpy implementation, changing X and Y to other numbers like X = 2,8,16,48, Y = 4,16, 32, 96: the loss won't converge at all, in-fact it's increasing. Any insights on this? (Code is correct because using 1,2,3,4 works as expected.
@musicalworld22074 жыл бұрын
Hi! Thank you for the tutorials. They are extremely good and clear. I have a doubt regarding this step: with torch.no_grad(): w = w - learning_rate *w_grad you mention that we do not want it to be a part of computational graph, but we do need to get the gradient of w right? Why is necessary to disable the autograd? If we don't disable it, does it mean, it will change the value of w in addition to our change? In the sense, the change will have no control from our side? Thank you once again.
@patloeber4 жыл бұрын
we already calculated the gradient by calling backward() in the training loop, but the actual weight update here does not have to be part of the computational graph...if we don't disable it then yes this update operation might be tracked and then affect the next backward() call
@musicalworld22074 жыл бұрын
@@patloeber Thank you so much for your reply!
@ЭнесЭсветКузуджу3 жыл бұрын
man this was awesome thank you
@zainthemaynnn3 жыл бұрын
thanks for showing it without torch. it was a little bit difficult to understand what torch did behind the scenes, this video made it click.
@ShahabShokouhi9 ай бұрын
Amazing job Patrick, thank you so much
@santoshumeshshet10834 жыл бұрын
DAM! after 2 years the concept is now clear.
@patloeber4 жыл бұрын
that's nice to hear!
@430matthew64 жыл бұрын
In Chinese, almost pytorch course content is stagnant under pytorch v0.4, when I found here, I think your course is the best of pytorch, even my English is suck, but it does not affect my learn because of your graphic teaching is vivid. Could you post about attention mechanism tutorial, thanks a lot!!!!
@InfoArduino3 жыл бұрын
Hello great work! In the example shown with linear regression with numpy i think there is something wrong: def gradient(x, y, y_pred): return np.dot(2 * x, y_pred - y).mean() I think the mean-function doesnt work, because these print commands gives the same values: def gradient(x, y, y_pred): print("Gradient") print(np.dot(2 * x, y_pred - y)) print(np.dot(2 * x, y_pred - y).mean()) return np.dot(2 * x, y_pred - y).mean() python 3.9, numpy 1.19.4
@tesfalemhaile84273 жыл бұрын
yes, in this case mean function has no effect.
@camus65254 жыл бұрын
Thanks a lot !!! A video showing the gradient ASCENT technique would be great, especially with policy gradients on deep reinforcement learning !!!
@patloeber4 жыл бұрын
could be something for the future..one of my upcoming videos is about Reinforcement learning :)
@hanzack9184 жыл бұрын
This is the best tutorial I've ever seen!
@patloeber4 жыл бұрын
Thank you !
@akashverma42804 жыл бұрын
We mean it, bro! you are the best!!!
@lorryzou9367 Жыл бұрын
Why do we update the weight in 'with torch.no_grad()' ?
@UserLS964 жыл бұрын
Excellent video, I really enjoyed. Greetings from Mexico
@patloeber4 жыл бұрын
Thanks! Greetings back from Germany
@DgibrillyMutabazi2 ай бұрын
If the manual computation to update weight takes less time to converge to the why should the pytorch backpropagation be used? It went from 20 to more than 61 iterations before the correct weights was found?
@taker92463 жыл бұрын
Bro you're the best! Deserved sub !
@patloeber3 жыл бұрын
thanks so much!
@NikhilCherianKurian3 жыл бұрын
Hi, liked your videos. Btw, which editor/IDE are you using?
@patloeber3 жыл бұрын
Thanks! VS Code. I have a tutorial about my editor setup on this channel
@scottmorrison78284 жыл бұрын
Outstanding!
@patloeber4 жыл бұрын
Thanks!
@shrawansahu95002 жыл бұрын
Great Work 😊
@amankushwaha89272 жыл бұрын
Thanks a lot. It helps
@xiquandong11834 жыл бұрын
Thanks a lot. I wonder why it has so less views. Keep it up !!! Subscribed.
@patloeber4 жыл бұрын
Thank you :)
@pathikghugare99183 жыл бұрын
Hey , instead of w -= I tried it with w = w - ... but then I got this error w.grad.zero_() AttributeError: 'NoneType' object has no attribute 'zero_' but when I am doing w -= it works fine why ?
@TuanTran-mk9jf9 ай бұрын
Thank you so much
@chungweinlee Жыл бұрын
Hi, thank you for the tutorials, the tutorial 5 video is not focused.
@tanmaykulkarni60464 ай бұрын
You are the best 💯💯💯
@Ali-dbds4 жыл бұрын
Superb tutorial! Thank you very much. I have a question. When I try to run gradients manually (by Numpy), the speed of calculation is too high in comparison to the calculation by Torch function. I want to know where the problem is? It should be noted that on my laptop torch runs on the CPU.
@patloeber4 жыл бұрын
You mean the manual computation is faster? That can easily be true because here you only have to calculate the formula in one operation. But the backward pass is far more computationally expensive...
@BhanudaySharma5063 жыл бұрын
I tried defining gradient in two ways- def gradient(y, y_predicted, x): #return (2*(y_predicted - y)*x).mean() return torch.dot(2*x, y_predicted - y ).mean() Both of them converging to correct solution. But, the convergence of second one was much much faster than the first one, i.e., upper one. Could you explain why?
@patloeber3 жыл бұрын
I think the first equation is only correct for 1-d tensors, so it will produce incorrect values otherwise
@Forka1373 жыл бұрын
It's because the second one is not calculating the mean. the function .dot() returns only one value (an escalar), then .mean( ) gives you the mean for one value (it divides it by one). By doing this you are using a gradient 4 times bigger than in the first one, which it's the same as using a bigger learning rate.
@helenadati63633 жыл бұрын
Perfect ! Thank you so much !
@patloeber3 жыл бұрын
glad you like it!
@nibelungueros Жыл бұрын
I just dont quite understand why in the gradient function we create at first, w is not included in the function when doing the np.dot. If the derivative of J is 1/N * (wx - y) * x w is still in the formula
@paulaperdomo79213 жыл бұрын
Hello great video! I implemented by own and it works! But I was just wondering, if we want to calculate something say like y = x^2, then we would no longer be able to use linear regression algorithms right?
@8462anto3 жыл бұрын
Yes you could still use linear regression, but it will not be a perfect fit, no matter how long you train. It would be similar to approximating y=x^2 with a first order Taylor polynomial
@vl44162 жыл бұрын
@@8462anto Hello! And how do you solve such a task (x^2) using the neural network? Adding a hidden layer with an arbitrary number of neurons?
@xflory26x Жыл бұрын
Not sure if it's really daft of me but when implementing Autograd, you replaced the gradients dw = gradient(X,Y,y_pred) to l.backwards, but replaced the "dw" from w-= learning_rate * dw with w.grad. Why is that?
@Cantordust02726 күн бұрын
hey mate !, l.backward() computes all local grad included in a model , but earlier dw = gradient(X,Y,y_pred) it was calculating directly dl/dw , so that's why in case of updation we changed w-=lr*dw to w-=lr*w.grad as w.grad gives dl/dw , but l.backward() just computes it , hope it helps!
@gabrieleliuzzo7859Ай бұрын
thankyou😀
@omerfeyyazselcuk73253 жыл бұрын
awesome, thank you master.
@patloeber3 жыл бұрын
You are very welcome
@filosofiadetalhista Жыл бұрын
It would have been great if somewhere you ever explained where this "computational graph" is stored. I takes viewers the longest of time to realize that operations such as loss.backward() have effects hidden from view, namely, on the computational graph (which is why it could change the value of w.grad).
@breakdancerQ4 жыл бұрын
That is one damn good video!
@patloeber4 жыл бұрын
Thank you!
@peter_.3 жыл бұрын
def gradient(x, y, y_predicted): return np.dot(2*x, y_predicted-y).mean() In the gradient function above, the mean() function is unnecessary. I think it should be divided by the number of elements in the array.
@Forka1373 жыл бұрын
I had the same doubt, for this case you need to use: return (2*x*(y_predicted - y)).mean( ) then it does what it's supposed to do :)
@genetixx013 жыл бұрын
Freakin good tutorials. You should teach university professors how to teach properly.
@patloeber3 жыл бұрын
thanks!
@lakeguy656163 жыл бұрын
great video series!
@patloeber3 жыл бұрын
Thanks :)
@lakeguy656163 жыл бұрын
I'm getting this error, any ideas why? Thank you for helping me understand the error of my ways... print(f'prediction before training: f(5) = {forward(5):.3f}') TypeError: unsupported format string passed to numpy.ndarray.__format__
@peregudovoleg3 жыл бұрын
make sure you write everything in f" your code here ". You need an f before quotation marks.
@juleswombat53093 жыл бұрын
This is really great stuff. My problems is that I cannot get VS Code to respond with Intelligence in the same way as it is working for you. The python Intellisense on VS Code is very slow, with suggestion some 4 seconds after the np.dot. and I do not get any suggestions for what is to go inside the [ ], within a numpy array. So I cannot be as productive on VS Code, compared to compiled code like C#. PyCharm is also very slow Intellisense.
@patloeber3 жыл бұрын
hmm that's bad :( for me sometimes it's also slow in VS Code...
@ilkayand44 жыл бұрын
You are awesome!
@okaynext86923 жыл бұрын
Great stuff. Keep it up !!!
@patloeber3 жыл бұрын
Appreciate it!
@ravivarma57034 жыл бұрын
Hi when i do the below part with torch.no_grad(): w = w - learning_rate * w.grad # here after doing w.grad my w.grad becomes None why is it like that ? print(w.grad)----> None # so if w.grad is None we cant zero the gradients right? can you elaborate this please w.grad.zero_()
@patloeber4 жыл бұрын
be careful. there is a difference between w = w - x, and w -= x...in your case you assign w to a new variabel and therefore it loses the gradient. it my calculation i used with torch.no_grad(): w -= learning_rate * w.grad as alternative you can also use: w.data = w.data - learning_rate * w.grad
@ravivarma57034 жыл бұрын
got it 🙂 Thanks a lot
@xtian_neuralx4 жыл бұрын
Hi, another alternative is use: w.sub_(w.grad*learning_rate) instead: w = w - learning_rate * w.grad
@peregudovoleg3 жыл бұрын
@@patloeber thanks for this insight - "you assign w to a new variabel and therefore it loses the gradient" I would have never guessed it could be the case. These 2 have always seemed the interchangable before.
@delphinemico42833 жыл бұрын
Great video with very nice explanations! One thing though, for your 05_gradients.numpy.py file, in your for loop, I am assuming you are looping over epochs, right? And so perhaps you should probably change the name of the variable 'n_iters' to 'n_epochs' (or something along these lines), otherwise, it is kind of confusing when you say 'number of iterations equals 10' which can be confused with the number of iterations in the Machine Learning sense (the maximum batch size possible for your example is 4, and so technically, you can't have more than 4 iterations).
@patloeber3 жыл бұрын
Yes I think you are right. I use n_epochs in later tutorials
@scharupa4 жыл бұрын
Best tutorial
@patloeber4 жыл бұрын
Thanks!
@Frostbyte-Game-Studio3 жыл бұрын
thanks man you explain this very well appreciate the github code as well
@patloeber3 жыл бұрын
Glad it helped!
@m159954 жыл бұрын
That is a neat explanation of the framework, great content and useful tips! However I can not seem to make it converge regardless of the learning rate, even if I make it < 10E-4 and run up 100-1000 gradient update iterations it still oscillates. Doing it manually converges asymptotically as expected. Why would implementing it in a torch framework cause this?
@patloeber4 жыл бұрын
Maybe you need to play around with the learning_rate. Using backpropagation is not as precise as calculating the gradient manually, so there can be differences. Also make sure to empty your gradients after the update step. A lot of beginner forget to call w.grad.zero_()
@abinashsahu46613 жыл бұрын
First of all thank you for your effort in putting together a great tutorial. When I use a set of numbers lager than say 7, the numpy implementation does not work. Try this X = np.array([1,2,3,4,5,6,7], dtype = np.float32) Y = np.array([2,4,6,8,10,12,14], dtype = np.float32) and the training loop throws an error. any suggestions?
@patloeber3 жыл бұрын
what error do you get? it works for me with 7 numbers...try to compare with my code on github...
@abinashsahu46613 жыл бұрын
@@patloeber Hello. I used your code and changed the n_iters = 100. X = np.array([1, 2, 3, 4,5,6,7,8], dtype=np.float32) Y = np.array([2, 4, 6, 8,10,12,14,16], dtype=np.float32). Rest all same as your code. Gets a run time warning with the loss increasing - RuntimeWarning: overflow encountered in square return ((y_pred - y) ** 2).mean()
@PriyankaJain-dg8rm Жыл бұрын
Getting this error :TypeError: 'numpy.float32' object is not callable how to fix it?
@DanielWeikert4 жыл бұрын
From scratch - very nice. Thank you. What's in your pipeline for the next videos? Best regards
@patloeber4 жыл бұрын
Hi, the following is just a rough schedule of what I want to do. If you have any suggestions let me know! Training Pipeline Linear Regression Logistic Regression Custom Neural Net DataLoader CNN Tensorboard
@nicolasgabrielsantanaramos2914 жыл бұрын
Very good class!!! (edit)
@adsgfsgd4 жыл бұрын
didactic is not a positive word btw
@thatchipmunksings4 жыл бұрын
You are AWESOME!
@patloeber4 жыл бұрын
Thanks!
@mishaalnaeem61354 жыл бұрын
What editor are you using?
@patloeber4 жыл бұрын
VS Code
@하민박사4 жыл бұрын
How could you run your codes in the 'output' panel instead of 'terminal' panel??
@sudarshankoirala20724 жыл бұрын
you can do that by installing the code runner extension in vscode
@patloeber4 жыл бұрын
Exactly! I’m using this Extension
@prajganesh4 жыл бұрын
My calculus is rusty, but in line 23, don't we have to take inner derivative as well. so wx-y becomes x
@patloeber4 жыл бұрын
you mean applying the chain rule? yes (wx-y)/dw becomes x. That's what we are we doing, and that's why we have 2*x in the first term now
@prajganesh4 жыл бұрын
@@patloeber oh thanks.
@VeyselDeste-p4l9 ай бұрын
Why are the results of manual differentiation for 10 epochs better than those for 20 epochs with autograd? Is using autograd ridiculous?
@Cantordust02726 күн бұрын
because manually you are giving exact values of each differentiation but in autograd you are just intializing the vals so it comes close to exact , but if you increase the iterations it will be equal to manual output
@mahery_ranaivoson3 жыл бұрын
just discovered that the syntax "w -=" and "w = w -" are different at the graph computation's view. Anyone can explain why?
@nitinkapoor44723 жыл бұрын
actually these operations are called as "in-place" operation , they operate without making a copy and as per Pytorch documentation these "in-place" operations are not allowed on tensors which are having "grad_fn" . I hope that resolves the question.
@peregudovoleg2 жыл бұрын
by the way, there is going to be an error if we update with: w = w - lr * w.grad (talking about the torch part here) and its ok with numpy. Must be what Nitin Kapoor said
@МихаилПоликарпов-ф4м4 жыл бұрын
I repeat the code after you,Time of your video is 16:23.Spyder gives error in w.grad.zero_():AttributeError: 'NoneType' object has no attribute 'zero_'.Why?
@patloeber4 жыл бұрын
Be careful if you use w-=... or w = w - ... when updating the weights . You can compare with the code on GitHub
@tobi96683 жыл бұрын
Why you use the dot product?
@tobi96683 жыл бұрын
My first thought was to use matmul
@我想學英文2 жыл бұрын
5:25 dJ jet function
@ahmadsystems35603 жыл бұрын
dear, when i use this line: w -= learning_rate * w.grad it shows error: TypeError: unsupported operand type(s) for *: 'float' and 'NoneType' please give solution:
@skymanaditya3 жыл бұрын
did you set requires_grad=True?
@patloeber3 жыл бұрын
yep that might be the issue
@MpTSprocket4 жыл бұрын
Your videos are top!!!! If I use: with torch.no_grad(): w = w - w.grad*learning_rate instead with torch.no_grad(): w -= w.grad*learning_rate I get the following error: --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () 45 46 # zero gradients, sets all gradients back to zero, otherwise this would cumulate after each iteration ---> 47 w.grad.zero_() AttributeError: 'NoneType' object has no attribute 'zero_'. --------------------------------------------------------------------------- what is the reason for this?