Backpropagation explained | Part 4 - Calculating the gradient

Рет қаралды 48,207

Күн бұрын

Пікірлер: 133

@deeplizard 6 жыл бұрын

Backpropagation explained | Part 1 - The intuition kzbin.info/www/bejne/jnaWnKWcaKiEotU Backpropagation explained | Part 2 - The mathematical notation kzbin.info/www/bejne/aJ62qqaIrZJkmZI Backpropagation explained | Part 3 - Mathematical observations kzbin.info/www/bejne/fWbFZZ2Id7CBrtk Backpropagation explained | Part 4 - Calculating the gradient kzbin.info/www/bejne/kKOYp5x3j6yhmqc Backpropagation explained | Part 5 - What puts the “back” in backprop? kzbin.info/www/bejne/rnTPfJKVeNaNpLM Machine Learning / Deep Learning Fundamentals playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL

@you_dont_know_me_sir 5 жыл бұрын

deeplizard That Scientific Notebook, from which you are explaining in multiple videos, seems great as a collection of notes to review when needed. Have you put it somewhere? GitHub?

@sampro454 4 жыл бұрын

5:35 shouldn't there be a da/dg in there?

@sampro454 4 жыл бұрын

Nevermind, you involve g when differentiating a with respect to z later very nice

@abhishek-shrm 3 жыл бұрын

So much professionalism in a youtube video course which is free. Thank you for making these videos.

@Sikuq 3 жыл бұрын

Although a complex issue, this presentation makes it much easier to understand. Thanks deeplizard.

@MalTimeTV Жыл бұрын

These videos of yours on the mathematics of back-propagation are just incredible. Thank you very much.

@bildadatsegha6923 2 жыл бұрын

Awesome. I am learning Deep learning as a complete novice and you have been truly helpful. Thanks. I really love the simplicity of your lectures.

@edbshllanss 3 жыл бұрын

A year ago, I started learning deep learning firstly by your contents. I acquired some intuition, but this series as well as the Keras series went over the top of my head because I was totally unfamiliar with programming and mathematics. But a year after, with basic knowledge of python and mathematics like calculus, it becomes much easier to follow the thread of your videos, and I feel I am finally standing at the starting point of machine learning. Thank you so much for your straightforward explanations!!!

@vaibhavkhobragade9773 2 жыл бұрын

Clear, concise, and perfect understanding. Thank you mandy!

@pawarranger 5 жыл бұрын

this is now my favourite ann playlist, thanks a ton!

@BabisPlaysGuitar 5 жыл бұрын

Awesome! All the advanced calculus and linear algebra classes that I took back in engineering school make sense now. Thank you very much!

@naughtrussel5787 4 жыл бұрын

I've been stuck with backpropagation for several days. I've tried a bunch of resources but you are the only one whose way of explaining this is exactly what I needed. Thanks a lot for doing this "unfolded" calculations and emphasizing _the purpose_ of doing this or that thing. This is what often missing in ANN courses. Great job!

@CreeperSlenderman 3 жыл бұрын

Imagine me I've been for months

@fatihandyildiz 3 жыл бұрын

Just wow! Normally, it's really hard for me to fully understand these derivations (even after watcing multiple times), but you just made it happen in 1.5x speed. Thank you for offering this high quality tutorial for free. Blessings to you.

@shakyasarkar7143 4 жыл бұрын

You are legend, Mam!!! Truly!! I have been searching for this backprop total calculus portion derivation throughout all the youtube videos until i came upon your videos...even i looked upon some udemy courses. Nobody, I REPEAT NOBODY has or even dared to explain this total derivation. Thank you, Mam! I owe you a lot!

@MJ2403 4 жыл бұрын

You are a gem.....able to understand backpropagation for which i was struggling like anything.

@SafeBuster80 4 жыл бұрын

Thank you for your videos of backpropagation, I now understand this subject as you explained it nicely and clearly (unlike my uni professor).

@moizahmed8987 4 жыл бұрын

Terrific video, thank you very much this is the first video that goes through backprop step by step

@weactweimpactcharityassoci3964 3 жыл бұрын

this is now my favorite ANN playlist, thanks a ton! شكرا

@durgamanoja8179 6 жыл бұрын

i have gone through your series , i must say you are AWESOME!! . i could not understand the mathematics behind back propagation in any websites or videos ,you made it very clear. Thanks a lot. Please do make such videos .

@deeplizard 6 жыл бұрын

Thank you, durga! I'm so happy to hear this 😄 If you're also interested in implementing the neural network concepts from this Deep Learning Fundamentals series in code, check out both our Keras and TensorFlow.js series! Keras: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js: kzbin.info/aero/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ-

@AnandaKevin28 3 жыл бұрын

Just. Great. Explanation. Words are not enough to express it. Thanks a lot for the explanation! 😁

@lancelotdsouza4705 2 жыл бұрын

Thanks so much you made backpropagation a cakewalk

@joshuayudice8245 4 жыл бұрын

Seriously, you are a godsend. Thank you for creating these clear and methodical videos.

@wilmobaggins 4 жыл бұрын

It might have been easier to follow the notation if you had shown from a node 2 to say node 3. The 1 looks a lot like an l to my aging eyes :) Thank you for the video very helpful.

@williamdowling 5 жыл бұрын

Awesome videos, thanks! I am curious though what happens if a g function is not differentiable? I guess that is common, too, for example g(x) = {0 if x

@justchill99902 5 жыл бұрын

Hey there! The daunting Backprop Math proof went through as smooth as butter after watching these 5 videos. Thank you so much. Sometimes I think the book itself is speaking lol. I think that one dislike is by mistake. Question - At 8:36 , you talk about why we get the derivative as g prime. Could you please explain what does it mean? and it's relation to a sub 1? PS: You are my most favourite KZbin channel. You earned it. I think your content is definitely making a difference in the world. :) Please carry on!

@deeplizard 5 жыл бұрын

Hey Nirbhay - Apologies for the delayed response! Somehow this comment was tucked away, and I just came across it. Thank you so much for your kind remarks! Really glad to hear you're enjoying and learning from the content. For your question from 8:36: (I'll eliminate the superscripts and subscripts from my explanation below.) Our objective is to differentiate a with respect to z. Recall that a = g(z) by definition. Taking the derivative of a with respect to z means that we need to take the derivative of g(z) with respect to z. Since g is a function of z, this gives us g'(z) as the derivative. Let me know if this helps.

@chickensalad1369 4 жыл бұрын

High schooler here, only armed with calculus of add maths level, has entered to boss room. Had to spend almost 3 hours just on the entire back propagation process, filling up holes in my mathematics on the way such as partial derivatives with other online math tutorials. it was hard but worth it. These will be my final words as my brain liquifies and escape form my ears, bye.....

@georgezhou1287 4 жыл бұрын

Your work is a godsend. Thank you.

@haadialiaqat4590 2 жыл бұрын

Thank you so much for such a nice explanation. Please make more vedios.

@aruchan9890 10 ай бұрын

thank u so much for this, really helpful!

@rik43 3 жыл бұрын

Finally I got it, thank you!

@Luis-fh8cv 6 жыл бұрын

Thank you deeplizard, this is very helpful. I can code backpropagation just fine for anns that use the sigmoid function and MSE, but I've always struggled to follow the gradient descent and backprop math

@luisluiscunha Жыл бұрын

Very much appreciated. Very nice explanation.

@minhdang5132 4 жыл бұрын

Brilliant Explanation. Thanks alot!

@krishnaik06 6 жыл бұрын

Nice video. Can you please explain by using python code instead of using Keras only the backpropogation part. I have written the feed forward propogation part but was not able to write the code of back propogation. Please help

@deeplizard 6 жыл бұрын

Thanks, Krish! I don't have any code I've written myself that implements the backprop math that I've illustrated in the set of backprop videos. When I searched online for backpropagation in Python though, I saw some open sourced resources that you might be able to check out to assist with your implementation. This is one of the top results that came back from my query: machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/

@samaryadav7208 6 жыл бұрын

Great video. Waiting for the next part.

@deeplizard 6 жыл бұрын

Thanks, Samar!

@danielrodriguezgonzalez2982 4 жыл бұрын

Cant' say it enough, the best!

@richarda1630 3 жыл бұрын

everyone else has said it all :) Thanks so much!

@richarda1630 3 жыл бұрын

to bolster my newbie mind :P I watched 3B1B also to help me understand what you discussed here :) kzbin.info/www/bejne/f53KZJp9mtyEa7c

@andonglin8900 6 жыл бұрын

Easy to follow. Thanks a lot!

@deeplizard 6 жыл бұрын

I'm glad you think so, Andong! And you're welcome!

@John-wx3zn 5 ай бұрын

Hi Mandy, since the output of a single node comes from the relu function, why isn't this output, a, written on the side of the arrow instead of the weight, w, when going from L-1 to L?

@WahranRai 3 жыл бұрын

12:44 to avoid the 2 in the equation of the gradient : minimizing 0.5*C_0 is same that minimizing C_0...: take 0.5*C_0 as loss function and when taking the derivative the 2 disappear.

@John-wx3zn 5 ай бұрын

Hi Mandy, how does the weight in L-1 connect to layer L? Doesn't L have its own weight?

@tymothylim6550 3 жыл бұрын

Thank you very much for this video! May I ask if the training sample refers to the "batch" in a given epoch? Thus, the average gradients calculated across all batches would be used for SGD? Thanks you also for going through the mathematics step-by-step! It really helps to have someone go through the math, instead of just reading it on my own!

@deeplizard 3 жыл бұрын

You're welcome Tymothy! Happy that you're enjoying the course. In this explanation, a sample refers to a single sample. However, most of the time, neural network APISs will calculate the gradients and do a weight update per-batch. The per-batch update is referred to as "mini-batch gradient descent." I give a little note about it in the section of the blog below titled "Mini-Batch Gradient Descent": deeplizard.com/learn/video/U4WB9p6ODjM

@s25412 3 жыл бұрын

11:46 Given a single weight, wouldn't there be multiple versions of that weight that correspond to each training sample? If so, on the right hand side where you sum over i, shouldn't w_12 be indexed with 'i' just like how you did it for C?

@yeahorightbro 6 жыл бұрын

Great video! How did you learn this stuff yourself?

@deeplizard 6 жыл бұрын

Thanks, Daniel! For deep learning in general, I took a self-lead study approach through a combination of online resources and exploring/building networks for my own use in personal projects. The online resources that I took the most away from were Jeremy Howard’s and Rachel Thomas’ fast.ai course, parts of Andrew Ng’s Deep Learning and Machine Learning courses on Coursera, and Michael Nielson’s Neural Networks and Deep Learning book. In regards to the math- One of my degrees is in math, so that’s where that came from. :) Learning the math specific to neural networks was just a matter of applying the math that I already had experience with.

@zeus1082 6 жыл бұрын

deeplizard Iam doing the same what u did.but i was not a math graduate but I had more interest in maths.So its just easy for learning these concepts.

@deeplizard 6 жыл бұрын

Hey aneesh - That's cool! Thanks for sharing. Are you also following the same online resources I mentioned?

@zeus1082 6 жыл бұрын

deeplizard no except fast.ai Iam following andrew ng videos ,some online resources and udemy.Your tutorials are usefull too.Like you said my interest in math made these concepts easy .Keep posting videos like this.

@ericsonbitoon 4 жыл бұрын

Hi Mandy, do you have a good reference book to recommend?

@freedmoresidume 3 жыл бұрын

You are the best ❤️

@hairuiliu3446 5 жыл бұрын

very well explained, thanks

@Nissearne12 4 жыл бұрын

Ahh.. at 7:07 explain my wonder how it's ever possible to use the Total Loss value for back propagation. The Total loss value have only Absolute value (because of the square operation), I was wounded how that Total Loss value could ever be used to know what direction each weight knobs should be turned (+ or -), it could not, the sign information is lost in the total Loss calculation!. But it it turns out that the sign of the Error come back into the equations again when looking at individual Losses d/da1(L) = 2(Actual Value - Target).

@umair369 4 жыл бұрын

Thanks this was very elaborate and thoroughly explained, however I was wondering how you would average the derivative with respect to one particular weight across ALL training examples. 11:40 is when you mention that. I am assuming the change in w_1_2 doesn't affect the loss for any other training example, is that true? Please let me know.

@khawarshahzad5721 6 жыл бұрын

Hello deeplizard, great video! can you please explain how would the partial derivative of loss calculation be done for batch size greater than 1? thanks.

@deeplizard 6 жыл бұрын

Hey Khawar - Thanks! To summarize, you take the gradient of the loss with respect to a particular weight for _each_ input. You then average the resulting gradients and update the given weight with that average. This would be the case if you passed all the data to your network at once. If instead you were doing batch gradient descent, where you were passing mini-batches of data to your network at a time, then you would apply this same method to _each batch_ of data, rather than to all the data at once.

@matharbarghi 4 жыл бұрын

The partial derivative of loss function should be taken w.r.t weights of last layer in the network. But you mentioned that we should take derivative of loss function with respect to all weights of the network. Please correct me if I am wrong, otherwise correct it in your course. Thanks

@deeplizard 4 жыл бұрын

You take the derivative of the loss function with respect to each weight. You then use each respective gradient to update each individual weight. For example, take the derivative of the loss with respect to weight w1. With the resulting gradient, update w1 to a new value. Do the same for w2, w3, etc...

@AndreaCatania 5 жыл бұрын

Sorry if this question is stupid, but I don't understand exactly what mean the loss. Knowing the loss of weight W12 how can I update the related weight? W12 += LOSS12 seems not correct to me

@RohitPrasad2512 5 жыл бұрын

Can u add what happens if the weight is between two hidden layers? and also how to calculate loss for that.

@FelidInPetasus 4 жыл бұрын

Here's a thing that's unclear to me: You say that this process (which you do describe very neatly) can be applied to any weight in the network. However, shortly after 7:00, you conclude that the first term contains y_1. In video #2, you define this y_j as "the desired value of node j in the output layer L for a single training sample", i.e., the value a specific output neuron "ought to be". This works fine if you're looking at the weights connecting L-1 to L (the output layer), but doesn't make sense for the weights connecting, say, L-2 to L-1. What value would I use for y_j in a case like that? Edit: Thinking about it now: Am I correct in assuming that the first and second terms for the example you provided stay the same (even when looking at previous layers) and it's only the third term (specifically its weighted sum) that is "split up" into even more terms? This would remove the need to use a different y_j for other layers. Other than that: thank you for your videos

@aryanrahman3212 2 жыл бұрын

When she says g-Prime, what she means is the derivative(or differentiation) of the activation function-g. This function can be anything literally.

@abubakarali6399 3 жыл бұрын

What degree you have and from which university?

@r.balamurali8246 4 жыл бұрын

Thank you very much. @12.22 does 'n' represents the number of training samples or number of nodes in layer L. could you please explain this.

@deeplizard 4 жыл бұрын

number of training samples

@MaahirGupta 3 жыл бұрын

You win.

@ajaymalik9147 5 жыл бұрын

nice

@ashutoshshah864 3 жыл бұрын

🙏🏽💪🏽🤙🏽

@mechhyena6957 4 жыл бұрын

i have no clue what is going on in this video...

@ПроскуринГлеб 5 жыл бұрын

I feel sentdex style

@deeplizard 5 жыл бұрын

sentdex is cool 😎

@patrickryckman3867 4 жыл бұрын

8:22 you lost me. You said we just put this into the right side of the equation, but thats not the only thing you put into the right side of the equation.

@kiarash7604 4 жыл бұрын

most of these videos are explaining the obvious

@rabirajbanerjee3872 4 жыл бұрын

After watching your video I could actually do the derivation all by myself, thanks for the intuition :)

@antoinetoussaint483 4 жыл бұрын

Clear, precise, consistent. What a channel, thx.

@chintanmathia 4 жыл бұрын

This is the epitome of explaining such difficult topic with such simplicity. thanks a lot. . . I could not stop going through all 36 videos in one go. . . Amazing job mam.

@nikosips 4 жыл бұрын

Thank you for those videos! Your explanations are crisp and clear, and very helpful! You deserve many more subs!

@harishh.s4701 2 жыл бұрын

Hello, Thank you for sharing your knowledge with us. I really appreciate the effort put into these videos. This series on backpropagation clarified a lot of confusion and helped me to understand it more clearly. The explanation was clear and easy to follow. However, I have one small suggestion. In this video at the timestamp 11.53, the term 'n' is used to represent the number of training samples whereas in all the previous equations 'n' represents the number of neurons in a particular layer (Please correct me if I am wrong). Perhaps it would be better if you could use a different notation (like N) for the number of training samples to avoid confusion. Maybe it has already been updated. I apologize if this is a repetition. Otherwise, great work, Keep it up, and thanks a lot :)

@trankhanhang8151 3 жыл бұрын

So simple and elegant, I wish I found you sooner.

@EliorBY 3 жыл бұрын

wow. what an amazing illustrative mathematical explanation. just what I was looking for. thank you very much deep lizard!

@gbyu7638 3 жыл бұрын

So clear explanation and calculation!

@PritishMishra 3 жыл бұрын

0:00 - Introduction 1:01 - Precap of the Video 1:24 - Derivative of the Loss with respect to weights (Calculations) 11:56 - Conclusion

@ramiro6322 3 жыл бұрын

I would also add 5:45 First term (Loss with respect to Activation Output) 7:36 Second term (Activation Output with respect to Input) 8:52 Third term (Input with respect to weight) 11:30 Putting it all together

@deeplizard 3 жыл бұрын

Thank you both! Your timestamps have been added to the video description :D

@PritishMishra 3 жыл бұрын

@@deeplizard Thanks

@todianmishtaku6249 4 жыл бұрын

Excellent!!!

@databridgeconsultants9163 4 жыл бұрын

Thank You So much Guys . This series is just the BEST ever made . Its a legendary work done by you guys . I have read so many books . Even my prof was not able to make us understand how these things actually work step by step . All i understood in past was to ditch this portion of Neural networks . But I now I can confidently explain whats the matter inside a neural network . I have subscribed to the paid version of yours .

@deeplizard 4 жыл бұрын

That's great to hear! Really happy that you gained new knowledge. Thank you for letting us know :) By the "paid version," are you referring to becoming a member of the deeplizard hivemind via Patreon?

@JordanMetroidManiac 4 жыл бұрын

Thicc

@satnav1377 5 жыл бұрын

Incredibly clear explanation, great vid once again!

@thomasvinet6160 6 жыл бұрын

Great video, just have a question: if we want to calculate the derivative or the weight of the layer (L-2), it will be the same as for layer L-1, but by changing a with g(z) and so ? Thanks EDIT: didn't see the next video ;) Those tutorials are very understandable, keep doing them !!

@deeplizard 6 жыл бұрын

Thanks, Thomas! So were you able to answer your question after watching the next video?

@money_wins_controls 5 жыл бұрын

guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

@TheMaidenReturns 4 жыл бұрын

Wow.. just wow. I have been really struggling this year in uni with my A.I. module, as the teacher doesn't really explain things well. I really can't believe how simple and easy to understand you can make this topic. This series has saved me from failing a module this year, and it helped me learn so much about deep learning. Amazing content, well explained. Big up for this

@Loev06 4 жыл бұрын

Amazing video! I know you don't use biases in this series, but do you know the derivative of the cost function w.r.t. the biases? Edit: I think I found it, is it (dC0 / da(L)1) (da(L)1 / dZ(L)1) = 2(a(L)1 - y1)( g'(L) ( Z(L)1 ) )? (Basically the first two terms, because the third term is always equal to 1)

@bobhut8613 4 жыл бұрын

Thank you so much for this! I had been stuck trying to wrap my head around the maths for days and your videos really helped.

@hussainbhavnagarwala2596 Жыл бұрын

can you show the same example for a weight that is a few layers behind the output layer, I am not able to understand how we will sum the activation of each layer

@jorgecelis8459 3 жыл бұрын

Only detail is that the number of nodes should be indexed for the general case and then maybe use another letter for the number of examples =)

@Arjun-kt4by 4 жыл бұрын

hello at 11:02 how did the derivative came out to be a2(L-1)? are you considering a as a constand?

@EinsiJo 4 жыл бұрын

Extremely useful! Thank you!

@timxu1766 4 жыл бұрын

thank you deeplizard!!!

@evertonsantosdeandradejuni3787 3 жыл бұрын

I feel like I can Implement this myself with c++, is this normal?

@Jxordan 6 жыл бұрын

Thank you! Dedicating my midterm today to you. Also just a random tip, if you don't use cortana you can right click the "type here to search" and hide it

@deeplizard 6 жыл бұрын

Thanks for the tip! How did your midterm go?

@aamir122a 6 жыл бұрын

In the future, you might look at doing videos on neural networks for reinforcements learning approximating value function and policy function.

@deeplizard 6 жыл бұрын

Thanks for the suggestion, Aamir!

@money_wins_controls 5 жыл бұрын

guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

@jeetenzhurlollz8387 4 жыл бұрын

far better than deeplearning.ai

@jdmik 6 жыл бұрын

Thanks for the great videos! Just wondering if you were planning on doing a video on how backprop is applied in convnets?

@deeplizard 6 жыл бұрын

Hey Johan - Glad you're liking the videos! I currently don't have this as an immediate topic to cover, but I will add it to my list to explore further as a possible future video.

@sinaasadi3800 5 жыл бұрын

Hi. Would you please answer my other comment ? I posted it yesterday under another video from this play list. And also thanks a lot for your videos.

@JordanMetroidManiac 4 жыл бұрын

How does bias fit into all of this?

@deeplizard 4 жыл бұрын

Bias terms are updated in the same way as the weights. I elaborate more on this on the upcoming episode dedicated to bias: deeplizard.com/learn/video/HetFihsXSys

@nourelislam8565 5 жыл бұрын

Amazing explanation...... but I just want to know what is the purpose of having the average of the loss function for a certain weight for n training examples ??? ....I guess all we have to know is the change of the loss fun throughout the training examples ??

@deeplizard 5 жыл бұрын

Hey Nour - It's because we want to know the average loss across all samples. This will tell us how our model performs on average across the entire data set.

@money_wins_controls 5 жыл бұрын

guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

@ssffyy 3 жыл бұрын

Hi sid, This response is bit late as I just read your comment....I guess your confusion comes from the fact that you are considering g multiplied with z, where in fact its not a multiplication rather ... g is a function of z --> g(z) .... like f(x)... so when you take derivative of g(z) with respect to z, you end up getting g'(z)...hope this cleared any doubts.

@Tntpker 5 жыл бұрын

After thinking about it a bit, why is the expression @ 12:18 used where you sum all the partials of the cost function w.r.t. w12 _for all training examples_ and calculate an average partial derivative? I thought one would do this for batch gradient descent but not with stochastic gradient descent? Or am I seeing something completely wrong here?

@deeplizard 5 жыл бұрын

Hey Tntpker - Yes, when _n_ is the number of samples in the entire training set, this is the case for _batch_ gradient descent. Also, if using _mini-batch_ gradient descent, which is normally what is done with most neural network APIs by default, then you could look at _n_ as being the number of training examples within a single batch, rather than the entire training set. With this, the gradient update would occur on a per-batch basis.

@Tntpker 5 жыл бұрын

@@deeplizard Cheers!

@money_wins_controls 5 жыл бұрын

guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation

@srijalshrestha7380 6 жыл бұрын

Thanks a lot, don't know when and how i will use in future but i understood it very well. Thank you.

@deeplizard 6 жыл бұрын

You're welcome, Srijal! I'm glad you were able to gain an understanding!

@money_wins_controls 5 жыл бұрын

guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation