Backpropagation explained | Part 1 - The intuition kzbin.info/www/bejne/jnaWnKWcaKiEotU Backpropagation explained | Part 2 - The mathematical notation kzbin.info/www/bejne/aJ62qqaIrZJkmZI Backpropagation explained | Part 3 - Mathematical observations kzbin.info/www/bejne/fWbFZZ2Id7CBrtk Backpropagation explained | Part 4 - Calculating the gradient kzbin.info/www/bejne/kKOYp5x3j6yhmqc Backpropagation explained | Part 5 - What puts the “back” in backprop? kzbin.info/www/bejne/rnTPfJKVeNaNpLM Machine Learning / Deep Learning Fundamentals playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL
@you_dont_know_me_sir5 жыл бұрын
deeplizard That Scientific Notebook, from which you are explaining in multiple videos, seems great as a collection of notes to review when needed. Have you put it somewhere? GitHub?
@sampro4544 жыл бұрын
5:35 shouldn't there be a da/dg in there?
@sampro4544 жыл бұрын
Nevermind, you involve g when differentiating a with respect to z later very nice
@abhishek-shrm3 жыл бұрын
So much professionalism in a youtube video course which is free. Thank you for making these videos.
@Sikuq3 жыл бұрын
Although a complex issue, this presentation makes it much easier to understand. Thanks deeplizard.
@MalTimeTV Жыл бұрын
These videos of yours on the mathematics of back-propagation are just incredible. Thank you very much.
@bildadatsegha69232 жыл бұрын
Awesome. I am learning Deep learning as a complete novice and you have been truly helpful. Thanks. I really love the simplicity of your lectures.
@edbshllanss3 жыл бұрын
A year ago, I started learning deep learning firstly by your contents. I acquired some intuition, but this series as well as the Keras series went over the top of my head because I was totally unfamiliar with programming and mathematics. But a year after, with basic knowledge of python and mathematics like calculus, it becomes much easier to follow the thread of your videos, and I feel I am finally standing at the starting point of machine learning. Thank you so much for your straightforward explanations!!!
@vaibhavkhobragade97732 жыл бұрын
Clear, concise, and perfect understanding. Thank you mandy!
@pawarranger5 жыл бұрын
this is now my favourite ann playlist, thanks a ton!
@BabisPlaysGuitar5 жыл бұрын
Awesome! All the advanced calculus and linear algebra classes that I took back in engineering school make sense now. Thank you very much!
@naughtrussel57874 жыл бұрын
I've been stuck with backpropagation for several days. I've tried a bunch of resources but you are the only one whose way of explaining this is exactly what I needed. Thanks a lot for doing this "unfolded" calculations and emphasizing _the purpose_ of doing this or that thing. This is what often missing in ANN courses. Great job!
@CreeperSlenderman3 жыл бұрын
Imagine me I've been for months
@fatihandyildiz3 жыл бұрын
Just wow! Normally, it's really hard for me to fully understand these derivations (even after watcing multiple times), but you just made it happen in 1.5x speed. Thank you for offering this high quality tutorial for free. Blessings to you.
@shakyasarkar71434 жыл бұрын
You are legend, Mam!!! Truly!! I have been searching for this backprop total calculus portion derivation throughout all the youtube videos until i came upon your videos...even i looked upon some udemy courses. Nobody, I REPEAT NOBODY has or even dared to explain this total derivation. Thank you, Mam! I owe you a lot!
@MJ24034 жыл бұрын
You are a gem.....able to understand backpropagation for which i was struggling like anything.
@SafeBuster804 жыл бұрын
Thank you for your videos of backpropagation, I now understand this subject as you explained it nicely and clearly (unlike my uni professor).
@moizahmed89874 жыл бұрын
Terrific video, thank you very much this is the first video that goes through backprop step by step
@weactweimpactcharityassoci39643 жыл бұрын
this is now my favorite ANN playlist, thanks a ton! شكرا
@durgamanoja81796 жыл бұрын
i have gone through your series , i must say you are AWESOME!! . i could not understand the mathematics behind back propagation in any websites or videos ,you made it very clear. Thanks a lot. Please do make such videos .
@deeplizard6 жыл бұрын
Thank you, durga! I'm so happy to hear this 😄 If you're also interested in implementing the neural network concepts from this Deep Learning Fundamentals series in code, check out both our Keras and TensorFlow.js series! Keras: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js: kzbin.info/aero/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ-
@AnandaKevin283 жыл бұрын
Just. Great. Explanation. Words are not enough to express it. Thanks a lot for the explanation! 😁
@lancelotdsouza47052 жыл бұрын
Thanks so much you made backpropagation a cakewalk
@joshuayudice82454 жыл бұрын
Seriously, you are a godsend. Thank you for creating these clear and methodical videos.
@wilmobaggins4 жыл бұрын
It might have been easier to follow the notation if you had shown from a node 2 to say node 3. The 1 looks a lot like an l to my aging eyes :) Thank you for the video very helpful.
@williamdowling5 жыл бұрын
Awesome videos, thanks! I am curious though what happens if a g function is not differentiable? I guess that is common, too, for example g(x) = {0 if x
@justchill999025 жыл бұрын
Hey there! The daunting Backprop Math proof went through as smooth as butter after watching these 5 videos. Thank you so much. Sometimes I think the book itself is speaking lol. I think that one dislike is by mistake. Question - At 8:36 , you talk about why we get the derivative as g prime. Could you please explain what does it mean? and it's relation to a sub 1? PS: You are my most favourite KZbin channel. You earned it. I think your content is definitely making a difference in the world. :) Please carry on!
@deeplizard5 жыл бұрын
Hey Nirbhay - Apologies for the delayed response! Somehow this comment was tucked away, and I just came across it. Thank you so much for your kind remarks! Really glad to hear you're enjoying and learning from the content. For your question from 8:36: (I'll eliminate the superscripts and subscripts from my explanation below.) Our objective is to differentiate a with respect to z. Recall that a = g(z) by definition. Taking the derivative of a with respect to z means that we need to take the derivative of g(z) with respect to z. Since g is a function of z, this gives us g'(z) as the derivative. Let me know if this helps.
@chickensalad13694 жыл бұрын
High schooler here, only armed with calculus of add maths level, has entered to boss room. Had to spend almost 3 hours just on the entire back propagation process, filling up holes in my mathematics on the way such as partial derivatives with other online math tutorials. it was hard but worth it. These will be my final words as my brain liquifies and escape form my ears, bye.....
@georgezhou12874 жыл бұрын
Your work is a godsend. Thank you.
@haadialiaqat45902 жыл бұрын
Thank you so much for such a nice explanation. Please make more vedios.
@aruchan989010 ай бұрын
thank u so much for this, really helpful!
@rik433 жыл бұрын
Finally I got it, thank you!
@Luis-fh8cv6 жыл бұрын
Thank you deeplizard, this is very helpful. I can code backpropagation just fine for anns that use the sigmoid function and MSE, but I've always struggled to follow the gradient descent and backprop math
@luisluiscunha Жыл бұрын
Very much appreciated. Very nice explanation.
@minhdang51324 жыл бұрын
Brilliant Explanation. Thanks alot!
@krishnaik066 жыл бұрын
Nice video. Can you please explain by using python code instead of using Keras only the backpropogation part. I have written the feed forward propogation part but was not able to write the code of back propogation. Please help
@deeplizard6 жыл бұрын
Thanks, Krish! I don't have any code I've written myself that implements the backprop math that I've illustrated in the set of backprop videos. When I searched online for backpropagation in Python though, I saw some open sourced resources that you might be able to check out to assist with your implementation. This is one of the top results that came back from my query: machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/
@samaryadav72086 жыл бұрын
Great video. Waiting for the next part.
@deeplizard6 жыл бұрын
Thanks, Samar!
@danielrodriguezgonzalez29824 жыл бұрын
Cant' say it enough, the best!
@richarda16303 жыл бұрын
everyone else has said it all :) Thanks so much!
@richarda16303 жыл бұрын
to bolster my newbie mind :P I watched 3B1B also to help me understand what you discussed here :) kzbin.info/www/bejne/f53KZJp9mtyEa7c
@andonglin89006 жыл бұрын
Easy to follow. Thanks a lot!
@deeplizard6 жыл бұрын
I'm glad you think so, Andong! And you're welcome!
@John-wx3zn5 ай бұрын
Hi Mandy, since the output of a single node comes from the relu function, why isn't this output, a, written on the side of the arrow instead of the weight, w, when going from L-1 to L?
@WahranRai3 жыл бұрын
12:44 to avoid the 2 in the equation of the gradient : minimizing 0.5*C_0 is same that minimizing C_0...: take 0.5*C_0 as loss function and when taking the derivative the 2 disappear.
@John-wx3zn5 ай бұрын
Hi Mandy, how does the weight in L-1 connect to layer L? Doesn't L have its own weight?
@tymothylim65503 жыл бұрын
Thank you very much for this video! May I ask if the training sample refers to the "batch" in a given epoch? Thus, the average gradients calculated across all batches would be used for SGD? Thanks you also for going through the mathematics step-by-step! It really helps to have someone go through the math, instead of just reading it on my own!
@deeplizard3 жыл бұрын
You're welcome Tymothy! Happy that you're enjoying the course. In this explanation, a sample refers to a single sample. However, most of the time, neural network APISs will calculate the gradients and do a weight update per-batch. The per-batch update is referred to as "mini-batch gradient descent." I give a little note about it in the section of the blog below titled "Mini-Batch Gradient Descent": deeplizard.com/learn/video/U4WB9p6ODjM
@s254123 жыл бұрын
11:46 Given a single weight, wouldn't there be multiple versions of that weight that correspond to each training sample? If so, on the right hand side where you sum over i, shouldn't w_12 be indexed with 'i' just like how you did it for C?
@yeahorightbro6 жыл бұрын
Great video! How did you learn this stuff yourself?
@deeplizard6 жыл бұрын
Thanks, Daniel! For deep learning in general, I took a self-lead study approach through a combination of online resources and exploring/building networks for my own use in personal projects. The online resources that I took the most away from were Jeremy Howard’s and Rachel Thomas’ fast.ai course, parts of Andrew Ng’s Deep Learning and Machine Learning courses on Coursera, and Michael Nielson’s Neural Networks and Deep Learning book. In regards to the math- One of my degrees is in math, so that’s where that came from. :) Learning the math specific to neural networks was just a matter of applying the math that I already had experience with.
@zeus10826 жыл бұрын
deeplizard Iam doing the same what u did.but i was not a math graduate but I had more interest in maths.So its just easy for learning these concepts.
@deeplizard6 жыл бұрын
Hey aneesh - That's cool! Thanks for sharing. Are you also following the same online resources I mentioned?
@zeus10826 жыл бұрын
deeplizard no except fast.ai Iam following andrew ng videos ,some online resources and udemy.Your tutorials are usefull too.Like you said my interest in math made these concepts easy .Keep posting videos like this.
@ericsonbitoon4 жыл бұрын
Hi Mandy, do you have a good reference book to recommend?
@freedmoresidume3 жыл бұрын
You are the best ❤️
@hairuiliu34465 жыл бұрын
very well explained, thanks
@Nissearne124 жыл бұрын
Ahh.. at 7:07 explain my wonder how it's ever possible to use the Total Loss value for back propagation. The Total loss value have only Absolute value (because of the square operation), I was wounded how that Total Loss value could ever be used to know what direction each weight knobs should be turned (+ or -), it could not, the sign information is lost in the total Loss calculation!. But it it turns out that the sign of the Error come back into the equations again when looking at individual Losses d/da1(L) = 2(Actual Value - Target).
@umair3694 жыл бұрын
Thanks this was very elaborate and thoroughly explained, however I was wondering how you would average the derivative with respect to one particular weight across ALL training examples. 11:40 is when you mention that. I am assuming the change in w_1_2 doesn't affect the loss for any other training example, is that true? Please let me know.
@khawarshahzad57216 жыл бұрын
Hello deeplizard, great video! can you please explain how would the partial derivative of loss calculation be done for batch size greater than 1? thanks.
@deeplizard6 жыл бұрын
Hey Khawar - Thanks! To summarize, you take the gradient of the loss with respect to a particular weight for _each_ input. You then average the resulting gradients and update the given weight with that average. This would be the case if you passed all the data to your network at once. If instead you were doing batch gradient descent, where you were passing mini-batches of data to your network at a time, then you would apply this same method to _each batch_ of data, rather than to all the data at once.
@matharbarghi4 жыл бұрын
The partial derivative of loss function should be taken w.r.t weights of last layer in the network. But you mentioned that we should take derivative of loss function with respect to all weights of the network. Please correct me if I am wrong, otherwise correct it in your course. Thanks
@deeplizard4 жыл бұрын
You take the derivative of the loss function with respect to each weight. You then use each respective gradient to update each individual weight. For example, take the derivative of the loss with respect to weight w1. With the resulting gradient, update w1 to a new value. Do the same for w2, w3, etc...
@AndreaCatania5 жыл бұрын
Sorry if this question is stupid, but I don't understand exactly what mean the loss. Knowing the loss of weight W12 how can I update the related weight? W12 += LOSS12 seems not correct to me
@RohitPrasad25125 жыл бұрын
Can u add what happens if the weight is between two hidden layers? and also how to calculate loss for that.
@FelidInPetasus4 жыл бұрын
Here's a thing that's unclear to me: You say that this process (which you do describe very neatly) can be applied to any weight in the network. However, shortly after 7:00, you conclude that the first term contains y_1. In video #2, you define this y_j as "the desired value of node j in the output layer L for a single training sample", i.e., the value a specific output neuron "ought to be". This works fine if you're looking at the weights connecting L-1 to L (the output layer), but doesn't make sense for the weights connecting, say, L-2 to L-1. What value would I use for y_j in a case like that? Edit: Thinking about it now: Am I correct in assuming that the first and second terms for the example you provided stay the same (even when looking at previous layers) and it's only the third term (specifically its weighted sum) that is "split up" into even more terms? This would remove the need to use a different y_j for other layers. Other than that: thank you for your videos
@aryanrahman32122 жыл бұрын
When she says g-Prime, what she means is the derivative(or differentiation) of the activation function-g. This function can be anything literally.
@abubakarali63993 жыл бұрын
What degree you have and from which university?
@r.balamurali82464 жыл бұрын
Thank you very much. @12.22 does 'n' represents the number of training samples or number of nodes in layer L. could you please explain this.
@deeplizard4 жыл бұрын
number of training samples
@MaahirGupta3 жыл бұрын
You win.
@ajaymalik91475 жыл бұрын
nice
@ashutoshshah8643 жыл бұрын
🙏🏽💪🏽🤙🏽
@mechhyena69574 жыл бұрын
i have no clue what is going on in this video...
@ПроскуринГлеб5 жыл бұрын
I feel sentdex style
@deeplizard5 жыл бұрын
sentdex is cool 😎
@patrickryckman38674 жыл бұрын
8:22 you lost me. You said we just put this into the right side of the equation, but thats not the only thing you put into the right side of the equation.
@kiarash76044 жыл бұрын
most of these videos are explaining the obvious
@rabirajbanerjee38724 жыл бұрын
After watching your video I could actually do the derivation all by myself, thanks for the intuition :)
@antoinetoussaint4834 жыл бұрын
Clear, precise, consistent. What a channel, thx.
@chintanmathia4 жыл бұрын
This is the epitome of explaining such difficult topic with such simplicity. thanks a lot. . . I could not stop going through all 36 videos in one go. . . Amazing job mam.
@nikosips4 жыл бұрын
Thank you for those videos! Your explanations are crisp and clear, and very helpful! You deserve many more subs!
@harishh.s47012 жыл бұрын
Hello, Thank you for sharing your knowledge with us. I really appreciate the effort put into these videos. This series on backpropagation clarified a lot of confusion and helped me to understand it more clearly. The explanation was clear and easy to follow. However, I have one small suggestion. In this video at the timestamp 11.53, the term 'n' is used to represent the number of training samples whereas in all the previous equations 'n' represents the number of neurons in a particular layer (Please correct me if I am wrong). Perhaps it would be better if you could use a different notation (like N) for the number of training samples to avoid confusion. Maybe it has already been updated. I apologize if this is a repetition. Otherwise, great work, Keep it up, and thanks a lot :)
@trankhanhang81513 жыл бұрын
So simple and elegant, I wish I found you sooner.
@EliorBY3 жыл бұрын
wow. what an amazing illustrative mathematical explanation. just what I was looking for. thank you very much deep lizard!
@gbyu76383 жыл бұрын
So clear explanation and calculation!
@PritishMishra3 жыл бұрын
0:00 - Introduction 1:01 - Precap of the Video 1:24 - Derivative of the Loss with respect to weights (Calculations) 11:56 - Conclusion
@ramiro63223 жыл бұрын
I would also add 5:45 First term (Loss with respect to Activation Output) 7:36 Second term (Activation Output with respect to Input) 8:52 Third term (Input with respect to weight) 11:30 Putting it all together
@deeplizard3 жыл бұрын
Thank you both! Your timestamps have been added to the video description :D
@PritishMishra3 жыл бұрын
@@deeplizard Thanks
@todianmishtaku62494 жыл бұрын
Excellent!!!
@databridgeconsultants91634 жыл бұрын
Thank You So much Guys . This series is just the BEST ever made . Its a legendary work done by you guys . I have read so many books . Even my prof was not able to make us understand how these things actually work step by step . All i understood in past was to ditch this portion of Neural networks . But I now I can confidently explain whats the matter inside a neural network . I have subscribed to the paid version of yours .
@deeplizard4 жыл бұрын
That's great to hear! Really happy that you gained new knowledge. Thank you for letting us know :) By the "paid version," are you referring to becoming a member of the deeplizard hivemind via Patreon?
@JordanMetroidManiac4 жыл бұрын
Thicc
@satnav13775 жыл бұрын
Incredibly clear explanation, great vid once again!
@thomasvinet61606 жыл бұрын
Great video, just have a question: if we want to calculate the derivative or the weight of the layer (L-2), it will be the same as for layer L-1, but by changing a with g(z) and so ? Thanks EDIT: didn't see the next video ;) Those tutorials are very understandable, keep doing them !!
@deeplizard6 жыл бұрын
Thanks, Thomas! So were you able to answer your question after watching the next video?
@money_wins_controls5 жыл бұрын
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@TheMaidenReturns4 жыл бұрын
Wow.. just wow. I have been really struggling this year in uni with my A.I. module, as the teacher doesn't really explain things well. I really can't believe how simple and easy to understand you can make this topic. This series has saved me from failing a module this year, and it helped me learn so much about deep learning. Amazing content, well explained. Big up for this
@Loev064 жыл бұрын
Amazing video! I know you don't use biases in this series, but do you know the derivative of the cost function w.r.t. the biases? Edit: I think I found it, is it (dC0 / da(L)1) (da(L)1 / dZ(L)1) = 2(a(L)1 - y1)( g'(L) ( Z(L)1 ) )? (Basically the first two terms, because the third term is always equal to 1)
@bobhut86134 жыл бұрын
Thank you so much for this! I had been stuck trying to wrap my head around the maths for days and your videos really helped.
@hussainbhavnagarwala2596 Жыл бұрын
can you show the same example for a weight that is a few layers behind the output layer, I am not able to understand how we will sum the activation of each layer
@jorgecelis84593 жыл бұрын
Only detail is that the number of nodes should be indexed for the general case and then maybe use another letter for the number of examples =)
@Arjun-kt4by4 жыл бұрын
hello at 11:02 how did the derivative came out to be a2(L-1)? are you considering a as a constand?
@EinsiJo4 жыл бұрын
Extremely useful! Thank you!
@timxu17664 жыл бұрын
thank you deeplizard!!!
@evertonsantosdeandradejuni37873 жыл бұрын
I feel like I can Implement this myself with c++, is this normal?
@Jxordan6 жыл бұрын
Thank you! Dedicating my midterm today to you. Also just a random tip, if you don't use cortana you can right click the "type here to search" and hide it
@deeplizard6 жыл бұрын
Thanks for the tip! How did your midterm go?
@aamir122a6 жыл бұрын
In the future, you might look at doing videos on neural networks for reinforcements learning approximating value function and policy function.
@deeplizard6 жыл бұрын
Thanks for the suggestion, Aamir!
@money_wins_controls5 жыл бұрын
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@jeetenzhurlollz83874 жыл бұрын
far better than deeplearning.ai
@jdmik6 жыл бұрын
Thanks for the great videos! Just wondering if you were planning on doing a video on how backprop is applied in convnets?
@deeplizard6 жыл бұрын
Hey Johan - Glad you're liking the videos! I currently don't have this as an immediate topic to cover, but I will add it to my list to explore further as a possible future video.
@sinaasadi38005 жыл бұрын
Hi. Would you please answer my other comment ? I posted it yesterday under another video from this play list. And also thanks a lot for your videos.
@JordanMetroidManiac4 жыл бұрын
How does bias fit into all of this?
@deeplizard4 жыл бұрын
Bias terms are updated in the same way as the weights. I elaborate more on this on the upcoming episode dedicated to bias: deeplizard.com/learn/video/HetFihsXSys
@nourelislam85655 жыл бұрын
Amazing explanation...... but I just want to know what is the purpose of having the average of the loss function for a certain weight for n training examples ??? ....I guess all we have to know is the change of the loss fun throughout the training examples ??
@deeplizard5 жыл бұрын
Hey Nour - It's because we want to know the average loss across all samples. This will tell us how our model performs on average across the entire data set.
@money_wins_controls5 жыл бұрын
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@ssffyy3 жыл бұрын
Hi sid, This response is bit late as I just read your comment....I guess your confusion comes from the fact that you are considering g multiplied with z, where in fact its not a multiplication rather ... g is a function of z --> g(z) .... like f(x)... so when you take derivative of g(z) with respect to z, you end up getting g'(z)...hope this cleared any doubts.
@Tntpker5 жыл бұрын
After thinking about it a bit, why is the expression @ 12:18 used where you sum all the partials of the cost function w.r.t. w12 _for all training examples_ and calculate an average partial derivative? I thought one would do this for batch gradient descent but not with stochastic gradient descent? Or am I seeing something completely wrong here?
@deeplizard5 жыл бұрын
Hey Tntpker - Yes, when _n_ is the number of samples in the entire training set, this is the case for _batch_ gradient descent. Also, if using _mini-batch_ gradient descent, which is normally what is done with most neural network APIs by default, then you could look at _n_ as being the number of training examples within a single batch, rather than the entire training set. With this, the gradient update would occur on a per-batch basis.
@Tntpker5 жыл бұрын
@@deeplizard Cheers!
@money_wins_controls5 жыл бұрын
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation
@srijalshrestha73806 жыл бұрын
Thanks a lot, don't know when and how i will use in future but i understood it very well. Thank you.
@deeplizard6 жыл бұрын
You're welcome, Srijal! I'm glad you were able to gain an understanding!
@money_wins_controls5 жыл бұрын
guys please help @8.44 I am curious though why g function is not added - g' * z + g ( according to product rule ) why did they neglect g after diffrentiation