10.14: Neural Networks: Backpropagation Part 1

10.14: Neural Networks: Backpropagation Part 1 - The Nature of Code

Рет қаралды 188,051

The Coding Train

Күн бұрын

Пікірлер: 213

@mats.fricke 4 ай бұрын

You are amazing! Thank you! 7 years later still the best explanation!🎉

@pharmacist66 4 жыл бұрын

Whenever I don't understand something I immediately come to your channel because I *know* you will make me understand it

@isaacmares5590 5 жыл бұрын

You are the master of explaining complicated concepts effectively... my dog sitting next to me now understands backpropagation of neural networks better than roll over.

@phumlanigumedze9762 3 жыл бұрын

Amazing

@TheRainHarvester 2 жыл бұрын

I just made a video without subscripts to explain multi hidden layer back propagation. It's easy to understand without so many sub/super scripts.

@vigneshreddy1213 4 ай бұрын

It has been 7 years for this video but still he rocks

@abhinavarg 4 жыл бұрын

Sir, I don't even know how to express my joy after hearing this from you. Nicely done!!!

@programmingsoftwaredesign7887 3 жыл бұрын

You are very good at breaking things down. I've been through a few videos trying to understand how to code my backpropagation. You are the first one to really give a visual of what it's doing at each level for my little math mind.

@vaibhavsingh1049 5 жыл бұрын

I'm on day 3 of understanding backpropation, you made cry "Finally"

@kae4881 4 жыл бұрын

Dude. Best. Explanation. Ever. Straight Facts. EXCELLENT DAN. You, sir, are a legend.

@SigSelect 5 жыл бұрын

I read quite a few comprehensive tutorials on backprop with the full derivation of all the calculus, yet this was the first source I found which explicitly highlighted the method for finding the error term in layers preceding the output layer, which is a huge component of the overall algorithm! Good job for sniffing that out as something worth making clear!

@TheCodingTrain 5 жыл бұрын

Thanks for the nice feedback!

@gnorts_mr_alien 2 жыл бұрын

Exactly. Watched at least 20 videos on backprop but this one made sense finally.

@Twitchi 7 жыл бұрын

The whole series has been amazing, but I particularly enjoy these theory breakdowns :D

@manavverma4836 7 жыл бұрын

Man you are awesome. Someday I'll be able to understand this.

@drdeath2667 4 жыл бұрын

do u now? :D

@prashantdwivedi9073 4 жыл бұрын

@@drdeath2667 🤣🤣😂

@billykotsos4642 4 жыл бұрын

The day for me to understand is today.... 2 years later !!!!!

@jxl721 3 жыл бұрын

do you understand it now :)

@kaishang6406 2 жыл бұрын

has the day come yet?

@TopchetoEU 4 жыл бұрын

I'm quite honestly impressed by the simplicity of the explanation you gave. Right now I'm trying to get started with the AI but could not find a good explanation of backpropagation. That is until I found your tutorial. The only thing I didn't like is that this tutorial doesn't include any bias-related information. Regardless, this tutorial is simply great.

@sidanthdayal8620 6 жыл бұрын

When i start working i am going to support this channel on patreon. Helped me so much.

@artyomchernyaev730 3 жыл бұрын

Did u start working?

@Meuporgman 7 жыл бұрын

thanks Daniel for ur involvment in this serie and in all the others ! You're probably the best programming teacher on youtube, we can see that you put a lot of effort into making us understand all the concepts you go through ! much love

@Ezio-Auditore94 6 жыл бұрын

I love KZbin University

@giocaste619 4 жыл бұрын

Nicolas Licastro io o ok oo Olivetti

@matig 5 жыл бұрын

Even though we speak different languages you are a thousand times clearer than my teacher. Thanks a lot for this, you are the best

@narenderrawal 2 жыл бұрын

Thanks for all the effort you put together in helping us understand. Best I've come so far.

@drugziro2275 5 жыл бұрын

I am studying these things in Korea. But before I can see this lecture, can not take classes, but now I can show my professor a smile, not a frustrated face. So..... thanks be my light.

@sanchitverma2892 5 жыл бұрын

wow im actually impressed you managed to make me understand all of that

@onionike4198 5 жыл бұрын

That was an excellent stroll through the topic. I feel like I can implement it in code now, it was one of the few hang ups for me. Thank you very much 😁

@phishhammer9733 6 жыл бұрын

These videos are immensely helpful and informative. You have a very clear way of explaining concepts, and think through problems in an intuitive manner. Also, I like your shirt in this video.

@dayanrodriguez1392 2 жыл бұрын

I always love your honesty and sense of humor

@santiagocalvo 3 жыл бұрын

You should stop selling yourself short, iv'e seen dozens of videos on this exact subject because iv'e strugle a lot trying to understand backprop and i have to tell you this might be the best one iv'e seen so far, great work!! keep it up!!

@nemis123 Жыл бұрын

After watching entire KZbin I had no idea what bp is, thankfully found yours.

@MircoHeinzel 6 жыл бұрын

You're such a good teacher! It's fun to watch your high quality videos!

@mrshurukan 7 жыл бұрын

Incredible, as always! Thank you so much for this Neural Network series, they are very interesting and helpful

@lucrezian2024 6 жыл бұрын

I swear this is the one video which made me understand the delta of weights!!! THANK YOU!!!!!

@robv3872 2 жыл бұрын

I commend you for great videos and such an honest video! You are a great person! Thank you for the effort you put into this content you are helping people and a big part of us solving important problems throughout the future. I just commend you for being a great person which comes out in this video!

@CrystalMusicProductions 5 жыл бұрын

I have used backpropagation in my NNs in the past but I never knew how the math works. Thank you so much ❤❤ I finally understand this weird stuff

@magnuswootton6181 2 жыл бұрын

really awesome doing this lesson, everywhere else is cryptic as hell on this subject!!!

@codemaster1768 3 жыл бұрын

This concept has been taught way better as compared to my University professors.

@aakash10975 5 жыл бұрын

superb explanation that I ever saw for back propagation

@roshanpawara8717 7 жыл бұрын

I m glad that you came up with this series of videos on Neural Networks. It has inspired me to choose this as a domain to work on as a mini project for this semester. Love you. Big Fan. God bless!! :-)

@lehw916 3 жыл бұрын

Man, watching your video after 3Blue1Brown series on back-propagation is a breeze. Thanks for sharing!

@SistOPe 6 жыл бұрын

Bro, I admire you so much! Someday I wanna teach algorithms the way you do! Thanks a lot, greetings from Ecuador :)

@moganesanm973 Жыл бұрын

the best teacher i ever seen☺

@artania06 7 жыл бұрын

Awesome video ! Keep it up :) I love your way of teaching to code with happiness

@qkloh6804 4 жыл бұрын

3blues1brown + this video is all we need. Great content as always.

@c1231166 6 жыл бұрын

Would you mind making a video about how you learn things? because it seems to me you can learn basically everything and be thorough about it. This is a skill I would like to own.

@dolevgo8535 6 жыл бұрын

when you try to study something, just practice it. learning how to create a neural network? sweet, try to create one yourself while doing so. there's actually no way that you'd do it perfectly, and you WILL come back to where you study from, or google things that popped up to your head that you started wondering about, and that is how you become thorough about these things. its just mostly about curiosity and practicing :)

@yolomein415 5 жыл бұрын

Find a book for beginners, look at the reviews, buy the book, read it, try it out, watch KZbin videos, google your questions(if not answered ask on stackoverflow)

@mzsdfd 4 жыл бұрын

That was amazing!! You explained it very well.

@mikaelrindmyr 5 жыл бұрын

What if the one of the weights are negative? Is it the same formula when you calculate the greatness of the error-Hidden-1? or should I use Math.abs on all the denominators. Like Weight-1 = -5, Weight-2 = 5 error = 0.5, then it should look like this right? Error-Hidden-1 = error * ( W1 / ( Math.Abs(W1) + Math_abs(W2)) ) ty

@mikaelrindmyr 5 жыл бұрын

//mike form sweden

@minipy3164 3 жыл бұрын

When you are about to give up on Neural network and you see this awesome video😍😍😍😍😍

@luvtv7433 3 жыл бұрын

You know what would be nice, that you could teach about algorithms on graphs using matrices, I feel that helped me a lot to practice and understand the importance of matrices in other topics including neural networks. Some exercises are to find if two graphs are isomorphic, find cycles in a vertex, if a graph is complete, planar, bipartite, find a tree and paths using matrices, I am not sure but that might be called spectral graph theory.

@kalebbruwer 5 жыл бұрын

Thanks, man! This makes it easier to debug code I wrote months ago that still doesn't work, because this is NOT what I did

@sirdondaniel 6 жыл бұрын

Hi, Daniel. At 8:44, I have to point something out: you said that h1 is 67% responsible for the error because it has a W1 = 0.2 which is 2 times bigger than W2. Well I think that is false. If in this particular case the value stored by h1 is 0 then nothing is coming from it and for the entire 0.7 output is h2 responsible with its W2 = 0.1. Check at 5:14 the "What is backpropagation really doing? | Chapter 3, deep learning" from "3Blue1Brown". I'm not 100% I'm right. Anyway you are making a really good job with this series. I've watched some videos about this topic on pluralsight, but the way you explain it makes way more sense than over there. I really look forward to see you implement the Digits recognition thing. If you need some assistance please don't hesitate to message me.

@TheCodingTrain 6 жыл бұрын

Thank you for this feedback, I will rewatch the 3blue1brown video now!

@sirdondaniel 6 жыл бұрын

I've watched the entire playlist and I saw that you actually take care of the node's value (x) in the ΔW equation. These error-equations that you present in this video are just a technique of spreading the error inside the nn. Also they are present in the "Make Your Own Neural Network" of Tariq Rashid, so they should be right :)

@quickdudley 6 жыл бұрын

I actually made the same mistake when I implemented a neural network the first time. Surprisingly: it actually worked, but needed a lot more hidden nodes than it would have if I'd done it right.

@sirdondaniel 6 жыл бұрын

Wait... Which mistake do you mean Jeremy?

@landsgevaer 6 жыл бұрын

Yeah, I noticed this too! Although understandable, the explanation is wrong. Also, what if w1 and w2 cancel, that is w1+w2=0? Then the suggested formulas lead to division by zero, so infinite adjustments. I find it more intuitive to consider every parameter (all weights and all biases) as parameters. Then you look what happens to the NN's final output when you change any such parameter by a small (infinitesimal) amount, keeping all others constant. If you know delta_parameter and the corresponding delta_output, you know the derivative of the output with respect to the parameter, equal to delta_output/delta_parameter. Gradient descent then dictates that you nudge the parameter in proportion to that derivative (times error times learning rate). Finally, the derivative can be expanded using the chain rule to include the effects of all the intermediate weights and sigmoids separately. Backpropagation is "merely" a clever trick to keep track of these products-of-derivatives. Apart from that, kudos for this great video series!

@jiwon5315 5 жыл бұрын

You probably know already but you are amazing 💕

@merlinak1878 5 жыл бұрын

Question: If w2 is 0.1 and it gets tuned by 1/3 of 0.3, the new weight of w2 ist 0.2. And now the error of that is new w2 - old w2. So the error of hidden2 is 0.1? Is that correct? And do i need learning rate for that?

@amanmahendroo1784 5 жыл бұрын

That seems correct. And you do need a learning rate because the formula dy/dx = ∆y/∆x is only accurate for small changes (i.e. small ∆x). Good luck!

@merlinak1878 5 жыл бұрын

Aman Mahendroo ok thank you

@carlosromerogarcia1769 2 жыл бұрын

Daniel, I have a liittle doubt. When I see the weights here I think in Markowitz portfolio model and I wonder if the sum of the weights in Neural Networks should be one w1 + w2 + w3 + ... + w n = 1 Do you know if in Keras it´s possible compute this type of constraints... just to experiment. Thank you I love your videos

@lirongsun5848 4 жыл бұрын

Best teacher ever

@ulrichwake1656 6 жыл бұрын

good video man. it really helps a lot. u explain it clearly. thank u very much

@snackbob100 4 жыл бұрын

are all errors across all layers calculated first, then gradient descent is done? or are they done in cognate with each other?

@YashPatel-fo2ec 5 жыл бұрын

what a detailed and simple explanation. thank you so much.

@michaelelkin9542 4 жыл бұрын

I think you answered my question, but to be sure. Is backward propagation only 1 layer at a time? As in you calculate the errors in the weights to the final layer and then as if the last layer went away. Then use the new expected values that just done to adjust the previous layers weights and so on. The key is that you do not simultaneously adjust all weights in all layers, just one layer at a time. Seems like a very simple question but I have never found a clear answer. Thank you.

@jyothishmohan5613 4 жыл бұрын

Why do we need to do backpropagation to all hidden layers but only to the previous layer to the output?

@ganeshhananda 6 жыл бұрын

A really awesome explanation which can be understood by a mere human being like me ;)

@lornebarnaby7476 6 жыл бұрын

Brilliant series, have been following the whole thing but I am writing it in go

@atharvapagare7188 6 жыл бұрын

Thank you, I am finally able to grasp this concept slowly

@sanchitverma2892 5 жыл бұрын

hello

@justassaltyasthesea5533 6 жыл бұрын

Does the coding train have a coding challence about missiles slowly turning towards their target, trying to intercept them? And maybe instead of flying where the target is, the missile uses some advanced navigation? On wikipedia is Proportional Navigation where they talk about a LOS-rate. I think this would be a nice coding challence, but where do I suggest them?

@roger109z 5 жыл бұрын

thank you so much, I watched the 3blue1brown videos and read a few books but this never clicked for me watching you made it click for me.

@Vedantavani3100BCE 4 жыл бұрын

HELP !!!!! In RNN we have only 3 unique weight parameters, so during back prop. their will be only 3 parameters to update then why are RNN goes till the 1st input & creates long term dependencies thereby creates vanishing gradient problem ????

@zareenosamakhan9780 3 жыл бұрын

Hi, Can you please explain the back propagation with cross entropy loss error.

@SetTheCurve 5 жыл бұрын

I would love it if you told us how to include activation in these calculations, because in this example you're only including weights. A high activation and low weight can have the same impact on error as a low activation and high weight.

@jchhjchh 6 жыл бұрын

Hi, I am confused. ReLU will kill the neuron only during the forward pass? Or also during the backward pass?

@hamitdes7865 5 жыл бұрын

Sir thank you for teaching backpropagation😊😊

@ksideth 2 жыл бұрын

Many thanks for simplifying.

@priyasingh9984 4 жыл бұрын

awesome person you taught so well and kept a tuff topic go on interesting

@johnniemeredith9141 5 жыл бұрын

At timestamp 15:15 why was error 2 changed from .3 to .4... it seems like he saw something wrong there that he didn't explain.

@iagosoriano3734 5 жыл бұрын

He just didn't want two equal values of error

@12mkamran 5 жыл бұрын

How would you deal with the fact that in some cases error of H1 and h2 may be 0. Do you not adjust it or is there bias associated with it as well? Thanks

@marcosvolpato8135 6 жыл бұрын

do we have to update all the weights before we calculate the all errors or first calculate all the errors and then update all the weights?

@shimronalakkal523 3 жыл бұрын

Oh yeah. Thank you so much. This one helped a lot.

@BrettClimb 5 жыл бұрын

I feel like the derivative of the activation function is also a part of the equation for calculating the error of the hidden nodes, but maybe it's unnecessary if you aren't using an activation function?

@MRarvai 5 жыл бұрын

Is this the same thing as in the 3Blue1Brown video about the calculus of back propagation, just its way less mathematical ?

@TheCodingTrain 5 жыл бұрын

This is the same! Only 3Blue1Brown's explanation is much better!

@surtmcgert5087 4 жыл бұрын

I understand everything here I am just confused. I watched a video by 3Blue1Brown on backpropogation and their explanation was nothing like this, they had a cost function and derivatives and they were averaging their formulae over every singe training example, Im confused because iv got two completlly different takes on the exact same topic.

@arindambharati2312 4 жыл бұрын

same here..

@wakeatmethree4023 7 жыл бұрын

Hey Dan! You might want to check out computational graphs as a way of explaining backpropagation (Colah's blog post on computational graphs and Andrew Ng's video on computational graph/derivatives as well. )

@TheCodingTrain 7 жыл бұрын

Thank you for this, I will take a look!

@phumlanigumedze9762 3 жыл бұрын

@@TheCodingTrain humility,appreaciated, thank God

@borin2882 3 жыл бұрын

Hi guys, why don't we calculate output at every hidden neural layer to get value of error?

@FilippoMomesso 7 жыл бұрын

In "How to make a Neural Network" by Tariq Rashid weight notation is reversed. For example, in the book the weight of the connection between input node x1 and hidden node h2 is noted as w1,2 but in your videos is w2,1 Which one is more correct? Or is it only a convention?

@TheCodingTrain 7 жыл бұрын

I would also like to know the answer to this question. But I am specifically using w(2,1) since it shows up in row 2 and column 1 in the matrix. And I believe rows x columns is the convention for linear algebra stuff?

@volfegan 7 жыл бұрын

The Notation is a convention. As long you keep using the same notation system, it should not be a problem. Let mathematicians argue about the details. Engineers just have to make the stuff works.

@FilippoMomesso 6 жыл бұрын

The Coding Train ok, just asked my math professor. He said the right convention is (row, column). I read again the section of the book where it talks about matrices. The author contradicts himself. At page 52, he says "it is convention to use row first then columns" but then, when he applies matrices to weight notation he does the opposite. His notation is W(i,h) (i for number of input node and h for number of hidden node) but it is column,rows. Your notation is W(h,i) but with the right convention for matrices row,column. So in the end, using one notation or the other it is the exact same thing because weight w(1,2) in the book is w(2,1) in your videos. Hope I've been enough clear :-P

@jonastjepkema 6 жыл бұрын

TariqnRashid actually doesn't use the conventional matrix notation because he actually looks layer per layer, not as a matrix with rows and columns : he writes w11 meaning "weight from the first neuron of the first layer to the first neuron of second layer". And he goes on so that the weights leaving from one neuron have the same first number which is his own way of representing this. Both work thought, as someone said before me, it just notation, he just doesn't look at it as a matrix (which makes the matrix notation to calculate the outputs less readable unfortunately) hope i managed to make myself clear hahaha

@TheRainHarvester 2 жыл бұрын

The reason for the swapped position is so that when multiplying matricies, the notation is correct for multiplying: M2x3 X M3x2 Where the 3s need to be in those inner positions next to the times symbol, X.

@pythondoesstuff2969 4 жыл бұрын

What if there are more neurons in the hidden layer. Then how to calculate the error

@alexanderfreyschuss1307 7 жыл бұрын

so after each guess your're getting like a new generation of weights, right? could you figure out, like after a couple generations if any specific tuning was correct or it would have been correct to adjust just a single weight instead of all of them? maybe it is sometimes right to "blame" just one of the weights from a huge number of them in order to get perfect guesses in the future, even though it might be unlikely. so my question comes down to, how do you check, if tunings the you (or the network) made were right or if there were tuninges which would do even better?

@FilippoMomesso 7 жыл бұрын

you do it with gradient descent, it will be covered in the next videos. Try to watch 3blue1brown playlist on neural networks if you don't want to wait(I think the third video is about gradient descent). It is very well explained.

@skywalkerdk01 6 жыл бұрын

Awesome video. Thank you for this! First time i understand backpropagation. +1

@masterstroggo 6 жыл бұрын

Daniel Shiffman. I'm following along these tutorials and re-creating the neural network in Processing3. I know you´re a bit of a wiz on that topic so I thought I'd ask you about it. I got a working prototype of the neural network up and running in Processing even though I had to do some workarounds and compromises. One Issue I've run into however is that I cannot seem to figure out how to use static variables or methods in processing. Is this not implemented? None of the standard Java ways of doing it works in the processing environment. I've tried the same code snippets in other more standard Java environments and they work there.

@masterstroggo 6 жыл бұрын

It's kind of solved. I found out that Processing wraps all code in a class which means that all user created classes are treated like inner classes, and from what I understand Java does not support static inner classes unless the parent class (the whole processing sketch in this case) is static. I've found workarounds for that, but I thought I'd share my findings.

@TheCodingTrain 6 жыл бұрын

Yes, this is indeed the case! Let me know how I can help otherwise.

@OskarNendes 2 жыл бұрын

If backprolagation really works why we need to apply it more than once, and why we need to test the Neural network again if the error is possible to be calculated? How do we know we are no just doing a random search in the vicinity?

@OneShot_cest_mieux 7 жыл бұрын

Thank you so much ^^ I have a question: we divide by the sum of weights, but what if the sum of weights is equal to zero ? and your weights are beetween 0 and 1 or beetween -1 and 1 ?

@oooBASTIooo 6 жыл бұрын

gabriel dhimoila weight 0 would mean tgat there is no edge between the vertices. So if the sum would be 0, the output vertex wouldn't be connected to the graph at all and you couldn't measure anything there... What he does is assign every edge its portion of the area by using the arithmetic mean..

@OneShot_cest_mieux 6 жыл бұрын

I don't understand what do you mean by "graph", "edge" and "area" but if weights are beetween -1 and 1 or if weights are initialized by 0 it's probable that the program has to make a division by 0 sorry for my bad english I am french

@nagesh007 Жыл бұрын

Awesome tutorial

@unnikked 7 жыл бұрын

Let me tell you that you are an amazing teacher! ;)

@jagdeepakrawat6028 5 жыл бұрын

In your single output neuron situation, why do you assume that 1/3 of error is from h2 and 2/3 from h1 . since you really don't know their inputs, or the activations they produce. how can you just compare weights and decide that one is more responsible. It's not always true. is it?

@TheRainHarvester 2 жыл бұрын

Yeah you need to consider input.

@samsricatjaidee405 2 жыл бұрын

Thank you. This is very clear.

@teamsalvation 6 жыл бұрын

Well done, a bit of rambling here and there, but overall well done. Now that we have the error, what do we do with it? How do we tweak the weights? -- How do we know to apply 33% of the error to W2 and 67% of the error to W1

@TheCodingTrain 6 жыл бұрын

Only a bit of rambling? That's kind of you to say. Hah. I think I address this if you keep watching parts 2 etc.

@teamsalvation 6 жыл бұрын

The Coding Train You did and it still made no sense ;-) - hahaha - at first. I spent the last 3 days jumping around web sites and other KZbin videos and...after breaking down to using multi-color pen and paper (old school) things finally started to come together, dots connected and I have a working neural network using C++ :-) As I believe you mentioned, it was the backward propagation that was tricky, and even then, it’s the math that threw me off. If I ignored the math for a moment and only focused on the formula, as-is, I had to apply, it all made sense. I then went back and reviewed all the math to understand why it works. So, for all you intro-peeps or just plain thick-headed peeps like me, ignore the math (for a moment). Simply take the formula and code it. At the very least you’ll have working code with hopes that it keeps the momentum going so you then go back to understand the math.

@GurpreetSingh-th1di 6 жыл бұрын

the kind of video i want , thanks

@lucaslopesf 7 жыл бұрын

I finally understand! It's so simple

@andrefurlan 3 жыл бұрын

More than 3 years since this video has been uploaded and i still wander what happened at 3:04

@sz7063 6 жыл бұрын

It is amazing! When will you teach us about recurrent neural network and LSTM?? Looking forward to that!!

@IgorSantarek 6 жыл бұрын

You're doing great job! Keep it up!

@iagosoriano3734 5 жыл бұрын

Why not add w0 in the sum that goes in the denominator?

@raonioliveira8758 6 жыл бұрын

So you are saying that eh1 should be its part of e1 + its part of e2. It seems counter-intuitive for me since I feel like sometimes, maybe e1 can contain a bit of e2 in itself. So, summing those diferent instances of error measurements shouldn't be an effective way of computing eh1 or eh2. I hope I made myself clear, sorry if my english suck.