The Absolutely Simplest Neural Network Backpropagation Example

Рет қаралды 147,073

Күн бұрын

I'm (finally after all this time) thinking of new videos. If I get attention in the donate button area, I will proceed:
www.paypal.com/donate/?busine...
sorry there is a typo: @3.33 dC/dw should be 4.5w - 2.4, not 4.5w-1.5
NEW IMPROVED VERSION AVAILABLE: • 0:03 / 9:21The Absolut...
The absolutely simplest gradient descent example with only two layers and single weight. Comment below and click like!

Пікірлер: 183

@Vicente75480 5 жыл бұрын

Dude, this was just what I needed to finally understand the basics of Back Propagation

@webgpu 13 күн бұрын

if you _Really_ liked his video, just click the first link he put on the description 👍

@justinwhite2725 3 жыл бұрын

@8:06 this was super useful. That's a fantastic shorthand. That's exactly the kind of thing I was looking for, something quick I can iterate over all the weights and find the most significant one for each step.

@SamuelBachorik-Mrtapo8-ApeX 2 жыл бұрын

Hi I have question for you, at 3:42, you have, 1.5*2(a-y) = 4.5*w-1.51, how did you get this result?

@nickpelov Жыл бұрын

... in case someone missed it like me - it's in the description (it's a typo). y=0.8; a=i*w = 1.5*w, so 1.5*2(a-y) =3*(1.5*w - 0.8) = 4.5*w - 3*0.8 = 4.5*w - 2.4 is the correct formula.

@mateoacostarojas6031 5 жыл бұрын

just perfect, simple and with this we can extrapolate easier when in each layer there are more than one neuron! thaaaaankksss!!

@bedeamadi9317 3 жыл бұрын

My long search ends here, you simplified this a great deal. Thanks!

@adoughnut12345 3 жыл бұрын

This was great. Removing non linearity and including basic numbers as context help drove this material home.

@gerrypaolone6786 2 жыл бұрын

If you use relu there is nothing more that that

@Freethinker33 2 жыл бұрын

I was just looking for this explanation to align derivatives with gradient descent. Now it is crystal clear. Thanks Miakel

@saral123 3 жыл бұрын

Fantastic. This is the most simple and lucid way to explain backprop. Hats off

@gautamdawar5067 3 жыл бұрын

After a long frantic search, I stumbled upon this gold. Thank you so much!

@ronaldmercado4768 7 ай бұрын

Absolutly simple. Very useful illustration not only to understand Backpropagation but also to show gradient descent optimization. Thanks a lot.

@praneethaluru2601 3 жыл бұрын

The best short video explanation of the concept0 on KZbin till now...

@arashnozarinejad9915 4 жыл бұрын

I had to write a comment and thank you for your very precise yet simple explanation, just what I needed. Thank you sir.

@markneumann381 12 күн бұрын

Really nice work. Thank you so much for your help.

@SureshBabu-tb7vh 5 жыл бұрын

You made this concept very simple. Thank you

@EthanHofton 3 жыл бұрын

Very clearly explained and easy to understand. Thank you!

@user-mc9rt9eq5s 3 жыл бұрын

Thanks! This is Awesome. I have I question, if we make the NN more complicated a little bit (adding an activation function for each layer), what will be the difference?

@GustavoMeschino 14 күн бұрын

GREAT, it was a perfect inspiration for me to explain this critical subject in a class. Thank you!

@rdprojects2954 3 жыл бұрын

Excellent , please continue we need this kind of simplicity in NN

@ilya5782 5 ай бұрын

To understand mathematics, I need to see an example. An this video from start to end is awesome with quality presentation. Thank you so much.

@sparkartsdistinctions1257 3 жыл бұрын

I watched almost every videos of back propagation even Stanford but never got such clear idea until I saw this one ☝️. Best and clean explanation. My first 👍🏼 which I rarely give.

@webgpu 13 күн бұрын

a 👍is very good, but if you click on the first link on the description, it would be even better 👍

@sparkartsdistinctions1257 13 күн бұрын

@@webgpu 🆗

@xflory26x 21 күн бұрын

Not kidding. This is the best explanation of backpropagation on the internet. The way you're able to simplify this "complex" concept is *chef's kiss* 👌

@bettercalldelta 2 жыл бұрын

I'm currently programming a neural network from scratch, and I am trying to understand how to train it, and your video somewhat helped (didn't fully help cuz I'm dumb)

@popionlyone 5 жыл бұрын

You made it easy to understand. Really appreciated it. You also earned my first KZbin comment.

@santysayantan 2 жыл бұрын

This makes more sense than anything I ever heard in the past! Thank you! 🥂

@brendawilliams8062 9 ай бұрын

It beats the 1002165794 thing and 1001600474 jumping and calculating with 1000325836 and 1000564416. Much easier 😊

@jameshopkins3541 8 ай бұрын

you are wrong: Say me what is deltaW?

@RohitKumar-fg1qv 5 жыл бұрын

Exactly what i needed

@outroutono4937 Жыл бұрын

Thank you bro! Its so easier to visualize it when its presented like that.

@mysteriousaussie3900 3 жыл бұрын

are you able to briefly describe how the calculation at 8:20 works for a network with mutliple neurons per layer?

@adriannyamanga1580 4 жыл бұрын

dude please make more videos. this is amazing

@riccardo700 2 ай бұрын

I have to say it. You have done the best video about backpropagation because you chose to explain the easiest example, no one did that out there!! Congrats prof 😊

@webgpu 13 күн бұрын

did you _really_ like his video? Then, i'd suggest you click the first link he put on the description 👍

@st0a 8 ай бұрын

Great video! One thing to mention is that the cost function is not always convex, in fact it is never truly convex. However, as an example this is really well explained.

@TrungNguyen-ib9mz 3 жыл бұрын

Thank you for your video. But I’m a bit confused about 1,5.2(a-y) = 4,5.w-1,5, Might you please explain that? Thank you so much!

@user-gq7sv9tf1m 3 жыл бұрын

I think this is how he got there : 1.5 * 2(a - y) = 1.5 * 2 (iw - 0.5) = 1.5 * 2 (1.5w - 0.5) = 1.5 * (3w - 1) = 4.5w - 1.5

@christiannicoletti9762 3 жыл бұрын

@@user-gq7sv9tf1m dude thanks for that, I was really scratching my head over how he got there too

@Fantastics_Beats Жыл бұрын

i am also confused this error

@morpheus1586 Жыл бұрын

@@user-gq7sv9tf1m y is 0.8 not 0.5

@Controlvers 3 жыл бұрын

Thank you for sharing this video!

@shilpatel5836 3 жыл бұрын

Bro i just worked it through and it makes so much sense once you do the partial derivatives and do it step by step and show all the working

@shirish3008 3 жыл бұрын

This is the best tutorial on back prop👏

@satishsolanki9766 3 жыл бұрын

Awesome dude. Much appreciate your effort.

@smartdev1636 6 ай бұрын

Thank you so much! I'm 14 years old and I'm now trying to build a neural network with python without using any kind of libraries, and this video made me understand everything much better.

@Banana-anim8ions 3 ай бұрын

No way me too

@smartdev1636 3 ай бұрын

Brooo WW I ended up coding something which looked good to me but for some reason It didn't work so I just gave up on it. I wish you good luck man@@Banana-anim8ions

@aorusaki 4 жыл бұрын

Very helpful tutorial. Thanks!

@jakubpiekut1446 2 жыл бұрын

Absolutely amazing 🏆

@giuliadipalma5042 2 жыл бұрын

thank you, this is exactly what I was looking for, very useful!

@mahfuzurrahman4517 6 ай бұрын

Bro this is awesome, I was struggling to understand chain rule, now it is clear

@mixhybrid 4 жыл бұрын

Thanks for the video! Awesome explanation

@DaSticks 4 ай бұрын

Great video, going to spend some time working out it looks for multiple neurons, but a demonstration on that would be awesome

@lazarus8011 16 күн бұрын

Unreal explanation

@ApplepieFTW Жыл бұрын

It clicked after just 3 minutes. Thanks a lot!!

@paurodriguez5364 Жыл бұрын

best explanation i had ever seen, thanks.

@sunilchoudhary8281 Ай бұрын

I am so happy that I can't even express myself right now

@webgpu 13 күн бұрын

there's a way you can express your happiness AND express your gratitude: by clicking on the first link in the description 🙂

@hamedmajidian4451 3 жыл бұрын

Great illustrated, thanks

@anirudhputrevu3878 2 жыл бұрын

Thanks for making this

@AjitSingh147 Жыл бұрын

GOD BLESS YOU DUDE! SUBSCRIBED!!!!

@zeljkotodor 2 жыл бұрын

Nice and clean. Helped me a lot!

@samiswilf 3 жыл бұрын

This video is gold.

@bhlooli Жыл бұрын

Thanks very helpful.

@lhyd7hak 2 жыл бұрын

Thanks for a very explanatory video.

@zh4842 4 жыл бұрын

excellent video, simple & clear many thanks

@meanderthalensis 2 жыл бұрын

Helped me so much!

@whywhatwherein 18 күн бұрын

finally, a proper explanation.

@elgs1980 3 жыл бұрын

Thank you so much!

@user-tt1hl6sk8y 4 жыл бұрын

Спасибо братан, наконец-то выкупил что после последнего слоя происходит:)

@caifang324 3 жыл бұрын

I thought it is just similar to LMS widely used in communication, right? LMS was developed by Bernard back in 60s.

@javiersanchezgrinan919 17 күн бұрын

Great video. Just one question, this is for 1 x 1 input and batch size of 1 right?. If we have, let´s say a batch size of 2, It is just to sum (b-y)^2 to the loss function ( C= (a-y)^2 + (b-y)^2) isnt it?, with b = w * j and j = the input of the second batch size. Then you just perform the backpropation with partial derivatives. Is it correct?

@jks234 3 ай бұрын

I see. As previously mentioned, there are a few typos. For anyone watching, please note there are a few places where 0.8 and 0.5 are swapped for each other. That being said, this explanation has opened my eyes to the fully intuitive explanation of what is going on... Put simply, we can view each weight as an "input knob" and we want to know how each one creates the overall Cost/Loss. In order to do this, we link (chain) each component's local influence together until we have created a function that describes weight to overall cost. Once we have found that, we can adjust that knob with the aim of lowering total loss a small amount based on what we call "learning rate". Put even more succinctly, we are converting each weight's "local frame of reference" to the "global loss" frame of reference and then adjusting each weight with that knowledge. We would only need to find these functions once for a network. Once we know how every knob influences the cost, we can tweak them based on the next training input using this knowledge. The only difference between each training set will just be the model's actual output, which is then used to adjust the weights and lower the total loss.

@ozziew007 2 жыл бұрын

Am I right in thinking this gets more complicated when we add in a sigmoid function to the hidden layers and also biases?

@ahmetpala7945 4 жыл бұрын

Thank you for the easiest expression for bacpropagation dude

@Dan-uf2vh 4 жыл бұрын

I have a problem and can't find a solution.. how do you express this in matrices? how to do you backpropagate an error vector along the weights and preceding vectors

@evanparshall1323 3 жыл бұрын

This video is very well done. Just need to understand implementation when there is more than one node per layer

@mikaellaine9490 3 жыл бұрын

Have you looked at my other videos? I have a two-dimensional case in this video: kzbin.info/www/bejne/eJXVnmCYhKhoe80

@ExplorerSpace Жыл бұрын

@Mikael Laine even though you say that @3:33 has a typo. i cant see the typo. 1.5 is correct because y is the actual desired out put and it is 0.5. so 3.0 * 0.5 = 1.5

@user-og9zn9vf4k 4 жыл бұрын

thanks a lot for that explanation :)

@dcrespin Жыл бұрын

The video shows what is perhaps the simplest case of a feedforward network, with all the advantages and limitations that extreme simplicity can have. From here to full generalization several steps are involved. 1.- More general processing units. Any continuously differentiable function of inputs and weights will do; these inputs and weights can belong not only to Euclidean spaces but to any Hilbert spaces as well. Derivatives are linear transformations and the derivative of a unit is the direct sum of the partial derivatives with respect to the inputs and with respect to the weights. 2.- Layers with any number of units. Single unit layers can create a bottleneck that renders the whole network useless. Putting together several units in a layer is equivalent to taking their product (as functions, in the set theoretical sense). Layers are functions of the totality of inputs and weights of the various units. The derivative of a layer is then the product of the derivatives of the units. This is a product of linear transformations. 3.- Networks with any number of layers. A network is the composition (as functions, and in the set theoretical sense) of its layers. By the chain rule the derivative of the network is the composition of the derivatives of the layers. Here we have a composition of linear transformations. 4.- Quadratic error of a function. --- This comment is becoming a too long. But a general viewpoint clarifies many aspects of BPP. If you are interested in the full story and have some familiarity with Hilbert spaces please Google for papers dealing with backpropagation in Hilbert spaces. Daniel Crespin

@kitersrefuge7353 5 ай бұрын

Brilliant. What would be awesome is to then further expand if u would and explain multiple rows of nodes...in order to try and visualise if possible multiple routes to a node and so on...i stress "if possible...".

@muthukumars7730 3 жыл бұрын

hi the eqn a=wx usually pass through 0 right?. but in pic it is represented away from center. can i understand that it is just for illustration?

@malinyamato2291 Жыл бұрын

thanks a lot... a great start for me to learn NNs :)

@tellmebaby183 Жыл бұрын

Perfect

@glaswasser 3 жыл бұрын

Okay I am better with language than with maths, so I'll try to sum it up: We basically look for a desired weight in order to get a certain output unit. And we get this desired weight by setting the weight equal to C, which again is the x-value of the minimum of some function that we get by deriving the function containing the original (faulty) output continuously (by steps determined by a "learning rate") until it is very close to zero. That correct?

@srnetdamon 2 ай бұрын

man 4:08 i dont undestrand how you find the valor 4.5, in expression 4.5.w-1.5,

@thiagocrepaldi6071 5 жыл бұрын

Great video. I believe there is a typo at 1:10. y should be 0.5 and not 0.8. That might cause some confusion, especially at 3:34, when we use numerical values to calculate the slope (C) / slope (w)

@mikaellaine9490 5 жыл бұрын

Thanks for pointing that out; perhaps time to make a new video!

@mikaellaine9490 5 жыл бұрын

yes, that should say a=1.2

@Vicente75480 5 жыл бұрын

+Mikael Laine I would be si glad if you could make more videos explaining these kind of concepts and how they actually work in a code level.

@mikaellaine9490 5 жыл бұрын

Did you have any particular topic in mind? I'm planning to make a quick video about the mathematical basics of backpropagation: automatic differentiation. Also I can make a video about how to implement the absolutely simples neural network in Tensorflow/Python. Let me know if you have a specific question. I do have quite a bit experience in TF.

@mychevysparkevdidntcatchfi1489 5 жыл бұрын

@@mikaellaine9490 How about adding that to description? Someone else asked that question.

@animatedzombie64 22 күн бұрын

Best video ever about the back propagation in the internet 🛜

@formulaetor8686 Жыл бұрын

Thats sick bro I just implemented it

@TruthOfZ0 15 сағат бұрын

if we take directly the derivitive dC/dw from C=(a-y)^2 is the same thing right? do we really have to split individually da/dw and dC/da ???

@bofloa 4 жыл бұрын

Great video, however the implementation of this, software wise is not as easy though ...how do you derive a matrices of weights...without iterating through individual weights...and I want to ask you as you mentioned in the video, can we just ignore all the hidden layers weights and do the aggregate derivative of the first weight which in your example is w'3...and live the rest un_change would this still works?_ thanks

@Maniclout 2 жыл бұрын

To derive that you will need some linear algebra

@Janeilliams 2 жыл бұрын

okay !! , it was simple and clear , BUT , things are getting complex when i add two inputs or hidden layers, the partial derivates how to do ? if you anyone have propoiate and simple vedio of doing more than one inputs , hidden layers , then please throw it in the replay box , thanks !

@capilache 4 жыл бұрын

When calculating the new value for w3 (at the end of the video), do you use the other original weights or the updated weights?

@mikaellaine9490 4 жыл бұрын

For a backward pass, you use the old weights.

@rachidbenabdelmalek3098 Жыл бұрын

Thanks you

@hegerwalter Ай бұрын

Where and how did you get the learning rate?

4 жыл бұрын

Very helpful

@marc6775 5 жыл бұрын

At 3:42, you say that 1.5 * 2(a - y) = 4.5 * w - 1.5 How does that work? We know that a = 1.2, y = 0.8, so the equation you present should be 1.5 * 2(1.2 - 0.8) = 1.2

@mychevysparkevdidntcatchfi1489 5 жыл бұрын

1.2 yes. See reply to comment by Thiago Crepaldi

@mikaellaine9490 5 жыл бұрын

@@mychevysparkevdidntcatchfi1489 I have corrected that here: kzbin.info/www/bejne/bpWZm5ltqJuSh9U

@RamonChiNangWong078 3 жыл бұрын

Thank the Gods, I read the comments

@riccardo700 2 ай бұрын

My maaaaaaaannnnn TYYYY

@fredfred9847 2 жыл бұрын

Great video

@rafaelscarpe2928 3 жыл бұрын

Thank you

@RaselAhmed-ix5ee 3 жыл бұрын

what is a

@accelworld6.031 2 жыл бұрын

why not doing 4.5w - 1.5 = 0, to get the minimum of the cost function?

@chris--tech 5 жыл бұрын

太棒了，正是我需要的！！！👍from a Chinese guy

@sameersahu3987 Жыл бұрын

Thanks

@starkarabil9260 2 жыл бұрын

1:11 where does y = 0.8 come from?

@polybender 5 күн бұрын

best on internet.

@AAxRy 3 ай бұрын

THIS IS SOO FKING GOOD!!!!

@alexandrmelnikov5126 7 ай бұрын

man, thanks!

@Blue-tv6dr 3 жыл бұрын

Amazing

@btmg4828 19 күн бұрын

I don’t get it you write 1.5*2(a-y) = 4.5w -1.5 But why? It should be 4.5w -2,4 Because 2*0,8*-1,5= -2,4 Where am I rong?

@Nova-Rift 3 жыл бұрын

hmm, if y = .8 then should dc/dw = 4.5w - 2.4. Because .8 * 3 = 2.4, not 1.5. What am I missing?

@LunaMarlowe327 2 жыл бұрын

very clear