Backpropagation explained | Part 5 - What puts the "back" in backprop?

Рет қаралды 35,314

Күн бұрын

Пікірлер: 203

@deeplizard 6 жыл бұрын

Backpropagation explained | Part 1 - The intuition kzbin.info/www/bejne/jnaWnKWcaKiEotU Backpropagation explained | Part 2 - The mathematical notation kzbin.info/www/bejne/aJ62qqaIrZJkmZI Backpropagation explained | Part 3 - Mathematical observations kzbin.info/www/bejne/fWbFZZ2Id7CBrtk Backpropagation explained | Part 4 - Calculating the gradient kzbin.info/www/bejne/kKOYp5x3j6yhmqc Backpropagation explained | Part 5 - What puts the “back” in backprop? kzbin.info/www/bejne/rnTPfJKVeNaNpLM Machine Learning / Deep Learning Fundamentals playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL

@FerMJy 5 жыл бұрын

awesome.. finally someone explained it... there are books.. expensive ones... that don't.... 1 advice... it would be... "easier" if you use.. 2 neurons per layer instead of 6 or more.... it's more simple to follow up in paper and visually(which is more important)

@hasantas7751 5 жыл бұрын

Aw now got it better thanks. I applied it with cross entropy and worked pretty well

@envy5944 4 ай бұрын

I swear to God this level of explanation surpass ANY university teaching standard. Like after 6 years, new people like me still amaze with the level of details and breakdowns through out all this heavy based maths.

@deeplizard 4 ай бұрын

🙏🙏

@rohitjagannath5331 6 жыл бұрын

My God!!! Hey dear tutor... U are so very amazing with your explanation. I went through all your backprop math and was really amazed by your presentation. U broke down the complexity into such little chunks explained it so very well. The final touch with that image of a man with a blast about the concept of backpropogation was amazing!!! Thank you very much. I need to practice this a lot many times to get a grip over it

@deeplizard 6 жыл бұрын

You're very welcome, rohit! I'm glad you liked it and found the explanations helpful! Keep me posted once you get a full grip on all the math. There is nothing better than the _aha!_ moment you experience once the math fully clicks in your brain!

@paulinevandevorst1514 2 жыл бұрын

THANK YOU! I'm writing a bachelor's thesis on deep learning for noise reduction in MRI images. This has helped me very much in understanding back propagation. The math I found in papers, seemed so difficult, that it was difficult to keep motivated. However, through this series of yours, I have discovered that the math really isn't that difficult, and that it is also really intuitive, once you grasp the notation. Good work! :)

@deeplizard 2 жыл бұрын

Glad to hear it!

@kushagrachaturvedy2821 4 жыл бұрын

These 5 videos on backpropogation are some of the best on KZbin.

@brendanmcgivern3540 6 жыл бұрын

This series on backpropagation is simply amazing! Arguably the best I have come across. You cover low-level topics as well as abstract away the details at times to communicate a high-level understanding using your visual models. Great work! One thing you could also add is a video covering some basic calculus. Most online courses generally provide the derivates for you and don't explain how functions can simply be derived using: (f(x+dx)-f(x))/dx. Or how the power rule, product rule, sum rule, chain rule etc.. can be used or why they work.

@deeplizard 6 жыл бұрын

Thank you, Brendan! I'm really glad you enjoyed the series! Also, thanks for the calculus suggestion. Math is a huge passion of ours, and we intend to add math videos in the future. 😊

@wgc9794 3 жыл бұрын

This was great. Part 1 set it up nicely. Part 2 you made it so clear what each ingredient was. That was dry but completely essential as you drew us through all the required calculations. You were able to patiently step through them for us as straightforwardly as they possibly could be. My hat is off to you and I look forward to viewing other of your videos to learn more.

@DEEPAKSV99 4 жыл бұрын

Got the exact same feeling at 12:56! No words, to thank you for the amount of social service you are doing to the tech community. I feel your work would have a major contribution to the AI development across the world, since a lot of young beginners like me are motivated into this field even more by your style of teaching. The amount of effort you put up in keeping your content short, precise and yet interesting seems incredible. . .

@deeplizard 4 жыл бұрын

Thank you, Deepak!

@xariskaragiannis1190 Жыл бұрын

Best explanation for backpropagation out there! All i have to say is THANK YOU

@caesarsalad44 5 жыл бұрын

Cheers mate indeed. This 5-video series was very helpful in terms of explaining things slowly and concisely. Thank you deeplizard

@Hatesun-lz6fi Жыл бұрын

Outstanding, clear and concise explanation. The idea of splitting this topic in 5 videos was very helpful to me. Thank you!

@yoyomemory6825 3 жыл бұрын

BEst explanation of BP ever!!!! Thank you soooooooo much!

@bishwasapkota9621 3 жыл бұрын

This is by far the best one for Backprop!!!! Congrats and thanks!

@vinaychitrakathi9237 3 жыл бұрын

Chris and Mandy hats off to your hardwork. 🙏

@JarosawRewers 5 жыл бұрын

Thank you very much for your series about backpropagation. You have a talent for explaining complicated things.

@ayodejioseni2357 3 жыл бұрын

Fantastic explanation of backpropagation. Thanks so much

@nurali2525 3 жыл бұрын

Just in short - brilliant! So much work to make such series of videos, thank you very much!

@Jackson_Leung 5 жыл бұрын

I am so grateful that I run across this video. I am working on a NN project, and this video just explain Backpropagation so well!

@Tobbse91 5 жыл бұрын

I watched the whole playlist for deep learning now because i have to learn that stuff for my exams. The first thing i have to say, i do not write a lot of comments on videos. But I am really impressed of your explanations for such complex topics in deep learning. I don't know anyone who can speak that clearly! You got real talent! You helped me trough this hard stuff. I will recommend your channel to my course. Best regards from germany! And thank you very much!

@deeplizard 5 жыл бұрын

Thank you, Tobias! Glad you commented, and I'm happy to hear you're learning from the videos!

@GauravSingh-ku5xy 3 жыл бұрын

Thank you for existing.

@svssukesh1170 3 жыл бұрын

This series is really good......the way you explained it is amazing....got a clear understanding on how the math under the backpropagation works....thank you

@circuithead94 3 жыл бұрын

Good series of explanations. I need to still rewatch them and get my hands dirty with a simple example to hammer it down. I know all the information is there in the videos so it's not like one of those lecture videos where you rewatch it in hopes of getting it lol. I usually like looking at the whole math at once, instead of small chunks at a time. But I definitely, like what you did. I did keep the notation video open on a separate screen. Again thanks a lot for these.

@gilbertnordhammar3677 5 жыл бұрын

I don't think there's anyone who could have explained this more clearly than you just did! But still... Maaaan, this was INTENSE! I'm glad I made it through, though :D

@deeplizard 5 жыл бұрын

I'm glad you did too! 👏 And thank you!

@justing912 3 жыл бұрын

Really, really clear and consistent explanation. You rule!

@bhaveshshah9487 5 жыл бұрын

Hi! This was amazing and speechless! This is the best explanation on Backprop I ever came across. Thank you so much for sharing your knowledge. I look forward for more content with Deep Learning, AI, CNN and so on. God Bless!

@timrault 6 жыл бұрын

Thank you so much for this series of videos, it really helped me get a better understanding of the concept of backpropagation, especially the 'back' part of it ! Your explanations of the maths are really clear, and I like that you are very precise about all the notations

@deeplizard 6 жыл бұрын

You're very welcome, Tim! Glad you found everything clear and helpful!

@dinos3741 4 жыл бұрын

Excellent approach and explanation! Especially the part 4 where you analyze the back propagation in the previous layers is very clear. Thanks!

@EliorBY 3 жыл бұрын

wow. what an amazing illustrative mathematical explanation. just what I was looking for. thank you very much deep lizard!

@richarda1630 3 жыл бұрын

quite a good effort on such a herculean problem, I think to fully appreciate this concept, people would need a good understanding of derivatives and basic calculus , imho. Love the Mind Blown ending :) will watch this again =)

@stackologycentral 4 жыл бұрын

Amazing explanation. Finally I fully understand backprop calculation. Keep these coming. :)

@thutoaphiri6989 4 жыл бұрын

this series was really great - thank you very much for the clear explanation of the math terms and how they all fit into each other. the series does not touch on the maths which will be used for updating the weights in L-2 and that is actually what i wanted to see. i have an idea based in what you covered but i would like to be sure - can you kindly point me to another video/article you know of which builds to that level of depth? or can you pleeeeeeaaaaaase make another video where you do that

@scrap8660 Жыл бұрын

You. Are. Awesome. Thank you SO MUCH for this!!!

@joaoramalho4107 3 жыл бұрын

Marvelous explanation. Wow! Might even consider mentioning who made me really understand deep learning in my master thesis

@PritishMishra 3 жыл бұрын

Amazing !! The all parts of Back prop series is just mind-blowing and understood all the Mathematics part

@Tschibibo 4 жыл бұрын

Thank you! Neven seen such a great and clear explanation on backpropagation!

@prabaa123 2 ай бұрын

Great explanation, Thank you so much !!

@panwong9624 6 жыл бұрын

This is a very informative video. Because of your amazing and effective explanation on back-propagation, I now understand the math behind the calculation. Thanks! :)

@deeplizard 6 жыл бұрын

Thank you, Pan! And you're welcome :) I'm really glad to hear that you now understand the math behind it!

@arjunbakshi810 4 жыл бұрын

best explanation of back prop out there

@EnvoyOfFabulousness 5 жыл бұрын

Hey deeplizard! I posted 6 days ago and since then I've managed to successfully implement a neural network in java capable of learning with backpropagation, while not using any pre-existing libraries. I'm pretty pleased I was able to do so, and feel I have a good understanding of these ideas and algorithms. So first thank you for this excellent series. I wanted to ask a question to clarify something, though. The network I created for testing had 3 layers: the input, one hidden, and the output layer. As you explained, when calculating the derivative of the error with respect to any given weight in the L'th layer, we have those three main terms we need to solve for and multiply together (I just simply labeled them Term 1, 2 and 3). Similarly, if we were taking a given weight in the other layer (L-1), we also have three terms. Two of those terms for the calculation for the node in the L-1'th layer will be solved the same way as for the L'th layer. Also, any change in the activation of a node in layer L-1 affects the nodes in L, so we sum up these effects. So here's my question: suppose I introduced a second hidden layer, L-2. Any changes to the activation of a node in L-2 should affect L-1, so I should again sum up these affects, right? Also, changes in L-2 affect L-1, which then affect L, correct? So when changing a layer beyond L-1, does the change have a sort of recursive, exponential effect for each additional layer deep you go? Putting it in some pseudo-code: When updating L: For every node in L: Calculate Term 1 Calculate Term 2 Calculate Term 3 derivative_rate = Term 1 * 2 * 3 Update Weight in L based on the derivative_rate When updating L-1: For every node in L-1: Create Term 1 Calculate Term 2 Calculate Term 3 For every node in L: ( Sigma ) { Calculate Term 1 a Calculate Term 2a Calculate Term 3 a derivative_rate_a = Term 1a * 2a * 3a Add derivative_rate_a to Term 1 } derivative_rate = Term 1 * 2 * 3 Update Weight in L-1 based on the derivative_rate Now, if I were to update L-2: For every node in L-2: Create Term 1 Calculate Term 2 Calculate Term 3 // from here is where I start to be unsure For every node in L-1: ( Sigma ) { Create Term 1b Calculate Term 2b Calculate Term 3 b For every node in L: ( Sigma ) { Calculate Term 1 a Calculate Term 2a Calculate Term 3 a derivative_rate_a = Term 1a * 2a * 3a Add derivative_rate_a to Term 1b } derivative_rate_b = Term 1b * 2b * 3b Add derivative_rate_b to Term 1 } derivative_rate = Term 1 * 2 * 3 Update Weight in L-2 based on the derivative_rate I would write that out with mathematical terms and sigmas but I don't know how well that would translate in text. Mainly I'm wondering if I'm on the right track with my thinking that this becomes recursive, or exponential in nature as you attempt to update deeper and deeper into the network, or if I am over-thinking this. Many Thanks!

@GoredGored 3 жыл бұрын

You put a lot of effort in creating this material. Thank you for that. I was hoping you will convert the intuition into python. Anyway, I am convinced you should be one of companions on my long and frustrating ML journey. Subscribed.

@vikrambharadwaj6349 5 жыл бұрын

Why aren't you at a million subs already? :D Btw, thanks for the explanation!

@abubakarali6399 3 жыл бұрын

Why these youtubers not replace University professors. Can save much of our time.

@crazy-boy6143 2 жыл бұрын

Didn't quite get it yet, since I jumped right into this video to see if it had a solution to the problem on which I'm working. The math didn't seem so sophisticated, however, I'll reevaluate after seeing the videos about backpropagation. Thanks for the videos btw

@davidtemael1307 6 жыл бұрын

Deeplizard you are such an awesome Wizard!

@transolve9726 6 жыл бұрын

Agreed, she is pretty good at explaining things with the voice for it as well.

@John-wx3zn 5 ай бұрын

Hello Mandy. Thank you. I learned. The weight 1,2 comes from the 3rd weights vector position in layer L and this is the number on top of the arrow that points from node 2 to node 1. The arrow does not mean that weight 1,2 is flowing from node 2 in L-1 to node 1 in L. The flow of the activation function outputs are not being shown.

@javiercmh 2 жыл бұрын

this is great!! I am studying for my exam tomorrow. thank you!

@javiercmh 2 жыл бұрын

I had an A+!! Thanks again

@deeplizard 2 жыл бұрын

Congrats! 🥳

@moizahmed8987 4 жыл бұрын

great explanation, thanks a lot

@tymothylim6550 3 жыл бұрын

Thank you very much for this 5-part series! It was a fantastic explanation and I learnt a lot! I like the meme too xD

@hussainbhavnagarwala2596 Жыл бұрын

The video series was super helpful, could you do a solved problem with fewer nodes in each layer and code it step-wise, that would be super helpful

@paraklesis2253 5 жыл бұрын

You deserve more likes

@DavidCH12345 5 жыл бұрын

Your explenations are awesome! And they are in a debth where I can actually use them for my studies. Great work! keep it up :-)

@KatarDan 6 жыл бұрын

Hey, that was amazing. I don’t even know the chain rule and partial derivatives but it was quite intuitive thanks to explanations.

@deeplizard 6 жыл бұрын

Thanks for letting me know, Dmitry! Glad it was easy to follow intuitively even without the calculus background.

@paulbloemen7256 5 жыл бұрын

During the backpropagation process, for any neuron in the hidden layers and the output layer one has the following information available: - The notion of the loss, expressed in a certain number, say -0,45; - The activation value of that neuron, say 0,65; - The activation value of a neuron in the previous layer, say 0,35; - The weight of the activation value of that same neuron in the previous layer, say 0,55; - The bias, say 0,75. Knowing these five values, and maybe using some or all of them: - How is the value of the weight, that is now 0,55, modified? - How is the value of the bias, that is now 0,75, modified? - Is the value of the learning rate, or step size, that is used in both modifications influenced by any of the values mentioned here, like, when these values become very big or very small? I truly would appreciate an answer to these questions, thank you very much!

@deeplizard 5 жыл бұрын

Hey Paul - Check out the earlier video/blog on the learning rate to this series. There, I explain how the weights are updated. deeplizard.com/learn/video/jWT-AX9677k Bias terms are updated in the same fashion. More on bias here: deeplizard.com/learn/video/HetFihsXSys In regards to the learning rate, the next video explains the vanishing/exploding gradient problem that can happen when the weights become very small/big, and the learning rate plays a role in this problem. In general, the lr alone is not necessarily influenced by the weights or biases, but there are techniques that you can use (in addition to weight initialization), like steadily modifying your lr during training. deeplizard.com/learn/video/qO_NLVjD6zE

@paulbloemen7256 5 жыл бұрын

deeplizard Thank you for your prompt answer! I saw all the videos the past few days to eat myself into the subject, I somehow missed the point of the video on the learning rate. Just to be sure, assuming a learning rate of 0,004 The new weight will be 0,55 - (0,45 / 0,55) * 0,004 = 0,5467 . The new bias will be 0,75 - (0,45 / 0,75) * 0,004 = 0,7476 . Are these numbers realistic? The modifications are really small, that would mean quite some steps are needed, like even if you happen to have to reach 0,5 and 0,7 for those two parameters in some straight way. Thus, I just wonder: if so many steps are needed anyway, why trust one simple formula to the hilt that doesn't seem to perform that welł? I guess I am touching the problem of vanishing and exploding gradients here. I would say, whatever the formula yields, make sure there is a minimum modification, against the vanishing gradient problem, and a maximum modification, against the exploding gradient problem. During the whole test process one would expect that the weight is nearing its definitive value, as the loss is nearing its minimum value. Here one would expect less of a risk of an exploding gradient, and a need for smaller modifications to reach the optimal value of the weight. The maximum value of the modification will not hurt anyway, and the minimum modification can be lowered steadily: this minimum modification could be a function of the decrease of the loss when compared to the previous test, or... of the amount of tests that are performed. Well, the last paragraph may be a bit of nonsense, or?

@timothyjulian6817 3 жыл бұрын

This is amazing!

@simonty1811 3 жыл бұрын

this was better than my masters course

@furkanfiratli7908 2 жыл бұрын

it was amazing. very helpful! thank you so much!

@im-Anarchy Жыл бұрын

Just the lengths of positive comments show us that how satisfied the audience is, and also Arigato, for teaching this dumb brain this complex topic in simple terms

@capeandcode 6 жыл бұрын

This is exactly what I needed to see. I can die in peace now. Though a final question. What do people mean by DELTA (Greek Symbol) ? They multiply it by values to get new weights I guess or something. You have not used it. I don't understand the intuition behind it. Is is something different or what? please explain.

@joshlyn9041 5 жыл бұрын

it is the indicator function

@dinos3741 4 жыл бұрын

@Ian Song Delta is the symbol denoting "difference" or small changes (Δx means a small change in x)

@s25412 3 жыл бұрын

10:59 seems like 'n' represents both the number of nodes as well as the number of training samples. Recommend you use 'm' for the latter to avoid confusion as it may cause problems when implementing.

@apilny2 5 жыл бұрын

These are really great! Thanks so much for uploading these. Cheers :-)

@pratiklohia1 4 жыл бұрын

These are superb. Just one request. Can you help with the nesting diagram of the math equations to be able to wrap around the whole idea?

@acyutanand 3 ай бұрын

Well i got that part without having to go through these videos since I have some good maths background. But thanks for the revision.

@quanchenghuang7864 5 жыл бұрын

This series of back propagation intro are just amazing. I really like your way of presenting the math which is using an example to guide us through it. I only have one question: during the training process, do we update the weights in the shallow layer or deep layer first?

@priyadarsanpriyadarsan4726 3 жыл бұрын

i think to calculate derivatives for weights in layer L-2, the final formula will again change as the number of terms / distance from layer L increases. I feel that as the distance from L increases, the gradient formula will also increase and become more complex.

@ramakrishnandurairaj9386 3 жыл бұрын

Awesome from India tamilnadu

@paulbloemen7256 5 жыл бұрын

As I understand it, because of the huge amount of different possible values for weights and biases, the values of the loss can be seen as a huge multdimensional landscape with many peaks and valleys. The valley in which backpropagation is looking for a minimum loss is dependent of the random choice of a set of start values for weights and biases: another set of these values could lead to a different place of the loss in this landscape, with perhaps a lower minimum loss. Is it common practice to sometimes change some values of weights and biases, in the hope that the loss "gets into a different valley", looking for a lower minimum loss? I guess this leads to several neural networks kind of competing against each other, where eventually the neural network with the lowest minimum loss is chosen. I truly would appreciate an answer to this question, thank you very much!

@edwardj.warden5072 2 жыл бұрын

Hi Mandy, I like your tutorial as well as your voice frankly. About your backpropagration tutorial, there has hell a of lot math you have covered and has there, Is that necessary to have fully understanding on how the math work ? Or just you know at least some sort of basic understanding ? Thank you.

@roger_is_red 4 жыл бұрын

Well I think it was very well done!

@yaribnevarez2541 4 жыл бұрын

Just amazing, Thanks!

@Enigmo1 4 жыл бұрын

How are you so good at explaining

@evolvingit 5 жыл бұрын

awesome explanation!!!!

@gaugustop 2 жыл бұрын

Amazing!!!!

@gustasmikutavicius8494 3 жыл бұрын

Hi, I have a question regarding the formula in the blue rectangle at 6:57. I think that we need to divide the right side of the equation by n, because the right side is n times the left side of the equation at the moment. Am I right?

@mitulagr 2 жыл бұрын

same question

@silicondecay 4 жыл бұрын

Thank you very much for these videos, but isn't the animation at 10:08 that is connecting node 2 in layer L-1 to node j in layer L supposed to be 1 line? Because having 4 lines connecting to the whole of layer L isn't connecting to a node 'j'. It is the same definition given in the first video on backpropogation where $w_{jk}^{L}$ is 'the weight of the connection that connects node k in layer L-1 to node j in layer L.

@trentonpaul6376 5 жыл бұрын

So in backpropagation do you use the updated weight values connecting L-1 and L to update those connecting L-2 and L-1 or do you maintain all the weights' original values until all calculations are finished?

@ajinkya933 4 жыл бұрын

Hey I had a small confusion regarding the loss. Shouldn't SGD minimize the loss for the node whose yield is maximum and maximize the loss for the other nodes? In some of your videos you said this, but in other videos, you said that SGD tries to minimize the total loss.

@pile_of_Logs 6 жыл бұрын

Great video! Your explanations are super helpful! One thing I noticed was at 9:52 you said you use the power rule to differentiate that one single term after all of the others are canceled out... did you mean the product rule? I could be wrong but just wanted to point it out :)

@deeplizard 6 жыл бұрын

Thanks, Julia! Yes, you're right - Whoops! We do indeed use the product rule _first_ to differentiate that term, which ends up leading us to use the power rule. Further explanation - Using the product rule there, we take (the derivative of the first term) times (the second term) plus (the first term) times (the derivative of the second term). It is when we're taking the derivative of the second term that we use the product rule. Nice catch!

@timxu1766 4 жыл бұрын

So basically, the last set of activation functions are composites functions that contain all the weights in the neural network. So then we can apply the chain rule to calculate the gradients for every weight.

@fabricendui7902 3 жыл бұрын

super useful

@abubakarali6399 3 жыл бұрын

Thinkinh about derivative of loss (100)th layer with respect to w(L-98)?

@capeandcode 6 жыл бұрын

This is exactly what I need to see. I can die in peace now. Though a final question. What do people mean by DELTA ? You have not used it. I don't understand the intuition behind it. Is is something different or what? please explain.

@Arjun-kt4by 4 жыл бұрын

at 10:03 you say according to power rule we calculate the derivate. According to my knowledge power rule says if y =x^n, then dy/dx = nx^(n-1). there is now power in that term. so how are you applying power rule mam? I could not understand how the derivative is calculated in this and the previous video?

@Arjun-kt4by 4 жыл бұрын

@@popcat2309 exactly what i said. Read the comment again and then think about my question

@DemianPresser 4 жыл бұрын

Love it

@2007chandanashish 5 жыл бұрын

I have one question, when we are calculating loss at output layer, it is simply a-y , a: output at final layer , y: expected output. However, while calculating loss at any node which is part of hidden layer, what will be the expected output ?

@deeplizard 5 жыл бұрын

The loss is only calculated at the end of the network for the final output nodes. The values of these output nodes, however, are influenced by the values of previous nodes and weights in the network, which are learned and updated during training to minimize the loss.

@meins2966 5 жыл бұрын

Thanks for your great tutorials!! Could you recommend a book from which I can cite these formulas please?

@deeplizard 5 жыл бұрын

This book by Michael Nielsen is great. Chapter 2 is where backprop is explained.neuralnetworksanddeeplearning.com/chap2.html

@MrStudent1978 6 жыл бұрын

Thanks miss! Your videos are amazing as ever.... I have a request, can you explain vectorization of the back propagation procedure for a case with a large number of hidden layers, may be 5 say. I want to see the operations happening at matrix level. Please do help.

@deeplizard 6 жыл бұрын

Thanks, Gurpreet! We'll consider making a future video with this type of example.

@MrStudent1978 6 жыл бұрын

Good morning!

@justchill99902 5 жыл бұрын

Ahh! "back" keeps coming "back". 4:45 and 12:56 is so funny haha.

@Furioso_Fourier 6 жыл бұрын

Thank you, it was amazing. But I have this question that may seem stupid but still i have it. Imagine that I just have a single hidden layer with 5 nodes(whatever) and that I want to initialize all the weighs to be the same (1/n for example). So my question is: In this case, all the nodes of the NN have the same parameters, the same inputs, the same weighs, etc... Then how would the weighs evolve to be different from a node to another?

@deeplizard 6 жыл бұрын

Hey olias - Suppose our input is an image. Each pixel from the image is going to represent a node in the input layer. Even if we initialize all weights with the same value, when we forward propagate the input, each node (being the activation output of weighted sum of the output of the previous layer) will be different from one another. Therefore, when we backpropagate, the weights will be updated differently from one another as well. Make sense?

@sharkk2979 2 жыл бұрын

got 100%

@kelkka7 5 жыл бұрын

Any chance of making a video about how all this relates to matrices in mathematical terms? #struggleisreal

@transolve9726 6 жыл бұрын

Do you reckon the "back propagation" or learning can be optimized (sped up) using another type of algorithm?

@deeplizard 6 жыл бұрын

Stochastic gradient descent (SGD) is the learning algorithm, but it uses backprop as a tool to calculate the required gradients. There are many variations of SGD that may have benefits for use during training over standard SGD. Adam is the most popular one at the moment. but I'm not sure if it's actually any faster than standard SGD.

@bipulkumardas3853 6 жыл бұрын

Thanks mam...these videos are very usefull...can u plz give an working example by taking a small set of input??

@deeplizard 6 жыл бұрын

You're welcome, BIPUL! By a working example, do you mean you want to see the working out of the backprop math that was shown in the videos but with actual numbers?

@bipulkumardas3853 6 жыл бұрын

@@deeplizard yes mam...that will clear more about how to apply??

@user-yj7rc2qs8d 6 жыл бұрын

Awesome

@golangshorts 5 жыл бұрын

4:53 i am seeing me there

@deeplizard 5 жыл бұрын

Lol! Hopefully you will transition from that state to the state at 12:55 🤯

@math__man 5 жыл бұрын

Hello, First of all I would like to say that your videos are amazing, they taught me very much, they are high quality and are very informative! I think that in this video the math could've been simpler if you explained the backpropagation algorithm using the delta (error per weighted input) notation as explained in the second chapter of this book: static.latexstudio.net/article/2018/0912/neuralnetworksanddeeplearning.pdf This packs all the maths for backpropagation neatly in 4 equations (3 if you are not including biases in your network).