Neural Networks Part 7: Cross Entropy Derivatives and Backpropagation

  Рет қаралды 136,447

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 327
@statquest
@statquest 3 жыл бұрын
The full Neural Networks playlist, from the basics to deep learning, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@bigbangdata
@bigbangdata 3 жыл бұрын
Your talent for explaining these difficult concepts and organizing the topics in didactic, bite-sized, and visually compelling videos is astounding. Your channel is a great resource for beginners and advanced practitioners who need a refresher on a particular concept. Thank you for all that you do!
@statquest
@statquest 3 жыл бұрын
Wow, thank you!
@Rationalist-Forever
@Rationalist-Forever 2 жыл бұрын
Right now I am reading the ML book "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani. Many a times, I stuck at the mathematical details and could not comprehend and stopped reading. Although I love that book a lot but felt frustrated. But now I use your videos and read the book side by side and now everything start making sense in the book. You are such a great story teller. They way you explains in the video with examples, it seems like I am listening to a story "There was a king ..." It is so soothing and complex topics become easy. I feel you are my friend and teacher in my ML journey who understand my pain, and explains me the hard things with ease. BTW, I have done Master in Data Science from Northwestern University and got good ML foundation from that course. But I can tell you now I feel complete after going through most of your videos. Mr. Starmer, we are lucky to have you as such a great teacher and mentor. You are gifted to teach people. I will pledge to support your channel from my heart. Thank you.
@statquest
@statquest 2 жыл бұрын
Wow! Thank you very much!!! :)
@wennie2939
@wennie2939 3 жыл бұрын
Josh Starmer is THE BEST! I really appreciate your patience in explaining the concepts step-by-step!
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@naf7540
@naf7540 Жыл бұрын
Dear Josh, how is it at all possible to deconstruct so clearly all these concepts, just incredible, thank you very much, your videos are addictive!!
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@RubenMartinezCuella
@RubenMartinezCuella 3 жыл бұрын
Even though there are many other youtube channels that also explain NN, your videos are unique in the sense that you break down every single process into small operations easy to understand by anyone. Keep up the great work Josh, everyone here appreciates so much your effort!! :D
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@simhasankar3311
@simhasankar3311 Жыл бұрын
Imagine the leaps and bounds we could achieve in global education if this teaching method was implemented universally. We would have a plethora of students equipped with the analytical skills to tackle complex issues. Your contributions are invaluable. Thank you!
@statquest
@statquest Жыл бұрын
Thank you so much!
@iZapz98
@iZapz98 3 жыл бұрын
all your videos have helped me tremendously studying for my ML - exam, thank you
@statquest
@statquest 3 жыл бұрын
Great to hear!
@positive_freedom
@positive_freedom 2 жыл бұрын
Your videos are truly astounding. I've gone through so many youtube playlists looking to understand Neural Networks, and none of them can come close to yours in terms of simplicity & content! Please keep up this amazing work for beginners like me :)
@statquest
@statquest 2 жыл бұрын
Glad you like them!
@YLprime
@YLprime 10 ай бұрын
This channel is awesome, my deep learning knowledge is sky rocketing everyday.
@statquest
@statquest 10 ай бұрын
bam!
@Lucas-Camargos
@Lucas-Camargos Жыл бұрын
This is the best Neural Networks example video I've ever seen.
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@AbdulWahab-mp4vn
@AbdulWahab-mp4vn Жыл бұрын
WOW ! I have never seen anyone explaining topics in such minute level detail. You are an angel to us Data Science Students ! Love from Pakistan
@statquest
@statquest Жыл бұрын
Thank you very much!
@abhishekjadia1703
@abhishekjadia1703 2 жыл бұрын
Incredible !! ...You are not teaching, You are revealing !!
@statquest
@statquest 2 жыл бұрын
Wow, thank you!
@salahaldeen1751
@salahaldeen1751 2 жыл бұрын
I don't know where else I could understand that like this. Thanks, you're talented!!!
@statquest
@statquest 2 жыл бұрын
Thanks!
@vishnukumar4531
@vishnukumar4531 2 жыл бұрын
0 comments left unreplied! Josh, you are truly one of a kind! ❣❣❣
@statquest
@statquest 2 жыл бұрын
Thanks!
@vishnukumar4531
@vishnukumar4531 2 жыл бұрын
@@statquest TRIPLE BAAAM!❤❤
@anisrabahbekhoukhe3652
@anisrabahbekhoukhe3652 Жыл бұрын
i literally cant stop from watching those vids, help me
@statquest
@statquest Жыл бұрын
bam! :)
@farrukhzamir
@farrukhzamir 8 ай бұрын
Brilliantly explained. You explain the concept in such a manner that it becomes very easy to understand. God bless you. I don't know how to thank you really. Nobody explains like you.❤
@statquest
@statquest 8 ай бұрын
Thank you!
@saurabhdeshmane8714
@saurabhdeshmane8714 Жыл бұрын
Incredibly done....it doesn't even feel like we are learning such complex topics...keeps me engaged for going via entire playlist..thank you for such content!!
@statquest
@statquest Жыл бұрын
Glad you liked it!
@yourfavouritebubbletea5683
@yourfavouritebubbletea5683 Жыл бұрын
Incredibly well done. I'm astonished and thank you for letting me not have a traumatic start with ML
@statquest
@statquest Жыл бұрын
Thank you! :)
@ligezhang4735
@ligezhang4735 Жыл бұрын
This is so impressive! Especially for the visualization of the whole process. It really makes things very easy and clear!
@statquest
@statquest Жыл бұрын
Thank you!
@susmitvengurlekar
@susmitvengurlekar 2 жыл бұрын
"I want to remind you" helped me understand why in the world is P(setosa) involved in output of versicolor and virginica. Great explanation!
@statquest
@statquest 2 жыл бұрын
Hooray!!! I'm glad the video was helpful.
@samerrkhann
@samerrkhann 3 жыл бұрын
A huge appreciation for all the efforts you put. Thank you josh!
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@tejaspatil3978
@tejaspatil3978 2 жыл бұрын
your way of learniing is on next level. thanks for having us this best sessions..
@statquest
@statquest 2 жыл бұрын
Thank you!
@johannesweber9410
@johannesweber9410 6 ай бұрын
Nice Video! First I was a little confused (like always) but than I pluged your values and the exact structure of your Neural Network into my own small framework and compared the results. After I did this, i followed your instructions and implemented the backpropagation step-by-step. Thanks for the nice video!
@statquest
@statquest 6 ай бұрын
BAM!
@nabeelhasan6593
@nabeelhasan6593 3 жыл бұрын
At last I am really thankful for all your hard effort you put in these videos immensely helped me in making a strong foundation in deeplearning
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@pietrucc1
@pietrucc1 3 жыл бұрын
I started using the techniques of the machine learning from a little less than a month, I found this site and it helped me a lot, thank you very much !!
@statquest
@statquest 3 жыл бұрын
Thank you!
@rajpulapakura001
@rajpulapakura001 Жыл бұрын
Clearly and concisely explained! Thanks Josh! P.S. If you know your calculus, I would highly recommend trying to compute the derivatives yourself before seeing the solution - it helps a lot!
@statquest
@statquest Жыл бұрын
bam! :)
@Meditator80
@Meditator80 3 жыл бұрын
Thank you so much! It is so clear for explaining the calculation of Cross Entropy Derivative and how to use it in BP
@statquest
@statquest 3 жыл бұрын
Thank you very much! :)
@donfeto7636
@donfeto7636 Жыл бұрын
You are a national treasure BAAAM. Keep doing those video they are great.
@statquest
@statquest Жыл бұрын
Thank you!
@sergeyryabov2200
@sergeyryabov2200 11 ай бұрын
Thanks!
@statquest
@statquest 11 ай бұрын
TRIPLE BAM!!! Thank you so much for supporting StatQuest!!! :)
@chethanjjj
@chethanjjj 3 жыл бұрын
@18:20 what i've been looking for awhile. thank you!
@statquest
@statquest 3 жыл бұрын
Bam! :)
@GamTinjintJiang
@GamTinjintJiang 2 жыл бұрын
wow~ your videos are so intuitive to me. What a precious deposits!
@statquest
@statquest 2 жыл бұрын
Thanks!
@Recordingization
@Recordingization 3 жыл бұрын
Thanks for nice lecture!I finally understand the derivative of cross Entropy and optimization of bias.
@statquest
@statquest 3 жыл бұрын
bam!
@RC4boumboum
@RC4boumboum 2 жыл бұрын
Your courses ara so good! Thanks a lot for your time :)
@statquest
@statquest 2 жыл бұрын
You're very welcome!
@KayYesYouTuber
@KayYesYouTuber Жыл бұрын
So beautiful. Never seen anything like this!!!
@statquest
@statquest Жыл бұрын
Thank you!
@arielcohen2280
@arielcohen2280 Жыл бұрын
hate all the songs and the meaningless sound affects, but damn I have been trying to understand this concept for hell long of a time and you made it clear
@statquest
@statquest Жыл бұрын
Noted!
@susmitvengurlekar
@susmitvengurlekar 2 жыл бұрын
There is nothing wrong in self promotion and frankly, you don't need promotion. Anyone who watches any one video of yours, will prefer your videos over any other videos henceforth.
@statquest
@statquest 2 жыл бұрын
Wow! Thank you!
@bonadio60
@bonadio60 3 жыл бұрын
Your explanation is fantastic!! Thanks
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@GLORYWAVE.
@GLORYWAVE. 10 ай бұрын
Thanks Josh for an incredibly well put together video. I have two quick questions: 1) When you initially get that new b3 value of -1.23, and then say to repeat the process, I am assuming the process is repeated with a new 'batch' of 3 training samples, correct? i.e. you wouldn't use the same 3 that were just used? 2) Are these multi-classification models always structured in such a way that each 'batch' or 'iteration' includes 1 actual observed sample from each class like in this example? It appears that the Total Cross Entropy calculation and derivatives would not make sense otherwise. Thanks again!
@statquest
@statquest 10 ай бұрын
1) In this case, the 3 samples is all the data we have, so we reuse them for every iteration. If we had more data, we might have different samples in different batches, but we would eventually reuse these samples at some later iteration. 2) No. You just add up the cross entropy, regardless of how the samples are distributed, to get the total.
@jamasica5839
@jamasica5839 3 жыл бұрын
This is even more bonkers than Backpropagation Details Pt. 2 :O
@statquest
@statquest 3 жыл бұрын
double bam! :)
@charliemcgowan598
@charliemcgowan598 3 жыл бұрын
Thank you so much for all your videos, they're actually amazing!
@statquest
@statquest 3 жыл бұрын
Glad you like them!
@samore11
@samore11 Жыл бұрын
These videos are so good - the explanations and quality of production are elite. My only nitpick was it is hard for me to see "x" and not think the letter "x" as opposed to a multiplication sign - but that's a small nitpick.
@statquest
@statquest Жыл бұрын
After years of using confusing 'x's in my videos, I've finally figured out how to get a proper multiplication sign.
@shreeshdhavle25
@shreeshdhavle25 3 жыл бұрын
Finally was waiting for new video so long...
@statquest
@statquest 3 жыл бұрын
Thanks!
@shreeshdhavle25
@shreeshdhavle25 3 жыл бұрын
@@statquest Thanks to you Josh..... Best content in the whole world.... Also thanks to you and your content I am working in Deloitte now.
@statquest
@statquest 3 жыл бұрын
@@shreeshdhavle25 Wow! That is awesome news! Congratulations!!!
@gabrielsantos19
@gabrielsantos19 3 ай бұрын
Thank you, Josh! 👍
@statquest
@statquest 3 ай бұрын
My pleasure!
@r0cketRacoon
@r0cketRacoon 9 ай бұрын
Thank you very much for the video Backpropagation with multiple outputs to me is not that hard but it's really a mess when do the computations
@statquest
@statquest 9 ай бұрын
Yep. The good news is that PyTorch will do all that for us.
@rahulkumarjha2404
@rahulkumarjha2404 2 жыл бұрын
Thank you for such an awesome video!!! I just have one doubt. At 18:12 of the video. The summation has 3 values because there are 3 items in the dataset. Let's say if we have 4 items in the dataset i.e 2 items of setosa, 1 for virginica and 1 for versicolor. So our summation will look like {(psetosa - 1) + (psetosa - 1) + psetosa + psetosa} i.e the summation is for the data setosadata_row1 , setosadata_row2, versicolordata_row3, virginicadata_row4 Am I right?
@statquest
@statquest 2 жыл бұрын
yep
@rahulkumarjha2404
@rahulkumarjha2404 2 жыл бұрын
@@statquest Thank You!! Your entire neural network playlist is awesome.
@statquest
@statquest 2 жыл бұрын
@@rahulkumarjha2404 Hooray! Thank you!
@dianaayt
@dianaayt 9 ай бұрын
20:14 if we have a lot more training data we would just add all the training data we have in this to make the backpropagation?
@statquest
@statquest 9 ай бұрын
Yes, or we can put the data into smaller "batches" and process the data with batches (so, if we had 10 batches, each with 50 samples each, we would only add up the 50 values in a batch before updating the parameters).
@r0cketRacoon
@r0cketRacoon 9 ай бұрын
there are some methods like mini-batch gradient descent, and stochastic gradient descent, u should do some diggings about it
@Waffano
@Waffano 2 жыл бұрын
Watching these videos makes me wonder how in the world someone came up with this in the first place. I guess it slowly evolved from something more simple, but still, would be cool to learn more about the history of neural networks :O If anyone knows of any documentaries or books please do share ;)
@statquest
@statquest 2 жыл бұрын
A history would be nice.
@osamahabdullah3715
@osamahabdullah3715 3 жыл бұрын
I really can't give enough from your videos, what an amazing way of explanation , thanks for sharing your knowlege with us, when is gonna be your next videos plz ?
@statquest
@statquest 3 жыл бұрын
My next video should come out in about 24 hours.
@osamahabdullah3715
@osamahabdullah3715 3 жыл бұрын
@@statquest what a wonderful news, thank you sir
@madankhatri7727
@madankhatri7727 11 ай бұрын
Your explaination of hard concepts are pretty amazing. I have been stuck in a very difficult concept called adam optimizer. Please explain it. You are my last hope.
@epiccabbage6530
@epiccabbage6530 Жыл бұрын
This has been extremely helpful, this series is great. I am a little confused though as too why we repeat the calculations for p.setosa, i.e. why we cant simply run through the calculations once, and use the same p.setosa value 3 times (So like, x-1 + x + x) and use that for the bias recalculation. But either way this has cleared up a lot for me
@statquest
@statquest Жыл бұрын
What time point, minutes and seconds, are you asking about?(unfortunately I can't remember all of the details in all of my videos)
@epiccabbage6530
@epiccabbage6530 Жыл бұрын
@@statquest starting at 18:50, you go through three different observations and solve for the cross entropy. I'm curious as too why you need to look at three different observations, i.e. why you need to plug in values 3 times instead of just doing it once. If we want to solve for psetosa twice and psetosa-1 once, why do we need to do the equation three times, instead of just doing it once? Why can't we just do 0.15-1 + 0.15 + 0.15
@statquest
@statquest Жыл бұрын
@@epiccabbage6530 Because each time the predictions are made using different values for the petal and sepal widths. So we take that into account for each prediction and each derivative relative to that prediction.
@epiccabbage6530
@epiccabbage6530 Жыл бұрын
@@statquest Right, but why do we look at multiple predictions in the context of changing the bias once? Is it just a matter of batch size?
@statquest
@statquest Жыл бұрын
@@epiccabbage6530 Yes, in this example, we use the entire dataset (3 rows) as a "batch". You can either look at them all at once, or you can look at them one at a time, but either way, you end up looking at all of them.
@stan-15
@stan-15 2 жыл бұрын
since you used 3 sample data to get the value of the three cross-entropy derivitives, does this mean we must use multiple inputs for one gradient descent step when using cross-entropy? (more precisely, does this mean we have to use n input samples, that each light up all n features of the outputs, in order to be able to compute the appropriate derivative of the bias, and thus in order to perform one single gradient descent step?)
@statquest
@statquest 2 жыл бұрын
No. You can use 1 input if you want. I just wanted to illustrated all 3 cases.
@shubhamtalks9718
@shubhamtalks9718 3 жыл бұрын
BAM! Clearly explained.
@statquest
@statquest 3 жыл бұрын
Thanks!
@hangchen
@hangchen 10 ай бұрын
Awesome explanation! Now I understand neural networks in more depth! Just one question - shouldn't the output of the softmax values sum to 1? @18:57
@statquest
@statquest 10 ай бұрын
Thanks! And yes, the output of the softmax should sum to 1. However, I rounded the numbers to the nearest 100th and, as a result, it appears like they don't sum to 1. This is just a rounding issue.
@hangchen
@hangchen 10 ай бұрын
Oh got it! Right if I add them up they are 1.01, which is basically 1. I just eyeballed it. Should have done a quick mind calc haha! By the way, I am so honored to have your reply!! Thanks for making my day (again, BAM!)!@@statquest
@statquest
@statquest 10 ай бұрын
@@hangchen :)
@Pedritox0953
@Pedritox0953 3 жыл бұрын
Great explanation
@statquest
@statquest 3 жыл бұрын
Bam! :)
@pedrojulianmirsky1153
@pedrojulianmirsky1153 2 жыл бұрын
Thank you for all your videos, you are the best! I have one question though. Lets suppose you have the worst possible fit for your model, where it predicts pSetosa = 0 for instances labeled Setosa, and pSetosa = 1 for those labeled either Virginica or Versicolor. Then, for each Setosa labeled instance, you would get dCESetosa/db3 = pSetosa - 1 = -1, and for each nonSetosa labeled instance dCEVersiOrVirg/db3 = pSetosa = +1. In summary, the total dCE/db3 would be accumulating either +1 for each Setosa instance and -1 for each non Setosa. So, if you have for example a dataset with 5 Setosa, 2 Versicolor and 3 Virginca: dCE(total)/db3 = (1+1+1+1+1) + (-1 -1) +(-1 -1 -1) = 5-2-3 = 0. The total dCE/db3 would be 0, as if the model had the best fit for b3. Because of this compensation between the opposite signs (+) and (-), the weight (b3) wouldn´t be adjusted by gradient descent, even though the model classifies badly. Or maybe I missunderstood something haha. Anyways, I got into ML and DL mainly because of your videos, can't thank you enough!!!!!!!
@statquest
@statquest 2 жыл бұрын
To be honest, I don't think that is possible because of how the softmax function works. For example, if it was known that the sample was setosa, but the output value was 0, then we would have e^0 / (e^0 + e^versi + e^virg) = 1 / (1 + e^versi + e^vrig) > 0.
@ariq_baze4725
@ariq_baze4725 2 жыл бұрын
Thank you, you are the best
@statquest
@statquest 2 жыл бұрын
Thanks!
@yuewang3962
@yuewang3962 3 жыл бұрын
Caught a fresh one
@statquest
@statquest 3 жыл бұрын
:)
@АлександраРыбинская-п3л
@АлександраРыбинская-п3л Жыл бұрын
Dear Josh, I adore your lessons! They make everything so clear! I have a small question regarding this video. Why do you say that the predicted species is setosa when the predicted probability for setosa is only 0.15 (17:13 - 17:20)? There is larger value (0.46) for virginica in this case (17:14). Why don't we say it's virginica?
@statquest
@statquest Жыл бұрын
You are correct that virginica has the largest output value - however, because we know that the first row of data is for setosa, for that row, we are only interested in the predicted probability for setosa. This gives us the "loss" (the difference between the known value setosa, 1, and the predicted value for setosa, 0.1 (except in this case we using logs)) for that first row. For the second row, the known value is virginica, so, for that row, we are only interested in the predicted probability for virginica.
@АлександраРыбинская-п3л
@АлександраРыбинская-п3л Жыл бұрын
Thanks@@statquest
@marahakermi-nt7lc
@marahakermi-nt7lc 3 ай бұрын
heyy josh i think there is a mistake in the video at 18:54 if the predicted value is setosa i think the correspanding raw output of setosa and also the probability should be the biggest isnt that right ?
@statquest
@statquest 3 ай бұрын
The video is correct. At that time point the weights in the model are not yet full trained - so the predictions are not great, as you see. The goal of this example is to use backpropagation to improve the predictions.
@marahakermi-nt7lc
@marahakermi-nt7lc 3 ай бұрын
@@statquest i m sorry jash my bad you are brillant man baaaaaam
@wuzecorporation6441
@wuzecorporation6441 Жыл бұрын
18:04 Why are we taking sum of gradient of cross entropy across different data points? Won't it be better if we take gradient for one data point and do back propagation and then take gradient of another data point to do backpropagation?
@statquest
@statquest Жыл бұрын
You can certainly do backpropagation using one data point at a time. However, in practice, it's usually much more efficient to do it in batches, which is what we do here.
@sanjanamishra3684
@sanjanamishra3684 11 ай бұрын
@@statquest `thanks for the great series! I had a similar doubt regarding this. I understand the point of processing in batches and taking a batch wise loss but what I can't wrap my head around is why we need to have datapoints that predict all the three categories i.e. setosa, virginica and versicolor? Does this mean that in practice we have to ensure that each batch covers all the data points i.e. a classic data imbalance problem? I normally thought that ensuring data imbalance overall in the dataset is enough. Please clarify this, thanks!
@statquest
@statquest 11 ай бұрын
@@sanjanamishra3684 Who said you needed data points that predict all 3 species?
@nonalcoho
@nonalcoho 3 жыл бұрын
It is really easy to understand even I am not good at calculus. And I got the answer that I asked you what's the meaning of the derivative of softmax in the last video. I am really so happy! Btw, will you make more programming lessons like you made before~? Thank you very much!
@statquest
@statquest 3 жыл бұрын
I hope to do a "hands on" webinar for neural networks soon.
@nonalcoho
@nonalcoho 3 жыл бұрын
@@statquest looking forward to it!
@andredahlinger6943
@andredahlinger6943 2 жыл бұрын
Hey Josh, awesome videos
@statquest
@statquest 2 жыл бұрын
I think the idea is to optimize for whatever your output ultimately ends up being.
@zahari_s_stoyanov
@zahari_s_stoyanov 2 жыл бұрын
I think he said that this optimization is done instead of, not after SSR. Rather than calculating SSR and dSSR , we go another step further by using softMax, then calculate CE and dCE, which puts the final answers between 0.0 and 1.0 and also provides simpler calculations for backprop :)
@ecotrix132
@ecotrix132 11 ай бұрын
Thanks so much for posting these videos! I am curious about this : while using gradient descent for SSR one could get stuck at local minimum. One shouldnt face this problem with cross entropy right?
@statquest
@statquest 11 ай бұрын
No, you can always get stuck in a local minimum.
@Tapsthequant
@Tapsthequant 3 жыл бұрын
So much gold in this one video, how did you select the learning rate of 1. In general how do you select learning rates? Do you have ways to dynamically alter the learning rate in gradient descent? Taking recommendations.
@statquest
@statquest 3 жыл бұрын
For this video I coded everything by hand and setting the learning rate to 1 worked fine and was super easy. However, in general, most implementations of gradient descent will dynamically change the learning rate for you - so it should not be something you have to worry about in practice.
@Tapsthequant
@Tapsthequant 3 жыл бұрын
Thank you 😊, you know I have been following this series and taking notes. I literally have a notebook. I also have Excel workbooks with implementations of the examples. I'm now at this video of CE, taking notes again. This is the softest landing I have had to a subject. Thank you 😊. Now how do I take this subject of Neural Networks, further after this series. I am learning informally. Thank you Josh Starmer,
@statquest
@statquest 3 жыл бұрын
@@Tapsthequant I think the next step is to learn about RNNs and LSTMs (types of neural networks). I'll have videos on those soon.
@ferdinandwehle2165
@ferdinandwehle2165 2 жыл бұрын
Hello Josh, your videos inspired me so much that I am trying to replicate the classification of the iris dataset. For my understanding, are the following statements true: 1) The weights between the blue/orange nodes and the three categorization outputs are calculated in the same fashion as the biases (B3, B4, B5) in the video, as there is only one chain rule “path”. 2) For weights and biases before the nodes there are multiple chain rule differentiation “paths” to the output: e.g. W1 can be linked to the output Setosa via the blue node, but could also be linked to the output Versicolour via the orange node; the path is irrelevant as long as the correct derivatives are used (especially concerning the SoftMax function). 3) Hence, this chain rule path is correct given a Setosa input: dCEsetosa/dW1 = (dCEsetosa/d”Psetosa”) x (d”Psetosa”/dRAWsetosa) x (dRAWsetosa/dY1) x (dY1/dX1) x (dX1/dW1) Thank you very much for your assistance and the more than helpful video. Ferdinand
@statquest
@statquest 2 жыл бұрын
I wish I had time to think about your question - but today is crazy busy so, unfortunately I can't help you. :(
@ferdinandwehle2165
@ferdinandwehle2165 2 жыл бұрын
@@statquest No worries. The essence of the question is: how to optimize W1? Maybe you could have a think about it on a calmer day (:
@statquest
@statquest 2 жыл бұрын
@@ferdinandwehle2165 Regardless of the details, I think you are on the right track. The w1 can be influenced by a lot more than b3 is.
@michaelyang3414
@michaelyang3414 5 ай бұрын
excellent work!!! could you make one more video to show how to do all the parameters at the same time.
@statquest
@statquest 5 ай бұрын
I show that for a simple neural network in this video: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@michaelyang3414
@michaelyang3414 5 ай бұрын
@@statquest Yes, I watched that video several times. Actually, I watched all 28 videos in your neural network/deep learning series several times. I am also a member and have bought your books. Thank you for your excellent work! But that video is just for one input and one output. Would you make another video to show how to handle multiple inputs and outputs, similar to the video you recommended?
@statquest
@statquest 5 ай бұрын
@@michaelyang3414 Thank you very much for your support! I really appreciate it. I'll keep that topic in mind.
@praveerparmar8157
@praveerparmar8157 3 жыл бұрын
Waiting for "Neural Networks in Python: from Start to Finish" :)
@statquest
@statquest 3 жыл бұрын
I'll start working on that soon.
@xian2708
@xian2708 3 жыл бұрын
Legend!
@مهیارجهانینسب
@مهیارجهانینسب 2 жыл бұрын
Awesome video. I really appreciate how you explain all these concepts in a fun way. I have a question in the previous video for softmax you said the value for predicted probabilities for classes is not reliable even though they correctly classify input data because of our random initial value for weights and biases. now by using cross entropy we basically multiply observed probability in the data set by log p and then optimize it. so Is the value of predicted probabilities for different classes of an input reliable. ?
@statquest
@statquest 2 жыл бұрын
To be clear, I didn't say that the output from softmax was not reliable, I just said that it should not be treated as a "probability" when interpreting the output.
@MADaniel717
@MADaniel717 3 жыл бұрын
If I want to find biases of other nodes, I just do the derivative with respect to them? What about the weights? Just became a member, you convinced me with these videos lol, congrats and thanks
@statquest
@statquest 3 жыл бұрын
Wow! Thank you for your support. For a demo of backpropagation, we start with one bias: kzbin.info/www/bejne/f3-ViaB4na5_qpY then we extend that to one bias and 2 weights: kzbin.info/www/bejne/n6rRY62adrGcn5o then we extend that to all biases and weights: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@MADaniel717
@MADaniel717 3 жыл бұрын
@@statquest Thanks Josh! Maybe I left it unnoticed. I meant for hidden layers' weighs and biases.
@statquest
@statquest 3 жыл бұрын
@@MADaniel717 Yes, those are covered in the links I provided in the last comment.
@aritahalder9397
@aritahalder9397 2 жыл бұрын
hi, do we have to consider the inputs as batches of setosa,versicolor and verginica?? what if while calculating the derivative of total CE we had 1st row setosa as well as the 2nd row setosa?? what will be the value for dCE(pred2)/db3??
@statquest
@statquest 2 жыл бұрын
We don't have to consider batches - we should be able to add up the losses from each sample for setosa.
@evilone1351
@evilone1351 2 жыл бұрын
Excellent series! Enjoyed every one of them so far, but that's the one where I lost it :) Too many subscripts and quotes in formulas.. Math has been abstracted too much here I guess, sometimes just a formula makes it easier to comprehend :D
@statquest
@statquest 2 жыл бұрын
noted
@minerodo
@minerodo Жыл бұрын
Thank you!! I understood everything but just a question: here you explain how to modify a single bias, and know I understand how to do it for each one of the biases. My question is how do you back propagate to the biases that are in the hidden layer ? In what moment ? After yo finish with b3, b4 and b5? Thanks!!
@statquest
@statquest Жыл бұрын
I show how to backpropagate through the hidden layer in this video: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@sonoVR
@sonoVR Жыл бұрын
This is really helpful! So am I right to assume that in the end, when using one hot encoding we can simplify it to d/dBn = Pn - Tn and d/dWni = (Pn - Tn)Xi ? Given n is the number of outputs, P is the prediction, T is the one hot encoded target, i is the number of inputs, Wni is the weight associated from that input to the respective output and X is the input. Then when backpropagating, we can transpose the weights, multiply the weights by the respective error of Pn -Tn in the output layer and sum them to get an error for each hidden node if I'm correct
@statquest
@statquest Жыл бұрын
For the Weight, things are a little more complicated because the input is modified by previous weights and biases and the activation function. For more details, see: kzbin.info/www/bejne/n6rRY62adrGcn5o
@a909ym0u7
@a909ym0u7 Ай бұрын
i have a question when we optimized b3 having constant b4 and b5 so when we try to optimize b4,b5 will this effects back b3 or what because there values now changing
@statquest
@statquest Ай бұрын
Regardless of the number of parameters you are estimating, you evaluate the derivatives with the current state of the neural network before updating all of the parameters. For more details, see: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@콘충이
@콘충이 3 жыл бұрын
Appreciated it so much!
@statquest
@statquest 3 жыл бұрын
bam! :)
@Xayuap
@Xayuap Жыл бұрын
yo, Josh, in my example, with two output if I adjust repeatedly one b, then the other b doesn't need almost any adjust. ¿should I adjust both in paralell?
@statquest
@statquest Жыл бұрын
Yes
@user-rt6wc9vt1p
@user-rt6wc9vt1p 3 жыл бұрын
Is the process for calculating derivatives in respect to weights and biases the same for each layer we backpropagate through? Or would the derivative chain be made up of more parts for certain layers?
@statquest
@statquest 3 жыл бұрын
If each layer is the same, then the process is the same.
@user-rt6wc9vt1p
@user-rt6wc9vt1p 3 жыл бұрын
great, thanks!
@muntedme203
@muntedme203 2 жыл бұрын
Awesome vid.
@statquest
@statquest 2 жыл бұрын
Thank you!
@hisyamzayd
@hisyamzayd 3 жыл бұрын
Thank you so much Mr. Josh, I wish I had this back time when I first learn neural networks. Let me ask question.. so the Cross Entropy must use batch processing a.k.a. multiple row/data for each training? Thank you
@statquest
@statquest 3 жыл бұрын
I don't think it requires batch processing.
@Waffano
@Waffano 2 жыл бұрын
Thanks for all these great videos Josh. They are a great resource for my thesis writing! I have a question about the intuition behind all this: Intutively it really doesnt make sense to me, why we need to include the error of the virginica and versicolor, when we are trying to optimize a value that only affects setosa? Would a correct intuition be: It is because they "indirectly" indicate how well the Setosa predictions are? In other words, because of Soft Plus, we will always get a probability of Setosa no matter what input we use? And then we might aswell use all the data, since more data = better models?Hope I didnt miss anything in the video that explains this!
@statquest
@statquest 2 жыл бұрын
To be honest, I'm exactly sure what time point (minutes and seconds) in the video you are asking about. However, in the examples we are solving for the derivatives with respect to the bias b3, which only affects the output value for Sentosa. We want that output value to be very high when we are classifying samples that are known to be setosa and we want that output value to be very low when we are classifying samples that are known to be some other species. And, because we want it high in one case and low in all other, we need to take all cases into account.
@Waffano
@Waffano 2 жыл бұрын
@@statquest Thank you very much!
@zedchouZ2ed
@zedchouZ2ed 2 ай бұрын
at the end of this video,backpropagation algorithm use Batch Gradient Descent to update the b3,which means using the whole dataset to update one weight or biases .If we only use one sample then it would be SGD and if we have more data and split them into minibatchs and input them one by one ,it would be mini-batch Gradient Descent.Am I right about this training strategy?
@statquest
@statquest 2 ай бұрын
yep
@beshosamir8978
@beshosamir8978 2 жыл бұрын
Hi Josh, I have a quick question ,i saw a video on KZbin the man who was explained the concept said they use segmoid function in output layer for a binary classification and RelU for hiddens layers , So,i think we fall in the same problem here which is the gradient of the Segmoid Function is too small which is make us ends with take a small step , so i thought about it which we can use Croos entropy also in this situation Right ?
@statquest
@statquest 2 жыл бұрын
I'm not sure I fully understand your question, any time you have more than one categories, you can use cross entropy.
@beshosamir8978
@beshosamir8978 2 жыл бұрын
@@statquest I mean can i use cross entropy with Binary classification ?
@statquest
@statquest 2 жыл бұрын
@@beshosamir8978 Yes.
@beshosamir8978
@beshosamir8978 2 жыл бұрын
@@statquest So, it is smart to use it in a Binary classification problem ? Or it is better to use just Segmoid function in output layer?
@lokeshbansal2726
@lokeshbansal2726 3 жыл бұрын
Thank you so much! You are making some amazing content. Can you please suggest some good book for Neural Networks in which mathematics of algorithms is explained or can you please tell from where you are learning about machine learning and neural networks. Again thankyou for these precious videos.
@statquest
@statquest 3 жыл бұрын
Here's where I learned about the math behind cross entropy: www.mldawn.com/back-propagation-with-cross-entropy-and-softmax/ (by the way, I didn't watch the video - I just read the web page).
@sachinK-k5q
@sachinK-k5q 9 ай бұрын
please create one such Series for single layer Perceptron as well and show the derivative as well
@statquest
@statquest 9 ай бұрын
I'll keep that in mind.
@environmentalchemist1812
@environmentalchemist1812 3 жыл бұрын
Some topic suggestions: Could you go over the distinction between PCA and Factor Analysis, and describe the different factor rotations (orthogonal vs oblique, varimax, quartimax, equimax, oblimin, etc)?
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@grankoczsk
@grankoczsk 3 жыл бұрын
Thank you so much
@statquest
@statquest 3 жыл бұрын
Thanks!
@user-rt6wc9vt1p
@user-rt6wc9vt1p 3 жыл бұрын
Are we calculating the derivative of the total cost function (ex - log(a) - log(b) - log(c)), or just the loss for that respective weight's output?
@statquest
@statquest 3 жыл бұрын
We are calculating the derivative of the total cross entropy with respect to the bias, b3.
@shark-p4o
@shark-p4o 11 ай бұрын
what's the difference between Softplus and Softmax ? Is it only about the softness of the toilet paper ? 🤣🤣🤣 just kidding, you do an awesome job, your videos are way above everybody else in ML / DL
@statquest
@statquest 11 ай бұрын
Thank you very much!
@lancelofjohn6995
@lancelofjohn6995 3 жыл бұрын
Bam, this is a nice video.
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@neelkamal3357
@neelkamal3357 Ай бұрын
thanks a lot sir
@statquest
@statquest Ай бұрын
:)
@danielsimion3021
@danielsimion3021 4 ай бұрын
What about the derivatives with the inner w like w1 or w2, before entering in the ReLU function? Cause for example w1 affects all the 3 raw output values unlike b3 that affects only the first raw output.
@statquest
@statquest 4 ай бұрын
See: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@danielsimion3021
@danielsimion3021 4 ай бұрын
@@statquest thanks for ur answer, I've already seen that video; my problem is that w1 affects all the 3 raw datas, so when u do the the derivative of predicted probability respect to raw data, wich raw data should u use , setosa, virginica or versicolor? Whichever u choose u will get back to w1, because setosa raw, virginica raw and versicolor raw, all have w1 in their expression.
@statquest
@statquest 4 ай бұрын
@@danielsimion3021 You use them all.
@danielsimion3021
@danielsimion3021 4 ай бұрын
@@statquest ok; i did it with pen and paper and finally understood. Thank u very much.
@statquest
@statquest 4 ай бұрын
@@danielsimion3021 bam! :)
@jaheimwoo866
@jaheimwoo866 Жыл бұрын
Save my university life!
@statquest
@statquest Жыл бұрын
bam!
@_epe2590
@_epe2590 3 жыл бұрын
Please could you do videos on classification specificly gradient descent for classification.
@statquest
@statquest 3 жыл бұрын
Can you explain how that would be different from what is in this video? In this video, we use gradient descent to optimize the bias term. In neural network circles, they call this "backpropagation" because of how the derivatives are calculated, but it is still just gradient descent.
@_epe2590
@_epe2590 3 жыл бұрын
@@statquest Well when I see others explaining it its usually with a 3 dimention nnon linear graph. When you demo it the graph always looks like a parabloa. Am I missing something important?
@statquest
@statquest 3 жыл бұрын
@@_epe2590 When I demo it, I try to make it as simple as possible by focusing on just one variable at a time. When you do that, you can often draw the loss function as a parabola. However, when you focus on more than one variable, the graphs get much more complicated.
@_epe2590
@_epe2590 3 жыл бұрын
@@statquest Ok. And I love you videos by the way. They are easy to understand and to absorb it all. BAM!
@dr.osamahabdullah1390
@dr.osamahabdullah1390 3 жыл бұрын
Is there any chance to talk about deep leaning or compressive sensing plz; your videos are so awesome
@statquest
@statquest 3 жыл бұрын
Deep learning is a pretty vague term. For some, deep learning just means a neural network with 3 or more hidden layers. For others, deep learning refers to a convolutional neural network. I explain CNNs in this video: kzbin.info/www/bejne/fnjac4t6gKueb6s
@kamshwuchin6907
@kamshwuchin6907 3 жыл бұрын
Thank you for the efforts in making these amazing videos!! It helps me alot in visualising the concepts. Can you make a video about information gain too? Thank you!!
@statquest
@statquest 3 жыл бұрын
I'll keep that in mind.
@raminmdn
@raminmdn 3 жыл бұрын
@@statquest I think videos on general concepts of information theory (such as information gain) would be greatly beneficial for many many people out there, and a very nice addition to the machine learning series. I have not been able to find such comprehensive (and at the same time clearly explained) videos as yours anywhere on KZbin or online courses, specifically when it comes to ideas as concepts that usually seem much complicated.
@ΓάκηςΓεώργιος
@ΓάκηςΓεώργιος 3 жыл бұрын
Nice video! I only have one question How i do it when there is more than 3 data (for example there is, n for setosa ,m for virginica , k for versicolor)
@statquest
@statquest 3 жыл бұрын
You just run all the data through the neural network, as shown at 17:04, to calculate the cross entropy etc.
@ΓάκηςΓεώργιος
@ΓάκηςΓεώργιος 3 жыл бұрын
Thank you a lot for your help Josh
@Xayuap
@Xayuap Жыл бұрын
hi, serious question, ¿can I do the same with the final w weights? something is not converging in the tests.
@statquest
@statquest Жыл бұрын
What time point, minutes and seconds, are you asking about?
@Xayuap
@Xayuap Жыл бұрын
​@@statquest I mean the cross entropy adjusting for b bias. ¿can I do the same for the w weights? I understand that the cross entropy derivatives with respect of the final weights when the mesure is setosa to be dCe/dWyi = Psetosa × Yi and dCe/dWyi = (Psetosa - 1) × Yi when Yi is the Y component exit of the previous box.
@statquest
@statquest Жыл бұрын
@@Xayuap I believe that is correct.
@Xayuap
@Xayuap Жыл бұрын
thanks, well, if that is correct then maybe my writes are off, when I try to adjust both W the derivatives converge to integer numbers others than 0. I'm not adjusting the B bias, only the final Ws
@rhn122
@rhn122 3 жыл бұрын
Hey cool video, though I actually haven't fully watched your neural network playlists, just want to keep things simple with traditional statistics for now hehe! But I want to ask you about all these steps and formulas, do you actually always have in mind all of these methods and calculations, or only keep the essential parts and their ups & downs when actually solving practical problems? Because I love statistics, but can never fully commit myself to be in one with the calculation steps. I watched your videos to understand the under the hood process, but only keep the essential parts like why it works and its pitfalls, and leaving behind all the calculation tricks.
@rhn122
@rhn122 3 жыл бұрын
As a note, I think understanding the process is crucial to fully understand its strengths and weaknesses, but for the actual formula most of the time if it's too complicated I'll just delegate it to the computer to be processed
@statquest
@statquest 3 жыл бұрын
It's perfectly fine to ignore the details and just focus on the main ideas.
@tulikashrivastava2905
@tulikashrivastava2905 3 жыл бұрын
Thanks for posting the NN video series. It was just in time when I needed it 😊 You have the knack to split complex topics into logical parts explain them like a breeze😀😀 Can I request you to share some videos on Gradient Descent Optimisation and Regularization ?
@statquest
@statquest 3 жыл бұрын
I have two videos on Gradient Descent and five on Regularization. You can find all of my videos here: statquest.org/video-index/
@tulikashrivastava2905
@tulikashrivastava2905 3 жыл бұрын
@@statquest Thanks for your quick reply! I have seen those videos and they are great as usual 👍👍 I was requesting for Gradient descent optimisation with respect to Deep networks like Momentum, NAG, Adagrad, Adadelta, RMSProp, Adam and regularization techniques for Deep networks like weight decay, dropout, early stopping, data augmentation and batch normalization.
@statquest
@statquest 3 жыл бұрын
@@tulikashrivastava2905 Noted.
Neural Networks Part 8: Image Classification with Convolutional Neural Networks (CNNs)
15:24
Recurrent Neural Networks (RNNs), Clearly Explained!!!
16:37
StatQuest with Josh Starmer
Рет қаралды 619 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 59 МЛН
Google Gemini 2.0: Welcome to the Golden Age of Agentic of AI
14:15
Gradient Descent, Step-by-Step
23:54
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Neural Networks Part 6: Cross Entropy
9:31
StatQuest with Josh Starmer
Рет қаралды 259 М.
A Short Introduction to Entropy, Cross-Entropy and KL-Divergence
10:41
Aurélien Géron
Рет қаралды 357 М.
Cross Entropy Loss Error Function - ML for beginners!
11:15
Python Simplified
Рет қаралды 40 М.
The Key Equation Behind Probability
26:24
Artem Kirsanov
Рет қаралды 156 М.
Long Short-Term Memory (LSTM), Clearly Explained
20:45
StatQuest with Josh Starmer
Рет қаралды 624 М.
Entropy (for data science) Clearly Explained!!!
16:35
StatQuest with Josh Starmer
Рет қаралды 641 М.
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Рет қаралды 770 М.
Word Embedding and Word2Vec, Clearly Explained!!!
16:12
StatQuest with Josh Starmer
Рет қаралды 350 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 59 МЛН