The full Neural Networks playlist, from the basics to deep learning, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@bigbangdata3 жыл бұрын
Your talent for explaining these difficult concepts and organizing the topics in didactic, bite-sized, and visually compelling videos is astounding. Your channel is a great resource for beginners and advanced practitioners who need a refresher on a particular concept. Thank you for all that you do!
@statquest3 жыл бұрын
Wow, thank you!
@Rationalist-Forever2 жыл бұрын
Right now I am reading the ML book "An Introduction to Statistical Learning" by James, Witten, Hastie and Tibshirani. Many a times, I stuck at the mathematical details and could not comprehend and stopped reading. Although I love that book a lot but felt frustrated. But now I use your videos and read the book side by side and now everything start making sense in the book. You are such a great story teller. They way you explains in the video with examples, it seems like I am listening to a story "There was a king ..." It is so soothing and complex topics become easy. I feel you are my friend and teacher in my ML journey who understand my pain, and explains me the hard things with ease. BTW, I have done Master in Data Science from Northwestern University and got good ML foundation from that course. But I can tell you now I feel complete after going through most of your videos. Mr. Starmer, we are lucky to have you as such a great teacher and mentor. You are gifted to teach people. I will pledge to support your channel from my heart. Thank you.
@statquest2 жыл бұрын
Wow! Thank you very much!!! :)
@wennie29393 жыл бұрын
Josh Starmer is THE BEST! I really appreciate your patience in explaining the concepts step-by-step!
@statquest3 жыл бұрын
Thank you very much! :)
@naf7540 Жыл бұрын
Dear Josh, how is it at all possible to deconstruct so clearly all these concepts, just incredible, thank you very much, your videos are addictive!!
@statquest Жыл бұрын
Thank you very much! :)
@RubenMartinezCuella3 жыл бұрын
Even though there are many other youtube channels that also explain NN, your videos are unique in the sense that you break down every single process into small operations easy to understand by anyone. Keep up the great work Josh, everyone here appreciates so much your effort!! :D
@statquest3 жыл бұрын
Thank you very much! :)
@simhasankar3311 Жыл бұрын
Imagine the leaps and bounds we could achieve in global education if this teaching method was implemented universally. We would have a plethora of students equipped with the analytical skills to tackle complex issues. Your contributions are invaluable. Thank you!
@statquest Жыл бұрын
Thank you so much!
@iZapz983 жыл бұрын
all your videos have helped me tremendously studying for my ML - exam, thank you
@statquest3 жыл бұрын
Great to hear!
@positive_freedom2 жыл бұрын
Your videos are truly astounding. I've gone through so many youtube playlists looking to understand Neural Networks, and none of them can come close to yours in terms of simplicity & content! Please keep up this amazing work for beginners like me :)
@statquest2 жыл бұрын
Glad you like them!
@YLprime10 ай бұрын
This channel is awesome, my deep learning knowledge is sky rocketing everyday.
@statquest10 ай бұрын
bam!
@Lucas-Camargos Жыл бұрын
This is the best Neural Networks example video I've ever seen.
@statquest Жыл бұрын
Thank you very much! :)
@AbdulWahab-mp4vn Жыл бұрын
WOW ! I have never seen anyone explaining topics in such minute level detail. You are an angel to us Data Science Students ! Love from Pakistan
@statquest Жыл бұрын
Thank you very much!
@abhishekjadia17032 жыл бұрын
Incredible !! ...You are not teaching, You are revealing !!
@statquest2 жыл бұрын
Wow, thank you!
@salahaldeen17512 жыл бұрын
I don't know where else I could understand that like this. Thanks, you're talented!!!
@statquest2 жыл бұрын
Thanks!
@vishnukumar45312 жыл бұрын
0 comments left unreplied! Josh, you are truly one of a kind! ❣❣❣
@statquest2 жыл бұрын
Thanks!
@vishnukumar45312 жыл бұрын
@@statquest TRIPLE BAAAM!❤❤
@anisrabahbekhoukhe3652 Жыл бұрын
i literally cant stop from watching those vids, help me
@statquest Жыл бұрын
bam! :)
@farrukhzamir8 ай бұрын
Brilliantly explained. You explain the concept in such a manner that it becomes very easy to understand. God bless you. I don't know how to thank you really. Nobody explains like you.❤
@statquest8 ай бұрын
Thank you!
@saurabhdeshmane8714 Жыл бұрын
Incredibly done....it doesn't even feel like we are learning such complex topics...keeps me engaged for going via entire playlist..thank you for such content!!
@statquest Жыл бұрын
Glad you liked it!
@yourfavouritebubbletea5683 Жыл бұрын
Incredibly well done. I'm astonished and thank you for letting me not have a traumatic start with ML
@statquest Жыл бұрын
Thank you! :)
@ligezhang4735 Жыл бұрын
This is so impressive! Especially for the visualization of the whole process. It really makes things very easy and clear!
@statquest Жыл бұрын
Thank you!
@susmitvengurlekar2 жыл бұрын
"I want to remind you" helped me understand why in the world is P(setosa) involved in output of versicolor and virginica. Great explanation!
@statquest2 жыл бұрын
Hooray!!! I'm glad the video was helpful.
@samerrkhann3 жыл бұрын
A huge appreciation for all the efforts you put. Thank you josh!
@statquest3 жыл бұрын
Thank you! :)
@tejaspatil39782 жыл бұрын
your way of learniing is on next level. thanks for having us this best sessions..
@statquest2 жыл бұрын
Thank you!
@johannesweber94106 ай бұрын
Nice Video! First I was a little confused (like always) but than I pluged your values and the exact structure of your Neural Network into my own small framework and compared the results. After I did this, i followed your instructions and implemented the backpropagation step-by-step. Thanks for the nice video!
@statquest6 ай бұрын
BAM!
@nabeelhasan65933 жыл бұрын
At last I am really thankful for all your hard effort you put in these videos immensely helped me in making a strong foundation in deeplearning
@statquest3 жыл бұрын
Thank you very much! :)
@pietrucc13 жыл бұрын
I started using the techniques of the machine learning from a little less than a month, I found this site and it helped me a lot, thank you very much !!
@statquest3 жыл бұрын
Thank you!
@rajpulapakura001 Жыл бұрын
Clearly and concisely explained! Thanks Josh! P.S. If you know your calculus, I would highly recommend trying to compute the derivatives yourself before seeing the solution - it helps a lot!
@statquest Жыл бұрын
bam! :)
@Meditator803 жыл бұрын
Thank you so much! It is so clear for explaining the calculation of Cross Entropy Derivative and how to use it in BP
@statquest3 жыл бұрын
Thank you very much! :)
@donfeto7636 Жыл бұрын
You are a national treasure BAAAM. Keep doing those video they are great.
@statquest Жыл бұрын
Thank you!
@sergeyryabov220011 ай бұрын
Thanks!
@statquest11 ай бұрын
TRIPLE BAM!!! Thank you so much for supporting StatQuest!!! :)
@chethanjjj3 жыл бұрын
@18:20 what i've been looking for awhile. thank you!
@statquest3 жыл бұрын
Bam! :)
@GamTinjintJiang2 жыл бұрын
wow~ your videos are so intuitive to me. What a precious deposits!
@statquest2 жыл бұрын
Thanks!
@Recordingization3 жыл бұрын
Thanks for nice lecture!I finally understand the derivative of cross Entropy and optimization of bias.
@statquest3 жыл бұрын
bam!
@RC4boumboum2 жыл бұрын
Your courses ara so good! Thanks a lot for your time :)
@statquest2 жыл бұрын
You're very welcome!
@KayYesYouTuber Жыл бұрын
So beautiful. Never seen anything like this!!!
@statquest Жыл бұрын
Thank you!
@arielcohen2280 Жыл бұрын
hate all the songs and the meaningless sound affects, but damn I have been trying to understand this concept for hell long of a time and you made it clear
@statquest Жыл бұрын
Noted!
@susmitvengurlekar2 жыл бұрын
There is nothing wrong in self promotion and frankly, you don't need promotion. Anyone who watches any one video of yours, will prefer your videos over any other videos henceforth.
@statquest2 жыл бұрын
Wow! Thank you!
@bonadio603 жыл бұрын
Your explanation is fantastic!! Thanks
@statquest3 жыл бұрын
Thank you! :)
@GLORYWAVE.10 ай бұрын
Thanks Josh for an incredibly well put together video. I have two quick questions: 1) When you initially get that new b3 value of -1.23, and then say to repeat the process, I am assuming the process is repeated with a new 'batch' of 3 training samples, correct? i.e. you wouldn't use the same 3 that were just used? 2) Are these multi-classification models always structured in such a way that each 'batch' or 'iteration' includes 1 actual observed sample from each class like in this example? It appears that the Total Cross Entropy calculation and derivatives would not make sense otherwise. Thanks again!
@statquest10 ай бұрын
1) In this case, the 3 samples is all the data we have, so we reuse them for every iteration. If we had more data, we might have different samples in different batches, but we would eventually reuse these samples at some later iteration. 2) No. You just add up the cross entropy, regardless of how the samples are distributed, to get the total.
@jamasica58393 жыл бұрын
This is even more bonkers than Backpropagation Details Pt. 2 :O
@statquest3 жыл бұрын
double bam! :)
@charliemcgowan5983 жыл бұрын
Thank you so much for all your videos, they're actually amazing!
@statquest3 жыл бұрын
Glad you like them!
@samore11 Жыл бұрын
These videos are so good - the explanations and quality of production are elite. My only nitpick was it is hard for me to see "x" and not think the letter "x" as opposed to a multiplication sign - but that's a small nitpick.
@statquest Жыл бұрын
After years of using confusing 'x's in my videos, I've finally figured out how to get a proper multiplication sign.
@shreeshdhavle253 жыл бұрын
Finally was waiting for new video so long...
@statquest3 жыл бұрын
Thanks!
@shreeshdhavle253 жыл бұрын
@@statquest Thanks to you Josh..... Best content in the whole world.... Also thanks to you and your content I am working in Deloitte now.
@statquest3 жыл бұрын
@@shreeshdhavle25 Wow! That is awesome news! Congratulations!!!
@gabrielsantos193 ай бұрын
Thank you, Josh! 👍
@statquest3 ай бұрын
My pleasure!
@r0cketRacoon9 ай бұрын
Thank you very much for the video Backpropagation with multiple outputs to me is not that hard but it's really a mess when do the computations
@statquest9 ай бұрын
Yep. The good news is that PyTorch will do all that for us.
@rahulkumarjha24042 жыл бұрын
Thank you for such an awesome video!!! I just have one doubt. At 18:12 of the video. The summation has 3 values because there are 3 items in the dataset. Let's say if we have 4 items in the dataset i.e 2 items of setosa, 1 for virginica and 1 for versicolor. So our summation will look like {(psetosa - 1) + (psetosa - 1) + psetosa + psetosa} i.e the summation is for the data setosadata_row1 , setosadata_row2, versicolordata_row3, virginicadata_row4 Am I right?
@statquest2 жыл бұрын
yep
@rahulkumarjha24042 жыл бұрын
@@statquest Thank You!! Your entire neural network playlist is awesome.
@statquest2 жыл бұрын
@@rahulkumarjha2404 Hooray! Thank you!
@dianaayt9 ай бұрын
20:14 if we have a lot more training data we would just add all the training data we have in this to make the backpropagation?
@statquest9 ай бұрын
Yes, or we can put the data into smaller "batches" and process the data with batches (so, if we had 10 batches, each with 50 samples each, we would only add up the 50 values in a batch before updating the parameters).
@r0cketRacoon9 ай бұрын
there are some methods like mini-batch gradient descent, and stochastic gradient descent, u should do some diggings about it
@Waffano2 жыл бұрын
Watching these videos makes me wonder how in the world someone came up with this in the first place. I guess it slowly evolved from something more simple, but still, would be cool to learn more about the history of neural networks :O If anyone knows of any documentaries or books please do share ;)
@statquest2 жыл бұрын
A history would be nice.
@osamahabdullah37153 жыл бұрын
I really can't give enough from your videos, what an amazing way of explanation , thanks for sharing your knowlege with us, when is gonna be your next videos plz ?
@statquest3 жыл бұрын
My next video should come out in about 24 hours.
@osamahabdullah37153 жыл бұрын
@@statquest what a wonderful news, thank you sir
@madankhatri772711 ай бұрын
Your explaination of hard concepts are pretty amazing. I have been stuck in a very difficult concept called adam optimizer. Please explain it. You are my last hope.
@epiccabbage6530 Жыл бұрын
This has been extremely helpful, this series is great. I am a little confused though as too why we repeat the calculations for p.setosa, i.e. why we cant simply run through the calculations once, and use the same p.setosa value 3 times (So like, x-1 + x + x) and use that for the bias recalculation. But either way this has cleared up a lot for me
@statquest Жыл бұрын
What time point, minutes and seconds, are you asking about?(unfortunately I can't remember all of the details in all of my videos)
@epiccabbage6530 Жыл бұрын
@@statquest starting at 18:50, you go through three different observations and solve for the cross entropy. I'm curious as too why you need to look at three different observations, i.e. why you need to plug in values 3 times instead of just doing it once. If we want to solve for psetosa twice and psetosa-1 once, why do we need to do the equation three times, instead of just doing it once? Why can't we just do 0.15-1 + 0.15 + 0.15
@statquest Жыл бұрын
@@epiccabbage6530 Because each time the predictions are made using different values for the petal and sepal widths. So we take that into account for each prediction and each derivative relative to that prediction.
@epiccabbage6530 Жыл бұрын
@@statquest Right, but why do we look at multiple predictions in the context of changing the bias once? Is it just a matter of batch size?
@statquest Жыл бұрын
@@epiccabbage6530 Yes, in this example, we use the entire dataset (3 rows) as a "batch". You can either look at them all at once, or you can look at them one at a time, but either way, you end up looking at all of them.
@stan-152 жыл бұрын
since you used 3 sample data to get the value of the three cross-entropy derivitives, does this mean we must use multiple inputs for one gradient descent step when using cross-entropy? (more precisely, does this mean we have to use n input samples, that each light up all n features of the outputs, in order to be able to compute the appropriate derivative of the bias, and thus in order to perform one single gradient descent step?)
@statquest2 жыл бұрын
No. You can use 1 input if you want. I just wanted to illustrated all 3 cases.
@shubhamtalks97183 жыл бұрын
BAM! Clearly explained.
@statquest3 жыл бұрын
Thanks!
@hangchen10 ай бұрын
Awesome explanation! Now I understand neural networks in more depth! Just one question - shouldn't the output of the softmax values sum to 1? @18:57
@statquest10 ай бұрын
Thanks! And yes, the output of the softmax should sum to 1. However, I rounded the numbers to the nearest 100th and, as a result, it appears like they don't sum to 1. This is just a rounding issue.
@hangchen10 ай бұрын
Oh got it! Right if I add them up they are 1.01, which is basically 1. I just eyeballed it. Should have done a quick mind calc haha! By the way, I am so honored to have your reply!! Thanks for making my day (again, BAM!)!@@statquest
@statquest10 ай бұрын
@@hangchen :)
@Pedritox09533 жыл бұрын
Great explanation
@statquest3 жыл бұрын
Bam! :)
@pedrojulianmirsky11532 жыл бұрын
Thank you for all your videos, you are the best! I have one question though. Lets suppose you have the worst possible fit for your model, where it predicts pSetosa = 0 for instances labeled Setosa, and pSetosa = 1 for those labeled either Virginica or Versicolor. Then, for each Setosa labeled instance, you would get dCESetosa/db3 = pSetosa - 1 = -1, and for each nonSetosa labeled instance dCEVersiOrVirg/db3 = pSetosa = +1. In summary, the total dCE/db3 would be accumulating either +1 for each Setosa instance and -1 for each non Setosa. So, if you have for example a dataset with 5 Setosa, 2 Versicolor and 3 Virginca: dCE(total)/db3 = (1+1+1+1+1) + (-1 -1) +(-1 -1 -1) = 5-2-3 = 0. The total dCE/db3 would be 0, as if the model had the best fit for b3. Because of this compensation between the opposite signs (+) and (-), the weight (b3) wouldn´t be adjusted by gradient descent, even though the model classifies badly. Or maybe I missunderstood something haha. Anyways, I got into ML and DL mainly because of your videos, can't thank you enough!!!!!!!
@statquest2 жыл бұрын
To be honest, I don't think that is possible because of how the softmax function works. For example, if it was known that the sample was setosa, but the output value was 0, then we would have e^0 / (e^0 + e^versi + e^virg) = 1 / (1 + e^versi + e^vrig) > 0.
@ariq_baze47252 жыл бұрын
Thank you, you are the best
@statquest2 жыл бұрын
Thanks!
@yuewang39623 жыл бұрын
Caught a fresh one
@statquest3 жыл бұрын
:)
@АлександраРыбинская-п3л Жыл бұрын
Dear Josh, I adore your lessons! They make everything so clear! I have a small question regarding this video. Why do you say that the predicted species is setosa when the predicted probability for setosa is only 0.15 (17:13 - 17:20)? There is larger value (0.46) for virginica in this case (17:14). Why don't we say it's virginica?
@statquest Жыл бұрын
You are correct that virginica has the largest output value - however, because we know that the first row of data is for setosa, for that row, we are only interested in the predicted probability for setosa. This gives us the "loss" (the difference between the known value setosa, 1, and the predicted value for setosa, 0.1 (except in this case we using logs)) for that first row. For the second row, the known value is virginica, so, for that row, we are only interested in the predicted probability for virginica.
@АлександраРыбинская-п3л Жыл бұрын
Thanks@@statquest
@marahakermi-nt7lc3 ай бұрын
heyy josh i think there is a mistake in the video at 18:54 if the predicted value is setosa i think the correspanding raw output of setosa and also the probability should be the biggest isnt that right ?
@statquest3 ай бұрын
The video is correct. At that time point the weights in the model are not yet full trained - so the predictions are not great, as you see. The goal of this example is to use backpropagation to improve the predictions.
@marahakermi-nt7lc3 ай бұрын
@@statquest i m sorry jash my bad you are brillant man baaaaaam
@wuzecorporation6441 Жыл бұрын
18:04 Why are we taking sum of gradient of cross entropy across different data points? Won't it be better if we take gradient for one data point and do back propagation and then take gradient of another data point to do backpropagation?
@statquest Жыл бұрын
You can certainly do backpropagation using one data point at a time. However, in practice, it's usually much more efficient to do it in batches, which is what we do here.
@sanjanamishra368411 ай бұрын
@@statquest `thanks for the great series! I had a similar doubt regarding this. I understand the point of processing in batches and taking a batch wise loss but what I can't wrap my head around is why we need to have datapoints that predict all the three categories i.e. setosa, virginica and versicolor? Does this mean that in practice we have to ensure that each batch covers all the data points i.e. a classic data imbalance problem? I normally thought that ensuring data imbalance overall in the dataset is enough. Please clarify this, thanks!
@statquest11 ай бұрын
@@sanjanamishra3684 Who said you needed data points that predict all 3 species?
@nonalcoho3 жыл бұрын
It is really easy to understand even I am not good at calculus. And I got the answer that I asked you what's the meaning of the derivative of softmax in the last video. I am really so happy! Btw, will you make more programming lessons like you made before~? Thank you very much!
@statquest3 жыл бұрын
I hope to do a "hands on" webinar for neural networks soon.
@nonalcoho3 жыл бұрын
@@statquest looking forward to it!
@andredahlinger69432 жыл бұрын
Hey Josh, awesome videos
@statquest2 жыл бұрын
I think the idea is to optimize for whatever your output ultimately ends up being.
@zahari_s_stoyanov2 жыл бұрын
I think he said that this optimization is done instead of, not after SSR. Rather than calculating SSR and dSSR , we go another step further by using softMax, then calculate CE and dCE, which puts the final answers between 0.0 and 1.0 and also provides simpler calculations for backprop :)
@ecotrix13211 ай бұрын
Thanks so much for posting these videos! I am curious about this : while using gradient descent for SSR one could get stuck at local minimum. One shouldnt face this problem with cross entropy right?
@statquest11 ай бұрын
No, you can always get stuck in a local minimum.
@Tapsthequant3 жыл бұрын
So much gold in this one video, how did you select the learning rate of 1. In general how do you select learning rates? Do you have ways to dynamically alter the learning rate in gradient descent? Taking recommendations.
@statquest3 жыл бұрын
For this video I coded everything by hand and setting the learning rate to 1 worked fine and was super easy. However, in general, most implementations of gradient descent will dynamically change the learning rate for you - so it should not be something you have to worry about in practice.
@Tapsthequant3 жыл бұрын
Thank you 😊, you know I have been following this series and taking notes. I literally have a notebook. I also have Excel workbooks with implementations of the examples. I'm now at this video of CE, taking notes again. This is the softest landing I have had to a subject. Thank you 😊. Now how do I take this subject of Neural Networks, further after this series. I am learning informally. Thank you Josh Starmer,
@statquest3 жыл бұрын
@@Tapsthequant I think the next step is to learn about RNNs and LSTMs (types of neural networks). I'll have videos on those soon.
@ferdinandwehle21652 жыл бұрын
Hello Josh, your videos inspired me so much that I am trying to replicate the classification of the iris dataset. For my understanding, are the following statements true: 1) The weights between the blue/orange nodes and the three categorization outputs are calculated in the same fashion as the biases (B3, B4, B5) in the video, as there is only one chain rule “path”. 2) For weights and biases before the nodes there are multiple chain rule differentiation “paths” to the output: e.g. W1 can be linked to the output Setosa via the blue node, but could also be linked to the output Versicolour via the orange node; the path is irrelevant as long as the correct derivatives are used (especially concerning the SoftMax function). 3) Hence, this chain rule path is correct given a Setosa input: dCEsetosa/dW1 = (dCEsetosa/d”Psetosa”) x (d”Psetosa”/dRAWsetosa) x (dRAWsetosa/dY1) x (dY1/dX1) x (dX1/dW1) Thank you very much for your assistance and the more than helpful video. Ferdinand
@statquest2 жыл бұрын
I wish I had time to think about your question - but today is crazy busy so, unfortunately I can't help you. :(
@ferdinandwehle21652 жыл бұрын
@@statquest No worries. The essence of the question is: how to optimize W1? Maybe you could have a think about it on a calmer day (:
@statquest2 жыл бұрын
@@ferdinandwehle2165 Regardless of the details, I think you are on the right track. The w1 can be influenced by a lot more than b3 is.
@michaelyang34145 ай бұрын
excellent work!!! could you make one more video to show how to do all the parameters at the same time.
@statquest5 ай бұрын
I show that for a simple neural network in this video: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@michaelyang34145 ай бұрын
@@statquest Yes, I watched that video several times. Actually, I watched all 28 videos in your neural network/deep learning series several times. I am also a member and have bought your books. Thank you for your excellent work! But that video is just for one input and one output. Would you make another video to show how to handle multiple inputs and outputs, similar to the video you recommended?
@statquest5 ай бұрын
@@michaelyang3414 Thank you very much for your support! I really appreciate it. I'll keep that topic in mind.
@praveerparmar81573 жыл бұрын
Waiting for "Neural Networks in Python: from Start to Finish" :)
@statquest3 жыл бұрын
I'll start working on that soon.
@xian27083 жыл бұрын
Legend!
@مهیارجهانینسب2 жыл бұрын
Awesome video. I really appreciate how you explain all these concepts in a fun way. I have a question in the previous video for softmax you said the value for predicted probabilities for classes is not reliable even though they correctly classify input data because of our random initial value for weights and biases. now by using cross entropy we basically multiply observed probability in the data set by log p and then optimize it. so Is the value of predicted probabilities for different classes of an input reliable. ?
@statquest2 жыл бұрын
To be clear, I didn't say that the output from softmax was not reliable, I just said that it should not be treated as a "probability" when interpreting the output.
@MADaniel7173 жыл бұрын
If I want to find biases of other nodes, I just do the derivative with respect to them? What about the weights? Just became a member, you convinced me with these videos lol, congrats and thanks
@statquest3 жыл бұрын
Wow! Thank you for your support. For a demo of backpropagation, we start with one bias: kzbin.info/www/bejne/f3-ViaB4na5_qpY then we extend that to one bias and 2 weights: kzbin.info/www/bejne/n6rRY62adrGcn5o then we extend that to all biases and weights: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@MADaniel7173 жыл бұрын
@@statquest Thanks Josh! Maybe I left it unnoticed. I meant for hidden layers' weighs and biases.
@statquest3 жыл бұрын
@@MADaniel717 Yes, those are covered in the links I provided in the last comment.
@aritahalder93972 жыл бұрын
hi, do we have to consider the inputs as batches of setosa,versicolor and verginica?? what if while calculating the derivative of total CE we had 1st row setosa as well as the 2nd row setosa?? what will be the value for dCE(pred2)/db3??
@statquest2 жыл бұрын
We don't have to consider batches - we should be able to add up the losses from each sample for setosa.
@evilone13512 жыл бұрын
Excellent series! Enjoyed every one of them so far, but that's the one where I lost it :) Too many subscripts and quotes in formulas.. Math has been abstracted too much here I guess, sometimes just a formula makes it easier to comprehend :D
@statquest2 жыл бұрын
noted
@minerodo Жыл бұрын
Thank you!! I understood everything but just a question: here you explain how to modify a single bias, and know I understand how to do it for each one of the biases. My question is how do you back propagate to the biases that are in the hidden layer ? In what moment ? After yo finish with b3, b4 and b5? Thanks!!
@statquest Жыл бұрын
I show how to backpropagate through the hidden layer in this video: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@sonoVR Жыл бұрын
This is really helpful! So am I right to assume that in the end, when using one hot encoding we can simplify it to d/dBn = Pn - Tn and d/dWni = (Pn - Tn)Xi ? Given n is the number of outputs, P is the prediction, T is the one hot encoded target, i is the number of inputs, Wni is the weight associated from that input to the respective output and X is the input. Then when backpropagating, we can transpose the weights, multiply the weights by the respective error of Pn -Tn in the output layer and sum them to get an error for each hidden node if I'm correct
@statquest Жыл бұрын
For the Weight, things are a little more complicated because the input is modified by previous weights and biases and the activation function. For more details, see: kzbin.info/www/bejne/n6rRY62adrGcn5o
@a909ym0u7Ай бұрын
i have a question when we optimized b3 having constant b4 and b5 so when we try to optimize b4,b5 will this effects back b3 or what because there values now changing
@statquestАй бұрын
Regardless of the number of parameters you are estimating, you evaluate the derivatives with the current state of the neural network before updating all of the parameters. For more details, see: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@콘충이3 жыл бұрын
Appreciated it so much!
@statquest3 жыл бұрын
bam! :)
@Xayuap Жыл бұрын
yo, Josh, in my example, with two output if I adjust repeatedly one b, then the other b doesn't need almost any adjust. ¿should I adjust both in paralell?
@statquest Жыл бұрын
Yes
@user-rt6wc9vt1p3 жыл бұрын
Is the process for calculating derivatives in respect to weights and biases the same for each layer we backpropagate through? Or would the derivative chain be made up of more parts for certain layers?
@statquest3 жыл бұрын
If each layer is the same, then the process is the same.
@user-rt6wc9vt1p3 жыл бұрын
great, thanks!
@muntedme2032 жыл бұрын
Awesome vid.
@statquest2 жыл бұрын
Thank you!
@hisyamzayd3 жыл бұрын
Thank you so much Mr. Josh, I wish I had this back time when I first learn neural networks. Let me ask question.. so the Cross Entropy must use batch processing a.k.a. multiple row/data for each training? Thank you
@statquest3 жыл бұрын
I don't think it requires batch processing.
@Waffano2 жыл бұрын
Thanks for all these great videos Josh. They are a great resource for my thesis writing! I have a question about the intuition behind all this: Intutively it really doesnt make sense to me, why we need to include the error of the virginica and versicolor, when we are trying to optimize a value that only affects setosa? Would a correct intuition be: It is because they "indirectly" indicate how well the Setosa predictions are? In other words, because of Soft Plus, we will always get a probability of Setosa no matter what input we use? And then we might aswell use all the data, since more data = better models?Hope I didnt miss anything in the video that explains this!
@statquest2 жыл бұрын
To be honest, I'm exactly sure what time point (minutes and seconds) in the video you are asking about. However, in the examples we are solving for the derivatives with respect to the bias b3, which only affects the output value for Sentosa. We want that output value to be very high when we are classifying samples that are known to be setosa and we want that output value to be very low when we are classifying samples that are known to be some other species. And, because we want it high in one case and low in all other, we need to take all cases into account.
@Waffano2 жыл бұрын
@@statquest Thank you very much!
@zedchouZ2ed2 ай бұрын
at the end of this video,backpropagation algorithm use Batch Gradient Descent to update the b3,which means using the whole dataset to update one weight or biases .If we only use one sample then it would be SGD and if we have more data and split them into minibatchs and input them one by one ,it would be mini-batch Gradient Descent.Am I right about this training strategy?
@statquest2 ай бұрын
yep
@beshosamir89782 жыл бұрын
Hi Josh, I have a quick question ,i saw a video on KZbin the man who was explained the concept said they use segmoid function in output layer for a binary classification and RelU for hiddens layers , So,i think we fall in the same problem here which is the gradient of the Segmoid Function is too small which is make us ends with take a small step , so i thought about it which we can use Croos entropy also in this situation Right ?
@statquest2 жыл бұрын
I'm not sure I fully understand your question, any time you have more than one categories, you can use cross entropy.
@beshosamir89782 жыл бұрын
@@statquest I mean can i use cross entropy with Binary classification ?
@statquest2 жыл бұрын
@@beshosamir8978 Yes.
@beshosamir89782 жыл бұрын
@@statquest So, it is smart to use it in a Binary classification problem ? Or it is better to use just Segmoid function in output layer?
@lokeshbansal27263 жыл бұрын
Thank you so much! You are making some amazing content. Can you please suggest some good book for Neural Networks in which mathematics of algorithms is explained or can you please tell from where you are learning about machine learning and neural networks. Again thankyou for these precious videos.
@statquest3 жыл бұрын
Here's where I learned about the math behind cross entropy: www.mldawn.com/back-propagation-with-cross-entropy-and-softmax/ (by the way, I didn't watch the video - I just read the web page).
@sachinK-k5q9 ай бұрын
please create one such Series for single layer Perceptron as well and show the derivative as well
@statquest9 ай бұрын
I'll keep that in mind.
@environmentalchemist18123 жыл бұрын
Some topic suggestions: Could you go over the distinction between PCA and Factor Analysis, and describe the different factor rotations (orthogonal vs oblique, varimax, quartimax, equimax, oblimin, etc)?
@statquest3 жыл бұрын
I'll keep that in mind.
@grankoczsk3 жыл бұрын
Thank you so much
@statquest3 жыл бұрын
Thanks!
@user-rt6wc9vt1p3 жыл бұрын
Are we calculating the derivative of the total cost function (ex - log(a) - log(b) - log(c)), or just the loss for that respective weight's output?
@statquest3 жыл бұрын
We are calculating the derivative of the total cross entropy with respect to the bias, b3.
@shark-p4o11 ай бұрын
what's the difference between Softplus and Softmax ? Is it only about the softness of the toilet paper ? 🤣🤣🤣 just kidding, you do an awesome job, your videos are way above everybody else in ML / DL
@statquest11 ай бұрын
Thank you very much!
@lancelofjohn69953 жыл бұрын
Bam, this is a nice video.
@statquest3 жыл бұрын
Thank you! :)
@neelkamal3357Ай бұрын
thanks a lot sir
@statquestАй бұрын
:)
@danielsimion30214 ай бұрын
What about the derivatives with the inner w like w1 or w2, before entering in the ReLU function? Cause for example w1 affects all the 3 raw output values unlike b3 that affects only the first raw output.
@statquest4 ай бұрын
See: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@danielsimion30214 ай бұрын
@@statquest thanks for ur answer, I've already seen that video; my problem is that w1 affects all the 3 raw datas, so when u do the the derivative of predicted probability respect to raw data, wich raw data should u use , setosa, virginica or versicolor? Whichever u choose u will get back to w1, because setosa raw, virginica raw and versicolor raw, all have w1 in their expression.
@statquest4 ай бұрын
@@danielsimion3021 You use them all.
@danielsimion30214 ай бұрын
@@statquest ok; i did it with pen and paper and finally understood. Thank u very much.
@statquest4 ай бұрын
@@danielsimion3021 bam! :)
@jaheimwoo866 Жыл бұрын
Save my university life!
@statquest Жыл бұрын
bam!
@_epe25903 жыл бұрын
Please could you do videos on classification specificly gradient descent for classification.
@statquest3 жыл бұрын
Can you explain how that would be different from what is in this video? In this video, we use gradient descent to optimize the bias term. In neural network circles, they call this "backpropagation" because of how the derivatives are calculated, but it is still just gradient descent.
@_epe25903 жыл бұрын
@@statquest Well when I see others explaining it its usually with a 3 dimention nnon linear graph. When you demo it the graph always looks like a parabloa. Am I missing something important?
@statquest3 жыл бұрын
@@_epe2590 When I demo it, I try to make it as simple as possible by focusing on just one variable at a time. When you do that, you can often draw the loss function as a parabola. However, when you focus on more than one variable, the graphs get much more complicated.
@_epe25903 жыл бұрын
@@statquest Ok. And I love you videos by the way. They are easy to understand and to absorb it all. BAM!
@dr.osamahabdullah13903 жыл бұрын
Is there any chance to talk about deep leaning or compressive sensing plz; your videos are so awesome
@statquest3 жыл бұрын
Deep learning is a pretty vague term. For some, deep learning just means a neural network with 3 or more hidden layers. For others, deep learning refers to a convolutional neural network. I explain CNNs in this video: kzbin.info/www/bejne/fnjac4t6gKueb6s
@kamshwuchin69073 жыл бұрын
Thank you for the efforts in making these amazing videos!! It helps me alot in visualising the concepts. Can you make a video about information gain too? Thank you!!
@statquest3 жыл бұрын
I'll keep that in mind.
@raminmdn3 жыл бұрын
@@statquest I think videos on general concepts of information theory (such as information gain) would be greatly beneficial for many many people out there, and a very nice addition to the machine learning series. I have not been able to find such comprehensive (and at the same time clearly explained) videos as yours anywhere on KZbin or online courses, specifically when it comes to ideas as concepts that usually seem much complicated.
@ΓάκηςΓεώργιος3 жыл бұрын
Nice video! I only have one question How i do it when there is more than 3 data (for example there is, n for setosa ,m for virginica , k for versicolor)
@statquest3 жыл бұрын
You just run all the data through the neural network, as shown at 17:04, to calculate the cross entropy etc.
@ΓάκηςΓεώργιος3 жыл бұрын
Thank you a lot for your help Josh
@Xayuap Жыл бұрын
hi, serious question, ¿can I do the same with the final w weights? something is not converging in the tests.
@statquest Жыл бұрын
What time point, minutes and seconds, are you asking about?
@Xayuap Жыл бұрын
@@statquest I mean the cross entropy adjusting for b bias. ¿can I do the same for the w weights? I understand that the cross entropy derivatives with respect of the final weights when the mesure is setosa to be dCe/dWyi = Psetosa × Yi and dCe/dWyi = (Psetosa - 1) × Yi when Yi is the Y component exit of the previous box.
@statquest Жыл бұрын
@@Xayuap I believe that is correct.
@Xayuap Жыл бұрын
thanks, well, if that is correct then maybe my writes are off, when I try to adjust both W the derivatives converge to integer numbers others than 0. I'm not adjusting the B bias, only the final Ws
@rhn1223 жыл бұрын
Hey cool video, though I actually haven't fully watched your neural network playlists, just want to keep things simple with traditional statistics for now hehe! But I want to ask you about all these steps and formulas, do you actually always have in mind all of these methods and calculations, or only keep the essential parts and their ups & downs when actually solving practical problems? Because I love statistics, but can never fully commit myself to be in one with the calculation steps. I watched your videos to understand the under the hood process, but only keep the essential parts like why it works and its pitfalls, and leaving behind all the calculation tricks.
@rhn1223 жыл бұрын
As a note, I think understanding the process is crucial to fully understand its strengths and weaknesses, but for the actual formula most of the time if it's too complicated I'll just delegate it to the computer to be processed
@statquest3 жыл бұрын
It's perfectly fine to ignore the details and just focus on the main ideas.
@tulikashrivastava29053 жыл бұрын
Thanks for posting the NN video series. It was just in time when I needed it 😊 You have the knack to split complex topics into logical parts explain them like a breeze😀😀 Can I request you to share some videos on Gradient Descent Optimisation and Regularization ?
@statquest3 жыл бұрын
I have two videos on Gradient Descent and five on Regularization. You can find all of my videos here: statquest.org/video-index/
@tulikashrivastava29053 жыл бұрын
@@statquest Thanks for your quick reply! I have seen those videos and they are great as usual 👍👍 I was requesting for Gradient descent optimisation with respect to Deep networks like Momentum, NAG, Adagrad, Adadelta, RMSProp, Adam and regularization techniques for Deep networks like weight decay, dropout, early stopping, data augmentation and batch normalization.