The SoftMax Derivative, Step-by-Step!!!

  Рет қаралды 87,128

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 101
@statquest
@statquest 3 жыл бұрын
The full Neural Networks playlist, from the basics to deep learning, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk Support StatQuest by buying my books The StatQuest Illustrated Guide to Machine Learning, The StatQuest Illustrated Guide to Neural Networks and AI, or a Study Guide or Merch!!! statquest.org/statquest-store/
@charlesrambo7845
@charlesrambo7845 4 жыл бұрын
The quotient rule: low-D-high minus high-D-low square the bottom and away we go! My teacher told me this 14 years ago, and I never forgot! Also, thanks for posting! I love these videos!
@statquest
@statquest 4 жыл бұрын
bam! :)
@JaviOrman
@JaviOrman 3 жыл бұрын
That’s a good teacher. Stealing this!
@RyleyTraverse
@RyleyTraverse 4 ай бұрын
My teacher used, "low-D-high minus high-D-low, square the bottom and put it below!" hahaha
@rogerwilcoshirley2270
@rogerwilcoshirley2270 2 ай бұрын
@@charlesrambo7845 its just as easy to reconstruct it as d(f*(1/g) )
@brianp9054
@brianp9054 2 жыл бұрын
I don't remember when was the last time that I subscribed to a KZbin channel but you got my subscription and my gratitude is what I can give you in exchange of this magnificent ,clear, SHORT and understandble video. THANKS!!!
@statquest
@statquest 2 жыл бұрын
Welcome aboard!
@JaviOrman
@JaviOrman 3 жыл бұрын
I literally spent 15 minutes trying to figure out this derivative while I put the (original) video on pause. As soon as I pressed to resume, you pointed to this explanation! I now officially consider myself “hard-core” :)
@xiaoyangwu9270
@xiaoyangwu9270 3 жыл бұрын
I am taking your NN series online to have that protective bubble of easily understandable memories of knowledge before my professor dive into the real "academic knowledge" lol Thanks for the video
@statquest
@statquest 3 жыл бұрын
Bam!
@renjiehu9231
@renjiehu9231 4 жыл бұрын
Josh, thanks for these great videos, it really helps a lot for me and so many others who like machine learning! I think you make great videos teaching people about the ideas, but I really hope there can be more videos on how to code under these knowledges. I think it will be wonderful if your videos combine coding and theory together.
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@omarthesolid
@omarthesolid 4 жыл бұрын
your videos helped me have a better understanding of ml in a simpler way than the literature. thank you and waiting for your next videos.
@statquest
@statquest 4 жыл бұрын
Thank you very much! The next videos should come out on Monday.
@davidzarandi9287
@davidzarandi9287 3 жыл бұрын
Thank you very much, I was struggling to understand the SoftMax derivative, and finally managed to understand.
@statquest
@statquest 3 жыл бұрын
Hooray! :)
@romellfudi
@romellfudi 4 жыл бұрын
Absolutely awesome,I love going ml over with this kind of videos.
@statquest
@statquest 4 жыл бұрын
More to come!
@dearbass1637
@dearbass1637 3 жыл бұрын
the step by step derivative explanation is good.
@statquest
@statquest 3 жыл бұрын
bam!
@sagarpatel6062
@sagarpatel6062 Жыл бұрын
Excellent Video Brother ❤ Really so addicted to your videos and the way you explain every topics.. Thanks Man ! 🙌
@statquest
@statquest Жыл бұрын
Thanks!
@sanskarshrivastava5193
@sanskarshrivastava5193 3 жыл бұрын
Statquest should be declared as Universal Treasure !
@statquest
@statquest 3 жыл бұрын
Triple bam! :)
@weipenghu4463
@weipenghu4463 9 ай бұрын
I love you used 'hard core' to describe guys watching this video.
@statquest
@statquest 9 ай бұрын
bam! :)
@Jupiter-Optimus-Maximus
@Jupiter-Optimus-Maximus Жыл бұрын
Of the many AI videos on KZbin, yours are definitely among the very best. I'm even getting used to your constant singing, all the time. 🤣
@statquest
@statquest Жыл бұрын
Wow, thank you!
@lasha-georgeds4552
@lasha-georgeds4552 2 жыл бұрын
Remarkable sense of humor :D Laughing while studying
@statquest
@statquest 2 жыл бұрын
bam! :)
@hugocontreras2848
@hugocontreras2848 Жыл бұрын
I see there is an error in the multiplication of -Psetosa and Pversicolor in 5:57, the correct value would be -0.069, not -0.07. Anyway, thank you for the video, it was most useful! you are doing a great favor to this world with these series
@statquest
@statquest Жыл бұрын
Thanks!
@hugocontreras2848
@hugocontreras2848 Жыл бұрын
my bad lol I was a little sleepy, you just rounded the value haha thank you for the content! :)
@MusobarMedia
@MusobarMedia Жыл бұрын
as always Josh, Thank you
@statquest
@statquest Жыл бұрын
Thanks!
@pperez1224
@pperez1224 3 жыл бұрын
Quotient rule : d/dx(U(x)/V(x)) = U(x)' V(x) - U(x) V'(x) / V(x)^2 , in our case , U is U (x,y,z ) so we use partial derivatives 3 variables : setosa , versicolor and virginica d( E(x) / E(x)+Constant ) is the first equation we derive. Constant= E(versicolor)+E(virginica ). With clever identifications , Josh has a simple final formulae :)
@statquest
@statquest 3 жыл бұрын
:)
@김은맛있서
@김은맛있서 3 жыл бұрын
you are great! and cute sometimes saying "Quotient Rule"!!!!
@statquest
@statquest 3 жыл бұрын
Thank you! 😃
@bangoshi
@bangoshi 2 жыл бұрын
I sincerely thanks to you
@statquest
@statquest 2 жыл бұрын
Thanks!
@usergoogle7102
@usergoogle7102 3 жыл бұрын
I'm not hardcore, the coursework is...Thank you for helping me out.
@statquest
@statquest 3 жыл бұрын
Good luck! :)
@mahmoudshehata3343
@mahmoudshehata3343 4 жыл бұрын
Great video. I wonder if we will reach the Gaussian processes regression in the quest soon?
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@OmarMohamed-uo4rw
@OmarMohamed-uo4rw 2 жыл бұрын
Thank you so much!
@statquest
@statquest 2 жыл бұрын
You're welcome!
@sachink9102
@sachink9102 11 ай бұрын
Excellent Series !!! . I think for Regression use case you can try a Different example. Drug dosage example looks like classification Problem and there we used SSR as loss function. Like iris Flowers Example was perfect for Classification Problem
@statquest
@statquest 11 ай бұрын
What time point, minutes and seconds, are you referring to?
@nonalcoho
@nonalcoho 4 жыл бұрын
Thank you for the derivative! I want to ask a question at 6:30 that after we calculated the 0.21/-0.07/-0.15, what is the next step of this network? I mean at which part will this network utilize 0.21/-0.07/-0.15? Thank you for reading my question!
@statquest
@statquest 4 жыл бұрын
These derivatives will come in handy when we do backpropagation with Cross Entropy. The videos on Cross Entropy will be out soon.
@nonalcoho
@nonalcoho 4 жыл бұрын
@@statquest I see! I never connect it to cross entropy!!! Thank you so much!!!
@danielo6413
@danielo6413 3 жыл бұрын
Hey great video! one question: when you talk about RAWsetosa, what is its value exactly? is it like x in case we were talking about exp(x)? Thanks !
@statquest
@statquest 3 жыл бұрын
Early on, at 0.49, I say that the "raw setosa" value is the "raw output value for setosa". In other words, "raw setosa" is the output value from the neural net for setosa before we apply softmax. For more details, see the video that introduces softmax: kzbin.info/www/bejne/gaGuoJpjgZ6pm8k
@Tapsthequant
@Tapsthequant 3 жыл бұрын
Hard-core StatQuest I am, StatQuest
@statquest
@statquest 3 жыл бұрын
bam! :)
@blago7daren
@blago7daren Жыл бұрын
Video is great! But why probability of setosa in commas? Doesn't softmax convert raw outputs in probabilities based on logits relativity?
@statquest
@statquest Жыл бұрын
The outputs from the SoftMax abide by the technical definition of probabilities, but the actual values are dependent on the random values we used to initialize the model, so they shouldn't be trusted in the same way that we might trust the probabilities that were derived from some statistical framework. To see an example of what I'm talking about, see: kzbin.info/www/bejne/gaGuoJpjgZ6pm8ksi=FlU4E3gH2M0UJsJP&t=489
@RezaAmeriPhD
@RezaAmeriPhD 4 жыл бұрын
Hi Josh! Could you please make a video on what metric to look at to evaluate whether a ML model is overfitting?
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@magtazeum4071
@magtazeum4071 4 жыл бұрын
I'm in love with Josh..
@statquest
@statquest 4 жыл бұрын
:)
@lekanswansons3646
@lekanswansons3646 3 жыл бұрын
So do you add all the 3 probabilities together when you get the derivatives assuming the results for all three predicted outcomes are matrices. Simply put do you need all 3 derivatives or just the derivative of the output you are predicting.
@lekanswansons3646
@lekanswansons3646 3 жыл бұрын
Also if setosa is the the correct prediction ie your yhat, what values represent the incorrect prediction in your code ( from what i understand setosa is when i=j) and the other two is when i != j
@lekanswansons3646
@lekanswansons3646 3 жыл бұрын
ok i guess correct prediction when i=j is yhat(1-yhat) and when i != j is -yhat * yhat. When i add or subtract this to the first function my predictions return nan so im kinda lost on what you do with the equation when i != j. Any help appreciated.
@statquest
@statquest 3 жыл бұрын
The next video in this series shows how these derivatives are used in practice. So, to answer your question, see: kzbin.info/www/bejne/rnOomWlsi56akNE
@lekanswansons3646
@lekanswansons3646 3 жыл бұрын
@@statquest thanks, gonna check it now.
@lekanswansons3646
@lekanswansons3646 3 жыл бұрын
@@statquest do the outputs need to be 'labeled'. For example my softmax outputs a one hot encoded vector of results [ awaywin, draw, homewin], but they arent lableled logically so if i want to differentiate with respect to each outcome would you say i need to select each element in the vector individually ex dwin/result = dwin/daway + daway/ddraw + ddraw/dhome + dhome/result. You can tell me to fuck off if im annoying you : )
@amnont8724
@amnont8724 Жыл бұрын
Hey Josh, why would we even want to find a derivative for the output layer, as you did with the SoftMax function?
@statquest
@statquest Жыл бұрын
We need the derivative to do backpropagation.
@amnont8724
@amnont8724 Жыл бұрын
@@statquest I understand, thank you :)
@navsegda2007
@navsegda2007 4 жыл бұрын
Hi! Thanks for this great video!! Do you have (or plan to have) a video on jacknife?
@statquest
@statquest 4 жыл бұрын
I'll keep it in mind. I have a video on bootstrap here (which is related): kzbin.info/www/bejne/n6SolJqleNKfhZI
@eligirl100
@eligirl100 4 жыл бұрын
Hey any chance you could do a video on missing data and multiple imputation methods ?
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@steelcitysi
@steelcitysi 4 жыл бұрын
Great video as usual! In one of your earlier videos you referred to Elements of Statistical Learning as the Bible of machine learning. This text is comparatively light on NNs. Do you have a Bible for NNs that you would recommend?
@statquest
@statquest 4 жыл бұрын
Not yet. I'm writing one, though. I hope for it to be out next year.
@feliciamarilyn
@feliciamarilyn 17 күн бұрын
Hi, I'm having trouble correlating the setosa and P setosa. What is rawSetosa? Is it e^A or A?? Pls somebody help
@statquest
@statquest 17 күн бұрын
"rawSetosa" is the value we calculate before calculating the softmax - so, it is one of the 3 input values for the softmax function.
@nikachachua5712
@nikachachua5712 2 жыл бұрын
can u do the backpropagation to this example pls ?
@statquest
@statquest 2 жыл бұрын
I show it in part 7 of this series. So, you need to see part 6 first... kzbin.info/www/bejne/bHLVhKypatZ7dw ...then part 7... kzbin.info/www/bejne/rnOomWlsi56akNE
@dipankarmandal9442
@dipankarmandal9442 2 жыл бұрын
Can you please make a tutorial on RNN, LSTM and RL?
@statquest
@statquest 2 жыл бұрын
I am working on them.
@dipankarmandal9442
@dipankarmandal9442 2 жыл бұрын
@@statquest Thank you so much Sir. You are a real Guru.
@easyBob100
@easyBob100 6 күн бұрын
"When using softmax as the activation function in the output layer of a neural network, the error for each class (or category) can be calculated as the difference between the predicted probability (y_hat) and the true label (y)." Is this all you have to do for softmax backprop? My networks already do this, so I guess I can skip the softmax layer on backprop? So confusing.
@statquest
@statquest 6 күн бұрын
With softmax, the loss function is cross entropy (see: kzbin.info/www/bejne/bHLVhKypatZ7d7c ) and I show how that works with backpropagation here: kzbin.info/www/bejne/rnOomWlsi56akNE
@easyBob100
@easyBob100 6 күн бұрын
@@statquest Yes. I do use cross entropy to SHOW the loss, but that's all I do with it. The error that gets back propagated is still just the difference between the predicted output and the target output. Everything else I've tried fails big time and my network doesn't learn. EDIT: Note: I'm a programmer trying to do math, not a math guy trying to program lol.
@statquest
@statquest 6 күн бұрын
@@easyBob100 Check out those links I put in the last comment. They might help.
@easyBob100
@easyBob100 6 күн бұрын
@@statquest This is most likely a confusion on my part with terminology. Namely between calling something the "error" vs "gradient". I've watched many of your videos, sometimes many times too lol. And ya, I've watched those vids as well :).
@statquest
@statquest 6 күн бұрын
@@easyBob100 Well, if you need code examples, you can find them here: github.com/StatQuest/signa
@felipe_marra
@felipe_marra Жыл бұрын
up
@statquest
@statquest Жыл бұрын
double up! :)
@ccuuttww
@ccuuttww 3 жыл бұрын
HMMM I think your drawing is messing it is hard to read even I know how to do the derivative
@statquest
@statquest 3 жыл бұрын
Can you tell me what time point, minutes and seconds, is confusing?
@ccuuttww
@ccuuttww 3 жыл бұрын
@@statquest I truly understand all stuff it just hard to read if someone new to this topic may have some trouble
@statquest
@statquest 3 жыл бұрын
@@ccuuttww OH I see. Understood.
@ccuuttww
@ccuuttww 3 жыл бұрын
@@statquest If U don't mind I can help u to tidy it up and send it to u tomorrow
@statquest
@statquest 3 жыл бұрын
@@ccuuttww Sure!
Neural Networks Part 6: Cross Entropy
9:31
StatQuest with Josh Starmer
Рет қаралды 270 М.
Neural Networks Part 5: ArgMax and SoftMax
14:03
StatQuest with Josh Starmer
Рет қаралды 178 М.
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН
黑天使被操控了#short #angel #clown
00:40
Super Beauty team
Рет қаралды 61 МЛН
We Attempted The Impossible 😱
00:54
Topper Guild
Рет қаралды 56 МЛН
Derivative of Sigmoid and Softmax Explained Visually
22:49
Elliot Waite
Рет қаралды 8 М.
What if all the world's biggest problems have the same solution?
24:52
Softmax Layer from Scratch | Mathematics & Python Code
6:48
The Independent Code
Рет қаралды 17 М.
Entropy (for data science) Clearly Explained!!!
16:35
StatQuest with Josh Starmer
Рет қаралды 661 М.
Softmax Function Explained In Depth with 3D Visuals
17:39
Elliot Waite
Рет қаралды 41 М.
Word Embedding and Word2Vec, Clearly Explained!!!
16:12
StatQuest with Josh Starmer
Рет қаралды 376 М.
The Softmax : Data Science Basics
13:09
ritvikmath
Рет қаралды 53 М.
Why Do Neural Networks Love the Softmax?
10:47
Mutual Information
Рет қаралды 67 М.
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН