Backpropagation Details Pt. 1: Optimizing 3 parameters simultaneously.

  Рет қаралды 219,175

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 299
@statquest
@statquest 3 жыл бұрын
The full Neural Networks playlist, from the basics to deep learning, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@sahanamd707
@sahanamd707 2 жыл бұрын
In neural network, does the gradient for parameters are calculated parallel? For example: when I start with finding gradient for all the 7 parameters, do I calculate all 7 parameters simultaneously by taking the previous iteration values or, first I calculate the bias gradient and get the new bias, then calculate predicted value by new bias and then calculate gradient for w3 ? And so on till w1 ?
@statquest
@statquest 2 жыл бұрын
@@sahanamd707 Everything is done at the same time.
@sahanamd707
@sahanamd707 2 жыл бұрын
Thank you
@chaitanyasharma6270
@chaitanyasharma6270 3 жыл бұрын
the way you explain things,so patiently and in depth, i feel 200% more confident in the topic afterwards
@statquest
@statquest 3 жыл бұрын
Awesome! :)
@joserobertopacheco298
@joserobertopacheco298 Жыл бұрын
I'm writing from Brazil. This channel's playlist about neural networks is a masterpiece.
@statquest
@statquest Жыл бұрын
Muito obrigado! :)
@KzrLancelot
@KzrLancelot 6 ай бұрын
join a cartel
@shafir360
@shafir360 Жыл бұрын
I am watching all of these eventhough i already graduated with masters with focus on machine learning and deep learning. its actually amazing how much I am learning even as a intermediate student.
@statquest
@statquest Жыл бұрын
bam!
@arindammitra2293
@arindammitra2293 3 жыл бұрын
Triple BAM (Explanation)+Triple BAM (Animations)...... You are a very great teacher Josh Starmer :) :)
@statquest
@statquest 3 жыл бұрын
Wow, thanks!
@victorreloadedCHANEL
@victorreloadedCHANEL Жыл бұрын
We all should buy his book, he deserves it given the quality of these videos!!
@statquest
@statquest Жыл бұрын
Thank you!!! :)
@peki_ooooooo
@peki_ooooooo Жыл бұрын
yes!
@MultiSamarjit
@MultiSamarjit 9 ай бұрын
​@@statquest Hey man, just bought your book,will be arriving in a few days via amazon.All these topics are covered right?
@statquest
@statquest 9 ай бұрын
@@MultiSamarjit The basics of neural networks and backpropagation are covered. The other topics are listed here: statquest.org/statquest-store/
@TheAkiller101
@TheAkiller101 3 жыл бұрын
I really like the medieval guitar sound you added when you said "fancy notation" , you effort can really be seen in the little details
@statquest
@statquest 3 жыл бұрын
Thanks!
@voyam
@voyam 6 ай бұрын
Had to watch 17:09 at least ten times. But now I get the most dificult part: the orange and blue curves, represent... the orange and blue curves. Without that, I would be completely lost 😆. Thanks for the hard work. Amazing series!
@statquest
@statquest 6 ай бұрын
I'm glad you figured it out! :)
@eren_deniz
@eren_deniz Ай бұрын
hahahaha
@KenJee_ds
@KenJee_ds 4 жыл бұрын
I wish I had this when I was first learning backpropagation! Can I "work my way backward" with this knowledge haha
@statquest
@statquest 4 жыл бұрын
BAM! :)
@romanrandall2106
@romanrandall2106 3 жыл бұрын
Pro tip: you can watch movies on flixzone. Me and my gf have been using it for watching lots of of movies lately.
@amoszahir7346
@amoszahir7346 3 жыл бұрын
@Roman Randall Definitely, have been using Flixzone for years myself :)
@rajeevradnair
@rajeevradnair 3 жыл бұрын
haha good one !!
@eduardbenedic9844
@eduardbenedic9844 2 жыл бұрын
@Roman Randall and @Amos Zahir are bots but nice one
@erenplayzmc9452
@erenplayzmc9452 9 ай бұрын
OMG THE HAPPINESS I WAS FEELING WHEN I UNDERSTOOD EVERYTHING, you seriously are a really good teacher.
@statquest
@statquest 9 ай бұрын
Thank you!
@vusalaalakbarova7378
@vusalaalakbarova7378 2 жыл бұрын
Thanks Josh for these videos, I passed my data mining exam by watching your videos, now preparing for the ML exam. Your explanation is brilliant, I learn topics of 3 lectures with these 18 minutes videos. Please continue to publish such valuable content, you save lives of many people like me.
@statquest
@statquest 2 жыл бұрын
Thank you and good luck with your exam! Let me know how it goes.
@vusalaalakbarova7378
@vusalaalakbarova7378 2 жыл бұрын
@@statquest Josh, are you planning to make a video about batch normalization?
@statquest
@statquest 2 жыл бұрын
@@vusalaalakbarova7378 Not soon. Currently I'm working on a series of videos about how to build neural networks with pytorch and pytorch_lightning.
@mattduchene66
@mattduchene66 7 ай бұрын
Despite the simple explanations, these videos continuously make me doubt my mathematical abilities for about 15 minutes. But without fail, there’ll be a DOUBLE BAM! out of left field and suddenly everything’s clear in my head. Thank you! You’re doing God’s work.
@statquest
@statquest 7 ай бұрын
Bam! :)
@maayanmagenheim441
@maayanmagenheim441 3 жыл бұрын
I'm a student for CS at the Hebrew University of Jerusalem, study right now IML course. Your lectures so help me and my friends, and I really want to thank you. You're a great & funny teacher and your lessons are a perfect example to how to teach at the 21 century. Tnx again
@statquest
@statquest 3 жыл бұрын
Wow! Thank you very much! BAM! :)
@averagecandy2581
@averagecandy2581 11 ай бұрын
The details are just out of this world. Amazing. Breath-taking and short of words.
@statquest
@statquest 11 ай бұрын
Thanks!
@DharmendraKumar-DS
@DharmendraKumar-DS Жыл бұрын
How the heck do you have this much understanding in each concept...you are irreplaceable.
@statquest
@statquest Жыл бұрын
Thanks!
@magabosc2451
@magabosc2451 5 ай бұрын
BAM !!! I'm doing my PHD in this field, and it is the BEST serie of videos that I have watched since the bigenning of my study ! Thank you so much for that :D
@statquest
@statquest 5 ай бұрын
Thanks and good luck!
@rohanmishra3115
@rohanmishra3115 3 жыл бұрын
What a great explanation to such complex topic. I can't imagine the amount of effort you put in to create such detailed videos along with spoken text. One of the best youtube channel I have ever come across ! Hats off to you .. Don't BAM me :)
@statquest
@statquest 3 жыл бұрын
Wow, thank you!
@joserobertopacheco298
@joserobertopacheco298 Жыл бұрын
I agree 100%
@nonalcoho
@nonalcoho 4 жыл бұрын
BAMMMMMMM! I like the animation in the last part and the music with Fan~cy notation lol
@statquest
@statquest 4 жыл бұрын
BAM! :)
@georgeshibley9529
@georgeshibley9529 4 жыл бұрын
One of these days I'd love to see you do a NN to watch the process you produced on these videos get lined up with some code, maybe python or R. It's incredible work you do, hell you are helping me survive my masters program. If you put it up, I'd trust the content. Thank you for all your hard work
@statquest
@statquest 4 жыл бұрын
Thank you! And good luck with your masters degree.
@anshuljain2258
@anshuljain2258 7 ай бұрын
Such hard work. Thank you Josh, you are helping generations with this + all your videos. Step by step learning with examples is the right way to learn anything !
@statquest
@statquest 7 ай бұрын
Thank you!
@ileshdhall
@ileshdhall 2 ай бұрын
wow! WoW! WOW!, I have always been scared of math, cus it took me hell lot of time to understand, but you just explain it as smooth as butter, Thanks a lot really!!
@statquest
@statquest 2 ай бұрын
Thank you very much!
@blueeyessti
@blueeyessti 3 ай бұрын
These videos are so much better than 3blue1brown, he starts with complicated analogies and examples and then delves into heavy math whereas this simplifies the problem using simpler examples and works through all the small steps
@statquest
@statquest 3 ай бұрын
Thank you!
@free_thinker4958
@free_thinker4958 Ай бұрын
I totally agree with you 👏
@wliw3034
@wliw3034 3 жыл бұрын
You are One of the Best Content Creator I have ever Seen.
@statquest
@statquest 3 жыл бұрын
Wow, thanks!
@lisun7158
@lisun7158 2 жыл бұрын
[Notes] 6:44 Notation for activation functions 2:50 Initialize weights using standard normal distribution. Q: Why N(0,1)? -- A: Just one of many ways to initialize weights. [ref. 9:50 of kzbin.info/www/bejne/fXy9oIJ-jayWgtE&ab_channel=StatQuestwithJoshStarmer] Initialize bias with 0 since bias terms frequently start from 0. 4:33 4:48 plot SSR with respect to b3
@statquest
@statquest 2 жыл бұрын
bam!
@tagoreji2143
@tagoreji2143 2 жыл бұрын
A Brief Indepth Explanation.Thank you Sir
@statquest
@statquest 2 жыл бұрын
Glad you liked it
@ericchao3017
@ericchao3017 2 жыл бұрын
Really loving these videos, thank you so much for your work Josh
@statquest
@statquest 2 жыл бұрын
Thank you!
@Vanadium404
@Vanadium404 Жыл бұрын
This NN series is so underrated just 124K I mean come on
@statquest
@statquest Жыл бұрын
Thanks!
@boxiangwang
@boxiangwang 4 жыл бұрын
Mega BAMM!! I really love the explanation. Awesome!
@statquest
@statquest 4 жыл бұрын
Thanks!
@vladimirfokow6420
@vladimirfokow6420 Жыл бұрын
Thank you for your clear explanations with the simple example! Great work, and very useful.
@statquest
@statquest Жыл бұрын
Glad it was helpful!
@flyawayhome3
@flyawayhome3 28 күн бұрын
The little harpsichord really tickled me haha, love it
@statquest
@statquest 27 күн бұрын
:)
@mortyk182
@mortyk182 6 ай бұрын
woah this was some amazing teaching skills sir, you're totally gifted with that
@statquest
@statquest 6 ай бұрын
Thanks! 😃
@Viezieg
@Viezieg 2 жыл бұрын
thank you so much for these videos. i hated math back in high school, but now in my mid 20's i would rather do math than play video games. all thanks to your tutorials
@statquest
@statquest 2 жыл бұрын
Wow! That's awesome! Thank you!
@石政泰
@石政泰 5 ай бұрын
I am on vacation in Hawaii but I am watching your neural network video. This video is so entertaining to watch :) Tai
@statquest
@statquest 5 ай бұрын
BAM! Have a great vacation! :)
@石政泰
@石政泰 5 ай бұрын
@@statquest thank you! you too. have a nice day
@mohammadhaji2191
@mohammadhaji2191 3 жыл бұрын
That was the best explanation I had ever seen. Thank you very much.
@statquest
@statquest 3 жыл бұрын
Thank you! :)
@ayushipal7605
@ayushipal7605 11 ай бұрын
Hats off to you Josh!! So nicely explained ❤
@statquest
@statquest 11 ай бұрын
Glad you liked it!
@Tapsthequant
@Tapsthequant 3 жыл бұрын
You make this stuff so accessible, well done!
@statquest
@statquest 3 жыл бұрын
Thank you!
@alinadi9427
@alinadi9427 8 ай бұрын
this playlist is excellent
@statquest
@statquest 8 ай бұрын
Thank you!
@KayYesYouTuber
@KayYesYouTuber Жыл бұрын
This is simply beautiful!. You are the best.
@statquest
@statquest Жыл бұрын
Thank you!
@girmazewdie8366
@girmazewdie8366 Жыл бұрын
Thank you so much for sharing your knowledge, it is really so increadibly helped me understand the basics of the NN.
@statquest
@statquest Жыл бұрын
Glad it was helpful!
@amarnathmishra8697
@amarnathmishra8697 3 жыл бұрын
Well you actually make complex things super easy.Hats off and of course BAAA...M!!!
@statquest
@statquest 3 жыл бұрын
Bam! :)
@mahfuzurrahmanabeed4349
@mahfuzurrahmanabeed4349 2 ай бұрын
I wish I could have taken your classes when I was back in high school.
@statquest
@statquest 2 ай бұрын
bam! :)
@snp27182
@snp27182 3 жыл бұрын
You're a legend Doctor Starmer.
@statquest
@statquest 3 жыл бұрын
Thanks!
@starkarabil9260
@starkarabil9260 3 жыл бұрын
that was exactly what I needed. It would be great if you could 'also' do an application through one of Python libraries in order to show a real application by scripting with using this knowledge.
@statquest
@statquest 3 жыл бұрын
Thanks! I would like to do that.
@willw4096
@willw4096 Жыл бұрын
Notes: 2:31 6:14 15:57 the "y"s are calculated based on other weights (w1 and w2)
@statquest
@statquest Жыл бұрын
:)
@AnBru
@AnBru Жыл бұрын
amazing video, thanks for all your hard work on this.
@statquest
@statquest Жыл бұрын
Glad you enjoyed it!
@anashaat95
@anashaat95 2 жыл бұрын
Great explanation as usual. Thank you very much.
@statquest
@statquest 2 жыл бұрын
Thanks again!
@ilducedimas
@ilducedimas 2 жыл бұрын
God bless this Good Man.
@statquest
@statquest 2 жыл бұрын
Thanks!
@puppergump4117
@puppergump4117 2 жыл бұрын
13:35 Do you mean the derivative of observed - predicted? Wouldn't that be a derivative of a single number? Or does it always just come out to be -1?
@statquest
@statquest 2 жыл бұрын
To get a better understanding of how we determine this derivative, check out the StatQuest on The Chain Rule: kzbin.info/www/bejne/rZ2Unqyup9mEfrM It will explain exactly where that -1 comes from.
@puppergump4117
@puppergump4117 2 жыл бұрын
@@statquest Oh the derivative of the negative intercept? ok thanks
@nidakhan1412
@nidakhan1412 Жыл бұрын
thank you so much sir for clearly explaining everything
@statquest
@statquest Жыл бұрын
Thanks!
@tinacole1450
@tinacole1450 Жыл бұрын
Hi Josh! Love the videos. Do you have any posts for building models in R/Rstudio on neural networks? Thanks,Tina
@statquest
@statquest Жыл бұрын
Not yet!
@quantummusic2322
@quantummusic2322 2 жыл бұрын
I love you Statquest
@statquest
@statquest 2 жыл бұрын
:)
@killer-whale864
@killer-whale864 2 жыл бұрын
i hate stats, and i hate statquest. But i keep finding myself on this channel again and again
@statquest
@statquest 2 жыл бұрын
noted
@abhijeetmhatre9754
@abhijeetmhatre9754 3 жыл бұрын
This is just awesome. I had started learning machine learning algorithm from multiple sources until I found your youtube channel. And now I don't have to check for any other source for understanding any ML algorithm. Looking Forward for more deep learning videos as my area of interest is deep learning. Could you help me with a good book for deep learning? And thanks for such wonderful videos.
@statquest
@statquest 3 жыл бұрын
This series ends (for now) with Convolutional Neural Networks, so just keep watching to learn about deep learning.
@robertdavis2855
@robertdavis2855 2 жыл бұрын
I love you man! You have a sense of humor about you that is rare in deez parts lol
@statquest
@statquest 2 жыл бұрын
Thank you!
@danielo6413
@danielo6413 2 жыл бұрын
Hi Josh, great video as always. One question, if I want to speak in epoch and batch terms for this video, is it correct to say that this video shows one epoch, which includes one batch that contains all 3 data points we have (Batch Gradient Descent process)? Thanks a lot !!!
@statquest
@statquest 2 жыл бұрын
Yes, that is correct.
@akaBryan
@akaBryan 2 жыл бұрын
Hey just a question! Around 14:00, why are you taking the derivative of SSR with respect to w_3 and w_4 rather than y_2,i and y,1_i? What is the logic between choosing taking the derivative with respect to the weight rather than the functions themselves?
@akaBryan
@akaBryan 2 жыл бұрын
Ah nevermind, its because you want to optimize the weights w_3 and w_4, so you just take their derivative to get step size and so forth... im so dumb haha! Im assuming that in the next part then you will optimize the weights w_1 and w_2 by also connecting them to the derivative of the loss function with respect to the weights, so itll be a huge bonkers chain rule in action
@statquest
@statquest 2 жыл бұрын
Yes! It will be totally bonkers with chain rule action. :)
@samuelpolontalo6882
@samuelpolontalo6882 4 жыл бұрын
Best channel ever
@statquest
@statquest 4 жыл бұрын
Wow! Thank you! :)
@Aaa-vh2lm
@Aaa-vh2lm 4 ай бұрын
Absolutely amazing! I‘ve got a question though. How do we know if we are going the right direction when calculating the new parameter.
@statquest
@statquest 4 ай бұрын
The derivative tells us what direction to change the parameter. To see more details, see: kzbin.info/www/bejne/qXXZZZlqqJeGeJo
@Aaa-vh2lm
@Aaa-vh2lm 4 ай бұрын
@@statquest Thank you for answering even after 2 years! Funnily enough, while I wrote the elaboration of my question here, I stumbled upon the answer myself. Thank you again for your commitment. Let me tell you, that the work you do absolutely outclasses any learning material that I have stumbled across. I will definitely check out your book! Great work!
@jameelabot9122
@jameelabot9122 4 ай бұрын
Love your videos man, very helpful at providing detail without sacrificing clarity. However I have noticed quite a few errors across the videos, generally small errors such as saying the wrong numbers or when calling up examples such as in this video at 9:26. input_3 would be 0, not 1. Again, it is not a major error and the information provided is nonetheless exemplary however it does make following along a tad challenging when trying to listen to the video rather than watching it like a hawk. Keep up the good work man, much appreciated x
@statquest
@statquest 4 ай бұрын
I'm glad you like my videos. It is indeed unfortunate that a few of them have small "typos". However, the example you provide is not one of them. The inputs to the neural network are the x-axis coordinates, not the y-axis coordinates. The 3rd data point has an x-axis coordinate of 1 and a y-axis coordinate of 0. Thus, for the 3rd data point, the input to the neural network is 1 and the desired output is 0. So, not only is this not a major error, it's not an error at all.
@soraf583
@soraf583 4 жыл бұрын
Thanks for your great video as always! I have a question though after watching this video and the other SGD video you've made in the past. When calculating the gradients for each parameter with regular gradient descent, we are plugging in all of the samples into the derivative of the loss function w.r.t the current parameter; versus we will just randomly pick one sample in the same process with SGD being used. If that's the case, then what will be the purpose of looping through all the samples (with regular GD) in a complete epoch if we are already using all the samples when calculating the gradients? Thanks in advance!
@statquest
@statquest 4 жыл бұрын
I'm not sure I fully understand your question. The difference between "regular" and "stochastic" gradient descent in this context has to do with the summation. In "regular", the summation goes from 1 to 'n', where 'n' is the number of samples. In "stochastic", the summation goes from 1 to m, were 'm' is < 'n' and is the number of samples randomly selected for the iteration. Does that help?
@soraf583
@soraf583 4 жыл бұрын
@@statquest Thank you for the quick reply! Yes that’s helpful and I think I’m understanding that part. I was mixing the concept of Gradient Descent with epoch/batch numbers, but I guess whether the GD is stochastic or not has nothing to do with the general epoch/batching concept when running a neural network, as we would still need to go over all the samples in a full epoch.
@edphi
@edphi 2 жыл бұрын
Thanks. Great video again and again.
@statquest
@statquest 2 жыл бұрын
Thank you very much! :)
@白云开
@白云开 3 жыл бұрын
BAM! Great work!
@statquest
@statquest 3 жыл бұрын
Thank you!
@mikhailbaalberith
@mikhailbaalberith 4 жыл бұрын
Hey Josh, this is dope. Hope you could do some videos about the Hessian and Jacobian matrices, Thanks.
@statquest
@statquest 4 жыл бұрын
I'll keep those topics in mind.
@gero8049
@gero8049 3 жыл бұрын
Im gonna make a AI agent that create youtube bots that promotes your channel. You really deserve all kudos.
@statquest
@statquest 3 жыл бұрын
Bam!
@madghostek3026
@madghostek3026 Жыл бұрын
Small question: since we fiddle with all (or part) of the parameters at once, and for example bias is dependent on weights on the graph, does that mean they fight with each other? Can something be done about it? Like we calculate the derivatives for current forward pass, ok, but then changing all parameters at once to what the think is optimal might throw off everything, since they can't communicate in any way, how does it not explode?
@statquest
@statquest Жыл бұрын
In my video on gradient descent, I show how to optimize two parameters at the same here: kzbin.info/www/bejne/qXXZZZlqqJeGeJo In that video, we're trying to fit a straight line to some data points and are using gradient descent to find the best values for two parameters, the y-axis intercept and the slope. If you watch, you'll see a fancy graph, where one axis represents different values for the y-axis intercept and another axis represents different values for the slope. When we optimize both at the same time, we take a step towards a better intercept on that axis and take a step towards a better slope on that axis, which is different, and doesn't affect the one the intercept is on. So the parameters don't fight each other because each one gets its own axis to work on. That being said, we can still get stuck in a local minimum, but it's like progress in one parameter can be negated by progress in another.
@madghostek3026
@madghostek3026 Жыл бұрын
@@statquest Ah, this makes a lot of sense now, I think I know why it was misleading for me - in the end all you see a numerical value, the error, but behind the scenes the partial derivatives take apart the loss function in their own domains, so it's not just one number. Thank you for very descriptive response!
@statquest
@statquest Жыл бұрын
@@madghostek3026 bam! Your question is actually a very good one and maybe one day I'll make a short video that explains it for everyone.
@rafibasha1840
@rafibasha1840 3 жыл бұрын
@4:45 ,Hi Josh why sum of squares residual used classification problem
@statquest
@statquest 3 жыл бұрын
Because it works just fine in this simple example. However, if you keep watching the series, you'll see how to do backpropagation with ArgMax and SoftMax and Cross Entropy. Here's the whole playlist: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@rafibasha1840
@rafibasha1840 3 жыл бұрын
@@statquest ,Thank you Josh …I am watching your videos daily …Please make videos on RNN GAN LSTM and NLP ..
@statquest
@statquest 3 жыл бұрын
@@rafibasha1840 I plan on making those in the spring.
@rafibasha1840
@rafibasha1840 3 жыл бұрын
@@statquest ,Thank you Josh
@alexfeng75
@alexfeng75 6 ай бұрын
In "d SSR/ d Predicted", is Predicted a single value like Predictedi (with index i ) or a collection of values as i can range from 1 to 3?
@statquest
@statquest 6 ай бұрын
A collection of values. You can tell if you keep watching the video and see how it is used.
@alexfeng75
@alexfeng75 6 ай бұрын
@@statquest thank you for the prompt reply, Josh! you are the best!
@fndpires
@fndpires 2 жыл бұрын
THIS MAN IS AN ANGEL! :D QUADRUPLE BAM!
@statquest
@statquest 2 жыл бұрын
Thank you! :)
@dodosadventures7593
@dodosadventures7593 10 ай бұрын
Hi Josh ! Love your videos, could you please explain why normal distribution is used to initialise w3 and w4 or else if you have already uploaded a video on normal distribution can you tag it
@statquest
@statquest 10 ай бұрын
It's just a standard way to do it. However, you can use uniform distributions or other distributions if you would like. One thing people like about the normal distribution is that changing the standard deviation for each hidden layer can make it easier to train deeper models (models with lots of hidden layers).
@dahirou_harden
@dahirou_harden 6 ай бұрын
Just wanted to clarify. Is the output given at the end of each pass an actual function or just a set of 3 points (summed from y1 and y2)? Thanks!
@statquest
@statquest 6 ай бұрын
What time point, minutes and seconds, are you asking about?
@dahirou_harden
@dahirou_harden 6 ай бұрын
@@statquest Basically I'm just confused about if the final curve approximating the 3 points is a "curve" as in a polynomial, or just a set of 3 points. Because when we add the two activation functions, you talked about adding them at each point as if we were adding the equations for the lines themselves, in order to get the final line. But it seems like instead we're just adding the y values at each input (the 3 given inputs) rather than a line itself..?
@dahirou_harden
@dahirou_harden 6 ай бұрын
@@statquest At 4:03 for example.
@statquest
@statquest 6 ай бұрын
@@dahirou_harden The adding is done for all possible x-axis coordinates (or input values), and thus, we are adding the lines themselves, not just the 3 points. The points (or circles) on the lines are just to illustrate the concept of adding y-axis values, and do not to limit the adding to just those points.
@elmoreglidingclub3030
@elmoreglidingclub3030 4 жыл бұрын
Great video and explanation. But I'm missing something simple. The blue and orange lines are added to render the green line, right? It appears (I'm squinting) that, after convergence, the middle dose (the 1/2 dose; actually, just to the left of it) value is 1 but the intersection of the blue and orange lines is at about -.5. Adding those together gives -1, not 1. What am I missing??
@statquest
@statquest 4 жыл бұрын
You forgot to add the bias term.
@_epe2590
@_epe2590 3 жыл бұрын
BAM!! I finally understand but.... Am I correct to say that if I was optimizing 3 weights and biases at the same time i would do gradient descent in a function with 3 dimensions (1 for each weight and bias)??
@statquest
@statquest 3 жыл бұрын
Yes
@emkahuda776
@emkahuda776 4 жыл бұрын
As usual, your videos are totally awesome, I like them much and easy to understand. I wonder if you will make a video about spatial transcriptomic analysis please since you uploaded the scRNA three years ago considering the spatial analysis is now more famous?
@statquest
@statquest 4 жыл бұрын
I'll keep it in mind.
@Ruhgtfo
@Ruhgtfo 3 жыл бұрын
Yeaaaa finally new episodde
@statquest
@statquest 3 жыл бұрын
:)
@kousthabkundu1996
@kousthabkundu1996 4 жыл бұрын
Sir, one question I have. when you said we randomly select w3 and w4 from standard distrib in the first iteration, that is any values from standard distrib table or we select no's w.r.t. given dataset?
@statquest
@statquest 4 жыл бұрын
In this example I selected random value from a standard normal distribution. This is a normal distribution with mean = 0 and standard deviation = 1 and is completely independent of the data.
@omkarghadge8432
@omkarghadge8432 3 жыл бұрын
YOU ARE THE BEST!
@statquest
@statquest 3 жыл бұрын
Thanks!
@ertreri
@ertreri 2 ай бұрын
superb, thanks a lot.
@statquest
@statquest 2 ай бұрын
Thanks!
@akshaynn4651
@akshaynn4651 2 жыл бұрын
when i plug the value -1.43 into the equation log(1 + e**x) i get 0.093. should I use the base 10 for log or a different one?
@statquest
@statquest 2 жыл бұрын
In statistics, data science, machine learning and almost all programming languages, the default base for the log function is 'e', and that's what I use here.
@akshaynn4651
@akshaynn4651 2 жыл бұрын
@@statquest Thanks, this was very helpful.
@NoNonsense_01
@NoNonsense_01 2 жыл бұрын
I think for the sake of clarity and rigour, it should be noted that all of the differentials are partial. Otherwise, some people may wonder why implicit differentiation wasn't used in such cases where W2 was differentiated with respect to W1 or vice versa.
@statquest
@statquest 2 жыл бұрын
noted
@84mchel
@84mchel 3 жыл бұрын
Dw_3 = (observed-predicted) * y1. The output is also a softplus activation. Why isn’t this derivative in the chainrule? Thank you!
@statquest
@statquest 3 жыл бұрын
We include the derivative of the SoftPlus activation function in the next video (part 2), when we optimize all of the weights and biases, including the ones to the left of the activation functions: kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@salihylmaz4694
@salihylmaz4694 4 жыл бұрын
So underrated
@statquest
@statquest 4 жыл бұрын
Glad you think so! :)
@creativeo91
@creativeo91 4 жыл бұрын
Please make a tutorial on Gaussian mixture model and EM algorithm
@statquest
@statquest 4 жыл бұрын
I'll keep that in mind.
@creativeo91
@creativeo91 4 жыл бұрын
@@statquest thanks.. It will be really helpful 🙂
@GuidedTrading_
@GuidedTrading_ 2 жыл бұрын
basically, taking derivatives of losses with respect to unknown terms to find how quickly the loss is changing if we change the parameters is the essence of this whole Machine learning thing.
@statquest
@statquest 2 жыл бұрын
yep
@ianholloway9493
@ianholloway9493 Ай бұрын
Why do you not average the derivative of the SSR (the gradient). What I mean by average is dividing the derivative of the SSR by the number of training examples. I read online that this is more common practice unless we are doing stochastic gradient descent. I was a little bit confused as this was not clarified. Thanks for the video though it really helped me understand the topic better.
@statquest
@statquest Ай бұрын
As the video shows, it works just fine without averaging the SSR. However, we have a relatively small dataset and that keeps the derivative from getting out of hand. If we had tons and tons of data, the SSR alone might lead to a massive derivative that's too big to be helpful, and averaging could help with that.
@SM-xn9bv
@SM-xn9bv Жыл бұрын
I can not thank you enough!
@statquest
@statquest Жыл бұрын
Thanks!
@parijatkumar6866
@parijatkumar6866 4 жыл бұрын
Hey Josh, great video as always!! Can you also please point to some source with examples (with answers) which we can practice on our own? I know there are tons of them on internet, but you know, your selection will be really helpful as always!!
@statquest
@statquest 4 жыл бұрын
I don't have anything yet, but I will create a "how to do neural networks" video soon.
@396me
@396me 9 ай бұрын
If there are only 3 points in the inputs, how it’s possible to get 5 points for getting orange or blue curve😢, please help me to understand
@statquest
@statquest 9 ай бұрын
What time point, minutes and seconds, are you asking about?
@396me
@396me 9 ай бұрын
@@statquest 11:13
@statquest
@statquest 9 ай бұрын
@@396me Since the range of possible input values goes from 0 to 1, we can just plug in numbers, from 0 to 1, to see the shape of the curve that the neural network is using for this dataset.
@sattanathasiva8080
@sattanathasiva8080 3 жыл бұрын
Many many thanks for your videos.
@statquest
@statquest 3 жыл бұрын
Glad you like them!
@hamidfazli6936
@hamidfazli6936 2 жыл бұрын
You are amazing!
@statquest
@statquest 2 жыл бұрын
Wow, thank you!
@gf1987
@gf1987 Жыл бұрын
very informative ty
@statquest
@statquest Жыл бұрын
:)
@cairoliu5076
@cairoliu5076 4 жыл бұрын
great content!
@statquest
@statquest 4 жыл бұрын
Thanks!
@shubhamkumar-nw1ui
@shubhamkumar-nw1ui 2 жыл бұрын
My regards to the friendly folks of the genetics department of University of North Carolina at Chapel Hill
@statquest
@statquest 2 жыл бұрын
Thanks!
@giorgosmaragkopoulos9110
@giorgosmaragkopoulos9110 9 ай бұрын
So what is the clever part of back prop? Why does it have a special name and it isn't just called "gradient estimation"? How does it save time? It looks like it just calculates all derivatives one by one
@statquest
@statquest 9 ай бұрын
Backpropagation refers to how the gradient is calculated. Gradient Descent is how the gradient is used.
@macknightxu2199
@macknightxu2199 Жыл бұрын
Hi, how to understand back? not forward or other direction? I mean the video is nice, but didn't explain to clear why backward is important. Why not forward?
@macknightxu2199
@macknightxu2199 Жыл бұрын
got it. At the back point, the derivative is much simpler than the derivatives at the front. So, as we would like to go from simple to hard, we'd choose from back to front. That's why it's backpropagation, which is discussed in the next video. BR
@statquest
@statquest Жыл бұрын
bam! :)
@chicagogirl9862
@chicagogirl9862 7 ай бұрын
OMGGGGG, Is that you who sings at "big bang theory", S12, E24???!!!!!
@statquest
@statquest 7 ай бұрын
I wish! :)
@hungp.t.9915
@hungp.t.9915 2 жыл бұрын
around 150,000 steps and w3, w4, b3 still nowhere near -1.22, -2.30, 2.61 (lot lot of zeros) guess I need more steps toward the end of the video, may I ask how many steps did you take, Mr. Josh?
@statquest
@statquest 2 жыл бұрын
What learning rate are you using? I used 0.1 and optimized everything in less than 50,000 steps.
@hungp.t.9915
@hungp.t.9915 2 жыл бұрын
​@@statquest I think I have trouble with gradient descent involved more than one parameter given: y₁,₁ = 0.21 y₁,₂ = 0.82 y₁,₃ = 2.04; y₂,₁ = 1.02 y₂,₂ = 0.26 y₂,₃ = 0.05; learning rate: 0.1 - First iteration, is as show in the video: w3 = 0.36 w4 = 0.64 b3 = 0 predicted1: (y₁,₁ × w3) + (y₂,₁ × w4) + b3 = 0.72 predicted2: (y₁,₂ × w3) + (y₂,₂ × w4) + b3 = 0.46 predicted3: (y₁,₃ × w3) + (y₂,₃ × w4) + b3 = 0.77 d SSR / d w3 = 2.58 d SSR / d w4 = 1.26 d SSR / d b3 = 1.90 - Second iteration, maybe I am wrong somewhere in this step: step size of w3 = 2.58 × 0.1 = 0.258 step size of w4 = 1.26 × 0.1 = 0.126 step size of b3 = 1.90 × 0.1 = 0.19 w3 = 0.36 − 0.258 = 0.1 w4 = 0.64 − 0.126 = 0.51 b3 = 0 − 0.19 = −0.19 predicted1: (y₁,₁ × w3) + (y₂,₁ × w4) + b3 = 0.21 × 0.1 + 1.02 × 0.51 - 0.19 = 0.35 predicted2: (y₁,₂ × w3) + (y₂,₂ × w4) + b3 = 0.03 predicted3: (y₁,₃ × w3) + (y₂,₃ × w4) + b3 = 0.04 d SSR / d w3 = −2 × (0 − 0.35) × 0.21 + −2 × (1 − 0.03) × 0.82 + −2 × (0 − 0.04) × 2.04 = −1.28 d SSR / d w4 = −2 × (0 − 0.35) × 1.02 + −2 × (1 − 0.03) × 0.26 + −2 × (0 − 0.04) × 0.05 = 0.2 d SSR / d b3 = −2 × (0 − 0.35) + −2 × (1 − 0.03) + −2 × (0 − 0.04) = -1.1 (all results are approximate)
@hungp.t.9915
@hungp.t.9915 2 жыл бұрын
No reply from Mr. Josh. Guess I will leave this one for the future. Anyway, nice video. A great help to people with no math background like me.
@statquest
@statquest 2 жыл бұрын
@@hungp.t.9915 The KZbin comment section is not ideal for debugging code. However, one day I'll post mine and hopefully that will help.
@thepodfunnel
@thepodfunnel 3 жыл бұрын
BAM! that was good!
@statquest
@statquest 3 жыл бұрын
Thanks!
@karrde666666
@karrde666666 3 жыл бұрын
The right way to learn, textbooks and lectures should be obsolete
@statquest
@statquest 3 жыл бұрын
bam! :)
@roberthuff3122
@roberthuff3122 6 ай бұрын
The nested chain rule.
@statquest
@statquest 6 ай бұрын
:)
@zer995
@zer995 3 жыл бұрын
Triple BAM!!! That's what I said when I knew my girl, married her and got children :)
@statquest
@statquest 3 жыл бұрын
:)
Backpropagation Details Pt. 2: Going bonkers with The Chain Rule
13:09
StatQuest with Josh Starmer
Рет қаралды 140 М.
Neural Networks Pt. 2: Backpropagation Main Ideas
17:34
StatQuest with Josh Starmer
Рет қаралды 564 М.
She made herself an ear of corn from his marmalade candies🌽🌽🌽
00:38
Valja & Maxim Family
Рет қаралды 17 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 34 МЛН
Re-Starting Astrophotography After 1 Year Off
22:40
James Lamb
Рет қаралды 1,3 М.
Neural Networks Pt. 3: ReLU In Action!!!
8:58
StatQuest with Josh Starmer
Рет қаралды 284 М.
Understanding Backpropagation In Neural Networks with Basic Calculus
24:28
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 541 М.
Backpropagation, step-by-step | DL3
12:47
3Blue1Brown
Рет қаралды 4,8 МЛН
The Essential Main Ideas of Neural Networks
18:54
StatQuest with Josh Starmer
Рет қаралды 1 МЛН
Backpropagation Algorithm | Neural Networks
13:14
First Principles of Computer Vision
Рет қаралды 44 М.
Gradient Descent, Step-by-Step
23:54
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН