The full Neural Networks playlist, from the basics to deep learning, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@TheLLMGuy4 жыл бұрын
i think you are one of the few teachers on the planet who understands that the secret of understanding is simplicity! Thank you
@statquest4 жыл бұрын
Thanks!
@Priyanshuc24257 ай бұрын
Thank yoy for amazing fantastic video. No words is enough to say how great your channel and specially Bam! Is 😊
@maksims46694 жыл бұрын
Throughout all my years of Bachelor studies I was avoiding computer sciences and statistics as much as it was possible, for I could not understand them. However, when I enrolled for Masters I had no other choice but to figure it out. Last semester I had a compulsory course in Computational Intelligence, so in order to understand the material I had to find some additional sources. That's how I encountered StatQuest. You explained everything so well that through the summer I was inspired to make an additional course in Machine Learning. This semester I took several courses in statistics and optimization, next semester I will certainly take more and now I really consider to connect my life to one of these fields. All this with help of your channel. I have no doubts that there are and will be much more people like me, to whom you became a guiding light in their studies. Thank you for your work, Josh, and keep up helping the curious ones in finding answers. Have a great year 2021!
@statquest4 жыл бұрын
Wow! Thank you very much! Good luck with your career and happy 2021!!!! :)
@MelonsIncorporated3 жыл бұрын
I'm surprised more teachers don't know about this channel, I'm not exaggerating when I say it's the MOST incredibly useful tool in helping me understand material in a class that makes no sense.
@statquest3 жыл бұрын
Wow, thank you!
@tensorsj4 жыл бұрын
Josh, I cannot stress how much of an empathetic teacher you are. I went through years and years of education in engineering and I tend to forget the low level math behind it all. Here I can rescue that with you. You are just an amazing human being :)
@statquest4 жыл бұрын
Thank you! :)
@Vanadium404 Жыл бұрын
This is the best explanation on YT. No fancy animations just pure calculation. Better than 3B1B and others.
@statquest Жыл бұрын
Thanks! I specifically tried to go deeper into topics that 3B1B only skimmed over because I wanted our videos to complement each others.
@Nandeesh_N3 жыл бұрын
"Quadruple bam" is what I feel as soon as I learn something from your video. It's so amazing and when I get any doubt, first thing I do is to check a relevant video on your channel! Thank you Josh for these amazing videos!
@statquest3 жыл бұрын
Happy to help!
@mrglootie1014 жыл бұрын
Finally!, tbh you're really good in teaching Josh, simple but detail, keep it! Bam! From Indonesia
@statquest4 жыл бұрын
Thanks! 😃
@kanui36184 жыл бұрын
mantap bangg
@harryliu100522 күн бұрын
Thank you Josh. Over the past two years, I've experienced some struggles and doubts, but after learning from your lectures, I've developed a genuine passion for machine learning. I've decided to apply for Phd in related field, hope it's not late! Truly grateful for your guidance and support!
@statquest22 күн бұрын
Triple bam! I hope the Phd program goes well!
@seanfitzgerald61654 жыл бұрын
There were so many BAM's in this video it made my head spin, and I love it.
@statquest4 жыл бұрын
BAM!
@lisun71582 жыл бұрын
[Notes] 1:27 2:45 3:06 Because of the Chain Rule, we get the formula for the derivative of SSR with respect to w1. (w3 is used) 7:31 formula for the derivative of SSR with respect to b1 (w3 is used) 9:10 formula for the derivative of SSR with respect to w2 (w4 is used) 9:15 formula for the derivative of SSR with respect to b2 (w4 is used) formula for the derivative of SSR with respect to b3 = (d SSR / d Predicted)*(d Predicted/ d b3) [ref.13:24 kzbin.info/www/bejne/f3-ViaB4na5_qpY&ab_channel=StatQuestwithJoshStarmer] Based on formula above, (The best StatQuest gives the succinctest formula I've ever seen) I get the solution for question below, Q: why called "backward"? why dynamic programming? A: The backpropagation algorithm computes the gradient of the loss function with respect to each weight by the chain rule. The calculation of the gradient proceeds backwards through the network, computing the gradient one layer at a time, iterating backward from the last layer to avoid redundant calculations of intermediate terms in the chain rule. [ref. wikipedia] 11:00 use each derivative to calculate respective step size and each new value. -- optimize all parameters of the NN simultaneously
@statquest2 жыл бұрын
Double bam! :)
@kalpaashhar65224 жыл бұрын
Brilliantly explained. Not only is it easy to assimilate the information due to the colours, the style in which you have broken down a complex explanation is a skill not many teachers have. Keep up the awesome work
@statquest4 жыл бұрын
Many thanks!!
@None_me_2 жыл бұрын
This is the best thing ever... it's like you help me understand the complex concepts in 3Blue1Brown in the simplest ways in these videos....you both are the epitome of Bam!......
@statquest2 жыл бұрын
BAM! Thank you very much! :)
@brucebalfour10023 жыл бұрын
I love your videos so much. So helpful. I always look forward to the BAMs, not only because they are fun, but also because it gives a sense of fulfillment. BAM from Switzerland.
@statquest3 жыл бұрын
BAM from North Carolina! :)
@ramyav46892 жыл бұрын
Dear Josh, Thank you so much for making this video. I have always been intrigued by the way you explain the mathematics of complicated concepts! It is just amazing.
@statquest2 жыл бұрын
Thank you!
@wenyange26074 жыл бұрын
Thank you for making this video. It's very clear, easy to follow, and super helpful for understanding the algorithms.
@statquest4 жыл бұрын
Glad it was helpful!
@nandakumar89367 ай бұрын
every time you mention the soft plus activation function you show the toilet paper pic. Every time. That's commitment right there.
@statquest7 ай бұрын
bam! :)
@alejandrocanada21804 жыл бұрын
I wish all the teachers in the world explain as good as you do. Thank you very much!
@statquest4 жыл бұрын
Wow, thank you!
@panku0772 жыл бұрын
A phenomenal example of teaching from first principles. Josh's brand of elegance and charisma kept me engaged and on track to mastering this topic.
@statquest2 жыл бұрын
Hooray! :)
@siddharthmodi27403 жыл бұрын
I dont believe , how can someone explain this weird topic with this level of simplicity.Hats off to your efforts. Thank you josh
@statquest3 жыл бұрын
Thank you!
@nabeelhasan65933 жыл бұрын
Understanding Back Propagation always give me Panic Attack but your style and simplicity is beyond amazing. Thanks for making such a complicated topic to be so easy to understand
@statquest3 жыл бұрын
Thanks!
@nandankakadiya14943 жыл бұрын
The best and perfect videos on Machine learning I have ever seen bammmm thank you ......
@statquest3 жыл бұрын
Wow, thanks!
@TheGoogly703 жыл бұрын
Awesome! It appeared daunting in the beginning, but at the end it was so easy to understand. Great job!
@statquest3 жыл бұрын
Hooray! :)
@fengjeremy78782 жыл бұрын
Three lectures about backward propagation. Finally understand what's going on in this fancy technique. Thank you sir!
@statquest2 жыл бұрын
Triple bam!!! :)
@hunter88313 жыл бұрын
You are such a great teacher that you make machine learning seem so easy and motivate anyone to learn more about it
@statquest3 жыл бұрын
Wow, thank you!
@tangh1469 ай бұрын
the preschool vibe really does make everything much less intimidating. love u josh 🥺
@statquest9 ай бұрын
:)
@mathscorner92813 жыл бұрын
I wish I could find videos on every topic in which I m having problem on this channel. Sir, you are really a fabulous teacher.
@statquest3 жыл бұрын
Thank you! :)
@rajathk96914 жыл бұрын
I'm really confused right now because the quality of the content really makes me want to promote you but I also wanna keep this channel a secret to myself.
@statquest4 жыл бұрын
I'm glad you like my videos! :)
@bruceeugene75003 жыл бұрын
A tip : you can watch movies at Flixzone. I've been using it for watching lots of of movies recently.
@houstonabraham61813 жыл бұрын
@Bruce Eugene yea, have been watching on flixzone} for since december myself =)
@adrienmaverick83343 жыл бұрын
@Bruce Eugene yup, I've been watching on Flixzone} for years myself :D
@aakarshanraj11762 жыл бұрын
@Rajath K "Share your knowledge. It’s a way to achieve immortality."
@jancvrcek15413 жыл бұрын
Erudite, funny and free of charge (triple BAM!!) what more could we wish? Absolutely love your videos!
@statquest3 жыл бұрын
Thank you very much! :)
@shivamgaur86242 жыл бұрын
I know I'm enjoying a video when I like it even before it starts. Amazing work!!
@statquest2 жыл бұрын
BAM! :)
@parijatkumar68664 жыл бұрын
As always brings a smile along with great clarity...
@statquest4 жыл бұрын
Thank you! :)
@huidezhu75664 жыл бұрын
This is the clearest explanation I can find. Amazing work
@statquest4 жыл бұрын
Thank you! :)
@mitchynz Жыл бұрын
I love this explanation so much.... It actually helped me intuitively understand the chain rule too. I just purchased your book on machine learning which is the perfect compendium to this series.
@statquest Жыл бұрын
Thank you very much!
@azizmouhanna8996 Жыл бұрын
Thank you from the bottom of my heart, from a phd student who learns a lot from you 🙏
@statquest Жыл бұрын
Happy to help!
@powangsept3 жыл бұрын
Watching the videos in your channel let me love the statistics more and more! Thank you for the great videos!
@statquest3 жыл бұрын
Awesome, thank you!
@haozeli96604 жыл бұрын
damn, the quality of the content of this channel is just over the top. Hope u can get more views because u definitely deserve it!!!!!
@statquest4 жыл бұрын
Thank you!
@elhairachmohamedlimam96402 жыл бұрын
Thank you so much, I have wasted a lot of time to understand these things, but really, after watching your videos things become very easy, thank you a lot
@statquest2 жыл бұрын
Thanks!
@willw4096 Жыл бұрын
Thanks for the great video! My notes: 0:59 9:49 10:54 - 11:07
@statquest Жыл бұрын
Nice work!
@sagarchauhan359325 күн бұрын
Once i get a job, i am going to support this channel big time.
@statquest25 күн бұрын
BAM! :)
@simonkaerts2228Ай бұрын
With help of this video, I was able to program this example in Python and learn a lot from doing so! I really felt the BAM once it worked!!
@statquestАй бұрын
BAM! :)
@tupaiadhikari3 жыл бұрын
Thank You Josh, for these valuable Videos. I am immensely grateful to you for making these videos. You are truly a Legend in the Data Science Community. Love and Gratitude from Kolkata, India.
@statquest3 жыл бұрын
Thank you very much! :)
@tagoreji21432 жыл бұрын
Educating along with Entertaining. That too for a complicated topic.Thank you very much, Professor
@statquest2 жыл бұрын
BAM! :)
@amyma22042 жыл бұрын
You saved tons of confused souls by this amazing explanation.
@statquest2 жыл бұрын
Thank you!
@parshanjavanrood3682 жыл бұрын
With all the hype around AI and machine learning, it's not easy to find sources that teach the statistics behind these subjects, and the ones that do teach the math behind it, make it very hard to understand(from the perspective of a second-year bachelors student), but what you do is awesome. Thanks for this great series.
@statquest2 жыл бұрын
Thank you!
@zihaozhou12564 жыл бұрын
Honestly, some of the illustration is better than the college professors.
@statquest4 жыл бұрын
Thank you! :)
@aakarshanraj11762 жыл бұрын
you really explained it better than anyone on youtube. Thanks a lot, it was really helpful.
@statquest2 жыл бұрын
Thank you very much! :)
@kostjamarschke46132 жыл бұрын
Love the video and explanation, but I love the SoftPlus commercial every time you mention the activation function even more.
@statquest2 жыл бұрын
bam!
@lin14503 жыл бұрын
Thank you so much for your content. I will be forever grateful. They way you convey the information with such simplicity, step-by-step and humor makes it so fun to watch and builds up motivation. Concepts that seemed way out of reach for me are becoming something I'm slowely building trust to truly and deeply understand them one day! Thank you so much!
@statquest3 жыл бұрын
Thank you for your support!! When I have time, I'll design some more merch.
@lakshman5873 жыл бұрын
Million billion trillion thanks for this Neural networks videos Josh! BAM!!! You are really awesome!!!!
@statquest3 жыл бұрын
BAM! :)
@tapanaritete2 жыл бұрын
Благодарим ви!
@statquest2 жыл бұрын
Thank you so much for supporting StatQuest!!! BAM! :)
@yonasabebe2 жыл бұрын
You make it look so easy. Thanks for your effort and contribution.🙏
@statquest2 жыл бұрын
Thank you!
@anasmomani6473 жыл бұрын
u literally should get Nobel prize for your videos
@statquest3 жыл бұрын
bam!
@adamwagnerhoegh9071 Жыл бұрын
Tak!
@statquest Жыл бұрын
Hooray!!! Thank you so much for supporting StatQuest!!! TRIPLE BAM! :)
@magtazeum40714 жыл бұрын
this channel is lovely..Thank you Josh...
@statquest4 жыл бұрын
Thank you!
@franciscoruiz62692 жыл бұрын
You're a master! You have gain all my respect.
@statquest2 жыл бұрын
Wow, thanks!
@marccrepeau68533 жыл бұрын
Thanks for this great series of tutorials! One thing I'm confused about: in this video some of the final parameter estimates (after the 450 gradient descent steps) end up larger than the original (random) parameter estimates, and some end up smaller. But the derivatives calculated for the first gradient descent step are all positive, so if you multiply them by a constant (positive) learning rate you will end up *decreasing* all parameters (by subtracting a positive value from them). Thinking about that parabola in your gradient descent tutorial, you would be starting with tangent lines (derivatives) all on the right side of the parabola. All derivatives are positive and thus all the tangent lines have positive slopes. Gradient descent will subtract positive step sizes from all parameters. All parameters will thus *decrease*. So how do some final parameter values end up greater than the original (random) estimates? (For example w1 is originally set at 2.74 and the final estimate is 3.34)
@statquest3 жыл бұрын
I think the answer might just be that the 7-dimensional surface that we are trying to find the bottom of with gradient descent may have some complicated shape that includes local minima that we get out of because the step size is large enough to allow for that sort of thing.
@marccrepeau68533 жыл бұрын
@@statquest So if you step outside of a local minima then you might end up with negative derivatives in subsequent descent cycles? I think that makes sense!
@statquest3 жыл бұрын
@@marccrepeau6853 bam!
@praveerparmar81573 жыл бұрын
As a matter of fact, I was going bonkers trying to understand Backpropagation until I watched this video. Now I'm stable 😁
@statquest3 жыл бұрын
bam!
@MADaniel7173 жыл бұрын
I'm back Josh! Exactly what I needed. Now I'm going to try to implement a neural network from scratch in Python using your videos :D
@statquest3 жыл бұрын
Go for it!
@MADaniel7173 жыл бұрын
@@statquest github.com/danielmarostica/simple-neuralnet/blob/main/neural_network.py I did it! Can't believe haha. Thanks Josh!
@statquest3 жыл бұрын
@@MADaniel717 TRIPLE BAM!!!! Congratulations. That is awesome!!!
Thank you, thank you, thank you thank you, thank you!!! I never thought I could understand this type of thing. If somebody asks me to explain this to them I'll say... THE CHAAAIN RULE!!! Thanks Josh! I'll donate part of my first salary to you eventually :)
@statquest2 жыл бұрын
Hooray! I'm glad my videos are helpful. :)
@pavapequeno2 жыл бұрын
Hi Josh, what you have made was for me the first definitive guide to advanced stats and ML that is accessible and understandable by anyone with a basic scientific background. Multiple Bam! I searched a while before stumbling on this treasure trove. Thank you so much! Will you also be releasing some videos on graph embeddings, GNNs/GCNs? I think you would have a very eager audience (certainly including me at least)!
@statquest2 жыл бұрын
Thanks! I'll keep those topics in mind, but for now I'm working on LSTM and Transformer neural networks.
@pavapequeno2 жыл бұрын
Thanks for the quick reply, Josh :-) I‘ll keep my eyes peeled for future videos. I guess graphs and graph embeddings is almost a playlist in itself! Once again thank you for opening my eyes to this wonderful world! All the best with the current new videos!
@simplifiedscience7497 Жыл бұрын
You are so amazing!! You really made me love machine learning!!
@statquest Жыл бұрын
Wow, thank you!
@pelocku12343 жыл бұрын
This video was great and made me want to build this in R. One note to others that are trying to use the same starting values and get to the same optimal values, you will need to use a learning rate that is not .1 because this will find different optimal values that are not quite the same as we see here. Just wanted to drop this note in case someone else tried to recreate it like I did. Again, it was the teaching that inspired me to do it.
@statquest3 жыл бұрын
Glad you were able to figure something out that worked for you! :)
@petercourt4 жыл бұрын
Fantastic work Josh! :)
@statquest4 жыл бұрын
Thank you! And thank you for your support!
@ek_minute_ Жыл бұрын
thanks for the toilet paper reference of soft plus now i will always revise the backpropagation Every time i go to toilet.
@statquest Жыл бұрын
bam?
@twandekorte20774 жыл бұрын
Great video. You are exceptional with regards to explaining difficult concepts in an easy and straightforward way :)
@statquest4 жыл бұрын
Thank you, and thanks for your support!
@devharal65412 жыл бұрын
You are best Josh!!
@statquest2 жыл бұрын
Thank you! :)
@xendu-d9v2 жыл бұрын
Sir, you are gold. Thanks
@statquest2 жыл бұрын
Thank you!
@giorgosmaragkopoulos91102 жыл бұрын
Thanks for the video Josh! Now I can flex that I coded my first neural net from scratch :D It works fine with small samples but when I create a large sample size, it fails (I guess that's why "batch size" exists in tensorflow instead of throwing everything at once). Do we know why this happens? Thanks
@statquest2 жыл бұрын
It's hard to say without knowing about the specific data and the specific neural network.
@sreerajnr6897 ай бұрын
This is such a fun to learn!! 😀😀
@statquest7 ай бұрын
Thanks!
@124357689111 ай бұрын
Thanks for the video. This is awesome!
@statquest11 ай бұрын
Glad you liked it!
@bhavikdhandhalya9 ай бұрын
Thank you for wonderful videos.
@statquest9 ай бұрын
Glad you like them!
@KiKitChannelАй бұрын
thak you teacher. i tought i'd never understand the Backpropagation.
@statquestАй бұрын
Bam! :)
@lizzy98749 ай бұрын
Hi Josh, it's a great video to help me understand NN. I have some questions regarding the parameter optimization. Is it possible that different parameters would need different numbers of steps to reach the optimal value? If so, how to determine the maximum # of steps or learning rate that works the best for all the parameter optimization?
@statquest9 ай бұрын
That's a great question and to be honest, I don't really know what the best strategy is for training a neural network. All I know is that, for the most part, people use the Adam optimizer (which, in a nutshell, essentially averages steps from stochastic gradient descent so that it's not quite a stochastic) and set the learning rate to something like 0.001 and that often works in practice.
@lizzy98749 ай бұрын
@@statquest Got it. Thanks for your answer! :)
@mainerengineer2 ай бұрын
Thank you !! Question though - typical backpropagation algorithms update weights after each sample ? vs. processing all data then updating weights? Computationally an important difference?
@statquest2 ай бұрын
Usually the are updated in batches as illustrated in this video. Here the "batch size" is 3.
@tymothylim65503 жыл бұрын
Wonderful video! Now I will always think of toilet paper when seeing the softplus activation function :)
@statquest3 жыл бұрын
BAM!!! :)
@vincentmatthew36066 ай бұрын
Hi there ! Where do you make the animation at 11:19 ? Can you tell me how to make those kind of animation ? What software do you use ? Thanks !
@statquest6 ай бұрын
I just drew a bunch of graphs in R (R is a programming language geared towards doing statistics and data analysis) and then jammed them all together back to back in Final Cut Pro.
@vincentmatthew36066 ай бұрын
@@statquest Oh i see. May i get the code ? Im trying to do the same thing on Python. Your help would be much appreciated. Thank you.
@statquest6 ай бұрын
@@vincentmatthew3606 It's just a for loop. I'm not sure my R code will help much in Python.
@a_sun59413 жыл бұрын
I might have missed this from the videos. But would like to ask that: since all the parameters are updated at the same time, why it is called 'back propagation'. The parameters aren't really updated from the last layer to the first layer; they are updated all at the same time, why it is called 'back'?
@statquest3 жыл бұрын
I believe it is because the chain rule goes from the back to the front.
@a_sun59413 жыл бұрын
I see; I think I get it now. I was deriving dpred/dw1 directly from output layer value and did not use dpred/dw2 or the intermediate layer values, so did not realize the back to front relation in the derivatives. Just to sort out my understanding with an example: Suppose we have input x; it goes through a connection * w1 + b1, and it becomes w1*x + b1 in hidden layer 1 (and lets call this value as ya); then it goes through a soft plus activation, and it becomes log ( 1 + e ^ ya) in hidden layer 2 (and lets call this value as yb); then it goes through a connection * w2 + b2, and it becomes w2 * yb + b2 in hidden layer 3 (and lets call this value as yc); then it goes through a soft plus activation again, and it becomes log ( 1+ e ^ yc) in hidden layer 4 (and lets call this value as yd); and lastly, it goes through a connection * w3 + b3, and it becomes w3 * yd + b3 in output layer (and lets call this value as ye). To summarize: ya = w1*x + b1 yb = log ( 1 + e ^ ya) yc = w2 * yb + b2 yd = log ( 1+ e ^ yc) ye = pred = w3 * yd + b3 And we want to compute dSSR/dw3, and dSSR/dw2, and dSSR/dw1. Since dSSR/dwx = (dSSR/dpred) * (dpred /dwx) Lets compute dpred/dwx: dpred/dw3 = dye / dw3 dpred/dw2 = (dye/dyd) * (dyd/ dyc) * (dyc/dw2) dpred/dw1 = (dye/dyd) * (dyd/ dyc) * (dyc/dyb) * (dyb /dya) * (dya/dw1) From the longer one (dpred/dw1), we can clearly see there is a chain haha and it does go from the back layer to the front layer (ye -> ya).
@a_sun59413 жыл бұрын
And even dpred/dw1 can be written as dpred/dw2 * something, so we do not compute some terms repetitively ? dpred/dw1 = (dpred/dw2) / (dyc/dw2) * (dyc/dyb) * (dyb /dya) * (dya/dw1) ?? Is dpred/dw_n usually represented as dpred/dw_n+1 * ... ? Sorry my replies here are way too long. And it’s ok if the additional questions asked in this reply are not answered. :)
@juliank740810 ай бұрын
Thank you very much! Appreciated!
@statquest10 ай бұрын
You're welcome!
@subhashcs8553 Жыл бұрын
from 9:28, you explain that all parameters (weights and biases) are optimized simultaneously. But in back propagation, don't they get updated iteratively (layer by layer, starting from the last layer and going until the input layer is reached)? ( Anyone, please correct me if I am wrong. ) so, the weights and bias of the last layer (w3, w4, b3) are updated first. Then the weights and biases of the last but one layer (w1, b1) and (w2, b2) are updated using the updated weights and bias of the last layer- w34 b4, b3. But in your explanation, the direction of updates is the opposite (forward).
@statquest Жыл бұрын
I believe all of the parameters are updated at the same time. My understanding is that the "back" in backpropagation refers to how the derivatives are calculated, from the back to the front with the chain rule. Not the order they are updated.
@viniciuskosmota63264 күн бұрын
Thank you for the class Josh! Why not not use dpred/dw1 directly in the dssr/dw1 calculaction?
@statquest4 күн бұрын
What time point, minutes and seconds, are you asking about?
@nikolatotev2 жыл бұрын
I have a question about backpropagation: (edit after finishing writing, actually 2) In a real implementation when performing backpropagation do the weight values get after each layer is reached or does the algorithm go through the whole network, saving how each weight & bias should change, and then after reaching the start of the network all of the values get updated? And a question that is related to a more complicated version of neural networks - In Convolutional neural networks that use Skip Connections, during the forward pass results from the previous layers gets concatenated with a deeper layer. My question is - when performing a backpropagation are the skip connections used to pass the gradient directly to a layer closer to the start of the network or are the skip connections just ignored. I'm not sure if anyone else is struggling with backpropagation in CNNs, if there are more people a video on the topic with your teaching style would be amazing!
@statquest2 жыл бұрын
My understanding is that, for each iteration of backpropagation, there is a single "forward pass", were the data is run through the neural network, and this is used to calculate the loss (in this case, the sum of the squared residuals), and then it does a single "backwards pass", where all of the parameters are updated at once. If the parameters were updated one at a time, then we would have to do a bunch of extra forward passes, one per parameter that we want to update, and that would probably take a lot longer. As for your question about CNNs, I don't know the answer, but, believe it or not, 2022 is the year of the neural network for StatQuest, and I plan on making a lot more videos about them, so I might get to this question before too long (Unfortunately I can't promise I will!).
@jasonliu62394 жыл бұрын
Very helpful! Question: does it mean there is no order of optimizing different weights and bias (w1, w2, b1, b2, w3, etc.) at the same time? Also, does it optimize w1 based on initial w3 guess? I mean if there is a different initial w3, how it gonna impact w1 optimization? Thanks
@statquest4 жыл бұрын
All of the parameters are optimized at the same time, and yes, if you start with different random values for each parameter, that will affect the optimization. You can know this because each step requires running the data through the model with the unoptimized values.
@nehabalani72903 жыл бұрын
Wow, exactly my question. One additional question to this on same line, does the step size calculation give out separate value for each parameter? Which thus gives new value..or does it throw one value for overall derivatives
@statquest3 жыл бұрын
@@nehabalani7290 I'm not sure what you mean by "step size calculation", however, each backpropagation pass gives us new values for every single parameter we are trying to optimize.
@jiayiwu410111 ай бұрын
I like this beginning song a lot!
@statquest11 ай бұрын
bam! :)
@ignaciozamanillo96594 жыл бұрын
Thanks as simple and good as always! Planning about a tutorial of Neural Networks in R / Python? Would be great
@statquest4 жыл бұрын
That's the plan! :)
@harishbattula26723 жыл бұрын
Thank you for the explanation.
@statquest3 жыл бұрын
You are welcome!
@Mastin70 Жыл бұрын
Very well explained, thanks a lot. But what if there are multiple minimums in gradient descent?
@statquest Жыл бұрын
That's a real problem. It is possible to get stuck in a local minimum that is not optimal. So, to get around this, we sometimes use stochastic gradient descent or try different initial values for the weights.
@Mastin70 Жыл бұрын
@@statquest thank you very much for your answer Josh!
@LEELEE-dg3xd Жыл бұрын
This video really helped me a lot!
@statquest Жыл бұрын
BAM! :)
@qqhuang-lx8ct23 күн бұрын
thank you!!it helps me a lot! Wish i could pass the exam!
@statquest23 күн бұрын
Good luck!
@akshayrameshwar4869 Жыл бұрын
Question??? I have a question regarding steps, iteration and epochs Q1) Let's say we are using gradient descend to optimize weights, in which instead of considering all the samples (lets say 3) at once and then calculating total SSR and updating weights, We are going to update weights after each sample. Here we are going to calculate SSR for 1st sample and then update its weight but the new weight we calculated here are not optimal weight so do we iterate over the same 1st sample few more times to get new predicted values to get optimal weights? Or do we move to the next sample (2nd sample). If we move to the second sample, we won't be able to find optimal values of weight for the first sample, and similarly we would not get optimal value for 2nd sample, because after updating the weight just once we will move towards the 3rd sample? So where are the minimum 1000 steps we take to find optimal weights? Q2) Is Epochs are the steps we use to reduce gradient? Let's say we have 1000 samples with 2 feature, and we set the batch size as 500 then in 1 Epoch there will 2 iteration or 2 times weight will be updated, so we will calculate average SSR for first 500 samples then update its weight just once (similar to above scenario we wouldn't able to find optimal value) and move to the second batch of 500 then done the same thing. Now if we had set epochs = 20 then we will do the above steps 20 times ? In question 2 instead of updating weight after each sample, we are using batches but still the 1000 (or fewer) steps to find optimal weights are missing. Q3) What if we take batch size as 1000 the weights will update after each epoch, and it will be similar to the example taken in the above video. So the epochs and steps should be equal in this scenario, right?
@statquest Жыл бұрын
1) If you are only going to look at one sample at a time, you typically look at one and calculate the SSR and update the weights and biases, then look at the next and update then look at the 3rd and update and repeat. You stop when you hit the maximum number of steps or the changes to the weights and biases are small. 2) You would run 500 samples through and calculate the SSR and then update. Then run the next 500 samples through and calculate the SSR and update. That's one epoch. Then repeat until you hit the maximum number of steps or the changes to the weights and biases are small. 3) Yes.
@akshayrameshwar4869 Жыл бұрын
@@statquest thank you
@justinwhite27253 жыл бұрын
Thank you for the animation at the end. I've been purplexed why my network with 2 hidden layers seems to find a 'happy medium' where all the outputs are muddled values around the median. Your graph showed this is normal. I suspect this means I need more time to train the deeper layers towards the inputs.
@statquest3 жыл бұрын
Good luck! :)
@Luxcium9 ай бұрын
This was pretty cool to watch 😅but the more I watch the less remains 😢 which is so sad given how JS is so busy with the rest of his life and stuff 😮 I don’t know if he will ever have enough time to make videos awesome again MVAA
@statquest9 ай бұрын
:)
@durrotunnashihin54802 жыл бұрын
Question please: Why we use backpropagation and forward propagation in training process, while only forward propagation could possibly find the optimal parameters? Is it faster than only forward propagation? Or any other reason? Anw, I watched a lot of videos from your channel, it is very interactive and make the complexity simpler.. thank you for the effort :)))
@statquest2 жыл бұрын
I'm not sure I understand your question. Why do you think it is possible to find the optimal parameters using only forward propagation? I'm pretty sure that would be impossible, but maybe you know something I do not.
@durrotunnashihin54802 жыл бұрын
@@statquest let's say we train the data with forward propagation. At iteration 500, we get all the gradients of the weight < 0.001. Is this not possible? of course, in practice, it could be difficult to get the optimal parameters with a few training data only
@statquest2 жыл бұрын
@@durrotunnashihin5480 Again, I've never heard of anyone training a neural network only using forward propagation. Can you provide me a link to a reference where this is done?
@BuzaBuza Жыл бұрын
Thanks sir for this through excellent tutorials. I only have one doubt. at 10:09 you said it doesn't matter which derivative we start with. But from my understanding, it will be more effeictive if we started by the output layer and went back. as we always use the gradients from later layers in former layers. And hence the name backpropagation. Am I correct?
@statquest Жыл бұрын
When we are solving for the derivative equations, we start from the output layer and work our way back, via the chain rule, to the parameters we want to optimize. This is where the name "backpropagation" comes from.
@ouche714 жыл бұрын
I really liked this series, but I still feel confused what happens when we have more than one feature. Keep the great work!
@statquest4 жыл бұрын
We'll get there soon enough.
@ArifDolanGame4 жыл бұрын
glad I found this channel! very helpful 👍
@statquest4 жыл бұрын
Glad to hear it!
@mahdimohammadalipour30772 жыл бұрын
Thank you for this wonderful series. I came up with a Q in my mind. In the software and packages in order to calculate gradient they easily apply numerical differentiation by changing very little a parameter and measuring its effect on the loss function change and then calculate derivative with respect to that parameter by easily dividing those changes. am I thinking in a right way ?
@statquest2 жыл бұрын
To be honest, I don't know how automatic differentiation works, but that is what most packages (like Tensors) use.