This is showing that the quality and value of a video is not depending on how fancy the animations are, but how expert and pedagogue the speaker is. Really brilliant! I assume you spent a lot of time designing that course, so thank you for this!
@ritvikmath4 жыл бұрын
Wow, thanks!
@backstroke08102 жыл бұрын
Totally agree. I learn a lot from his short videos. Precise, concise, enough math, enough ludic examples. True professor mind.
@tzu-chunchen5139 Жыл бұрын
This is the best explanation of Ridge regression that I have ever heard! Fantastic! Hats off!
@rez_daddy4 жыл бұрын
"Now that we understand the REASON we're doing this, let's get into the math." The world would be a better place if more abstract math concepts were approached this way, thank you.
@garbour4563 жыл бұрын
good point
@GreenEyesVids4 жыл бұрын
Watched these 5 years ago to understand the concept and I passed an exam. Coming back to it now to refresh my memory, still very well explained!
@ritvikmath4 жыл бұрын
Nice! Happy to help!
@nadekang81985 жыл бұрын
This is awesome! Lots of machine learning books or online courses don't bother explaining the reason behind Ridge regression, you helped me a lot by pulling out the algebraic and linear algebra proofs to show the reason WHY IT IS THIS! Thanks!
@siddharthkshirsagar25455 жыл бұрын
I was searching for ridge regression on the whole internet and stumbled upon this is a video which is by far the best explanation you can find anywhere thanks.
@zgbjnnw93062 жыл бұрын
It's so inspiring to see how you get rid of the c^2! I learned Ridge but didn't know why! Thank you for making this video!
@taareshtaneja75236 жыл бұрын
This is, by far, the best explanation of Ridge Regression that I could find on KZbin. Thanks a lot!
@BhuvaneshSrivastava5 жыл бұрын
Your data science videos are the best I have seen on KZbin till now. :) Waiting to see more
@ritvikmath5 жыл бұрын
I appreciate it!
@RobertWF42 Жыл бұрын
Excellent video! One more thing to add - if you're primarily interested in causal inference, like estimating the effect of daily exercise on blood pressure while controlling for other variables, then you want an unbiased estimate of the exercise coefficient and standard OLS is appropriate. If you're more interested in minimizing error on blood pressure predictions and aren't concerned with coefficients, then ridge regression is better. Also left out is how we choose the optimal value of lambda by using cross-validation on a selection of lambda values (don't think there's a closed form expression for solving for lambda, correct me if I'm wrong).
@bettychiu73755 жыл бұрын
This really helps me! Definitely the best ridge and lasso regression explanation videos on KZbin. Thanks for sharing! :D
@q0x9 жыл бұрын
I think its explained very fast, but still very clear, for my level of understanding its just perfect !
@alecvan71435 жыл бұрын
Amazing video, you really explained why we do things which is what really helps me!
@TahaMVP6 жыл бұрын
best explanation of any topic i've ever watched , respect to you sir
@murraystaff5688 жыл бұрын
Brilliant! Just found your channel and can't wait to watch them all!!!
@SarahPourmolamohammadi2 жыл бұрын
You are the best of all.... you explained all the things,,, so nobody is gonna have problems understanding them.
@yxs84958 жыл бұрын
This really is gold, amazing!
@nikunjgattani9993 жыл бұрын
Thanks a lot.. I watched many videos and read blogs before this but none of them clarified at this depth
@Lisa-bp3ec7 жыл бұрын
Thank you soooo much!!! You explain everything so clear!! and there is no way I couldn't understand!
@mikeperez42223 жыл бұрын
Anyone else get anxiety when he wrote with the marker?? Just me? Felt like he was going to run out of space 😂 Thank you so much thoo, very helpful :)
@surajshivakumar51243 жыл бұрын
This is literally the best video on ridge regression
@akino.31927 жыл бұрын
You, Ritvik, are simply amazing. Thank you!
@soudipsanyal6 жыл бұрын
Superb. Thanks for such a concise video. It saved a lot of time for me. Also, subject was discussed in a fluent manner and it was clearly understandable.
@teegnas4 жыл бұрын
These explanations are by far the best ones I have seen so far on youtube ... would really love to watch more videos on the intuitions behind more complicated regression models
@abhichels18 жыл бұрын
This is gold. Thank you so much!
@cu76957 жыл бұрын
I subscribed just after watching this. Great foundation for ML basics
@theoharischaritidis41737 жыл бұрын
This really helped a lot. A big thanks to you Ritvik!
@babakparvizi24256 жыл бұрын
Fantastic! It's like getting the Cliff's Notes for Machine Learning. These videos are a great supplement/refresher for concepts I need to knock the rust off of. I think he takes about 4 shots of espresso before each recording though :)
@aDifferentHandle6 жыл бұрын
The best ridge regression lecture ever.
@Viewfrommassada5 жыл бұрын
I'm impressed by your explanation. Great job
@ritvikmath5 жыл бұрын
Thanks! That means a lot
@ethanxia12889 жыл бұрын
Excellent explanation! Could you please do a similar video for Elastic-net?
@nickb68118 жыл бұрын
So so so very helpful! Thanks so much for this genuinely insightful explanation.
@charlesity4 жыл бұрын
Stunning! Absolute gold!
@wi8shad0w4 жыл бұрын
seriously!!!
@mortezaabdipour55846 жыл бұрын
It's just awesome. Thanks for this amazing explanation. Settled in mind forever.
@Krishna-me8ly9 жыл бұрын
Very good explanation in an easy way!
@yanlinwang57033 жыл бұрын
The explanation is so clear!! Thank you so much!!
@jhhh06199 жыл бұрын
Your explanation is extremely good!
@mohamedgaal53402 жыл бұрын
I was looking for the math behind the algorithm. Thank you for explaining it.
@ritvikmath2 жыл бұрын
No problem!
@OmerBoehm2 жыл бұрын
Brilliant simplification of this topic. No need for fancy presentation to explain the essence of an idea!!
@aarshsachdeva57857 жыл бұрын
You should add in that all the variables (dependent and independent) need to be normalized prior to doing a ridge regression. This is because betas can vary in regular OLS depending on the scale of the predictors and a ridge regression would penalize those predictors that must take on a large beta due to the scale of the predictor itself. Once you normalize the variables, your A^t*A matrix being a correlation matrix of the predictors. The regression is called "ridge" regression because you add (lambda*I + A^t*A ) which is adding the lambda value to the diagonal of the correlation matrix, which is like a ridge. Great video overall though to start understanding this regression.
@vishnu2avv7 жыл бұрын
Awesome, Thanks a Million for great video! Searching you have done video on LASSO regression :-)
@canernm4 жыл бұрын
Hi and thanks fr the video. Can you explain briefly why when the m_i and t_i variables are highly correlated , then the estimators β0 and β1 are going to have very big variance? Thanks a lot in advance!
@lanag8732 жыл бұрын
Hi same question here😶🌫
@e555t66 Жыл бұрын
I don't have money to pay him so leaving a comment instead for the algo. He is the best.
@sanketchavan87 жыл бұрын
best explanation on ridge reg. so far
@abhijeetsingh50498 жыл бұрын
Stunning!! Need more access to your coursework
@xwcao19913 жыл бұрын
Thank you. I make the comment because I know I will never need to watch it again! Clearly explained..
@ritvikmath3 жыл бұрын
Glad it was helpful!
@wi8shad0w4 жыл бұрын
THIS IS ONE HELL OF A VIDEO !!!!
@intom16396 жыл бұрын
Brilliant! Could you make more videos about Cross validation, RIC, BIC, and model selection.
@SUBHRASANKHADEY6 жыл бұрын
Shouldn't the radius of the Circle be c instead of c^2 (at time around 7:00)?
@RLDacademyGATEeceAndAdvanced2 жыл бұрын
Excellent approach to discuss Lasso and Ridge regression. It could have been better if you have discussed how Lasso yields sparse solutions! Anyway, nice discussion.
@llmstrАй бұрын
in your drawing at 9:00, if the level curves is reverted back to 3D graph, is the axis that going up the loss function? just want to clarify. thanks!
@Thaifunn19 жыл бұрын
excellent video! Keep up the great work!
@nicolasmanelli73932 жыл бұрын
I think it's the best video ever made
@HeduAI7 жыл бұрын
I would trade diamonds for this explanation (well, allegorically! :) ) Thank you!!
@Sytch6 жыл бұрын
Finally, someone who talks quickly.
@prabhuthomas87706 жыл бұрын
SUPER !!! You have to become a professor and replace all those other ones !!
@LossAndWaste6 жыл бұрын
you are the man, keep doing what you're doing
@youyangcao38378 жыл бұрын
great video, the explanation is really clear!
@JC-dl1qr7 жыл бұрын
great video, brief and clear.
@sachinrathi78143 жыл бұрын
Can anyone explain the statement "The efficient property of any estimator says that the estimator is the minimum variance unbiased estimator", so what is minimum variance denotes here.
@sagarsitap35405 жыл бұрын
Thanks! why lamba cannot be negative? What if to improve variance it is need to increase the slope and not decrease?
@tsrevo17 жыл бұрын
Sir, a question about 4:54: I understand that in tax/income example the VARIANCE of the beta0-beta1's is high, since there's an additional beta2 effecting things. However, the MEAN in the population should be the same, even with high variance, isn't it so? Thanks in advance!
@shiva60167 жыл бұрын
simple and effective video, thank you!
@nickwagner51736 жыл бұрын
We start out by adding a constraint that beta 1 squared + beta 2 squared must be less than c squared, where c is some number we choose. But then after choosing lamda, we minimize F and c ends up having no effect at all on our choice of the betas. I may be wrong but it doesn't seem like c has any effect on our choice of lambda either. I find it strange that we start out with the criteria that beta 1 squared + beta 2 squared must be less than c squared, but the choice of c is irrelevant. If someone can help me un-boggle my mind that would be great.
@RobertWF42 Жыл бұрын
Good question - I think it has to do with using the method of Lagrange multipliers to solve the constrained OLS optimization problem. The lambda gets multiplied by the expression in the parentheses at 11:17, which includes the c squared term. So whatever c squared value you choose, it's going to be changed anyways when you multiply by the lambda.
@kamesh78186 жыл бұрын
Excellent explanation, thanks!
@adityakothari1937 жыл бұрын
Excellent explanation .
@sasanosia65586 жыл бұрын
Amazingly helpful. Thank you.
@prateekcaire4193 Жыл бұрын
It is unintuitive that we are constraining weights(betas) within value c^2, yet the regularization expression does not include the c but rather sum of squared weights. Certainly I am missing something here. Alternatively, why adding a sum of squared betas(or weights) to the cost function help optimize beta that stays within constraint so that betas don't become large and vary across datasets?
@kartikkamboj2955 жыл бұрын
Dude ! Hats off 🙏🏻
@tamoghnamaitra99017 жыл бұрын
Beautiful explanation
@myazdani29977 жыл бұрын
I love this video, really informative! Thanks a lot
@Theateist6 жыл бұрын
Is the reason to not choose big LAMBDA because we maight get underfitting? If we choose big LAMBDA we get small W and then the output function (hypothesis) won’t reflect our data and we might see underfitting.
@faeritaaf7 жыл бұрын
Thank you! Your explaining is really good, Sir. Do you have time to make a video explaining the adaptive lasso too?
@vinceb80414 жыл бұрын
Can anyone help me understanding the effects of multicollinearity? I understand that the estimators will be highly variable, but why would they be very large?
@benxneo4 жыл бұрын
thats actually an interesting question, have you found an explanation to this? I seem to only be able to say that regression depends on variables to be independent on each other, and multicolinearity makes it sensitive to small changes. But why is it that coefficients are larger I cant seem to understand.
@adarshnamdev58344 жыл бұрын
@ritvik when you said that the estimated coefficients has small variance does that implies the tendency of obtaining different estimate values of those coefficients ? I tend to confuse this term 'variance ' with the statistic Variance (spread of the data!).
@benxneo4 жыл бұрын
Variance is the change in prediction accuracy of ML model between training data and test data. Simply what it means is that if a ML model is predicting with an accuracy of "x" on training data and its prediction accuracy on test data is "y" Variance = x - y A smaller variance would thus mean the model is fitting less noise on the training data, reducing overfitting. this definition was taken from: datascience.stackexchange.com/questions/37345/what-is-the-meaning-of-term-variance-in-machine-learning-model Hope this helps.
@adarshnamdev58344 жыл бұрын
@@benxneo thanks mate!
@zw74533 жыл бұрын
best explanation ever!
@ronithsinha57026 жыл бұрын
Can someone explain why does Ridge regression leads to shrinkage of co-efficients but not entirely zero co-efficients, whereas Lasso causes some co-efficients to become zero entirely.
@mnwepple9 жыл бұрын
Awesome video! Very intuitive and easy to understand. Are you going to make a video using the probit link?
@abeaumont106 жыл бұрын
Great videos thanks for making it
@meysamsojoudi39473 жыл бұрын
It is a brilliant video. Great
@brendachirata22836 жыл бұрын
hey, great video and excellent job
@eDogBomb7 жыл бұрын
What is the intuition behind putting the constraint on the size of the Beta coefficient rather than the standard errors of the Beta coefficient?
@Hazit908 жыл бұрын
excellent video, thanks.
@TURBOKNUL6668 жыл бұрын
great video! thank you very much.
@kxdy8yg86 жыл бұрын
This is gold indeed!
@msloryg6 жыл бұрын
Thanks for this really helpful video! Could you explain why the independent variables in A should be standardized for Ridge and Lasso Regression?
@dimar41506 жыл бұрын
Can someone explain what the level curves mean?
@carlitors6 жыл бұрын
Google contour plots, not something that can be easily explained on a video. Usually taught in first few lessons of Calculus III.
@zhongshanhu73769 жыл бұрын
very good explanation in an easy way!!
@hunarahmad7 жыл бұрын
thanks for the nice explanation
@SiDanil7 жыл бұрын
what the "level curve" means?
@xiaoguangzhao347 жыл бұрын
awesome video, thank you very much!
@zehuilin87834 жыл бұрын
Hey Ritvik, I have a question about this one, I don't really know why we are choosing the point that is far from the origin point. So which direction does the gradient descent and why? Please help me out here, thank you so much!
@dorukhansergin98318 жыл бұрын
Thank You, Great Video! Just a possbile correction, at 6:35 shouldn't the radius be c instead of c^2?
@volintine8 жыл бұрын
Dorukhan Sergin probably late, but no equation of circle is x^2 + y^2 = r^2
@kevinwong80204 жыл бұрын
I was taught that the name Ridge Regression comes from the lambda I matrix. It looks like a ridged staircase shape.
@ibrahimkarabayir89639 жыл бұрын
nice video , I have a question: lambda depends on c, isnt it?
@JuPeggy7 жыл бұрын
excellent video! thank you!
@jakobforslin63012 жыл бұрын
You are awesome!
@ibrahimkarabayir89639 жыл бұрын
and , is c a value that minimizes VIF value?
@justinm13077 жыл бұрын
this is great stuff
@nickwagner51736 жыл бұрын
also, why can't we solve for lamda and the betas by taking partial derivatives and setting each equation to zero and then solve the system of equations?