Regularization Part 2: Lasso (L1) Regression

Рет қаралды 575,433

Күн бұрын

Пікірлер: 652

@statquest 2 жыл бұрын

If you want to see why Lasso can set parameters to 0 and Ridge can not, check out: kzbin.info/www/bejne/jp6VdJKdiaafbsU Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@JeanOfmArc 3 ай бұрын

(Possible) Fact: 78% of people who understand statistics and machine learning attribute their comprehension to StatQuest.

@statquest 3 ай бұрын

bam! :)

@sethmichael6855 13 күн бұрын

@@statquest Double Bam !!

@Phobos11 6 жыл бұрын

Good video, but didn't really explain how LASSO gets to make a variable zero. What's the difference between squaring a term and using the absolute value for that?

@statquest 6 жыл бұрын

Intuitively, the closer slope gets to zero, the square of that number becomes insignificant compared to the increase in the sum of the squared error. In other words, the smaller you slope, the square gets asymptotically close to 0 because it can't outweigh the increase in the sum of squared error. In contrast, the absolute value adds a fixed amount to the regularization penalty and can overcome the increase in the sum of squared error.

@statquest 5 жыл бұрын

@@theethatanuraksoontorn2517 Maybe this discussion on stack-exchange will clear things up for you: stats.stackexchange.com/questions/151954/sparsity-in-lasso-and-advantage-over-ridge-statistical-learning

@programminginterviewprep1808 5 жыл бұрын

@@statquest Thanks for reading the comments and responding!

@statquest 5 жыл бұрын

@@programminginterviewprep1808 I'm glad to help. :)

@Phobos11 5 жыл бұрын

@@statquest I didn't reply before, but the answer really helped me a lot, with basic machine learning and now artificial neural networks, thank you very much for the videos and the replies :D

@anuradhadas8795 4 жыл бұрын

The difference between BAM??? and BAM!!! is hilarious!!

@statquest 4 жыл бұрын

@SaiSrikarDabbukottu 11 ай бұрын

@@statquestCan you please explain how the irrelevant parameters "shrink"? How does Lasso go to zero when Ridge doesn't?

@statquest 11 ай бұрын

@@SaiSrikarDabbukottu I show how it all works in this video: kzbin.info/www/bejne/jp6VdJKdiaafbsU

@citypunter1413 6 жыл бұрын

One of the best explanation of Ridge and Lasso regression I have seen till date... Keep up the good work....Kudos !!!

@statquest 6 жыл бұрын

Thanks! :)

@marisa4942 2 жыл бұрын

I am eternally grateful to you and those videos!! Really saves me time in preparing for exams!!

@statquest 2 жыл бұрын

Happy to help!

@perrygogas 5 жыл бұрын

Some video ideas to better explain the following topics: 1. Monte Carlo experiments 2. Bootstrapping 3. Kernel functions in ML 4. Why ML is black box

@statquest 5 жыл бұрын

OK. I'll add those to the to-do list. The more people that ask for them, the more I'll priority they will get.

@perrygogas 5 жыл бұрын

@@statquest That is great! keep up the great work!

@gauravms6681 5 жыл бұрын

@@statquest yes we need it please do plsssssssssssssssssssssssssssssssss plsssssssssssssssssssssssssssssssssssssssssssssss

@InfinitesimallyInfinite 5 жыл бұрын

Bootstrapping is explained well in Random Forest video.

@miguelsaravia8086 5 жыл бұрын

Do it for us... thanks good stuff

@chrisg0901 5 жыл бұрын

Don't think your Monty Python reference went unnoticed (Terrific and very helpful video, as always)

@statquest 5 жыл бұрын

Thanks so much!!! :)

@ajha100 4 жыл бұрын

Oh it absolutely did. And it was much loved!

@patrickwu5837 4 жыл бұрын

That "Bam???" cracks me up. Thanks for your work!

@statquest 4 жыл бұрын

@arpitqw1 5 жыл бұрын

why can't ridge reduce weight/parameter to 0 like lasso?

@hughsignoriello 2 жыл бұрын

Love how you keep these videos introductory and don't go into the heavy math right away to confuse; Love the series!

@statquest 2 жыл бұрын

Thank you!

@admw3436 6 жыл бұрын

My teacher is 75 years old, explained us Lasso during one hour , without explaining it. But this is a war I can win :), thanks to your efforts.

@statquest 6 жыл бұрын

I love it!!! Glad my video is helpful! :) p.s. I got the joke too. Nice! ;)

@ak-ot2wn 4 жыл бұрын

Why is this scenario many times the reality? Also, I check StatQuest's vids very often to really understand the things. Thanks @StatQuest

@qiaomuzheng5800 2 жыл бұрын

Hi, I can't thank you enough for explaining the core concepts in such short amount of time. Your videos help a lot! My appreciations are beyond words.

@statquest 2 жыл бұрын

Thank you!

@alecvan7143 4 жыл бұрын

The beginning songs are always amazing hahaha!!

@statquest 4 жыл бұрын

Awesome! :)

@quahntasy 4 жыл бұрын

*Who else is here in 2020 and from India* BAM?

@statquest 4 жыл бұрын

@luisakrawczyk8319 5 жыл бұрын

How do Ridge or Lasso know which variables are useless? Will they not also shrink the parameter of important variables ?

@suriahselvam9066 5 жыл бұрын

I am also looking for the answer to this. I'm just using my intuition here, but here's what I think. The least important variables have terrible predictive value so the residuals along these dimensions are the highest. If we create a penalty for introducing these variables (especially with a large lambda that outweighs/similar in magnitude to the size of these residuals squared), decrease in coefficient of these "bad predictors" will cause comparatively smaller increase in residuals compared to the decrease in penalty due to the randomness of these predictors. In contrast, the penalty for "good predictors" (which are less random) will cause significant change in residuals as we decrease its coefficients. This would probably mean that these coefficients would have to undergo smaller change to account for the larger increase in residuals. This is why the minimisation will reduce the coefficients of "bad predictors" faster than "good predictors. I take this case would be especially true when cross-validating.

@orilong 5 жыл бұрын

if you draw the curves of y=x and y=x^2, you will find the gradient will vanish for y=x^2 near origin point, hence very hard to be decreased to zero if using optimizing approach like SGD.

@sanyuktasuman4993 4 жыл бұрын

Your intro songs reminds me of Pheobe from the TV show "Friends", and the songs are amazing for starting the videos on a good note, cheers!

@statquest 4 жыл бұрын

You should really check out the intro song for this StatQuest: kzbin.info/www/bejne/emHIl3t7f9iZftE

@jasonyimc 4 жыл бұрын

So easy to understand. And I like the double BAM!!!

@statquest 4 жыл бұрын

Thanks!

@clementbourgade2487 3 жыл бұрын

NOBODY IS GOING TO TALK ABOUT THE EUROPEAN / AFRICAN SWALLOW REFERENCE ????are you all dummies or something ? It made my day. Plus, video on top, congratulation. BAMM !

@statquest 3 жыл бұрын

bam!

@arpiharutyunyan8400 4 жыл бұрын

I think I'm in love with you ^_^

@statquest 4 жыл бұрын

@shyamparmar983 5 жыл бұрын

I am sorry but I'm not able to figure out why (regardless of the approach - ridge and lasso), the 'good' parameters 'slope' and 'diet difference' will behave differently than the other two silly ones. I don't understand this since you are applying the same 'lambda' and absolute value for all 4 parameters. It'd be really kind of you to clear my silly doubt. Thanks!

@tiborcamargo5732 6 жыл бұрын

That Monty Python reference though... good video btw :)

@statquest 6 жыл бұрын

Ha! I'm glad you like the video. ;)

@abelgeorge4953 9 ай бұрын

Thank you for clarifying that the Swallow can be African or European

@statquest 9 ай бұрын

bam! :)

@lavasrani3887 10 ай бұрын

Really love your videos!!!! But your songs are more of like Pheobe's songs lol. They are fun to listen to

@statquest 10 ай бұрын

Ha, you should definitely check out this song: kzbin.info/www/bejne/emHIl3t7f9iZftE

@lavasrani3887 10 ай бұрын

Pee poooo! BAM! Double BAM!

@khanhtruong3254 5 жыл бұрын

Hi. Your videos are so helpful. I really appreciate you spend time doing them. I have one question related to this video: Is the result of Lasso Regression sensitive to the unit of variables? For example in the model: size of mice = B0 + B1*weight + B2*High Fat Diet + B3*Sign + B4*AirSpeed + epsilon Suppose the original unit of weight in the data is gram. If we divide the weight by 1,000 to get unit in kilogram, is the Lasso Regression different? As I understand, the least square estimated B1-kilogram should be 1,000 times higher than the B1-gram. Therefore, B1-kilogram is more likely to be vanished in Lasso, isn't?

@ファティン-z2v Ай бұрын

Very very well-explained video, easy way to gain knowledge on the matters that would otherwise looks complicated and takes long to understand if reading them from textbook, I never used ridge or lasso regression, just stumble upon the terms and got curious, but now I fell like I might have gotten a valuable data analysis knowledge that I potentially use in the future

@statquest Ай бұрын

Glad it was helpful!

@abdulazizalhaidari7665 4 ай бұрын

Great work, Thank you Josh, I'm trying to connect ideas from different perspectives/angles, Does the lambda here somehow related to Lagrange multiplier ?

@statquest 4 ай бұрын

I'm not sure.

@gonzaloferreirovolpi1237 5 жыл бұрын

Hi man, really LOVE your videos. Right now I'm studying Data Science and Machine Learning and more often than not your videos are the light at the end of the tunnel, sot thanks!

@terrencesatterfield9610 Ай бұрын

Wow. This new understanding just slammed into me. Great job. Thank you.

@statquest Ай бұрын

Glad it was helpful!

@arnobchowdhury3191 4 жыл бұрын

Don't worry if your video doesn't get Million views... It's just there are a lot less than million smart people on the planet, and among those lot, even fewer people are into Machine learning and statistics. Just keep making better and better tutorials.

@statquest 4 жыл бұрын

Thank you very much! I'll do my best. :)

@ryanzhao3502 5 жыл бұрын

Thx very much. Clear explanation for these similar models. Great video I will conserve forever

@IrishLam 11 ай бұрын

at the end of the last video [regularization part 1: ridge (l2) regression], you mentioned to solve the problem that how to estimate 10000 parameters with 500 samples and will talk about it in the next one, and after finishing this video I was still wondering how to deal with it...🤣🤣 am i looking at these videos in the wrong order or what?

@statquest 11 ай бұрын

You have the correct order. Unfortunately, all I have had time to do is provide a general intuition on how cross validation is used to find an optimal line, even when we don't have enough data.

@pratiknabriya5506 4 жыл бұрын

A StatQuest a day, keeps Stat fear away!

@statquest 4 жыл бұрын

I love it! :)

@simrankalra4029 5 жыл бұрын

Thankyou Sir ! Great Help.

@pomegranate8593 Жыл бұрын

me: wathcing these videos in full panic video: plays calming music me: :)

@statquest Жыл бұрын

bam! Good luck! :)

@kitkitmessi 2 жыл бұрын

Airspeed of swallow lol. These videos are really helping me a ton, very simply explained and entertaining as well!

@statquest 2 жыл бұрын

Glad you like them!

@curious_yang 4 жыл бұрын

On top of a like, I would like to give you a TRIPLE BAM!!!

@statquest 4 жыл бұрын

Thank you! :)

@RenoyZachariah 2 жыл бұрын

Amazing explanation. Loved the Monty Python reference :D

@statquest 2 жыл бұрын

@rezaroshanpour971 9 ай бұрын

Great....please continue to learn other models...thank you so much.

@statquest 9 ай бұрын

Thanks!

@ainiaini4426 2 жыл бұрын

Hahaha.. That moment you said BAM??? I laughed out loud 🤣🤣🤣

@statquest 2 жыл бұрын

@AnaVitoriaRodriguesLima 4 жыл бұрын

Thanks for posting, my new favourite youtube channel absolutely !!!!

@statquest 4 жыл бұрын

Wow, thanks!

@hareshsuppiah9899 4 жыл бұрын

Statquest is like Marshall Eriksen from HIMYM teaching us stats. BAM? Awesome work Josh.

@statquest 4 жыл бұрын

Thanks!

@ajha100 4 жыл бұрын

I really appreciated the inclusion of swallow airspeed as a variable above and beyond the clear-cut explanation. Thanks Josh. ;-)

@statquest 4 жыл бұрын

@petrsomol 3 жыл бұрын

Me too!

@AdamHetherwick 6 ай бұрын

I caught that Monty Python reference haha :) african or european??

@statquest 6 ай бұрын

BAM! :)

@hsinchen4403 4 жыл бұрын

Thank you so much for the video ! I have watched several your videos and I prefer to watch your video first then see the real math formula. When I did that, the formula became so easier and understandable! For instance, I don't even know what does 'norm' is, but after watching your video then it would be very easy to understand!

@statquest 4 жыл бұрын

Awesome! I'm glad the videos are helpful. :)

@Jenna-iu2lx 2 жыл бұрын

I am so happy to easily understand these methods after only a few minutes (after spending so many hours studying without really understanding what it was about). Thank you so much, your videos are increadibly helpful! 💯☺

@statquest 2 жыл бұрын

Great to hear!

@rishatdilmurat8913 Жыл бұрын

Vey nice explanations, it is better than UDEMY!

@statquest Жыл бұрын

Thanks a lot!

@mrknarf4438 4 жыл бұрын

Great video, clear explanation, loved the Swallows reference! Keep it up! :)

@statquest 4 жыл бұрын

Awesome, thank you!

@emmanueluche3262 Жыл бұрын

Wow! so easy to understand this! Thanks very much!

@statquest Жыл бұрын

Thanks!

@Jan-oj2gn 5 жыл бұрын

This channel is pure gold. This would have saved me hours of internet search... Keep up the good work!

@statquest 5 жыл бұрын

Thank you! :)

@zebralemon Жыл бұрын

I enjoy the content and your jam so much! '~Stat Quest~~'

@statquest Жыл бұрын

Thanks!

@privatelabel3839 4 жыл бұрын

How do you know if a model is overfitting or not? I remember we could use Cross-Validation and compare the training error to the test error. But does it mean it's overfitting if the test error is higher than training?

@statquest 4 жыл бұрын

Yes, however, if the test error is only a little worse than training error, it's not a big deal.

@privatelabel3839 4 жыл бұрын

@@statquest Great. thanks!

@alexei.domorev Жыл бұрын

Josh - as always your videos are brilliant in their simplicity! Please keep up your good work!

@statquest Жыл бұрын

Thanks, will do!

@walkerbutin5171 10 ай бұрын

But what if there are two swallows carrying it together?

@statquest 10 ай бұрын

double bam! :)

@manishsharma2211 4 жыл бұрын

Is this L1 regularisation ? If not. Could you please say which is L1 and L2

@statquest 4 жыл бұрын

Ridge = L2, Lasso = L1

@manishsharma2211 4 жыл бұрын

@@statquest Thanks mahn Baaamm😛♥️

@Endocrin-PatientCom 5 жыл бұрын

Incredible great explanations of regularization methods, thanks a lot.

@statquest 5 жыл бұрын

Thanks! :)

@arthurus8374 2 жыл бұрын

so incredible, so well explained

@statquest 2 жыл бұрын

Thanks!

@joaocasas4 Жыл бұрын

Me and my friend are studying. When the first BAM came, we fell for laught for about 5min. Then the DOUBLE BAM would cause a catrastofic laughter if we didn't stop it . I want you to be my professor please!

@statquest Жыл бұрын

BAM! :)

@stefanomauceri 6 жыл бұрын

I prefer the intro where is firmly claimed that StatQuest is bad to the bone. And yes I think this is fundamental.

@statquest 6 жыл бұрын

That’s one of my favorite intros too! :)

@statquest 6 жыл бұрын

But I think my all time favorite is the one for LDA.

@stefanomauceri 6 жыл бұрын

Yes I agree! Together these two could be the StatQuest manifesto summarising what people think about stats!

@statquest 6 жыл бұрын

So true!

@lanchen5034 5 жыл бұрын

Thanks very much for this video, it really helps me with the concept of the Ridge Regression and the Lasso Regression. I have a silly question: why the parameter in the Ridge Regression cannot shrink to zero but in Lasso, they can?

@statquest 5 жыл бұрын

That's not a silly question at all, and there are lots of websites that dive into that answer. I'd just do a google search and you should find what you're looking for.

@jordanhe5852 2 жыл бұрын

this also make me muddle

@rakeshk6799 Жыл бұрын

Is there a more detailed explanation as to how some feature weights become zero in the case of Lasso, and why that cannot happen in Ridge? Thanks.

@statquest Жыл бұрын

Yes, see: kzbin.info/www/bejne/jp6VdJKdiaafbsU

@rakeshk6799 Жыл бұрын

@@statquest Thanks! I watched the video, but I am still not sure why there is a kink in the case of Lasso. What exactly creates that kink?

@statquest Жыл бұрын

@@rakeshk6799 The absolute value function.

@hanadiam8910 2 жыл бұрын

Million BAM for this channel 🎉🎉🎉

@statquest 2 жыл бұрын

Thank you!

@adwindtf 4 жыл бұрын

love your videos.... extremely helpful and cristal clear explained.... but your songs..... let's say you have a very promising career as a statistician... no question

@statquest 4 жыл бұрын

;)

@theuser810 Жыл бұрын

6:06 lol was that a Monty Python reference?

@statquest Жыл бұрын

Totes!

@arnobchowdhury3191 4 жыл бұрын

L1 regularization for more nitty-gitty

@statquest 4 жыл бұрын

Yes. :)

@cloud-tutorials 5 жыл бұрын

One more use case of Ridge/Lasso regression is 1) When data points are less 2) High Multicollinearity between variables

@SieolaPeter Жыл бұрын

Finally, I found 'The One'!

@statquest Жыл бұрын

@TM-do8ip 2 жыл бұрын

6։05 Monty Python reference

@statquest 2 жыл бұрын

Yep! :)

@2210duynn 4 жыл бұрын

Very good video. You help me alot !!!!

@statquest 4 жыл бұрын

Thanks! :)

@add6911 Жыл бұрын

Excelent video Josh! Amazing way to explain Statistics Thank you so much! Regards from Querétaro, México

@statquest Жыл бұрын

Muchas gracias! :)

@Azureandfabricmastery 4 жыл бұрын

Hi Josh, Thanks for clear explanation on regularization techniques. very exciting. God bless for efforts.

@statquest 4 жыл бұрын

Glad you enjoyed it!

@xichuzhang4839 2 жыл бұрын

I thought he's gonna sing through the video.

@statquest 2 жыл бұрын

@ginofranciscocordova3546 2 жыл бұрын

BAMMMM!!!!!!!!!!!!!!!!!!!!!!!! Extremely useful

@statquest 2 жыл бұрын

Thank you!

@gdivadnosdivad6185 11 ай бұрын

You are the best! I understand it now!

@statquest 11 ай бұрын

Thanks!

@Unremarkabler 3 жыл бұрын

BAM! your singing sounds seriously good!

@statquest 3 жыл бұрын

@raymilan2301 4 жыл бұрын

Thanks a lot for the explanation !!!

@statquest 4 жыл бұрын

You are welcome!

@RAJIBLOCHANDAS 2 жыл бұрын

Nice explanation!

@statquest 2 жыл бұрын

Thanks!

@anujsaboo7081 4 жыл бұрын

Great video, one doubt. Since you say Lasso Regression can exclude useless variables from the model, can it assist for variable(or feature) selection which I currently do in Linear Regression using the p value?

@statquest 4 жыл бұрын

Yes! One of the things that Lasso Regression does well is help identify the optimal subset of variables that you should use in your model.

@MrArunavadatta 4 жыл бұрын

wonderfully explained

@statquest 4 жыл бұрын

Thank you! :)

@yuzaR-Data-Science 5 жыл бұрын

Thanks a lot! Amazing explanation! Please, continue the great work and add more on statistics, probability in general and machine learning in particular. Sinse Data Science suppose to have a great future, I am certain that your channel also will prosper a great deal!

@statquest 5 жыл бұрын

Thank you! :)

@praveerparmar8157 3 жыл бұрын

Just love the way you say 'BAM?'.....a feeling of hope mixed with optimism, anxiety and doubt 😅

@statquest 3 жыл бұрын

@samarkhan2509 5 жыл бұрын

Nice

@jonesbbq307 4 жыл бұрын

How does it know which variables are useless tho?

@statquest 4 жыл бұрын

If setting a variable's coefficient to 0 doesn't drastically reduce the ability to make good predictions, then that variable is not very useful.

@jonesbbq307 4 жыл бұрын

StatQuest with Josh Starmer And this algorithm automatically does the tests?

@indian-de 3 жыл бұрын

feeling better now….

@statquest 3 жыл бұрын

bam!

@xendu-d9v 2 жыл бұрын

Great people know subtle differences which is not visible to common eyes love you sir

@statquest 2 жыл бұрын

Thanks!

@RussianSUPERHERO 2 жыл бұрын

I came for the quality content, fell in love with the songs and bam.

@statquest 2 жыл бұрын

BAM! :)

@pencenewton438 4 жыл бұрын

Bam!

@statquest 4 жыл бұрын

@pencenewton438 4 жыл бұрын

@@statquest Double bam!!

@rishabhkumar-qs3jb 3 жыл бұрын

Amazing video, explanation is fantastic. I like the song along with the concept :)

@statquest 3 жыл бұрын

Bam! :)

@somakkamos 6 жыл бұрын

hmmm.... i am not sure how using absolute makes the penalty zero and removes the useless variables. pls help.. i went thru similar qestions and your reply in the comments sections..and not sure still

@statquest 6 жыл бұрын

To be honest, if you've already looked at my other comments, I can't help you much. However, check out The Elements of Statistical Learning - free download - web.stanford.edu/~hastie/ElemStatLearn/ Some folks like the explanation there.

@whispers191 2 жыл бұрын

Thank you once again Josh!

@statquest 2 жыл бұрын

bam!

@185283 5 жыл бұрын

Great Video! Do you have any explanation on how Lasso reduces multicollinearity?

@statquest 5 жыл бұрын

To be honest, while I understand why Lasso can make parameters equal to 0 and Ridge regression can't, I'm not sure why one method tends to reduce the parameters estimates for collinear variables as a group and the other method reduces all but one.

@thej1091 5 жыл бұрын

Sensei!

@90fazoti 4 жыл бұрын

excellent thanks for help

@statquest 4 жыл бұрын

Thanks! :)

@lingaoxiao9808 2 жыл бұрын

Come just for the song🤣

@statquest 2 жыл бұрын

bam! :)

@pypypy4228 Жыл бұрын

Man... you are genius...

@statquest Жыл бұрын

Thanks!

@davidmantilla1899 2 жыл бұрын

Best youtube channel

@statquest 2 жыл бұрын

Thank you! :)

@takedananda 4 жыл бұрын

Came here because I didn't understand it at all when my professor lectured about LASSO in my university course... I have a much better understanding now thank you so much!

@statquest 4 жыл бұрын

Awesome!! I'm glad the video was helpful. :)

@tymothylim6550 3 жыл бұрын

Thank you, Josh, for this exciting and educational video! It was really insightful to learn both the superficial difference (i.e. how the coefficients of the predictors are penalized) and the significant difference in terms of application (i.e. some useless predictors may be excluded through Lasso regression)!

@statquest 3 жыл бұрын

Double BAM! :)