Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets

Рет қаралды 93,287

Күн бұрын

For downloadable versions of these lectures, please go to the following link:
www.slideshare....
github.com/Der...
This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.

Пікірлер: 50

@hannanazamkhan 6 жыл бұрын

Probably the best and the easiest to follow explanation of Ridge, LASSO and Elastic Net Regressions. Thanks.

@jnscollier 9 жыл бұрын

By the end of the video, I feel like I've just returned home after climbing a mental mountain. Putting it all together at the end with an example and walking through each model was truly an awesome and insightful experience. I 'feel' I learned a lot. Great teaching approach. Thanks a lot.

@DerekKaneDataScience 9 жыл бұрын

+jnscollier Thanks for the kind words and I'm glad you finished the marathon. A lesser person might not have made it to the peak of Mt. Everest. :0) There is so much theory behind these techniques it can really be overwhelming at first. L1 & L2 penalties, matrix notation, shrinkage estimators, etc... However, these are fantastic tools to have in your repertoire.

@carodak9849 7 жыл бұрын

Love the way you did this video. I was feeling exausted and thought that I would find another boring video but you made it easy to follow and I've woken up

@罗星-g4t 8 жыл бұрын

I am grateful for your video , it is very understandable!

@jitenjaipuria 7 жыл бұрын

wow, thanx for doing the hard work of explaining complex statistics well

@mdddd0731 8 жыл бұрын

Very good presentation. Can I find your code that produced the result in the prostate cancer example?

@giorgossartzetakis8771 5 жыл бұрын

An hour well spent. 2019

@sagarvadher 6 жыл бұрын

Awesome. Loved the last part!!

@sarahchen4385 9 жыл бұрын

Very concise; visual and well presented.

@DerekKaneDataScience 9 жыл бұрын

+Sarah Chen Thanks for the kind words and I am glad that you liked it.

@DezoCorka007 8 жыл бұрын

This may be true, if you are a professional mathematician. As a layman, I was lost after ca. 10 minutes and the rest of the presentation was a series of pictures and hieroglyphs that I had no clue about.

@wiscatbijles 9 жыл бұрын

At 7:28 the squares are not really what they should be. You take the x difference, where you should take the y difference and square that. For the rest good video.

@DerekKaneDataScience 9 жыл бұрын

+Sjors van Heuveln Thank you for pointing this point out. You are 100% correct about the Y difference as the basis for the square. Thanks for watching

@harlananelson 6 жыл бұрын

The error term is the vertical distance between the observed and the regression line. Your video incorrectly does not draw the side of each box to extend from the observed to the regression line. +Sjors van Heuvein is correct. You picture shows the sum{(Y^ - f(x+(Y^-f(x))) )^2 being minimized.

@albi232 9 жыл бұрын

good tutorial man. good for the intuitions. I'll look for analytical derivations and codes in other videos. thanks

@peiwang3223 5 жыл бұрын

Thank you sooo much, the explanation is so clear, you really save my life!

@DerekKaneDataScience 5 жыл бұрын

Pei Wang thank you and you really made my day. Glad to help.🙂

@pinardemetci8868 6 жыл бұрын

could you please label the x and y axes of your graphs in your future presentations? (There are also a few inconsistencies with parameters. e.g. when you use 'i' in the summation term but proceed with 'j' instead)

@meghananaik8017 6 жыл бұрын

Very good presentation!!

@scargomez9437 7 жыл бұрын

Awesome. Why you stopped your videos?

@kaleabwoldemariam4288 8 жыл бұрын

Derek, Thank you for providing a very good learning material. Can you please post a video entirely dedicated to Ridge Regression? Is it possible to use Ridge Regression to estimate the coefficients and determine which covariates are important drivers of the model?

@DerekKaneDataScience 8 жыл бұрын

+Kaleab Woldemariam Thank you for the kind words and I will definitely think about adding more content to the Ridge. Some of the limitations of the Ridge Regression technique itself is that it does not lend itself to get a sense of the importance of the covariates that are the key drivers. I typically will flip to the LASSO and Elastic net variants to get this assessment (the technique itself will eliminate the less important variables through shrinking the value to 0). You could consider running a PCA analysis or employing variable selection routines to gauge variable importance before leveraging the Ridge Regression.

@MishaFeldman121 9 жыл бұрын

at 26:40min it's 0.06 vertical line

@jacobhorowitz9939 6 жыл бұрын

Where does the equation in the third bullet at 16:50 (for minimum lambda) come from? Did I miss where you defined these new parameters?

@roffpoff8221 6 жыл бұрын

WE WANT MORE !!!

@theq18 8 жыл бұрын

Thank you very much for the video and explanation

@petepittsburgh 8 жыл бұрын

How can Ridge Regression and Lasso techniques apply to binary dependent variables. Is there a logit transformation necessary first?

@demudu 9 жыл бұрын

Good insights on Ridge Regression..Thank you

@DerekKaneDataScience 9 жыл бұрын

demudu naganaidu I'm glad that you found some value in this demudu. Ridge regression is kind of tricky and I find that it takes a little bit of work to get comfortable with it. Good luck.

@goodmanryanc 7 жыл бұрын

Thanks Derek! That was helpful! I didn't feel that you explained how elastic nets capture multicollinear groups (schools of fish). It just looks like a blended version of Ridge and Lasso without creating any clusters/schools like you mentioned. Also, any insight into why Ridge outperformed Lasso and Elastic? Is that usually the case? Also, I think someone asked below... can these models be used for logistic regression (for classification/binary output - i.e. yes/no)? And, how do you handle binary inputs (I'm guessing no adjustment needed there)?

@hh636 6 жыл бұрын

At 58:00 why is 6.5 the ideal lambda?Is this eyeballed? Could it have been 6 or 7 instead?

@tobias2688 7 жыл бұрын

Hey Derek, thanks for this great Video lecture! I just have a couple of questions: 1. What is the R Matrix in 23:10? Is it a variance-covariance Matrix? 2. Why does the Ridge model lose to OLS beyond the dashed line in 31:41? 3. When you compare the three models in the end by their MSEs, are these MSEs in sample or out of sample? Thank you very much!

@pasqualelaise1181 8 жыл бұрын

Excellent

@phebewu 8 жыл бұрын

Thank you. This is very valuable. I have one question about the Elastic net and hope you can help. Since Elastic net will include group of correlated variables, my question is if I apply elastic net, can I still interpret the coefficients on the effect to the prediction? I remembered that when multicollinearity is in presence , the coefficients will become the opposite sign. So i am concerned that I will not be able to interpret my coefficients. (in the end I want to be able to say, these set of variables can have positive impact on the predicted value while the other set of variables will have 0 or negative impact.) thank you!

@chloehe5523 6 жыл бұрын

Hello, Do you think you could post the r-code for your last example

@PinkFloydTheDarkSide 7 жыл бұрын

This is a fantastic lecture video. One question. In the final comparison, the MSE for Lasso is higher by only 0.0124 but in return, we are getting rid of 2 variables. Don't you think it is worth the trade-off? Derek or anyone who is good in this, please answer. Thanks.

@ضياءبايشسلمانالعبودي 4 жыл бұрын

thank you .input it explained about the method of adabtv lasso

@danieldeychakiwsky1928 6 жыл бұрын

At 21:37 you state that X transpose X is the correlation matrix of the data X? How does XTransposeX give you the correlation matrix?

@preeyank5 7 жыл бұрын

Thanks a lot Sir!! good explanation.

@chloehe5523 6 жыл бұрын

By the way, that's a really helpful lecture!!!

@chojojo5323 7 жыл бұрын

thx for your great video!!

@scarletovergods 7 жыл бұрын

Why use log of features instead of raw features at 54:24 ?

@amineounajim9818 7 жыл бұрын

It's to correct skewness, also it's a common practice in linear models because transforming features make residuals more normally distributed. (Lo ok up feature transformation in linear models and boxcox transform for further information).

@dimar4150 6 жыл бұрын

BIC = Bayesian information criterion. AIC = Akaike information criterion

@sudheer596 6 жыл бұрын

LASSO 32:55 to 43:04

@ashishsinha2555 7 жыл бұрын

It’s really a nice tutorial on different regression techniques. I want to use LASSO/ELASTIC NET in my Ph.D. research problem (Problem: Correction of Satellite based Rainfall by using several independent variable such as location, topographical variables) May I have your personal email to discuss the problem with you?