Tutorial 28- Ridge and Lasso Regression using Python and Sklearn

Рет қаралды 118,059

4 жыл бұрын

Please join as a member in my channel to get additional benefits like materials in Data Science, live streaming for Members and many more
/ @krishnaik06
github url: github.com/krishnaik06/Regres...
#Regularization
Please do subscribe my other channel too
/ @krishnaikhindi
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06

Пікірлер: 111

@abhishekchatterjee9503 3 жыл бұрын

Watch the 2nd part just now.... You're like a savior to me as I have some deadlines due tomorrow and this helped me a lot sir. Thank you very much.💯💯

@vishalvishwakarma7603 2 жыл бұрын

Beautifully explained. Really Helpful. Thank you!

@lotmoretolearn-dataanalyti9312 4 жыл бұрын

Great explanation. Very nice and simple explanation for ridge and lasso. Both theoretical and practical concept is good. Keep doing such videos.

@alakshendrasingh3425 4 жыл бұрын

You have already trained the model with whole data at first then splitting it for the prediction. Nothing wrong but I don't think so its an ideal technique.

@ektamarwaha5941 4 жыл бұрын

agreed

@danielwang977 3 жыл бұрын

What would be a better technique? Obviously it's best to train with the whole data set right?

@sidkapoor9085 3 жыл бұрын

@@danielwang977 Test train split is the way to go. You can overfit the model if you train on the entire dataset.

@bhanuteja9408 3 жыл бұрын

@@danielwang977 no 1.What is the use of testing again(we shouldnot believe that accuracy), if the same dataset is already used in making the model. It is more of overfitting case. 2.dataset has tobe split first and then 1 has tobe trained, other has to be tested. I think 2 is better. We get to know how the model works for a foreign data. Imo

@trentcrosby9196 2 жыл бұрын

you all probably dont care but does anybody know a trick to get back into an Instagram account..? I somehow lost my account password. I would appreciate any tips you can give me!

@bassamal-kaaki3253 4 жыл бұрын

Excellent straightforward video :)

@iftekarpatel123 4 жыл бұрын

I m d first to to watch ur video.. Was watching ur bias n variance video... N notification came for this... Big fan of urs sir...

@shindepratibha31 4 жыл бұрын

I feel, steps for regression process can be like this: 1) split the data into train and test. 2) Use train data for cross-validation and find the parameter with min MSE. 3) Use the same parameter over test data and check for the accuracy of different models.

@ilirsheraj2092 3 жыл бұрын

You are right, he has actually overfitted the data on test because the model was already trained with them. Spliting and then doing regularization is the correct way, but he probably dindnt wanna go into details here

@hemantkapoor6777 3 жыл бұрын

your videos are really nice keep it up!

@esakkiponraj.e5224 4 жыл бұрын

A small suggestion --- better differentiate the terms - alpha (for Lassa) and lambda (for ridge). It is confusing.

@shashankverma4044 4 жыл бұрын

Excellent !!

@apoorvshrivastava3544 4 жыл бұрын

Sir you are saviour

@rajeshthumma5930 4 жыл бұрын

Hi Krish, You are explained in a clear and easy manner to understand the concepts. Krish, If possible can you please explain the machine learning algorithms as like you explained as linear regression algorithm. I Mean to say, theoretical explanation of the all the important machine learning algorithms. thank you

@adarshnamdev5834 4 жыл бұрын

Hello Krish, thanks for making this wonderful video. Could you also please make a video on SVM and its underlying aspects like Kernelization etc...

@Kumar-oh2jl 4 жыл бұрын

I think the good example for regularization would be to show that model's accuracy on training data is excellent and accuracyon test data is bad i.e- overfitting; then we can use regularization and compare the results

@MasterofPlay7 4 жыл бұрын

or use roc and auc xD

@ayankoley5358 4 жыл бұрын

I'm following the 'the complete machine learning playlist' playlist, but you're step-jumping skipping many details saying 'hope you know this', love your teaching though but Can you make any good completed playlist??

@hasanzaman9783 3 жыл бұрын

Agreed.. I'm also following the complete list.

@utkarshsalaria3952 3 жыл бұрын

yes u said that right the explanation is good but it is difficult to follow the videos in step even after the playlist is available . some topics need to be rearranged and some need to be added in the playlist so that it becomes easy to follow the path ....but still its the best material available on youtube you can find all the topics but for that you need to search it manually on his channel

@Emotekofficial 3 жыл бұрын

Krish thank you for this video very informative. I have question though. Dont you think predicting (x_test, y_test) from the model that is trained from (x,y) would predict a memorized value? shouldn't it be accurate and realistic prediction if model is trained from x_train and y_train rather than x,y for testing purpose?

@madhu1987ful 4 жыл бұрын

Krish Just the video I wanted to see how ridge and lasso can be used in Python. Thanks a ton Can u pls explain what is the inference from last 2 dist plots...I didn't get

@sanjanaprakash6847 3 ай бұрын

The concept is well explained. However, as the ridge and lasso model is trained with the entire dataset and not x_train, data leakage has happened as the model is aware of the test inputs much before testing.

@sobhagyashri 4 жыл бұрын

Thankyou sooooo much.. :)

@anishyekhande6520 2 жыл бұрын

Thanks!

@2828jordan 4 жыл бұрын

Good videos. So far so good. From most of the videos, i feel inference part is missing. What can we infer from the plots ?

@laxitrana8605 3 жыл бұрын

@sandipansarkar9211 3 жыл бұрын

Superb explanation. Need to get my hands dirty in jupyter notebook. Thanks

@MrPiickel 4 жыл бұрын

Is the method correct? In my understanding we should: 1. split up the data train and test.set (maybe - depends on data units - standardize before) 2. do the hyperparameter optimization on the train set with cv-> best model for training data 3. predict with best model on the test set -> realistic result for unseen data you did: 1. hyperparameter optimization on all data using cross validation -> best model for all data 2. split up all data in training and test set 3. predict with best model on test set -> in my opinion your "best model" has already seen the data in the test set. Hence the result should not be quit realistic. What do you think?

@madhavanrangarajan6097 4 жыл бұрын

true...was thinkinhg the same

@jaysoni7812 3 жыл бұрын

main aim of this video is to explain the ridge and lasso regression this small small things we can do it our self, so just do it your self instead of finding this silly type of mistake

@vinayakpawar2460 4 жыл бұрын

Hi Krish, Amazing videos....On what basis Alpha parameter list is decided? Also can you please explain in more detail about the 2 plots in the last.? Thanks.

@dhivya_animal_lover 4 жыл бұрын

Thanks for your previous video Krish. I am not getting whether Lasso or Ridge is better at the end of the code. Also i referred a blog on this topic and found Elastic net is more efficient. Can you kindly explain this. Dy

@preranav7149 7 ай бұрын

veryhelpful

@kushswaroop7436 Жыл бұрын

How do we select the value of Aplha/Lambda, what is the ideal value

@JaiSreeRam466 4 жыл бұрын

Please explain elastic regression aswell

@onlydilip8124 3 жыл бұрын

cross_val_score is the costfuntion as u send in the previous video

@mitulkoul1909 2 жыл бұрын

I had a doubt. what would happen if the best fit line has a slope 0 i.e parallel to the x axis. How would ridge and lasso regression help overcome overfitting in that case?

@NEHASHARMA-tm9gu 2 жыл бұрын

Just a small doubt Since I m new to data science, You used cross_val technique for Linear Regress ion but Grid Search for Ridge and Lasso ?

@kapilchand6017 3 жыл бұрын

Hi Krish can you please explain why lasso is better than bridge in histogram prediction didn't able to follow the last minutes of the video.It would be great if you can clarify.

@Amir-English 2 ай бұрын

Why did we split the data into train & test "after" doing fitting in lasso regression? I mean, shouldn't it be like splitting the data before fitting and creating a model? 8:34

@surajsoren637 3 жыл бұрын

can we use other scoring method rather than neg_mean_squared_error to solve the problem...If any please suggest...Please help me out..

@adityay525125 4 жыл бұрын

Sir after you made everything, I did not get things related to Cross Validation, can you please explain it in brief?

@nagashishsv843 4 жыл бұрын

sir if i use cv=10 then the mse is coming still less..so how to chose the cv value appropriately..??will it depends upon the dataset..??

@ejikeozonkwo1705 2 жыл бұрын

Hi Krish, is there a reason you trained the whole data set before doing a train test split.

@deepalisharma1327 3 жыл бұрын

can someone please explain why are we subtracting y_test from prediction_ridge or prediction_lasso?

@techbenchers69 4 жыл бұрын

Is deep learning playlist is complete ?if not please complete it🙏🙏

@kks47 3 жыл бұрын

Hello krish , i am confused at the end which one is performed well ? Lasso or Ridge in this case? Please give some feedback.

@viveksingh881 3 жыл бұрын

ridge performed well....closer to zero better perfromance

@raghavsinghal22 2 жыл бұрын

is alpha is Lamba which we are trying to found out in ridge?

@3pandya 4 жыл бұрын

Hi. I just want to know if I am not wrong, we need to use train_test_split method before training the data. right? But you trained the data and then split the data into train & test, which for sure do not give us an accurate prediction on future predictions. Please correct me if I am wrong. Thank you.

@shashankkhare1023 4 жыл бұрын

Hi, he used cross validation first, which in itself does train-test split. cv=5 means splitting ino train-test 5 times and checking scoring(neg MSE) on each test set and showing mean value. So cross validation itself takes care of testing on unseen data. Doing a train-test split manually later on is an optional choice, hope this answer helps.

@shindepratibha31 4 жыл бұрын

@@shashankkhare1023 I think it is better to split the data into train and test first and then go for cross validation using train data. Cross validation splits train data into train and validation.

@shubhamkundu2228 3 жыл бұрын

Is it necessary to use GridSearchCV for ridge and lasso regression?

@arindamghosh3787 3 жыл бұрын

how to check which are the best variables left for the model

@RitwikDandriyal 4 жыл бұрын

Just curious but shouldn't we be standardizing our data when using linear regression? Or does sklearn automatically take care of that?

@hipraneth 4 жыл бұрын

I guess it needs to be standardized before applying the algorithm

@Prachi_Mayank 4 жыл бұрын

Sir this mse =cross val This line is showing error in my system sir... As per this 'neg_mean This line is showing error... Wat do I do sir?

@manikantasai4766 4 жыл бұрын

Can you please make video on sentiment analysis on twitter

@yadavsanderamniwas 4 жыл бұрын

I have a question. what do you mean by stable in the last ses distplot graphs? both ridge and lasso looks same to me. how is one more stable than other

@nabiltech1366 3 жыл бұрын

As u can watch at 4:10, he said that the more data closer to zero,the better the model u have.At histogram u can see that data is more fall in zero using Ridge that Lasso

@galymzhankenesbekov2924 4 жыл бұрын

Krish, as always amazing video. But why you decide to use cv=5, and how you come up with alpha values ? thanks

@krishnaik06 4 жыл бұрын

U can choose any value for cv and alpha usually ranges between 0and 1

@galymzhankenesbekov7242 4 жыл бұрын

Krish Naik thanks ! Could I also suggest a project for you to do ? Loan approval with online platform where you enter credentials of the client and it gives either to approve the loan or not . Is it just classification problem ?

@ali013579 4 жыл бұрын

I just don’t know where to start. By the way nice videos

@ali013579 4 жыл бұрын

wise guy you can do many things, for example code to drive your car while you are sleeping in the car :)

@shubhamkundu2228 3 жыл бұрын

cross_validation not explained clearly. What it is? Why it's needed to perform linear_regression here and what are those hyperparameters e.g cv used under cross_val_score ?

@barax9462 3 жыл бұрын

why im getting -1.34e+23 when i do mean_mse for linear reg??? is that bad?

@sufiyanansari1739 3 жыл бұрын

Cannot clone object. You should provide an instance of scikit-learn estimator instead of a class. what is this error

@daspopdsa Жыл бұрын

Sir there is error like 'load_boston' has been removed from scikit-learn since version 1.2. What should i do???

@nikheleshpanigrahi4640 4 жыл бұрын

Hello Krish, can you tell me how are you selecting alpha(lambda) values ?

@danielwang977 3 жыл бұрын

The algorithm cycles through each of the parameters and uses cross validation to comprehensively test how accurate each parameter is.

@krunalpatel9952 4 жыл бұрын

HI Krish, At 5.58 time in video, you said that this best score helps us to find out which lambda value is suitable! but question that how? you have mentioned those values as an alpha values. and alpha values as a learning rate should not be very high number, instead it should be very small in order to reach global minima. regards, Krunal

@rajathbk6915 4 жыл бұрын

Hey we used alpha for ridge and lasso not linear regression.dont confuse learning rate(Alfa) with lambda.those alpha values is for lambda

@nabiltech1366 3 жыл бұрын

Thats for gradient descents

@venkatasubbaraomandaleeka4973 2 жыл бұрын

I also got same issue. He is saying lamda is selected using cv then why he gave alpha values and printed best alpha value?

@jeevan88888 9 ай бұрын

@@venkatasubbaraomandaleeka4973 he just typed lambda as alpha in the code for simplicity I guess..

@arvindtechnical940 4 жыл бұрын

Sir AirIndex Project Second Video?

@easewithjapanese1844 3 жыл бұрын

Hi Krish. Nice video but not getting my histogram graph. it throws a value error. please help

@sudarshansharma8647 4 жыл бұрын

Sir, will Ridge and Lasso Regression would always give us better result than linear regression or polynomial regression?

@mehdi9771 Жыл бұрын

yes for sure

@RandomGuy-hi2jm 3 жыл бұрын

but sir u have not scaled the values????

@hrshtmlng 2 ай бұрын

Now this dataset load_boston isnt available in sklearn instead I'm using california_hosing dataset

@garima2158 7 ай бұрын

Error i m getting: It seems like you're encountering an issue because the load_boston function has been removed from scikit-learn since version 1.2 due to ethical concerns related to the dataset.

@priyanavthakur7014 5 ай бұрын

me too, did you find any solution?

@abinsharaf8305 2 жыл бұрын

once we findout the best parameters do we use it ? where are we using it ?

@abinsharaf8305 2 жыл бұрын

thannks so much for the video !

@Datacrunch777 3 жыл бұрын

How can I find summary of date in ridge regression as like in OLS ( estimatr, standard error and p values ) ? Kindly help plZ

@arjunkadam71 3 жыл бұрын

Use statmodels.GAM library

@Datacrunch777 3 жыл бұрын

Can you tell me proper command?

@arjunkadam71 3 жыл бұрын

@@Datacrunch777 explore statmodels library bro

@narotian 3 жыл бұрын

your theory videos are good, but i don't like coding part it looks way different from what i do. you must have tried doing it from start with some new dataset. i'm good with theory but now i'm messing my mind with coding part(everyone has their own way of coding).

@nabiltech1366 3 жыл бұрын

Sane

@rajusrkr5444 4 жыл бұрын

please provide code also sir, when practicing it will take more time to watch the video and type the code in editor.

@adityay525125 4 жыл бұрын

it' s on his Github profile

@sasikumar-fz1zy 4 жыл бұрын

refer the GitHub link in the description of this video

@d39-nischithhegde65 3 ай бұрын

boston dataset has been depricated

@adarshmamidi334 4 жыл бұрын

Reply in insta

@beautyofnature1541 3 жыл бұрын

Thank you for sharing your knowledge. I have recently started watching and have learned a lot. I was just practicing the above code on Colab and it's giving me the following error: AttributeError: 'Series' object has no attribute 'prediction_lasso' on the line sns.distplot(y_test.prediction_lasso) and also at sns.distplot(y_test.prediction_ridge). Can anyone please help with how to fix this?