Machine Learning Tutorial Python - 17: L1 and L2 Regularization

Machine Learning Tutorial Python - 17: L1 and L2 Regularization | Lasso, Ridge Regression

Рет қаралды 287,652

Күн бұрын

Пікірлер: 198

@codebasics 2 жыл бұрын

Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

@bharathis9295 4 жыл бұрын

Statquest theory+Codebasics Practical implementation=😍😍😍

@codebasics 4 жыл бұрын

ha ha .. nice :) Yes I also like statquest.

@gokkulkumarvd9125 4 жыл бұрын

Exactly!

@ItsSantoshTiwari 3 жыл бұрын

Same😂👌

@abhinavkaul7187 3 жыл бұрын

@@codebasics BAM!! :P Btw, the way you explained Yolo that was superb, bro!

@sandydsa 3 жыл бұрын

Yes! Minor comment, kindly please switch age and matches won. Got confused at first 😂

@ManigandanThangaraj Жыл бұрын

Nice explanation .. Adding to that L2 Ridge : Goal is to prevent multicollinearity and control magnitude of the coefficients where highly corelated features can be removed by shirking the coefficients towards to zero not exactly zero , stability and generalization. L1 Lasso : Goal is to prevent sparsity in the model by shirking the coefficients exactly to zero , importance in feature selection, preventing overfitting..

@r0cketRacoon 4 ай бұрын

so, in what cases should we use L1 and L2?

@AlonAvramson 3 жыл бұрын

I have been following all 17 videos on ML you provided so far and found this is the best resource to learn from . Thank you!

@DrizzyJ77 6 ай бұрын

Bro, you don't know how you've helped me in my computer vision journey. Thank you❤❤❤

@Hari-xr7ob 3 жыл бұрын

you should probably change the X and Y axes. Matches won is a function of Age. So, Age should be on X axis and Matches won on Y axis

@hansamaldharmananda9605 3 жыл бұрын

That will more familiar. :D

@kj7767 Жыл бұрын

familiar where !@@hansamaldharmananda9605

@parthasarothi2295 2 ай бұрын

you just said my words

@gyanaranjanbal10 Жыл бұрын

Clean, crisp and crystal clear, I was struggling to understand this from a long time, your 20 mins video cleared it in one attempt, thanks a lot💌💌

@bors1n 3 жыл бұрын

thank you a lot, I'm from Russia and I'm student. I watch your video about ML and It helps me to understand better

@codebasics 3 жыл бұрын

Glad to hear that!

@shashankdhananjaya9923 3 жыл бұрын

Couldn't have explained it any simpler. Perfect tutorial.

@codebasics 3 жыл бұрын

Glad it helped!

@ambujbaranwal9351 2 ай бұрын

00:04 L1 and L2 regularization help address overfitting in machine learning 02:12 Balancing between underfitting and overfitting is crucial for effective model training. 04:26 Regularization shrinks parameters for better prediction function 06:47 L2 regularization penalizes the overall error and leads to simpler equations. 09:14 Filtering and handling NA values in a dataset 12:02 Dropping NA values and converting categorical features into dummies for machine learning in Python. 14:28 Understanding the issues of overfitting in linear regression model 17:00 Regularization techniques like L1 and L2 improve model accuracy. 19:16 Encouraging viewers to like and share the video

@RadioactiveChutney 2 жыл бұрын

Note for myself: This is the guy... his videos can clear doubts with codes.

@codebasics 2 жыл бұрын

ha ha .. thank you 🙏

@NafisAnsari-vr2xq Күн бұрын

For a different parameter of train test split ( train_X, test_X, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=2) the test score is 0.13%, using the regular train test split parameter the scores are similar to 67-68%. No point in trying the regularization on these scores. Yet I tried them on these with some changes and there wasn't any significant change. For anyone with the same scenario, Just try to remember that regularization can be used when the scores for training and testing data has a significant difference, could be underfitting or overfitting.

@ajaykushwaha-je6mw 3 жыл бұрын

Best tutorial on l1 and L2 Regularization.

@tusharsethi2801 3 жыл бұрын

One of the best videos out there for Regularization.

@yash422vd 3 жыл бұрын

As per the equation y = mX + c, you inter-changed the y & X axis, if I'm not wrong. Because you are trying to predict match won(yhat) which is your horizontal line and age(X) is on vertical line. Maybe using something unconventional mislead new-learners. As X is a horizontal line and y is vertical line, that's what we learned since school time. Assigning X & y to axis(as per your explanation) will be great help to learner. I hope you are not taking personally. My opologies if so!

@lazzy5173 2 ай бұрын

Summary: - L1 regularization helps in feature selection. -L2 regularization helps in preventing overfitting.

@atulupadhyay1542 3 жыл бұрын

machine learning concepts and practicals made easy, Thank you so much Sir

@codebasics 3 жыл бұрын

I am happy this was helpful to you.

@bruh-jr6wj 10 ай бұрын

I believe the most appropriate imputing method here is to group by the similar type of houses and then fill with the mean value of the group. For example, if the average is, say, 90 m^2, and the home is only a flat, the building area is incorrectly imputed.

@nexthome1445 4 жыл бұрын

Kindly make video on Feature selection for Regression and classification problem

@bhavikjain1077 3 жыл бұрын

A good video to understand the practical implementation of L1 and L2. Thank You

@haintuvn 3 жыл бұрын

Thank you for your interesting video. As far as I get from the video, L1, L2 regulations help to overcome the overfit problem from Linear regression! What is about other algorithms ( Support vector machine, logistic regression..) , how can we overcome the overfit problem?

@nastaran1010 10 ай бұрын

best learning with very good explanation. Thanks

@koustavbanerjee8195 3 жыл бұрын

Please do videos about XGBoost, LGBoost !! You Videos Are Pure GOLD !!

@mukeshkumar-kh2fh 2 жыл бұрын

thank you for helping the DS community

@javiermarchenahurtado7013 2 жыл бұрын

Such a great video!! I was struggling to understand regularization and now it's crystal clear to me!

@piyushlanjewar6274 2 жыл бұрын

That's a really great explanation, Anyone can use this method in real use cases now. Keep it up.

@amruth3 3 жыл бұрын

Sir your all the vedios are really helpful...Now Iam giving you the feed back of the vedio Iam going to see.This is also an beautiful vedio and Hyperparamter tuning also an very best vedio......God Bless you..u..work hard in getting think to understand in easy manner..

@nationhlohlomi9333 Жыл бұрын

I really love your content….. You change lives❤❤❤

@DarkTobias7 4 жыл бұрын

These are the videos we like!!!

@codebasics 4 жыл бұрын

Thanks DarkTobias. Good to see your comment.

@NeekaNeeksz 7 ай бұрын

Clear introduction. Thanks

@kaizen52071 2 жыл бұрын

Nice video....good lesson......funny enough i see my house address in the dataset

@king1_one 9 ай бұрын

good explanation sir and you need appreciation , i am here .

@phil97n 2 ай бұрын

Awesome explanation, thanks.

@Ultimate69664 2 жыл бұрын

thank you ! this video save my exam :)

@joehansie6014 3 жыл бұрын

All your videos are totally great. Keep working on it

@phuonglethithanh8498 Жыл бұрын

Thank you for this video. Very straightforward and comprehensive ❤

@anvarshathik784 2 жыл бұрын

achine learning concepts and practicals made easy, Thank you so much Sir

@codebasics 2 жыл бұрын

You are most welcome

@ankitmaheshwari7310 3 жыл бұрын

Good.model representation is good.hoping some deep knowledge in next video

@nehareddy4619 2 жыл бұрын

I really liked your way of explanation sir

@vyduong276 Жыл бұрын

I can understand it now, thanks to you 🥳

@ALLINONETV1 4 жыл бұрын

Please continue ....

@leonardomenar55 2 жыл бұрын

Excellent Tutorial, Thanks.

@mohammadrasheed9247 2 жыл бұрын

Nice Explanation. Also Recommended to play on 2X

@dylanloh5327 2 жыл бұрын

Thank you vm for this video. This is straight-forward and simple to understand!

@codebasics 2 жыл бұрын

👍👍😊

@vishvam1307 4 жыл бұрын

Nice explanation

@rohantalaviya136 7 ай бұрын

Really great video

@PollyMwangi-cp3jn 8 ай бұрын

Thanks so much sir. Great content

@davuthdy876 3 жыл бұрын

Thank for your video for sharing to the world.

@codebasics 3 жыл бұрын

I am glad you liked it

@priyankshekhar2454 3 жыл бұрын

Very good videos by you on each topic..thanks !!

@kouider76 3 жыл бұрын

Just came across this video accidentally simply great thank you

@joehansie6014 3 жыл бұрын

Simple but powerful😎👍

@marthanyarkoa9007 Жыл бұрын

Thanks so simple ❤😊

@bryteakpakpavi637 3 жыл бұрын

You are the best.

@codebasics 3 жыл бұрын

Glad it was helpful!

@jongcheulkim7284 2 жыл бұрын

Thank you. This is very helpful.

@tanishsadhwani730 2 жыл бұрын

Amazing sir thank you so much

@ayenewyihune 2 жыл бұрын

Cool video

@analuciademoraislimalucial6039 3 жыл бұрын

Thank you so much teacher

@alielakroud1786 4 жыл бұрын

Hi Sir, Thanks for all this tutorials in ML. I've tried to use this syntaxe above, but when i fit my model the score using trainning data is 0.68 whereas the reg.scores using Test data is just weird.score(X_test,Y_test) =--17761722756.9913 dummies=pd.get_dummies(df[['Suburb','Type','Method','SellerG','CouncilArea','Regionname']]) Merge=pd.concat([df,dummies],axis='columns') final=Merge.drop(['Suburb','Type','Method','SellerG','CouncilArea','Regionname'],axis='columns') final 2nd part of my question is when i use L1 and L2 Regularization the score seem correct 0.66 and 0.67 I would also mentionned that when i've used LabelEncoder i find a score test data 0.44 and Trainning data 0.42 Thanks in advance for your answers

@ajgameboy6930 Жыл бұрын

Same here, I really don't know what went wrong...

@ajgameboy6930 Жыл бұрын

Hey, quick update, I found out the problem in my scenario... I had filled NaN values of price with mean, which caused the problem... Now that I have dropped 'em, it's working fine... Hope you had also solved the problem (you must've, ur comment is from 2 years back XD)

@swL1941 Жыл бұрын

Great video. However, It would have been better if you had provided the justification for assigning Zeros to few NaN values and giving mean to frew records. I know "its safest to assume" butt hen I believe in real world projects we cannot just assume things.

@unifarzor7237 2 жыл бұрын

Always excellent lessons, thank you

@HA-bj5ck 11 ай бұрын

Appreciate the efforts, but there were issues with the foundational understanding. Additionally, the inclusion of dummy variables expanded the columns to 745 without acknowledgement or communication regarding its potential adverse effects to viewers was not expected.

@noahrubin375 3 жыл бұрын

Not all superheroes wear capes!

@m.shiqofilla4246 3 жыл бұрын

Very nice video sir but at first i hoped you show the plot of scatter plot of the data and how the curve of the L1/L2 regression...

@aadityashukla8535 2 жыл бұрын

good theory!

@ravikumarrai7325 3 жыл бұрын

Awesom video....really awesom..

@codebasics 3 жыл бұрын

Glad you liked it

@denisvoronov6571 3 жыл бұрын

Nice example. Thank you so much!

@codebasics 3 жыл бұрын

Glad you liked it!

@nikolinastojanovska 2 жыл бұрын

great video, thanks!

@SahilAnsari-gl3xu 3 жыл бұрын

Thank a lot Sir❤️ Very good teaching style (theory+practical)👍

@sanooosai 10 ай бұрын

thank you great work

@daretoschool4113 3 жыл бұрын

Please make video for genetic algorithm

@ayusharora2019 3 жыл бұрын

Very well explained !!

@codebasics 3 жыл бұрын

Glad it was helpful!

@JAVIERHERNANDEZ-wp6qj Жыл бұрын

Maybe in the Cost formula, the indices for summation should be different (in general): for the MSE term the sum should be over the entire training dataset (in this case n), and the sum for the regularization term should run over the number of features or columns in the dataset

@cvino0618 Ай бұрын

there isnt a paid course on udemy better than the information I am gaining here

@MrMadmaggot 2 жыл бұрын

First when you apply lasso, you apply it apart from the first linear regression model you made right? Which means applying scikit Lasso is like making a linear regression but with regularization or it is applied to the linear regresion from the cell above?? So what if I use a knn or a forest?

@anseljanson5171 3 жыл бұрын

Thank you for this video why did you drop na value price column even though it had more than 7000 na values wont it affect the prediction??

@mkt4941 3 жыл бұрын

You cannot accurately make an assumption as to what the price is based on the available data, so you have to drop it.

@anseljanson5171 3 жыл бұрын

@@mkt4941 Thanks :)

@victorbenedict8743 4 жыл бұрын

Great tutorial sir.Its a privilege to be a fan of yours.Please sir could you please do a video on steps to carry out when doing data cleaning for big data.Thank you.

@gouravsapra8668 2 жыл бұрын

Hi...The equation, shouldn't it be : Theta0 + Theta1.x1 + Theta2.square (x1)+Theta3.cube (x1) rather than Theta0 + Theta1.x1 + Theta2.square (x2)+Theta3.cube (x3) because we have only one x feature ? 2) the Regularization expression (Lambda part), my understanding is that we should not take "i & n" , rather we should take "j & m" etc. The reason is that in first half of equation, we took "i & n" for number of rows whereas in second half, we need to take number of features, so different parameters should be used. Please correct me if my understanding is wrong.

@EngineerNick 3 жыл бұрын

Thankyou for this it was very useful :)

@codebasics 3 жыл бұрын

Glad it was helpful!

@nomanshaikhali3355 4 жыл бұрын

Kindly explain Boosting algos!!

@thoeer913 2 жыл бұрын

I don't know how can you explain such a simple topic in so complicated manner. Your explanation caused more confusion than the topic itself.

@OceanAlves23 4 жыл бұрын

👨‍🎓👏✔, from Brazil-Teresina-PI

@codebasics 4 жыл бұрын

Thanks Ocean. I wish you visit Brazil one day (especially Amazon rain forest :) )

@arjunbali2079 2 жыл бұрын

thanks sir

@SGandhi 3 жыл бұрын

Can you make a video of ensemble model of using decision tree,knn and svm code

@adia9791 2 жыл бұрын

I think one must not use those imputations(mean) before train test split as it leads to data leakage, correct me if I am wrong.

@Piyush-yp2po 4 ай бұрын

Taking mean for prices would have been a better choice

@nikhilsingh1296 Жыл бұрын

I really love learning from your Videos, they are pretty awesome. Just a concern, as in Line 11 we ran a missing value sum code where the Price Stated, 7610 and in the next line that is Line 12, we have dropped the 7610 rows, isn't it? Also, what was the other option if we would not have dropped the valued, can we not divide the data set and treat 50 percent of the missing values in Price and as a train dataset by imputing mean, and run the test on the missing price values. I am not sure, even if this is a valid question, but I am a bit curious. Also, what was the scope for PCA here?

@slainiae 9 ай бұрын

I agree. The missing 'Price' values could have been estimated using one of the previously presented algorithms.

@anjalipatel9028 9 ай бұрын

L1,L2 Regularization is valid for regression algorithm only?

@rash_mi_be 3 жыл бұрын

In L2 regularization, how can theta reduce when lambda increases, and increase when lambda decreases?

@sudharsanb9391 4 жыл бұрын

Sir pls put a video on xgboost, adaboost and gradient boosting

@swaralipibose9731 4 жыл бұрын

Yes please

@DHAiRYA2801 4 жыл бұрын

Yes!

@codebasics 4 жыл бұрын

sure. looks like there is lot of demand for these topics, I have added them in my todo list

@duztv5370 4 жыл бұрын

@@codebasics please sir, we will be expecting. Thanks

@sunzarora 3 жыл бұрын

Yes please!

@bhoomi5398 2 жыл бұрын

what is dual parameter and please explain what is primal formal & dual

@TheOraware 3 жыл бұрын

0.01*4=0.04 , here 0.01 is lambda and 4 is theta , 0.1*4=0.4 , here same 0.1 is lmada and 4 is theta. When i increase lambda then product of lambda increase hence cost function increase not theta value at @6:47

@Piyush-yp2po 4 ай бұрын

Got 45% accuracy for normal reg, l1 and l2

@_k_kd 7 ай бұрын

but you have dropped more then 7000 price with Na

@PRIYASHARMA-cr8ff 5 ай бұрын

same I was also thinking the same

@tjbwhitehea1 3 жыл бұрын

Hey, great video thank you. Quick question - what's the best way to find the optimal alpha? Do you do a grid search?

@codebasics 3 жыл бұрын

Yes doing grid search would be a way

@sunzarora 3 жыл бұрын

Nice video, my question is what will u do so accuracy will jump on this dataset from 67 to 90+?

@ajaysaroha2539 4 жыл бұрын

Sir,I am fresher & want to make career in finance domain data analyst & I have no any experience in finance domain so how can I gain knowledge in finance domain so pls give some suggestion about it.