Regularization - Explained!

Рет қаралды 13,948

Жыл бұрын

We will explain Ridge, Lasso and a Bayesian interpretation of both.
ABOUT ME
⭕ Subscribe: kzbin.info...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: / ajay-halthor-477974bb
RESOURCES
[1] Graphing calculator to plot nice charts: www.desmos.com
[2] Refer section 6.2 on "Shrinkage Methods" for mathematical details: hastie.su.domains/ISLR2/ISLRv...
[3] Karush-Kuhn-Tucker conditions for constrained optimization with inequality constraints: en.wikipedia.org/wiki/Karush-...
[4] stat exchange discussions on [3]: stats.stackexchange.com/quest...
[5] Proof of ridge regression: stats.stackexchange.com/quest...
[6] Laplace distribution (or double exponential distribution) used for lasso prior: en.wikipedia.org/wiki/Laplace...
[7] ‪@ritvikmath‬ 's amazing video for the bayesian interpretation of lasso and ridge regression: • Bayesian Linear Regres...
[8] Distinction between Maximum "Likelihood" Estimations and Maximum "A Posteriori" Estimations: agustinus.kristia.de/techblog...
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.net/MathML
📕 Calculus: imp.i384100.net/Calculus
📕 Statistics for Data Science: imp.i384100.net/AdvancedStati...
📕 Bayesian Statistics: imp.i384100.net/BayesianStati...
📕 Linear Algebra: imp.i384100.net/LinearAlgebra
📕 Probability: imp.i384100.net/Probability
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.net/Deep-Learning
📕 Python for Everybody: imp.i384100.net/python
📕 MLOps Course: imp.i384100.net/MLOps
📕 Natural Language Processing (NLP): imp.i384100.net/NLP
📕 Machine Learning in Production: imp.i384100.net/MLProduction
📕 Data Science Specialization: imp.i384100.net/DataScience
📕 Tensorflow: imp.i384100.net/Tensorflow

Пікірлер: 26

@ashishanand9642 17 күн бұрын

Why this is so Underrated, this should be on every one playlist for linear regression. Hatsoff man :)

@data_quest_studio4944 Жыл бұрын

My man looks sharp and dapper

@CodeEmporium Жыл бұрын

Haha. Thanks! I think this shirt looked better on camera than in person. :)

@ajaytaneja111 Жыл бұрын

Hi Ajay, great video, as always. One suggestion with your permission;) I think it might be worthwhile introducing the concept of regularization by comparing: Feature elimination ( which is equivalent to making the weight zero) vs reducing the weight ( which is regularization) and elaborate on this and then drfting towards Lasso and, Ridge. ;)

@lucianofloripa123 2 күн бұрын

Good explanation!

@paull923 Жыл бұрын

I had to watch it twice to truly digest your approach, but I like your approach to the contour plot in particular. I hope to boost your channel with my comments a tiny bit ;). tyvm! what I was taught and what is helpful to know imo: 1) Speaking on an abstract level what regularization achieves: it punishes high-dimensional terms. 2) The notion of L1- and L2- regularization and when you talk about "Gaussian" for Ridge, you could also talk about "Laplace" distribution instead of double exponential distribution for Lasso regression

@CodeEmporium Жыл бұрын

Thanks so much for your comments Paul! And yea, I feel like I have seen similar contour plots in books but never truly understood “why” they were like that until I started diving into details myself. Hopefully in the future I can explain it in a way that you’d be able to get it in a single pass through the video too :)

@blairnicolle2218 Ай бұрын

Excellent videos! Great graphing for intuition of L1 regularization where parameters become exactly zero (9:45) as compared with behavior of L2 regularization.

@cormackjackson9442 2 ай бұрын

Such an awesome video! Can't believe i hadn't made the connection between ridge and Lagrangians, literally has a lambda in it lol!

@cormackjackson9442 2 ай бұрын

With the lasso intuition, the stepwise function you get for theta, how do you get the conditions on the right i.e. yi < lambda/2.I thought perhaps instead of writing theta < 0, you are just using the implied relationship between yi and lambda. E.g. that if theta < 0, and therefore |theta|.= - theta, which then after optimising gives theta = y - lambda/2 i.e. y = lambda/2 + theta, but then i get the opposite conditions as you...i.e. as theta is negative in this case wouldn't that give y = lambda/2 + theta < lambda/2?

@NicholasRenotte Жыл бұрын

Well hello everyone right back at you Ajay! These are fire, the live viz is on point!

@CodeEmporium Жыл бұрын

Thank you for noticing ma guy. I will catch up to the 100K gang soon. Pls wait for me 😂

@NicholasRenotte Жыл бұрын

@@CodeEmporium 😂 you're one hunnit in my eyes 🙏

@fujinzhou7150 Жыл бұрын

Love your awesome videos! Salute! Thank you so much!

@CodeEmporium Жыл бұрын

You are so welcome! I am happy this helps

@TheRainHarvester Жыл бұрын

Great content on your channel. I just found it! Heh i used desmos to debug/visualize too! I just added a video explaining easy multilayer back propogation. The book math with all the subscripts is confusing, so i did it without any. Much simpler to understand.

@CodeEmporium Жыл бұрын

Thank you! And Solid work on that explanation :)

@sivakrishna5530 11 ай бұрын

always find interesting things here ,Keep going .Good luck .

@CodeEmporium 11 ай бұрын

Hah! Glad that is the case. I am here to pique that interest :)

@kakunmaor Жыл бұрын

AWESOME!!!!! thanks!

@chadx8269 Жыл бұрын

Nice explaination of Bayesian. Isn't Regularization just the Lagrange multiplier. The optimum point is where the the gradient of the constraint is proportional to the gradient of the cost function.

@abhirajarora7631 3 ай бұрын

It is mathematically written in the same way but they are not the same. Langrange multipliers are used when you need to min/max a given function provided a constraint, and then you find the value of lambda, but in regularisation, we set the lambda value ourselves. Regularisation gives us a penalty if we take steps towards the non minimum direction and thus allows us to go back to the correct direction in the following iteration.

10 ай бұрын

Nice video, thanks! The only thing I think is slightly incorrect is that you could see polynomials with increasing degrees as complex. Since you are talking about maths, I was expecting to see imaginary unit when I first heard complex.