Machine Learning Lecture 20 "Model Selection / Regularization / Overfitting" -Cornell CS4780 SP17

No video

Machine Learning Lecture 20 "Model Selection / Regularization / Overfitting" -Cornell CS4780 SP17

Рет қаралды 20,896

Kilian Weinberger

Күн бұрын

Lecture Notes:
www.cs.cornell....

Пікірлер: 33

@utkarshtrehan9128 3 жыл бұрын

The mediocre teacher tells. The good teacher explains. The superior teacher demonstrates. The great teacher inspires. ― William Arthur Ward

@venugopalmani2739 5 жыл бұрын

What a guy! Love the way you go about things Prof. Wish you soon have a million subs.

@consumidorbrasileiro222 4 жыл бұрын

i don't. if he does we are out of jobs!

4 ай бұрын

I've never encountered anything better than these playlist. Thank you, Professor, for the detailed explanation and, most importantly, for presenting it in such an engaging way that it sparks a deep passion in everyone. Here I am in 2024, following since lecture one and feeling how much I have developed after your lectures.

@vaibhavsingh8715 2 жыл бұрын

This video and the whole playlist is a treasure trove for ML students

@StevenSarasin 9 ай бұрын

God that variance explanation using the minimization graph at the end hit so hard. Loved that!

@sandeepreddy6295 4 жыл бұрын

The lecture - Very Clearly Explains - the concepts; worth subscribing

@alihajji2613 4 жыл бұрын

That was the good explanation i’ve ever seen. Thank you sir for sharing with us this knowledge.

@llll-dj8rn 8 ай бұрын

the fact that i am one of just 18k that watched this lecture is an amazing fact for me, thanks for this great content, really satisfied my passion on ml

@Ankansworld 3 жыл бұрын

K for Kilian!! How amazing these lectures are. Thanks, Prof. :)

@PriyanshuSingh-hm4tn 2 жыл бұрын

Amazing Way of Teaching. You're really a lifesaver, Sir.

@deepfakevasmoy3477 4 жыл бұрын

Please, please share your other courses online.. Its simple beautiful and very clearly explained. I just enjoy them, without even trying hard to understand :)

@MrJackstarman 5 жыл бұрын

could you by any chance link the "projects" that you set for you class please? it would be very beneficial for myself however if it would be time consuming dont worry about it :)

@billwindsor4224 4 жыл бұрын

@jack - information about the projects and homework assignments is in the Notes to Lecture #1

@yamacgulfidanalumni6286 4 жыл бұрын

You should have 800k subs, not 8k

@AshutoshPandey-se8vt 2 жыл бұрын

i guess even 80M would be less, for the content he provides

@tanaytalukdar4875 2 жыл бұрын

Hi Professor, It's a great lecture series. Thank you for sharing them with us. My question is if Bayes classifier have zero variance and bias error then why don't we always get the best result for Bayes?

@in100seconds5 4 жыл бұрын

Dear Kilian. One question please, when we want to detect underfitting, just looking at training error and test error is enough, isn’t it?( we see both are being high, we conclude underfitting). Why do we need that graph(the one with increasing training instances) ?

@kilianweinberger698 4 жыл бұрын

Yes, that’s fair. If it is a very clear case (and train/test error are high and almost identical) then you won’t need the graph.

@in100seconds5 4 жыл бұрын

Kilian Weinberger awesome. I got the idea. Thanks a lot.

@vincentxu1964 5 жыл бұрын

A little bit confusion about the noise. In the lecture note, it has mentioned that the algorithm can never beat the noise part, cause it is the intrinsic part of the data. But in the lecture, you mentioned that we could improve the noise by introducing more features. My confusion is that shouldn't introducing more features is a choice of algorithm and considered as part of algorithm? Great thanks.

@kilianweinberger698 5 жыл бұрын

Here I consider the feature extraction part independent of the algorithm. I.e. step 1 is you create a data set (with features), step 2 you train an ML algorithm on that data set. In high noise settings the algorithm (step 2) cannot really do much, but you can improve the results through the data (either by cleaning up features, removing mislabeled samples, or creating new features).

@ivanvignolles6665 5 жыл бұрын

In the case of the features ,the way I see it is that the noise is a term that represent all the features in the universe that I don't take into account but could affect my data. For example if I want to predict if my water is boiling given the temperature I would see that for a given temperature sometimes is boiling and sometimes is not, that could be caused by many factors and one is a variation in pressure. So if now I consider not only the temperature but also the pressure, i.e. added a new feature, the noise in my data should be reduced because I'm considering a new factor that affects my data.

@jiahao2709 4 жыл бұрын

Really beautiful explaination about the relation between regularization and early stop. Just some question for beyes optimal classifier for regression at 22:48, I know it used for classification, how it use for regression?

@kilianweinberger698 4 жыл бұрын

If you have P(Y|X) you would typically predict the expected value or its mode (depends a little on the application).

@colinmanko7002 4 жыл бұрын

Thank you! Ive watched a few of these lectures now, and you have a brilliant way of sharing these concepts. I’m curious to hear your thoughts on why you would reserve a testing set in the implementation of k-fold cross-validation you describe. My understanding is that you propose to split the data (D) from distribution (p) into a training and testing set and then run k-fold cross-validation on the training set. You would then use the testing set to estimate E[summation(loss(xi, yi))]~p, ie; the expectation of the cost from distribution p. Yes, this would be an unbiased estimate of this cost. However, is there not some value k whereby simply running k fold cross-validation on the entire dataset (D) would converge to this expectation; E[summation(loss(xi, yi))]~p? Of course k=1 would be biased, but you reduce that bias when you average over multiple cross validation sets. My hypothesis would be that it would converge to E[summation(loss(xi, yi))] ~p and be that error you could tell your boss. It boils down to expectation in both scenarios really being the expectation of cost on D. We use D because we don’t have p. Would you point me in a good direction here?

@colinmanko7002 4 жыл бұрын

Hmm, there may be bias given that we don’t shuffle D each time we validate on a new subset.. Will think on it

@kilianweinberger698 4 жыл бұрын

The reason is that you use k-fold cross validation to pick your hyper-parameters across a set of many options. Let's say for nearest neighbors with k=1,3,5,7 you get an average cross validation error across the leave-out-sets of 3.1%, 4.2%, 2.7%, 3.5% respectively. So you pick k=5 and conclude your validation error is 2.7%. Very likely your true test error will be higher than that. The reason is that you picked the lowest value on the validation sets, so you are overly optimistic (i.e. you cheated ;-)). An estimate of the test error should always be that you have all hyper-parameters fixed prior to the evaluation, you run your classifier over the set that this classifier has never seen before and measure the error. Ultimately that's how it will also be when your classifier is exposed to new data. Hope this helps.

@colinmanko7002 4 жыл бұрын

Kilian Weinberger thank you for your reply! And thanks again for your videos. I’ve really enjoyed them over the last week. I hope you have a good day

@deltasun 4 жыл бұрын

great great lecture! I have one doubt: how does set size influence bias? after all bias is related to the classifier average with respect to fixed size datasets (the n parameter), right? If this n does influence the variance it should influence the bias as well, shouldn't it?

@kilianweinberger698 4 жыл бұрын

Actually n does not affect bias. Bias is the error that the expected classifier would still make. However, as n->large, the variance will become very small and the remaining error will be dominated by bias. So if you have n very large and still high error, it is usually good to fight bias (i.e. get a more powerful classifier).

@deltasun 4 жыл бұрын

@@kilianweinberger698 but if I have few data the possibility of being highly biased should be lower, right? (it is easier to fit less data than lots of them)