Machine Learning Lecture 33 "Boosting Continued" -Cornell CS4780 SP17

Рет қаралды 17,286

Kilian Weinberger

Күн бұрын

Lecture Notes:
www.cs.cornell....

Пікірлер: 27

@varunjindal1520 4 жыл бұрын

The fact that all the topics are covered so exhaustively make it a must watch. I started from Decision Trees, but I will re-watch whole series. Thank You Killian for posting videos.

@amuro9616 5 жыл бұрын

I think the intuition explained in all his lectures are amazing and so helpful. This is probably one of most approachable lectures on ML there is.

@ioannisavgeros167 5 жыл бұрын

Thanks a lot professor for the entire course and greetings from Greece. I studied data science and machine learning on my master but your lectures are entirely pure masterpiece.

@projectseller702 5 жыл бұрын

I watched some videos and they are really amazing. He explained very easy to understand. Thanks you Sir for sharing it. I appreciate it

@sudhanshuvashisht8960 4 жыл бұрын

Unlike the Squared loss function, to my understanding, the exponential loss won't be minimum at H = Y (vector of labels). I take a very simple case of a dataset of the two labels (say [+1,+1]), Loss function at H = [+1,+1] is 2*exp(-1 * 1) whereas at H = [+2,+2] is 2*exp(-1*2) where the latter is clearly the minimum. Does it contradict your contour lines at 5:05, Dr. Killian? Grateful for all the explanations you've provided so far.

@kilianweinberger698 4 жыл бұрын

Yes, good point, for the exponential loss the solution is always somewhere in the limit. Still, the principle is the same ...

@sebastianroubert7543 5 күн бұрын

34:15 in your first demo, you have a training error of zero. but in the update rule, the demoninator contains error*(1-error) so the weight update should explode? i could understand if you have a floor on your error value such that then all weight updates are about the same for all the data points at the limit but i don't understand how to make protecting against zero error more explicit

@manikantabandla3923 Жыл бұрын

Such an amazing lecture on Gradient boosting. Can you provide any reference towards Gradient Boosted Classification Trees. -- Like what is the loss function used in that case? --What is the training dataset used for building classifier ht? Thanks in advance.

@vocabularybytesbypriyankgo1558 4 ай бұрын

Thanks a lot !!

@saad7417 5 жыл бұрын

Sir, could i please please pretty please get the coding assignments too? These intuition building in these lectures is perfect, the only thing needed is i think the coding assignments

@gregmakov2680 2 жыл бұрын

sao chi co 10 ngan luot coi video nay!! trong khi may video nham kia lai qua troi nguoi coi :D:D:D cac ban sinh vien xu thien dia nguc hong biet chon hang gi het tron ah.

@Gauravkriniitkgp 4 жыл бұрын

Hi professor, great intuition for explaining why adaboost overfits slowly and can be observed on log(#iterations) scale. One question though, I work with GBM and XGboost all the time and they also behave very similar to adaboost when it comes to overfitting. Do you have any intuition behind this?

@kilianweinberger698 4 жыл бұрын

Hmm, good question ... XGBoost in its vanilla form is just GBRT with squared loss - so the same rational doesn’t apply here. My guess would be that your data set is large enough that it may take a long time to overfit.

@Theophila-FlyMoutain 11 ай бұрын

Hi Professor, thank you so much for the lecture. I wonder if it's possible that AdaBoost stops when training error is zero? Because I see from your demo, after training error is close to zero and exponential loss goes smaller and smaller, the test error doesn't change too much. I guess we don't need to waste time on letting exponential loss smaller and smaller.

@kilianweinberger698 9 ай бұрын

No it typically doesn't stop when zero training error is reached. The reason is that even if the training error is zero, the training LOSS will still be >0 and can be further reduced (e.g. by increasing the margin of the decision boundary).

@jiahao2709 4 жыл бұрын

build in system against overfitting, because \alpha decrease

@AmitKumar-vy3so 5 жыл бұрын

Two loss are the global loss and the one with the weak learners right,Sir?

@galhulli5081 4 жыл бұрын

Hi Prof Kilian. Quick question on Boosting method. I watched the videos twice already (and I was also a certificate program student) but couldnt see any explanation.. Previous lecture mentions that one of the requirements of boosting method is that the weak learners must at least point to the right direction.. How can we check that the weak learners are on the right direction to run the boosting? Does this happen with trial-error or is there a method or a way? Thanks for the great class!!!

@kilianweinberger698 4 жыл бұрын

In AdaBoost, the error on the re-weighted training set must be 0.5 (or you would just flip the predictions), so you stop the moment your best weak learner has an error =0.5 (which means you just cannot learn anything useful anymore). In AnyBoost the inner product of the weak learner predictions and the gradients should be >0. (Same thing, it can never be

@galhulli5081 4 жыл бұрын

Thank you very much for the explanation, I will check my runs accordingly!

@sanjaykrish8719 4 жыл бұрын

Can we say Adaboost is like coordinate descent in the functional space??

@kilianweinberger698 4 жыл бұрын

Yes, with an adaptive stepsize.

@gregmakov2680 2 жыл бұрын

hahahahaha, adaboost is never overfitting, of course!! fixing a bug tends to create another bug :D:D

@MrAngryCucaracha 5 жыл бұрын

how can adaboost work for svm? wouldnt the linear combination of such a linear classifier be linear?

@kilianweinberger698 5 жыл бұрын

In AdaBoost we assume each weak learner only outputs +1 / -1. So you have to take the output of each linear classifier and apply the sign() function. Now you are combining multiple linear classifiers in a non-linear fashion.

@MrAngryCucaracha 5 жыл бұрын

@@kilianweinberger698 Thank you very much for your answer Prof. Weinberger. I see now that by taking only the sign of each classifier you are applying the step function and they are no longer linear, similar to an activation function in a neural network (now I wonder if there would be any advantage of using other functions for boosting). I would also like to thank you for your amazingly clear and understandable course. I can say that I have understood all topics (even gaussian processes, which I previously believed to be impossible), and I will be very interested too watch if you decide to do any further videos, either other courses or opinion pieces, and to contribute to any patreon-like funding. Best regards.