17. Learning: Boosting

  Рет қаралды 313,141

MIT OpenCourseWare

MIT OpenCourseWare

Күн бұрын

MIT 6.034 Artificial Intelligence, Fall 2010
View the complete course: ocw.mit.edu/6-034F10
Instructor: Patrick Winston
Can multiple weak classifiers be used to make a strong one? We examine the boosting algorithm, which adjusts the weight of each classifier, and work through the math. We end with how boosting doesn't seem to overfit, and mention some applications.
License: Creative Commons BY-NC-SA
More information at ocw.mit.edu/terms
More courses at ocw.mit.edu

Пікірлер: 139
@noobshady
@noobshady 5 жыл бұрын
“The definition of genius is taking the complex and making it simple.” ― Albert Einstein Thank you Dr Winston and MIT. I really need to send these to my teachers where the only thing they do is reading slides.
@pumpitup1993
@pumpitup1993 4 жыл бұрын
Brother u are not the only one
@JACQNARC
@JACQNARC 3 жыл бұрын
@@pumpitup1993 True
@daniellsitio
@daniellsitio 3 жыл бұрын
RIP Prof Patrick, thank you for the kind lectures.
@omidmo7554
@omidmo7554 8 жыл бұрын
An outstanding teacher. I appreciate Dr Winston. He explains confusing stuff in a very simple way.
@AbhishekSharma-nr5fh
@AbhishekSharma-nr5fh 6 жыл бұрын
I
@ssssssstssssssss
@ssssssstssssssss 5 жыл бұрын
Yeah. And ya gotta love his dry humor.
@RazerBlackShark
@RazerBlackShark 5 жыл бұрын
true
@bruceWayne19993
@bruceWayne19993 2 жыл бұрын
Hi I didnot understand this concept,can you share some references where I can better understand the concept?
@prasunshrestha7692
@prasunshrestha7692 3 жыл бұрын
I don't think I would have ever said this in academia, but I can binge-watch all his lectures. Amazing!
@EvanKozliner
@EvanKozliner 7 жыл бұрын
It's incredible that there are even empty seats in this lecture. Truly an amazing professor
@jamespaz4333
@jamespaz4333 4 жыл бұрын
that's not an issue. What those sleeping in the front row? too bad for them.
@CRJessen
@CRJessen 6 жыл бұрын
This is such a clear path to understanding. Thank you, Prof.Winston.
@emanuelen5
@emanuelen5 6 жыл бұрын
A comment on the transcription: A lot of times when it is transcribed [Inaudible], he says "Schapire", which is the inventor of Boosted learning (Robert Schapire)
@MaxRoth
@MaxRoth 9 жыл бұрын
I am amazed that Dr. Winston uses no notes. This is all in his head. Crazy.
@krakenmetzger
@krakenmetzger 4 жыл бұрын
The trick is he writes the notes on the chalkboard beforehand
@Stl71
@Stl71 4 жыл бұрын
If he teaches the same every year, then no surprise he can remember eveything after a decade or so.
@philtinn3015
@philtinn3015 3 жыл бұрын
The morning before each lecture, Patrick rehearses the chalkboarding. Make no mistake: hard work pays off.
@MilesTeg87
@MilesTeg87 Жыл бұрын
@@Stl71 that's very true. It's easier to remember and be good at giving a practiced speech reinforced by repetition. What separates him from the ordinary is that he is constantly updating the material, thinking about it, working on it, teaching about it, and also thinking about biology, psychology, evolution and how they integrate, explain, support or at least refute each other. Also he has a script, obviously the class is practiced beforehand to fit the ideas into the time frame, also for his ramblings about how we became human (i'm a biologyst so I really enjoy and adhere/agree to his thinking). Having ideas pre-written on the board, plot twists (like writing decision trees and then change that to tree stumps). He really likes, works, believes and put his knowledge to create something superb.
@alexhwang334
@alexhwang334 Жыл бұрын
At least 4 decades of teaching this. But remember things in AI evolved quite a bit. He did not get to repeat the same stale material year after year. You do need to know the material thoroughly to present the way he did. Most professors can't do that. In fact, I almost think that unless you can present without note, you should not profess. He was one of the great ones.
@Timvoortaal
@Timvoortaal 7 жыл бұрын
Man, that straight line on the board in the beginning, what a pro
@shashankaich7632
@shashankaich7632 4 жыл бұрын
One of my greatest and most admired professors. An Inspiration for the whole generation.
@swimmingsun87
@swimmingsun87 9 жыл бұрын
hand writing is amazing
@vurtnesaerdna
@vurtnesaerdna 2 жыл бұрын
Thank you Mr Winston. Rest in peace, your spirit will always be with us!
@alaaeltayeb5794
@alaaeltayeb5794 4 жыл бұрын
thank you, may you rest in peace
@iPyson
@iPyson 6 жыл бұрын
The guy sleeping in the 5th row at 23:13 though... forever on the internet sleeping in class
@apanapane
@apanapane 7 жыл бұрын
Thank you for this lecture.
@alifawzi4566
@alifawzi4566 7 жыл бұрын
i would like to thank you about your fantastic contribution in the all science &especially in computer field
@naheliegend5222
@naheliegend5222 5 жыл бұрын
3:45 : that free-handed drawn line is outstanding! :D
@GigaFro
@GigaFro 7 жыл бұрын
Phenomenal lecture. Easy to understand and as said before, great hand writing. Thanks for sharing, it is much appreciated :)
@iliTheFallen
@iliTheFallen 7 жыл бұрын
Perfect teaching! Great job, Sir.
@suketudave7508
@suketudave7508 3 жыл бұрын
27:32 I love the way he writes 'e' for e to the power always....
@aop2182
@aop2182 5 жыл бұрын
I really enjoyed this video and I watched twice. He was talking about Adaptive boosting if someone is interested in min error bound you can find prove that the bound is exponential. I wish he talked about Gradient boosting and xgboost as well! Thanks MIT open course!
@fahedalenezy9355
@fahedalenezy9355 5 жыл бұрын
Hi, xgboost wasn't developed yet back in 2010. it's invented in 2014.
@forthrightgambitia1032
@forthrightgambitia1032 2 жыл бұрын
Check Kilian Weinberg's lecture for gradient boosting trees.
@anuragsodhi
@anuragsodhi 7 жыл бұрын
Thanks for amazing lecture!
@zukofire6424
@zukofire6424 9 ай бұрын
I love Pr. Patrick Wilson, I love MIT, TSM, I hate my school
@TheAIChannel
@TheAIChannel 7 жыл бұрын
Way to go Doctor, the explanation is very clear and unique. I was just wandering if anyone has an idea what application was being used to demonstrate the algorithm.
@Tzvetkov
@Tzvetkov 6 жыл бұрын
It's his own. He's made it for the demonstrations as far as I know from other comments.
@geevarghesegeorge1424
@geevarghesegeorge1424 4 жыл бұрын
This might seem a bit intimidating at first, but give it another go and you will be able to digest this!
@bohrbrar
@bohrbrar 8 жыл бұрын
Great lecture...
@zkhandwala
@zkhandwala 4 жыл бұрын
Great lecture. I would LOVE to see an updated version of it (without having to go to Cambridge...), as much has changed over the past 10 years. For one thing, I imagine the focus would now be on gradient boosting... Anyway, I'm curious to hear people's thoughts on the implied quiz question around 8m15s. I thought about it for a few minutes, and my feeling is that as long as all of the individual models have the same classification accuracy (i.e., the sizes of the small circles are the same), ensembling can never hurt. Yes/no?
@ramnewton8936
@ramnewton8936 4 жыл бұрын
Yes, I agree with you. Say the area of a single circle is A. I'm assuming all the error circles are of same radius. The error of a vanilla model would be A. Now, for a 3 model ensemble to perform worse than vanilla model, the error should be greater than A. ie, the area of {the union of regions that have at least two intersecting circles} should be greater than the area of a single circle. Intuitively, I feel we can never arrange the circles in such a way that this condition is met.
@anasbekheit5479
@anasbekheit5479 2 жыл бұрын
I know it's a bit late but the idea of boosting is statistically speaking if your base models have a better than 50% chance of being right they 'll tend to boost eachother's performance but on the other hand if they have a lower than 50% chance of being correct, then they 'll boost eachother to miss classifying the dataset.
@weiqiangdong1022
@weiqiangdong1022 4 жыл бұрын
Thank you. Rest in peace.
@calop002
@calop002 8 жыл бұрын
Awesome teacher
@irfanshaikh-ub9ks
@irfanshaikh-ub9ks 5 жыл бұрын
your explanation is awesome i request you what you explain in theory do the same in Practical (small sample) that make understanding more clear
@sounakbhowmik2841
@sounakbhowmik2841 8 ай бұрын
He is incredible
@nikhilkumarjha
@nikhilkumarjha 5 жыл бұрын
So well explained :)
@JohnForbes
@JohnForbes 9 жыл бұрын
Amazing!
@adityanakate6516
@adityanakate6516 7 жыл бұрын
just awesome
@chymoney1
@chymoney1 Жыл бұрын
Superb!! God bless MIT
@WahranRai
@WahranRai 4 жыл бұрын
Rest in peace, Professor
@HechTea
@HechTea 5 жыл бұрын
"In conclusion, this is magic." lmao
@aakashblu
@aakashblu 7 жыл бұрын
Last part of Thank God Hole is excellent explanation
@sepehrgolestanian2431
@sepehrgolestanian2431 4 жыл бұрын
The lecture was ammmmaziiinggg!!!!
@jerrykam9247
@jerrykam9247 10 жыл бұрын
he draws a very "straight line"... amazing.. lol
@IonidisIX
@IonidisIX 5 жыл бұрын
Property of the thickness of the chalk and speed at which he was drawing. That is friction for you. It overpowered any tendency of his hand not to draw a straight line. :)
@luiservela
@luiservela 5 жыл бұрын
Is he saying that boosting doesn't overfit because it actually super-mega-over-fits so much that the volume of the "intruder" is too small to have any statistical significance? - brilliant.
@zhuyixue4979
@zhuyixue4979 5 жыл бұрын
I feel the last bit (50:30) on why boosting doesn't overfit insightful.
@Niels1234321
@Niels1234321 7 жыл бұрын
The not overfitting thing is really mind blowing, because it seems to me like the VC dimensionality of the demonstarted classifier is infinity. I was about to write a question like this: Does the volume of the space of which the classification result depend on an outlier decrease in any case, or are there cases (of low probability) in which they occupy more volume? I guess that the volume decreases, if there are good samples around the outlier, and that the volume can stay large if the outlier lies far away from the subspace in that the good samples lie. If that holds, it is still unlikely to get test data points in that volume even if it stays large. If somebody knows about this, please let me know
@rubiskelter
@rubiskelter 6 жыл бұрын
Right aisle : 2:38 He's exited.
@seul-kiyeom6222
@seul-kiyeom6222 5 жыл бұрын
Respect !!
@nomercysar
@nomercysar 7 жыл бұрын
Oh, wish I'd learn this in college. Close but not quite. Thanks MIT I guess
@amirsawiopa
@amirsawiopa 7 жыл бұрын
an amazing lecture ive enjoyed every second. question: would this work well for classification with very unbalanced data set? minority class at about 1 percent
@aop2182
@aop2182 5 жыл бұрын
why not just try it ?
@katateo328
@katateo328 Жыл бұрын
yeah, The name boosting sounds mystery but it is actually extremely easy. Excellent concept.
@jayb6080
@jayb6080 8 жыл бұрын
Excellent in every way. Just one question: I tried to implement this simple version but what I find strange is that some of my alphas are negative because that happens when the error is greater or equal to 0.5 but if that happens we dont have a weak learner right? So whats the deal with this case? I noticed that in the demo some of the alphas were negative too. How can I deal with this case? I would appreciate answers. Thanks for the great lecture and making that amazing knowledge available to the world!
@ConstantineKulak
@ConstantineKulak 8 жыл бұрын
+Gabriella Kiss If your binary classification algorithm gives >50% errors, just flip the sign and it becomes a "normal" weak classificator with less than 50% error rate.
@WepixGames
@WepixGames 4 жыл бұрын
R.I.P Patrick Winston
@katateo328
@katateo328 Жыл бұрын
it looks like the volume around correct classified points could be computed and that volume takes vast amount of the total volume. Hence the algorithm not overfitting. How to compute volume arount error classified points when all points are classified correctly?
@LinVincent
@LinVincent 9 жыл бұрын
这个老师讲课不怒自威,好有气场啊
@myreneario7216
@myreneario7216 6 жыл бұрын
At 16:25 doesn´t the orange line at the bottom symbolize the exact same thing as the orange line at the very left? Both say "Everything is +" or "Everything is -". And then we don´t have 12 classifiers but only 10.
@user-ol2gx6of4g
@user-ol2gx6of4g 6 жыл бұрын
No, those are for different dimensions.
@rakeshovr1
@rakeshovr1 6 жыл бұрын
Could someone throw more light? I didnt quite catch that
@aop2182
@aop2182 5 жыл бұрын
It's different dimension which mean they are different tests. Ex. x > -1 and y> -3 with both tests you can say the sample are all + or - but they are different tests.
@mixking5609
@mixking5609 5 жыл бұрын
Every line denotes two tests, and it still holds for the leftmost line. Therefore, 6 lines => 12 tests.
@ShubhamYadav-ut9ho
@ShubhamYadav-ut9ho 2 жыл бұрын
Can anyone please explain how the error rate is bounded by exp fn. I'm kindof getting the idea but still, there's just a small sense of doubt.
@jonnyradars
@jonnyradars 4 жыл бұрын
FYI: the lecture is about AdaBoost only
@sainathkumar7126
@sainathkumar7126 9 жыл бұрын
I would like to know what software was used in this lecture.Very interesting ,also can we have some practical examples where boosting will be used . How does boosting fair in comparison to other classifiers
@mitocw
@mitocw 9 жыл бұрын
Sainath Kumar Some of the demonstrations use the Java Runtime Environment. See the course on MIT OpenCourseWare at ocw.mit.edu/6-034F10 and see Demonstrations section for details.
@XArticSpartanX
@XArticSpartanX 4 жыл бұрын
@@mitocw typically when someone asks for software they are asking for the name of the program, not the language the software was written in
@gumikebbap
@gumikebbap 7 жыл бұрын
so how does the program choose the number of classifiers to use?
@sanjayharesh
@sanjayharesh 7 жыл бұрын
keep on adding a hypothesis unless the training error is 0.
@user-ol2gx6of4g
@user-ol2gx6of4g 6 жыл бұрын
Sanjay "unless" -> until
@BilalBarkati
@BilalBarkati 6 жыл бұрын
We already know the upper bound to the error rate epsilon so we can know beforehand how many iterations are needed.
@EranM
@EranM 5 жыл бұрын
Good question. It doesn't. It can train till 100% on the training set. A good way top stop training and Choose the Number of classifiers, is evaluating a Test set alongside with training. When test set error stop decreasing, you know its the best number of classifiers for that test set.
@henryzhang1809
@henryzhang1809 5 жыл бұрын
So, why several weak learners combine can become a strong learner? Can we prove it in probability?
@jbm5195
@jbm5195 3 жыл бұрын
Where can I get an explanation like this on bagging?
@raphaelseitz805
@raphaelseitz805 6 жыл бұрын
Why is a coin flip a weak classifier if p1>p2 with p1+p2=1? 0.5×p1+ 0.5×p2 still is 0.5.
@kumarrajendran1655
@kumarrajendran1655 6 жыл бұрын
A weak classifier is defined as something that has an error rate marginally lower than 50%. If you have a biased coin, it lands heads more/ less than 50% of the time. So you just predict +1 every time it lands heads. When you do this, your error rate, let's say e, will be greater than or less than 0.5. If it's less than 0.5, you have got a weak classifier. if it's greater than 0.5, predict -1, instead of 1, every time you get a head. Again, you have gotten a weak classifier.
@user-ol2gx6of4g
@user-ol2gx6of4g 6 жыл бұрын
because 50-50 is at maximum entropy and doesn't give you any useful information.
@doyltruddy902
@doyltruddy902 6 жыл бұрын
Could be a biased coin. You are assuming p1 = p2, but there could be a coin that is heavier on one side and so has a higher probability of landing on one side. That's all he meant.
@lucavecchi7638
@lucavecchi7638 5 жыл бұрын
this is not the demostration that boosting doesn't overfitt or am i be wrong?
@katateo328
@katateo328 Жыл бұрын
how is the volume of error classified point defined?
@solarstryker
@solarstryker 6 жыл бұрын
I didn't get the part where new weights are scaled to 1/2 what good does it do ?
@kumarrajendran1655
@kumarrajendran1655 6 жыл бұрын
The new computations of the weights doesn't involve computing any complex mathematical functions, like logarithm. You just divide by 2(1-e) or 2 (e). The other interpretation of 1/2 (you have equal number of positive and negative examples) - This is the hardest setting for a binary classifier to get right. If for example your training data is skewed (not 50/50 positive and negative) you can get a lower than 0.5 error rate just by predicting +1 or -1 all the time. This, I think, is pretty significant, otherwise instead of decision trees you could just randomly pick a dummy classifier that predicts +1 x percent of the times where x is sampled from [0,100].
@Soumonomics
@Soumonomics 7 жыл бұрын
wow
@Proman155
@Proman155 5 жыл бұрын
At 4:00 he said if error rate is towards 1 , we are dead...actually not true. It would mean that every classification is wrong and simply inverting that terrible classifier will make it an awesome classifier. But Amazing video, learnt so much, FILLED with Aha moments ! :D
@kaverisharma5368
@kaverisharma5368 9 жыл бұрын
Which Software is that?
@EranM
@EranM 5 жыл бұрын
How to caulculate the Error is missing
@TylerHNothing
@TylerHNothing 5 жыл бұрын
it's at 18:02 after he introduced the classifier
@rouhollahabolhasani1853
@rouhollahabolhasani1853 4 жыл бұрын
Holy shit!
@tedz2usa
@tedz2usa 5 жыл бұрын
Lol 2 guys asleep at 23:16 suddenly woke up when he yelled "add weights"!
@kevincui1631
@kevincui1631 5 жыл бұрын
lol. He is a great teacher, but I have to admit his voice made me wanna sleep as well. Had to turn on 1.25 speed.
@rockstarchampion5831
@rockstarchampion5831 4 жыл бұрын
How did that formula 27:30 come from?
@jonnyradars
@jonnyradars 4 жыл бұрын
that's explained clearly here kzbin.info/www/bejne/gqSuXqt9Zsh_j6M
@nitinkhola4491
@nitinkhola4491 4 жыл бұрын
i do not understand why the students don't find the jokes funny :D. great prof!
@Noelson
@Noelson 8 жыл бұрын
switch speed to 1.25 :D
@vladimir0681
@vladimir0681 7 жыл бұрын
same. great lecture though ;)
@gumikebbap
@gumikebbap 6 жыл бұрын
you're a genius!
@user-ol2gx6of4g
@user-ol2gx6of4g 6 жыл бұрын
1.5x for me
@Lod531
@Lod531 5 жыл бұрын
@@user-ol2gx6of4g I go at least x25
@olesianitsovych4632
@olesianitsovych4632 8 ай бұрын
rip
@naheliegend5222
@naheliegend5222 4 жыл бұрын
8:45 what is the answer of his question?
@eslammessi100
@eslammessi100 4 жыл бұрын
I think the 3 circles would be inside each others
@IamMoreno
@IamMoreno 5 жыл бұрын
Where can I find a playlist with all the videos on artificial intelligence?
@mitocw
@mitocw 5 жыл бұрын
Here is the link to the playlist: kzbin.info/aero/PLUl4u3cNGP63gFHB6xb-kVBiQHYe_4hSi. Good luck with your studies!
@benjaminkaarst
@benjaminkaarst 7 жыл бұрын
What do you mean by "data exaggeration"?
@syedehtesham6684
@syedehtesham6684 7 жыл бұрын
The "exaggeration" refers to the increased weights of the erroneously classified instances. Let me explain using the same example prof has used. Suppose you train 100 instances and make a model(h1), now when you run the trained model against the 100 instances, you get 70 correct and 30 wrong classified instances. You increase the weights of the 30 wrongly classified instances and train the next model(h2). You can continue this till you reach a desired threshold.
@slkslk7841
@slkslk7841 4 жыл бұрын
@@syedehtesham6684 thanks
@BilalBarkati
@BilalBarkati 6 жыл бұрын
While telling the advantages of Thank God hole number 1 around 46:00, The professor mentioned that we don't need to compute logarithms and also we don't need to compute alphas. I don't understand why alphas are not required since we will need alphas to get the final answer since H(x) is a weighted sum of h(x) and the weights are alphas so I think we need to compute alphas anyhow. Can someone please tell me where I am missing ?
@kingmanzhang
@kingmanzhang 5 жыл бұрын
I have the same question. Did you figure out?
@gauravsrivastava9428
@gauravsrivastava9428 5 жыл бұрын
The sum of the new weights resulting from old weights that were correctly classified will be 1/2 times. This means we can sum up all the old correct weights and scale this by some constant that the resulting sum is 1/2. Now each of the new weight coming from these old weights would be old weight times the constant used. Similar technique could be used to get the new weights coming from incorrectly classified old weights. I feel this is what he meant.
@aop2182
@aop2182 5 жыл бұрын
Because new weights added up is 1/2 so you just need to do some MANIPULATIONS to make the sum to 1/2 based on previous corrective / wrong predictions. I just wonder how to find those scales.
@qzorn4440
@qzorn4440 7 жыл бұрын
so to solve a data set, is there a program to 1st determine which neuro software is the correct choice to produce the correct results, KNN, SVM, Boosting, etc...?
@dondan2504
@dondan2504 7 жыл бұрын
try weka
@Niels1234321
@Niels1234321 7 жыл бұрын
If you don't know what is best, use everything and combine the results. If you have arbitrary classifiers (some SVM's with different weightening, some NN's, some decision trees, all mixed), you can map your data x to a vector containing the classifier results and then train a simple linear classifier on top of it.
@JD-ov5gt
@JD-ov5gt 2 жыл бұрын
Many stumps aka many winstons
@jasons8963
@jasons8963 6 жыл бұрын
Handwriting model
@zingg7203
@zingg7203 5 жыл бұрын
Neural nets naive? Time does not think so.
@kellybrower301
@kellybrower301 3 жыл бұрын
"That’s the thank God hole”
@premgarg5534
@premgarg5534 3 жыл бұрын
And my teacher is uploading on youtube in unlisted mode lol 😒😒
@AmeerulIslam
@AmeerulIslam 3 жыл бұрын
Boy I almost didn't understand anything!
@seanrimada8571
@seanrimada8571 7 жыл бұрын
Why is there a sheep on the first row?
@Soulless0815
@Soulless0815 6 жыл бұрын
guess he is albino, as he has pretty bad eyesight, you can see him writing in the SVM-Video like literally 10cm with his head from his script....
@KARAB1NAS
@KARAB1NAS 5 жыл бұрын
I died from boredom
18. Representations: Classes, Trajectories, Transitions
48:58
MIT OpenCourseWare
Рет қаралды 56 М.
16. Learning: Support Vector Machines
49:34
MIT OpenCourseWare
Рет қаралды 1,9 МЛН
Who’s more flexible:💖 or 💚? @milanaroller
00:14
Diana Belitskay
Рет қаралды 19 МЛН
UFC 302 : Махачев VS Порье
02:54
Setanta Sports UFC
Рет қаралды 1,4 МЛН
WHY THROW CHIPS IN THE TRASH?🤪
00:18
JULI_PROETO
Рет қаралды 8 МЛН
Watermelon Cat?! 🙀 #cat #cute #kitten
00:56
Stocat
Рет қаралды 20 МЛН
10. Introduction to Learning, Nearest Neighbors
49:56
MIT OpenCourseWare
Рет қаралды 262 М.
Boosting - EXPLAINED!
17:31
CodeEmporium
Рет қаралды 48 М.
1. Introduction and Scope
47:19
MIT OpenCourseWare
Рет қаралды 1,7 МЛН
CS480/680 Lecture 22: Ensemble learning (bagging and boosting)
1:07:24
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 21 М.
Trevor Hastie - Gradient Boosting Machine Learning
44:14
H2O.ai
Рет қаралды 149 М.
26. Chernobyl - How It Happened
54:24
MIT OpenCourseWare
Рет қаралды 2,8 МЛН
Lecture 1: Introduction to Superposition
1:16:07
MIT OpenCourseWare
Рет қаралды 7 МЛН
Bayes theorem, the geometry of changing beliefs
15:11
3Blue1Brown
Рет қаралды 4,2 МЛН
Who’s more flexible:💖 or 💚? @milanaroller
00:14
Diana Belitskay
Рет қаралды 19 МЛН