No video

Lecture 02 - Is Learning Feasible?

  Рет қаралды 475,724

caltech

caltech

12 жыл бұрын

Is Learning Feasible? - Can we generalize from a limited sample to the entire space? Relationship between in-sample and out-of-sample. Lecture 2 of 18 of Caltech's Machine Learning Course - CS 156 by Professor Yaser Abu-Mostafa. View course materials in iTunes U Course App - itunes.apple.com/us/course/ma... and on the course website - work.caltech.edu/telecourse.html
Produced in association with Caltech Academic Media Technologies under the Attribution-NonCommercial-NoDerivs Creative Commons License (CC BY-NC-ND). To learn more about this license, creativecommons.org/licenses/b...
This lecture was recorded on April 5, 2012, in Hameetman Auditorium at Caltech, Pasadena, CA, USA.

Пікірлер: 232
@iAmTheSquidThing
@iAmTheSquidThing 4 жыл бұрын
Learning is entirely feasible when this guy is your teacher.
@y-revar
@y-revar 5 ай бұрын
Absolutely (and there are > 1 color pair at our disposal to disambiguate two different concepts)
@michaelnguyen8120
@michaelnguyen8120 5 жыл бұрын
Does anyone else find this guy absolutely hilarious for some reason? Something about the look on his face makes you feel like he's constantly thinking "Yeah, I'm killing this lecture right now". When he whips out that smug half smile I can't help but laugh out loud. You can tell he loves teaching. Great set of lectures.
@-long-
@-long- 4 жыл бұрын
I agree, I saw wisdom in his face. #respect
@supriyamanna715
@supriyamanna715 2 жыл бұрын
actually the updating process is there in his mind, and always expressed in his face
@kora5
@kora5 7 жыл бұрын
Such a marvelous lecture! The logical sequence, the explanation, the joke, the intuitutive powerpoint/animation, ... I wish all my lectures are like this
@tomvonheill
@tomvonheill 7 жыл бұрын
Agreed, this guys is SO good, great job sprinkling in jokes to keep everyone's attention
@kezwikHD
@kezwikHD 5 жыл бұрын
Ohhh yes...so much better than my university. I get more of the stuff than in my own lectures even when i don't speak that much english and when i am currently just in the first semester at my place.
@plekkchand
@plekkchand 5 жыл бұрын
where was the joke?
@faruksn
@faruksn 10 жыл бұрын
19:00 Now, between you and me, I prefer the original formula better. Without the 2. However, the formula with the 2s has the distinct advantage of being … true. So we have to settle for that. Best quote ever on the Hoeffding's Inequality. :)
@supriyamanna715
@supriyamanna715 2 жыл бұрын
56:41 ooo, that's the hypothesis!! Nothing I need more than that. Thanks Professor!!!
@d13tr
@d13tr 5 жыл бұрын
If this video is confusing to you, consider the following: The example at 9:34 is only to show that we know something about the entire set, based on a sample. Basically, it says the bigger the sample, the closer v relates to u (Hoeffding). At 28:00, forget the above example. We are not trying to make a hypothesis for that example. The new values have nothing to do with above example. From this point on, we have a random data set (X) with an unknown function f. We want to know if we could make a hypothesis h to predict results. In other words: is learning feasible? So in the new bin, the probability of how many points are green is a measurement for how correct a hypothesis is. We do not know how many are green, so we take a sample. In this sample, we get the relation between correct and incorrect results of the hypothesis and this says something about the entire bin (Hoeffding). So if the sample is sufficiently big and has a lot of positive predictions than yes, learning is feasible. Or not? -> 33:30 okay? okay.
@AvielLivay
@AvielLivay 3 жыл бұрын
Yea, it's so misleading. The same marbles can plays two different roles. First it's used for measuring the probability of getting green and on the second time it's used for checking if a hypothesis h is correct (h(x)=fx(x)) or wrong (h(x)!=f(x)). He's a good professor but he's confusing.
@radicalengineer2331
@radicalengineer2331 2 жыл бұрын
can you please tell me how exactly mue is defined, picking means, I purchase both balls and defining the % how much of which 1 we got there in the bin after mixing or else, how we're defining picking red ball because ball are being picked in sample, like we pick 9 balls at a time and then define the probability of the same as "nue", but how that's being incorporated with the "mue"
@jorgetimes2
@jorgetimes2 4 ай бұрын
Well, this was for a long time the most baffling point of the entire lecture for me. However, when complemented with the book, it suddenly hit me: although we cannot explicitly compute f(x) when comparing it to g(x), since f is unknown, hence the colors themeselves are completely unknown to us, what we CAN do is to view randomly picked x's as samples from probability distribution P, where the x is red with probability mu and green with probability 1 - mu. That's it.
@sagarrathi1
@sagarrathi1 5 жыл бұрын
How much does this person knows, if he calls such a big concept just as simple tool. Respect.
@sharmilavelamur8342
@sharmilavelamur8342 8 жыл бұрын
Professor, your lectures are so enjoyable that I look forward to "learning" :). Thank you!
@NaveenKumar-nd5ts
@NaveenKumar-nd5ts 8 жыл бұрын
Brilliant lecture. A different approach than that of Andrew ng's. Loved it !!
@mukulkumar2316
@mukulkumar2316 3 жыл бұрын
It's way better than Andrew ng. I am a math major.
@letswasteayear7908
@letswasteayear7908 Жыл бұрын
The way andrew teaches is very bland. Whereas this guy is a storyteller.
@thekolbaska
@thekolbaska 10 жыл бұрын
The Coursera ML course assumes you're an idiot to start with, teaches you little and then proclaims you an "expert". This course assumes substantial background, teaches you things in-depth, and at the end is still humble about how much knowledge is gave you. Andrew Ng's KZbin lectures recorded at Stanford are quite good though.
@s9chroma210
@s9chroma210 4 жыл бұрын
Very well put, I was really frustrated with the coursera videos when I found this series and the experience has been much better.
@supriyamanna715
@supriyamanna715 2 жыл бұрын
@@s9chroma210 how're you mowadays??
@s9chroma210
@s9chroma210 2 жыл бұрын
@@supriyamanna715 Im doing quite good. Did this and a few other courses which really helped!
@FsimulatorX
@FsimulatorX 2 жыл бұрын
@@s9chroma210 which coursera course are you talking about that’s ‘drumming things down’?
@emrefisne9743
@emrefisne9743 8 ай бұрын
he is literally manifestation of "how to teach"
@FsimulatorX
@FsimulatorX 2 жыл бұрын
This is such a nice complementary course to Andrew Ng’s videos. I would seriously consider paying for quality lectures like these. Eternally grateful to Caltech for providing this one free of charge!
@UserUser-pv2wo
@UserUser-pv2wo 8 жыл бұрын
Thanks to pofessor and Caltech, sending me back to my youth! I recall myself, excited by outstanding lecturers, I met then.... His speach and passion perfectly colourizes the topic and, I beleive, is of great help to fogeign students to understand.
@zenicv
@zenicv 10 ай бұрын
Abu-Mostafa is a real genius in explaining complex things in simple terms...The real game changer is Hoefflding's inequality because it allows us to model the learning problem in a way that bounds the uncertainty independent of unknown parameter(mu). The only thing that remains is a tradeoff between error tolerance (epsilon) and sample size(N) and is captured by the relation P(|nu - mu| > epsilon)
@sreeragm8366
@sreeragm8366 4 жыл бұрын
In a world of bestest MOCCs this playlist stands apart.
@ProdTheRs
@ProdTheRs 4 жыл бұрын
This lecture is perfect! I recommend complementing it with Andrew Ng lecture 9 on learning theory from his youtube machine learning course. This prof. Is VERY good at conveying the intuition behind, while Ng will go more deeply into the Maths. They both complement each other in a perfect way.
@pinocolizziintl
@pinocolizziintl 11 жыл бұрын
It would be great to see the Professor in some courses on Coursera. He is one of the best one I've ever heard. Thanks!
@user-wn1vz8yt9j
@user-wn1vz8yt9j 6 жыл бұрын
Q&A session is great, which explains a lot questions in my mind. Especially for me without looking at other materials
@kezwikHD
@kezwikHD 5 жыл бұрын
Hey, i just want to thank caltech sooo much for this course. I am currently studying computer science in the first year, with the goal of Machinelearning and AI. But most of the courses i have to make are out of my interests. I unterstand that a lot of those courses are juat basic for some of the stuff in higher semesters, but since i am studying out of interest and not because of the graduation, i only need the basics for the stuff i am interested in in higher semester, like machine learning. All the rest is not really needed. And that is why this course really is what i want, because i can learn what my interests are in and don't have to deal with all the other stuff that i in specific won't need. I even understand more in this course than i do in the lectures of my university. Even when i don't speak that much english. Your Professor is really good with explaining and keep that style of PDFs. Because the PDFs are why i don't get anything in my university...there is a minimum of 10 formulas and 150 words per slide, so basically every slide is just a wall of text the professor would read off of :/ Thank you caltech
@FsimulatorX
@FsimulatorX 2 жыл бұрын
How are you doing today?
@kezwikHD
@kezwikHD 2 жыл бұрын
@@FsimulatorX Actually i now understand, why we learn all the boring basics and all the theory. Still i agree to a lot of the things i stated above (in very poor english as i now must confess ^^). I actually did some courses on DeepLearning and Pattern Recognition and Artificial Intelligence, but to be honest, my professors don't even bother explaining the stuff, they just give a ton of formulas and that's it. Therefore i learn much more just researching on my own. However to answer your question, i am doing quite well. I understand much more than i did back then. I am working for an international company as student and i am about to finish my Bachelors degree. Thank you for asking :) To anyone feeling the same way about their own university: don't give up, it gets better and even if the lectures don't, you still learn new stuff (and even if it is just "how to research" which by the way is the most valuable of all)
@neilbryanclosa462
@neilbryanclosa462 6 жыл бұрын
This is an amazing lecture. Looking forward for watching the next lecture videos.
@edvaned8207
@edvaned8207 4 жыл бұрын
Muito grato a UFRJ pela tradução. Excelente iniciativa para todos nós aprendentes autônomos.
@philipralph
@philipralph 7 жыл бұрын
@45:24: I got 5 heads!!! Actually a total of 7 consecutive heads before my first tails. You always think it happens to someone else...
@Rafaelkenjinagao
@Rafaelkenjinagao 2 жыл бұрын
Props for the method of explaining the Hoeffding's Inequality. I find that going on about each element of the equation separately facilitated a lot to understanding it. Congratulations!
@msharee9
@msharee9 10 жыл бұрын
I cannot thank you much for this lecture. You make machine learning math a piece of cake.
@rain531
@rain531 2 жыл бұрын
Day 2 done. Amazing lecture. Thank You Professor Yaser and Caltech for making them open to public.
@jyotsnamasand2414
@jyotsnamasand2414 Жыл бұрын
Superb teacher! His lectures are so clear and intuitive that they make 'learning' delightful.
@MatthewRalston89
@MatthewRalston89 4 жыл бұрын
Dr. Mostafa, thank you for the internesting lectures. I watched these while running on the treadmill for many days until it makes sense. I am happy that you let the world watch your great work and amazing lecture style. More students would love math if they had you for a teacher. Thank you!
@FsimulatorX
@FsimulatorX 2 жыл бұрын
Your comment about running on a treadmill for many days while trying to understand this lecture made me laugh 😂
@ProdTheRs
@ProdTheRs 4 жыл бұрын
The coin analogy for multiple bins was SO SO SO SO GOOD.
@PotadoTomado
@PotadoTomado 8 жыл бұрын
Prof. Abu-Mostafa is the man! Super cool guy.
@LessTrustMoreTruth
@LessTrustMoreTruth 11 жыл бұрын
Thumbs up to Professor Abu-Mostafa. Fantastic professor, fantastic sense of humor.
@minhtamnguyen4842
@minhtamnguyen4842 5 жыл бұрын
Love both his voice and jokes. A brilliant professor
@ningli335
@ningli335 7 жыл бұрын
The professor teaches so good. Glad they made the video.
@dissonantiacognitiva7438
@dissonantiacognitiva7438 9 жыл бұрын
He sounds just like King Julian, the king of the lemurs, I keep on hoping he starts singing "I like to move it move it"
@bluekeybo
@bluekeybo 6 жыл бұрын
I've been laughing for 10 minutes straight
@manjuhhh
@manjuhhh 10 жыл бұрын
Thank you Caltech and Prof
@Nbecom
@Nbecom 11 жыл бұрын
The professor here is making a subtle but very important point. He is saying that given a set of sample some or the other hypothesis **must** agree with the data. And the more hypotheses there are the more likely it gets that one of them will agree with the data (Capital M in his lecture). This is a guaranteed fact (that some hypothesis will agree with the sample), we want to make sure that the probability of accidental agreement is is small. Recall professor's coin toss example.
@Satelliteua
@Satelliteua 4 жыл бұрын
What do you mean by "set of samples"? I thought when we're talking about multiple bins, we're talking about the same sample but different hypothesis applied to it.
@Omar-kw5ui
@Omar-kw5ui 4 жыл бұрын
​@@Satelliteua We are not talking about the same sample from each bin.. The way I understand the problem is this way: We have 1 bin containing all the possible data points (marbles). For each h (a possible hypothesis), we extract a random sample from the bin. Now, each h actually changes mu in the bin (since each h changes the colors of the marbles based on its conformity to f) - this is why the prof represents the problems as multiple bins. Now what happens when we pull out random samples from each bin? It might be the case that all the samples pulled out are correctly classified by h. Does that then mean that these data points were correctly classified because h tends to f? Well not necessarily. There are two things at play here. Let's first assume we are dealing with a single h (a single bin). From this bin we pull out a sample of data and check its classification accuracy. If we keep pulling out samples, then we might get a sample where h gets it all correct. So in this case, it was just luck that h got it all correct. Now though, we are dealing with multiple h (bins), and from each bin we are pulling out different samples. So now, like the coin example, we are actually likely to find the case (the h) where we pull out all heads, even though h is not close to f. This is why as M grows, we are likely to find a wrong model. This is what I understood. Sorry if my ideas are all over the place, its a difficult concept to put into words.
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
@@Omar-kw5ui Actually this is quite correct!
@ShubhamSharma-tn3wm
@ShubhamSharma-tn3wm 3 жыл бұрын
I was waiting for the Indian girl to ask questions here as well. will expect her again in the future videos as well
@Shahada2012
@Shahada2012 8 жыл бұрын
The best way to have a grasp of all these lectures is to do all the homeworks and projects. If you are a "learning machine", then start to read conference papers on learning machine....so you will be ready for research.
@toori2l5l
@toori2l5l 8 жыл бұрын
Where can I find conference paper?
@NhatTanDuong
@NhatTanDuong Жыл бұрын
Professor Yaser Abu-Mostafa is amazing!
@WhyMe432532
@WhyMe432532 11 жыл бұрын
Thanks for the excellent lecture. Really enjoying them. Very well explained.
@avirtser
@avirtser 9 жыл бұрын
Thank you very much - a brilliant work
@nissarali8788
@nissarali8788 6 жыл бұрын
beautiful so far !!
@olugbemieric1757
@olugbemieric1757 10 жыл бұрын
thanks a lot for sharing. This will surely be of help to me in my m.sc
@dank8981
@dank8981 10 жыл бұрын
I like how he says okaaaaaay !
@YousefHamza
@YousefHamza 9 жыл бұрын
There's a 90 million Egyptian pronouncing it exactly like that XD
@brighty916
@brighty916 6 жыл бұрын
this is awesome man, tks.
@southshofosho
@southshofosho 8 жыл бұрын
He fields questions like a total G, I love this course
@taritgoswami9793
@taritgoswami9793 7 жыл бұрын
Really Brilliant lecture ..
@pablogarcia-zo1um
@pablogarcia-zo1um 7 жыл бұрын
fantastic lecture !!!! thanks a lot
@xhulioisufi2979
@xhulioisufi2979 2 жыл бұрын
Thank you sir for the recordings. Now I hope I can pass my course in "Statistical Foundation of Machine Learning". :)
@lolilops54
@lolilops54 11 жыл бұрын
Ahhh, thank you. Very well explained.
@helenlundeberg
@helenlundeberg 8 жыл бұрын
I love this guy !
@Shahada2012
@Shahada2012 8 жыл бұрын
Brillant ya yaser.
@spike345185
@spike345185 4 жыл бұрын
Love this guy
@GauravJain108
@GauravJain108 5 жыл бұрын
Just awesome!!!!
@AndyLee-xq8wq
@AndyLee-xq8wq Жыл бұрын
Nice analogy!!
@htf7
@htf7 6 жыл бұрын
Great explanation sir. I like ur accent. Your accent is really like a Ratahan accent.
@thangbom4742
@thangbom4742 5 жыл бұрын
excellent lecture. It looks like that the inequation in the final verdict is so loose that P[|E_in(g) - E_out(g)| > eps]
@izleaa
@izleaa 11 жыл бұрын
very nice explanation!
@esteban246
@esteban246 11 жыл бұрын
great teacher
@samchien474
@samchien474 11 жыл бұрын
cont. the formula with M on the RHS will be used, which is very conservative. Unfortunately, most hypothesis space is not finite, i.e., you have infinite number of hypothesis in it, so you can't use M to measure the model complexity. So in that case, we will resort to something called VC dimension and derive generalization bounds based on that.
@veronicanwabufo5905
@veronicanwabufo5905 3 жыл бұрын
It's been an excellent lecture so far. Though I am not very clear at what script H and each h mean.
@theodoregalanos9272
@theodoregalanos9272 7 жыл бұрын
Hello and thank you for the wonderful lectures! I'm new in this field and I am trying to combine it with Computational Geometry. As such, my problems are unique in the sense that the training sets can (usually) be constructed at the will of the modeller. The data are always (potentially) there, it is a matter of choosing which to produce. I was wondering if there is a theoretical or practical approach to an optimized selection of training samples from the whole? Does that relate to assigning a specifc (hopefully optimum in some way)l P(x), e.g. uniform distribution which takes samples uniformly from the whole? Or is it that random selection is still good enough in this case? Thank you in advance. Theodore.
@webbertiger
@webbertiger 11 жыл бұрын
I like the explanation that a complex g is too into the historical data and possible to get bigger Eout. I'm wondering whether most financial models in 2009 are like that and not many people could understand it, so few economists realized the crash was coming until it's too late.
@danielgray8053
@danielgray8053 3 жыл бұрын
Lol i love the lectures but why the heck did you use "mew" and "new" it is so confusing. Just think of two completely different sounding things. just use the sample mean x bar and the population mean mew lol duh
@pavel4616
@pavel4616 7 ай бұрын
The professor's book also says that there is a problem because we use the same data points for all hypotheses instead of generating new data for each hypothesis. So it breaks the assumption of independence of data generation. Why wasn't it mentioned in the lecture?
@samchien474
@samchien474 11 жыл бұрын
I think the original Hoeffding's equality applies when the hypothesis is specified BEFORE you see the data (e.g. a crazy hypothesis like if it is raining then approves the credit card otherwise not). However, in reality, we will learn a specific hypothesis using the data (e.g. using least squares to learn the regression coefficients), in that case, the learned hypothesis is g and you can considered it as chosen from a hypothesis space (H). If the hypothesis space is finite of size M, then
@pinocolizziintl
@pinocolizziintl 11 жыл бұрын
Thanks for pointing it out! I don't know how could I miss looking for a ML course on Coursera. The problem will be the overlap with the Criptography one from Dan Boneh
@pavel4616
@pavel4616 7 ай бұрын
I am not completely undrestood analogy between learning and 1000 coins. At 43:08 we try different hypothesis (because bins are different) and try to select the best according the sample. At 48:52 we have the same hypothesis (because bins are the same) and try different samples.
@Temperissue
@Temperissue 10 жыл бұрын
thank you!!!
@aviraljanveja5155
@aviraljanveja5155 5 жыл бұрын
Brilliant ! :D
@sarnathk1946
@sarnathk1946 7 жыл бұрын
Awesome lecture.. The 10 times coin flip experiment was surprising and it is pretty interesting. Few observations: 1) By choosing a sufficiently big M, then I can get inequality like P[Bad Event] < 2 : which actually is a no-brainer 2) An absolute value of "bound" is meaningless unless I know the "mu" , the unknown quantity. But that said, practically, we can overcome this by some educated guesses and possibly central limit theorem too. 3) I am surprised there is no talk of Central limit theorem... I was expecting that something will be proved based on it...Possibly Hoeffding has relation to it....
@jonsnow9246
@jonsnow9246 6 жыл бұрын
What was the conclusion of the lecture? I mean how did we prove that learning is feasible?
@granand
@granand 7 жыл бұрын
Thank you caltech and Professor. But please can you help me, it's decades I touched maths to catch up. Tell which links I must read and understand so I can get back here to follow like you smart guys.
@yairelblinger2891
@yairelblinger2891 9 жыл бұрын
This means that we assume that the data we have was sampled in the same way real data occurs. Does not seems a trivial assumption to me but makes allot of sense to need such an assumption. Anyway, great lecture!
@pavansughosh
@pavansughosh 12 жыл бұрын
You know output for In-sample cases(training set).. So if output matches, hypothesis for that sample is green..(Still target function remains unknown)
@RababMuhammadALy
@RababMuhammadALy 10 жыл бұрын
very nice
@rajkumarsaini7553
@rajkumarsaini7553 Жыл бұрын
What does it mean to have stringent tolerance (time in the video: around 57:57). Basically, what if the inequality gives 2.
@marcogelsomini7655
@marcogelsomini7655 2 жыл бұрын
56:20 awesome , thx!!
@thanhquocbaonguyen8379
@thanhquocbaonguyen8379 2 жыл бұрын
thank you for the lecture. it was really insightful though it's hard for me to capture it all. I'd like the questions that the students ask. why do we have multiple bins? they were cute though haha
10 жыл бұрын
This is the right answer (1/1024) for the first question in the "coin analogy"
@ahmedelsayed121
@ahmedelsayed121 6 жыл бұрын
I did not see that mentioned anywhere, Dr. Yasser has a book describing the course content in more details called "Learning From Data".
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
The book provides genuine additional insight to the lectures and vice verse.
@don2186
@don2186 10 жыл бұрын
I am doing the course on ML in Coursera, It is a very good start for someone who jwants to get started and know what machine learning is all about and do some exercises to get a feel for it. That said, simple derivations in calculus which you should know from high school are skipped and just the final formula is given which is a little disappointing. I don't see how anyone can do machine learning without knowing basic calculus. Too emphasis is placed on being nice.
@rehantahirch
@rehantahirch 11 жыл бұрын
The Prof. is amazing. He also looks like the Prince Charles
@ArunPaji
@ArunPaji 11 жыл бұрын
Something like feynman's lectures is being attempted.
@lolilops54
@lolilops54 11 жыл бұрын
I'm getting confused towards the end. Once you have g, what use is it to compare it to every h in H? Surely, g is the best h in H, that's how it became g. Also, why does he add M to the inequality at the very end? Doesn't that just increase the value, the bigger H is? So with a large H and a subsequent large M, won't the comparison be totally redundant? I think he made that point at the end, but I don't see why he added it in, in the first place.
@Sonia1978NYC
@Sonia1978NYC 11 жыл бұрын
Probably Approximately Correct :-o I have the book by Leslie Valiant
@onurcanisler
@onurcanisler Жыл бұрын
*I understood the topic probably approximately correct.*
@dtung2008
@dtung2008 6 жыл бұрын
Didn't state Hoeffding's inequality correctly. The value of nu must be bounded with a range 1 with the formula (at 20:30).
@jonsnow9246
@jonsnow9246 6 жыл бұрын
What was the conclusion of the lecture? I mean how did we prove that learning is feasible?
@desitravellers2023
@desitravellers2023 6 жыл бұрын
The main objective of learning, as laid out in the lecture, is finding a hypothesis that behaves similarly for the training data(in sample) and test data(out sample). No matter the performance of the hypothesis on the sample, if we can prove that the hypothesis is performing approximately same for in sample and out sample than we have essentially proved that learning is feasible i.e generalizing beyond in samples is possible. The final modification to Hoeffding's formula states that with reasonable choice of M,epsilon and N, the probability of in sample performance deviating from out sample performance can indeed be bound to an acceptable limit thus proving learning is feasible. The fact that M is infinite in all the models we generally come across and still able to learn is proved in theory of generalization lecture. Thanks.
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
Is learning feasible means here: can we - based on observations of our insample data - make statements on our outsample data or in other words, can we generalize observations on our selected sample to the entire population (in the bin)?
@VIVEKPANDEYIITB
@VIVEKPANDEYIITB 2 жыл бұрын
Since mu depends on probability distribution; should it not be constant for all bins; i.e all h? And it should be nu that should change with h and bins. Why is mu different for different bins?
@nayanvats3424
@nayanvats3424 3 жыл бұрын
How did we sum up the RHS in the Hoeffding's inequality. I mean each of the hypothesis will have a different bound(epsilon) and hence a different exponential term. So, how do they sum up to be substituted by a "M" times the exponential . Also if we keep the bound same for each inequality wont the "N" no of samples change. How is the exponential consistent across all the hypothesis. Am I missing something?
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
Hmm...a lot of questions here. You start off by by defining what you find to be the maximum "acceptable" deviation of your selected hypothesis can be. This acceptable value is epsilon. The deviation is between the in-sample error and out-sample error. You cannot guarantee this but you can ensure that the chance that it will exceed this deviation is smaller than a certain probability. This is why the whole probability thing is brought in to the discussion. Now g is just one of the h's in the hypothesis set. So if the in- and out-samples of g deviates by more than epsilon this implies that (at least) one of the h's deviates by more than epsilon so we can say that it must be the case that h1 deviates by more than epsilon or h2 deviates by more than epsilon and so on. The probabilityh of deviation between in and outsampling is independent of the number of red and green balls i.e. it is independent of any h that is why each h has the same bound.
@namanvats9547
@namanvats9547 6 жыл бұрын
How would you know what is the value of E(out) ??
@RD-lf3pt
@RD-lf3pt 8 жыл бұрын
Umm... I'm probably missing sth ;) What formula is he using to get to 63% probabilities? If each coin gets 10 straight heads once each 1,024 times (say we run it infinite times... Then the proportion should be 1 over 2 to the N, right? So 1 over 2 to the ten, so once each 1024 times roughly.) So because the probability of each coin is independent, doesn't it mean that the probability should be almost 100%? (1000/1024) Ah, Ok... So even if you had 100 trillion flips each with 99.9% chances of being heads, you still have 0.01x0.01... (100 trillion times) of chances to get all tails. For this example, you have 1/1024 chances of it being heads 10 consecutive times, so you have (1-(1/1024)) chances of it having at least one of the 10 flips being tails... That is, if you have 1 in 1024 chances of it being heads 10 straight times, you have 1023 in 1024 chances of it not being that. And if it is not that, it means that, at least, there is one tails somewhere (at least one) that would break the chain. So over 1000 repetitions, you have (1-(1/1024)) to the 1000, or (1023/1024) to the 1000, or 37% chances to get at least one tail on each set of 10. So 63% chances approx. to get 10 consecutive heads. That being said, I still believe if the chances are 1 in 1024 to get 10/10 heads, for each 1024 attempts when the number of attempts goes towards infinity, we should get at least one of those to be 10 straight heads? So maybe it has to do with distribution? Like sometimes you can get 2 or more sets of 10 straight heads in your lot of 1000, while other times you may get none. So the chances of you finding (in a lot of 1000 tries) at least 1 set of 10 straight heads is 63%? (because they can form clusters, and sometimes you will get a group with none) Or maybe it doesn't have to do with that? I mean, what are probabilities, really? Say you have 99.9% chances to get heads and 0.01% to get tails. You do it twice and the chances to get at least one time heads are really high, of course. But there is 1/10,000 chances of actually being tails and tails... So if you go towards infinity, you might think the distribution would be 99.9% of the time, no matter where or the order, you get heads, and 0.01% of the time you would get tails. But for N tries, there is 0.01 to the N chances of actually being all tails... So you can do it 100 trillion times, or go towards infinity, and there is still a very, very, very small, but real chance to, well, get all tails. So the chance is there, and now let's suppose it happens... Now, if that slim chance was the way events unfolded, then that option would happen forever, infinite times, and the 99.9% chances would mean nothing. You might say, well, but if we run the experiment again, now we will probably get 99.9% of the time heads. So the 99.9% vs 0.01% probability isn't wrong... But actually, this new set of samples can be concatenated to the last, as they go towards infinity and the premise is that this will happen (eventually) infinite times, and ALL times, as one single time not getting tails would break the chain... So now we might say it is unlikely, but now think of a person seeing it, witnessing the event... Wouldn't they say chances are 100% tails? So one important thing is that possibilities don't guarantee you will get heads and tails in a proportion of 1/1024 and 1023/1024. It really doesn't. A probability of 90% doesn't mean sth will happen 90% of the time, but that we believe it has '9 chances out of ten to be that'. But once the drawing is made, it can happen 70% of the time only, on 2% of the time, and stay like this forever... At least that's my understanding of it after giving it some thought!
@fadaimammadov9316
@fadaimammadov9316 8 жыл бұрын
You are correct that the probability of getting 10 heads is 1/(2^10). Let's call this a. The probability of NOT getting 10 heads in 1000 flips is (1-a)^1000 and getting at least one such result is 1 - (1-a)^1000 = 62.36%
@RD-lf3pt
@RD-lf3pt 8 жыл бұрын
Yeah, I know ;) I have to admit it puzzled me for a while until I figured it out (see paragraph three), though! Thanks for the reply and clear explanation!
@delightfulsunny
@delightfulsunny 10 жыл бұрын
Remind myself that this is just foundation, and that it is dry and Zzzz.... but must ....keep...going.. an hour later...really the summary is that the more your model cater towards a specific sample, your model is more prone to failure when it comes to unknown. it is like fourier series, fitting too well to the data can lead to not actually learning at all >...
@alfonshomac
@alfonshomac 10 жыл бұрын
maybe you'd like Stanford's course better by Andrew Ng, Google it and check it out. I like it.
@MrCmon113
@MrCmon113 5 жыл бұрын
No that was the last lecture. This one wasn't really about that.
@bertrandduguesclin826
@bertrandduguesclin826 3 жыл бұрын
Why not writing the RHS of the Hoeffding's inequality as min(1, 2exp(-2Neps^2)) since a probability cannot exceeds 1 anyway?
@jonsnow9246
@jonsnow9246 6 жыл бұрын
55:59 Overfitting!!!
@-long-
@-long- 4 жыл бұрын
awesome! thanks
@ajayram198
@ajayram198 6 жыл бұрын
This is quite a difficult lecture, couldnt understand much of it!
@granand
@granand 7 жыл бұрын
Can some one list me all the formulas I must know for the entire course
@shakesbeer00
@shakesbeer00 6 жыл бұрын
Thanks for the excellent lecture. Here are a couple of questions: At 42:22 Just want to be more rigorous, would the P notation in this Hoeffding inequality depend on both X and y? At 50:02 How exactly is g defined here in order to have this inequality hold? The inequality seems to require that g minimize the | Ein - Eout|? But that is not intuitive. Instead, it is more intuitive to have g that minimizes Ein (or eventually Eout) based on the definition of Ein and Eout earlier.
@shakesbeer00
@shakesbeer00 6 жыл бұрын
For my second question, I see it now because it is a less than or equal sign there, in stead of the equal sign. That inequality always holds since g is one of h in H. Thus whatever criterion for defining g is fine as long as g is one of the hs.
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
P is a selection probability that is assigned to X i.e. it defines the probability of selecting certain x's. It has nothing to do with y (or Y).
@wafamribah4162
@wafamribah4162 6 жыл бұрын
One thing I couldn't figured it out though is how the target function and the hypothesis would agree? how the comparison occurs?
@desitravellers2023
@desitravellers2023 6 жыл бұрын
Whatever happens in the bin, is hypothetical. Just assume you have chosen a hypothesis h. This will agree with the target function in some cases and differ in other over the entire set of inputs which is possibly infinite. The main takeaway is you can compare it on the sample which is the training data for which the value of target function is available. Thus the essence is, if you see the hypothesis chosen by you is agreeing with the values of target function on the sample, this will probably behave the same for out of sample data points with in a threshold (according to Hoeffding's formula). Feel free to ask if you have further queries.
@adarshsingh6313
@adarshsingh6313 5 жыл бұрын
sir 1.can u please explain hypothesis and target function in bin marble problem through some mathematical expression (as a example)....
@KieranMace
@KieranMace 6 жыл бұрын
Is an new hypothesis h_avg, defined as an average over a subset set of hypotheses in H, necessarily also in H? or does it depend on the functional form of H?
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
No, H is a set of any group of hypotheses. The set does not have to have any form of arithmetic closure.
@bhanumanagadeep
@bhanumanagadeep 6 жыл бұрын
In slide 23, why is the probablity equation of g dependent on all hypothesis while we pick only one out of the multiple hypothesis? Shouldnt it be equal to the probabilty of the hypothesis chosen?
@roelofvuurboom5431
@roelofvuurboom5431 3 жыл бұрын
g is one of hypotheses h so what the dependency statement says is that if something applies to g it must therefor apply to (at least) one of the h's.
Episode 32: The Electric Battery - The Mechanical Universe
28:46
6. Monte Carlo Simulation
50:05
MIT OpenCourseWare
Рет қаралды 2 МЛН
Finger Heart - Fancy Refill (Inside Out Animation)
00:30
FASH
Рет қаралды 28 МЛН
Heartwarming Unity at School Event #shorts
00:19
Fabiosa Stories
Рет қаралды 24 МЛН
Llegó al techo 😱
00:37
Juan De Dios Pantoja
Рет қаралды 58 МЛН
Secret Experiment Toothpaste Pt.4 😱 #shorts
00:35
Mr DegrEE
Рет қаралды 36 МЛН
Lecture 04 - Error and Noise
1:18:22
caltech
Рет қаралды 235 М.
A Star Is About to Explode (And You'll Be Able to See It)
8:45
StarTalk
Рет қаралды 1,8 МЛН
What's the future for generative AI? - The Turing Lectures with Mike Wooldridge
1:00:59
Scanning Electron Microscopy (SEM) Lecture: Principles, Techniques & Applications
1:05:46
The Kavli Nanoscience Institute at Caltech
Рет қаралды 58 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 839 М.
Roger Penrose explains Godel's incompleteness theorem in 3 minutes
3:39
Lecture 14 - Support Vector Machines
1:14:16
caltech
Рет қаралды 311 М.
1. Introduction and Scope
47:19
MIT OpenCourseWare
Рет қаралды 1,7 МЛН
Finger Heart - Fancy Refill (Inside Out Animation)
00:30
FASH
Рет қаралды 28 МЛН