One of the best ML teachers I've ever come across (all online tutorials inclusive)!
@onurcanisler2 жыл бұрын
*Even my professor admires Professor Yaser Abu-Mostafa. The way he describes the theoretical parts is beautiful.*
@VietNguyenJT11 жыл бұрын
That second to last question, the "silly" one, actually elicited a pretty informative response.
@spritica11 жыл бұрын
The example from lec-5 puzzle: N=3 (number of points) k=2 ; i.e. on any two points we cant have all patterns Thus we get only 4 rows (not 2^3 = 8), as shown below: S2- : -one -one -one S2+ : -one -one +one S1 : -one +one -one +one -one -one The rows have been annotated with S2-, S2+ and S1 tags. (alpha=2, beta=1)
@-long-5 жыл бұрын
Anyone who is confused with the calculation of B(N, k), especially S1 then remember this: S1 is the group of unique rows with the length N-1 and length N. They have been restricted by the constraint so they are different from S2. Thus S1 and S2 are disjoint. Credits to the comment of David below (you can find his example very useful for understand this). Professor Yaser mixed the description of S1 with something else, that's why it sounds confusing. Besides that this is a great lecture as always
@-long-5 жыл бұрын
and you guys might have noticed that confusion is raised in the Q&A when someone asked why alpha is different from beta
@s254123 жыл бұрын
totally agreed, I was confused at 10:55 where it seems like he is explaining the S2s.
@willemhekman17886 жыл бұрын
What beautiful theory and beautifully explained!
@ChunSun12 жыл бұрын
S1: a row with a pattern from x_1 to x_{N-1}, and we have only one row of that, with X_N being either +1 or -1. S2: a row with a pattern from x_1 to x_{N-1}, and we have two rows of that, with X_N being +1 AND -1.
@spritica11 жыл бұрын
You are probably looking at all 2^N patterns on x_N. The fact that we are estimating B(N,k), means that k is a break point, such that we only have those rows in the matrix which do not have all possible patterns on any k-points. This means that the matrix will have #rows
@akankshachawla22803 жыл бұрын
THANK YOU, it cleared a BIG doubt of mine.
@holopekochan5 жыл бұрын
After his explanation, it makes my mind much clearer. Good Prof~
@kevind.shabahang3 жыл бұрын
I love the abstractness of this level of analysis
@youweiliang5 жыл бұрын
The grouping of patterns may be a bit confusing. The patterns are actually grouped by whether their prefix appears once or appears twice. See also www.cs.rpi.edu/~magdon/courses/LFD-Slides/SlidesLect06.pdf
@sendhan64542 жыл бұрын
how does this univ have these slides
@moritzburmester3745 Жыл бұрын
@@sendhan6454 its an online course by the co author of the book lfd
@avimohan65947 жыл бұрын
"Yes you do!" Man, this guy is amazing. Completely deserves his Feynman award.
@avimohan65947 жыл бұрын
Maybe he should be given another one.
@danieljacales3264 жыл бұрын
What better option to past the time in quarantine that take this course. Awesome content. Thanks.
@saharqadan28057 жыл бұрын
in slide 5/17 @14:51 when the prof. says that s1 is different from s2 plus, are not s2 plus created from s1?
@-long-5 жыл бұрын
no they are different, s1 s2- and s2+ are disjoint
@hariharanramamurthy99464 жыл бұрын
yes i agree what you say but could not fathom the logic s1 and s2 are different
@movax20h8 жыл бұрын
In 27:05, prof, says "Now, we know for the fact that maximal number is 4". That is not true. We know that maximal number is less than or equal 4. It just happens that it is equal here. For the fact that for k > N, we are getting B(N,k) = 2^N, is also luck. It means the actual maximal number of dichotomies is less than or equal to 2^N. Yes, it is 2^N exactly here, but this is not due to the construction. So this, result gives us that the maximum is less than or equal to 4. We still need in principle to try and construct example, and if we are able to do example with 4, then we know it is actual maximal value.
@po-entseng66026 жыл бұрын
if N < k, number of dichotomies must equal 2^N, otherwise this N will be break point, since break point is defined as minimum of the point x such that number of dichotomies is less than 2^x
@manjuhhh10 жыл бұрын
Thank you Caltech and Prof Yaser Abu in particular.
@brainstormingsharing13094 жыл бұрын
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍
@brod5154 жыл бұрын
24:25. "4 points what are you talking about... we have only 1 point" I died laughing
@coolrobotics5 жыл бұрын
So perfect ......... Can't be better than this..... Thumbs up...
@ajayram1986 жыл бұрын
Could someone please explain to me the boundary conditions that have been got at 25:04?
@montintinmontintin12 жыл бұрын
Thanks. The professor says that each of the three variables can take the values +/-1. Not sure how any row could belong to Alpha and not Beta if that's the case.
@exmachina7673 жыл бұрын
(Commenting for people who may come in the future and have the same question) That’s because when enumerating the rows of B(N, k), it’s conceivable that some of the resulting valid patterns happen to have a unique prefix in the first N-1 columns. You can see this illustrated in the last two rows of the puzzle at the end of lecture 5 (those two rows would be in Alpha, while the first two, which are repeated, would be in Beta)
@AndyLee-xq8wq Жыл бұрын
I didn't well understand the "What to do about E_out" part. How the transition been made to use 2N samples?
@montintinmontintin12 жыл бұрын
I need an example of a row in Alpha and NOT in beta. It is not clear how would that look like. Anyone ?
@jasmineakkal6 жыл бұрын
Teaching, Examples everything is great !!
@thangbom47426 жыл бұрын
as far as I understand, if the final hypothesis is a dichotomy during the training phase, the E_in is zero, isnt it?
@MartinLarsson12 жыл бұрын
I don't understand the criteria for why a row should be inside S_1. From my understanding, it sounded like _all_ possible rows were included in S_1. Could somebody explain it with other words?
@pratik67089 жыл бұрын
How are the rows in S1 different from S2+? if you cut out the extension they are the same
@DavidBudaghyan9 жыл бұрын
pratik chachad I was confused at first too, but the idea is very simple. for example if you have this matrix: 1: [+1 +1 -1] 2: [+1 +1 +1] 3: [+1 -1 +1] 4: [-1 +1 +1] The first two rows will be in S2, as they are identical except the 3rd element. But the 3rd and 4th will be in S1, as there are no rows that are identical except the last element. (if they existed it they would look like this 5: [+1 -1 -1] and 6: [-1 +1 -1] correspondingly and all of them would be included in S2) As 5 and 6 do not exist in the given matrix, it means that [+1 -1 ?] has only one extension 3: [+1 -1 +1]. So does [-1 +1 ?] which is 4: [-1 +1 +1]. But [+1 +1 ?] has both extensions (see 1 and 2)
@a1988ditya9 жыл бұрын
+pratik chachad This is not clearly explained by him . According to him S1 & S2 are disjoint and together they make up B(N,k) . So here S2 = 2 * Beta , where Beta is just the X with n-1 dimensions
7 жыл бұрын
Thanks for the clarification. I wasn't getting it either.
@abdulateek37356 жыл бұрын
I had the same doubt. Thanks for clarifying ☺
@-long-5 жыл бұрын
Thank you for asking this and thanks David for the great answer. One thing I think it needs to be clarified is, each row in S1 happens only once because of the constraint. That's why there is no additional +1 -1 -1 besides 3rd row as in the example of David.
@utkarsh-21st4 жыл бұрын
I like his choice of words.
@saifurrehman67562 жыл бұрын
I did not get anything from lectures 5 and 6. Can somebody guide me on how to understand this stuff? Any reading material or any other source?
@0LeonardoViana17 жыл бұрын
I'm right that all this explanations is for explain combinatorial analysis??
@matthewmarston5149 Жыл бұрын
Caltech, do I get better grades when the Instructors are speaking Russian, Chinese, and French Kaiser Tsar Matthew Floyd Marston Romanov Windsor 2 Rothschild Rockefeller Cartier 2
@NitinKumar-xz8cz10 жыл бұрын
shouldn't the term Eout in the VC inequality be Ein prime? Because he is relating the performance of Ein with Ein prime and not Eout which enabled him to restrict the analysis to the space of dichotomies only...?
@jaredtramontano52498 жыл бұрын
no. read the proof.
@slabod7 жыл бұрын
can you give the link to the written VC inequality proof mentioned in a lecture?
@storm311806 жыл бұрын
Hi, I'd love to lay hands on the written proof as well.
@a1988ditya9 жыл бұрын
B(n,k) = a+ 2*b confused me :) ...According to him 'a' actually corresponds to Sample space - 2*b. Thanks a lot Professor ... You're awesome !!!!
@jjepsuomi11 жыл бұрын
Could someone please clarify the part about E_out in 52:50? I didn't understand the rest of the lecture even though I watched it many times :( Simple example?
@erenaydayanik9 жыл бұрын
Thank you for such a great video series, but I'm confused at a point. In the 4th slide of lecture-5, under the testing part, it is written that bad event probability should be less than or equal to vanilla-plain hoefding inequality (without M). But at the last slide of this lecture it is written that P(E_in(g)-E_out(g))
@AlexEx708 жыл бұрын
The point is that g is any(!) hypothesis picked out of all possible hypothesis set. We don't know what this hypothesis is. But we need a guarantee that when we use this particular learning model (which is charactarized by possible hypothesis set count) we can pick any hypothesis, and possibility that something goes wrong is predictable and limited.
@rahulkhapre93516 жыл бұрын
Could someone explain what did he do when he took out 2 samples from a bin ?
@fierydino94023 жыл бұрын
For those who have hard time understanding B(n, k) like me, check www.cs.princeton.edu/courses/archive/spr08/cos511/scribe_notes/0220.pdf as well.
@thangbom47426 жыл бұрын
breakpoint idea is excellent. Give me a breakpoint, I can build a model that could learn as well as you want.
@vkvfoe6 жыл бұрын
he is amazing!
@mohammedzidan120311 жыл бұрын
محاضرات ممتازة يا دكتور وانا استفادت منها كتير والله
@3csgold4509 жыл бұрын
hi can u give an easiest example of theory of generalization at configuration? thank u so much
@gauravparmar180911 жыл бұрын
great explanation man!!
@yichen88842 ай бұрын
25:30 remind me of of dynamic programming
@rolanvc8 жыл бұрын
This is a great set of videos! Thank you very much Prof Yaser! Question though, in the equation for the growth function, N is the number of points. For B(N,k), N is the number of columns ( dimensions?).. this is confusing. I imagine this will be substittuded into the Hoeffding's inequality, where N is the amount of data.. rows? Did I misunderstand something?
@TitusBigelow8 жыл бұрын
In B(N,k), N is still the number of points. x1, x2, x3, ..., xn are all input vectors. I think you were confusing these as scalar features (dimensions) on a single input. In this case, if we look at the first row, input x1 maps to +1, x2 maps to +1, ..., xn maps to +1. The dimensionality of the actual input x1 is irrelevant to this. Hope explained it well enough.
@developerpp3956 жыл бұрын
what is the reference text book for this class ?
@ZombieLincoln6666 жыл бұрын
Learning From Data
@chenyuzi11 жыл бұрын
brilliant explanation in slide 10
@clarkupdike65182 жыл бұрын
Actually the algebra needed to prove the induction step isn't that bad... just had to plug in (N-1)! = N/N! and friends in a couple spots. The combinatorial explanation was harder to follow... at least for me.
@wimthiels6388 жыл бұрын
thanks, very intuitive explanation !
@yusufahmed32238 жыл бұрын
Very good Lecturese
@aliakbarpamz7 жыл бұрын
Explained well and can understand clearly
@malharjajoo73938 жыл бұрын
Great answers
@actuallyactuary27873 жыл бұрын
I still havent understood the significance of having a polynomial. Could somebody please dumb it down for me? :(
@nguyensinhtu82486 жыл бұрын
i actually do not know how to apply these theory in practice
@soultouch084 жыл бұрын
This is more about generalisation of classifiers and learning models. In practice you'll be actually using (probably already invented and tested) learning models (like neural network) but this is just a conceptual understanding for why those things can learn in the first place
@AlexEx708 жыл бұрын
Tried to watch at 0.5x speed to better understand the material, but didn't cope. Makes me laugh all the time. It's like Boris Eltsin reading the lecture)
@jmalbornoz11 жыл бұрын
Professor Yaser is the MacDaddy!
@thanhquocbaonguyen83793 жыл бұрын
i almost bursted out laughing at the question the bottom line of the learnability of a model is that a learning model cannot learn everything ...
@brunor86743 жыл бұрын
Ok.
@hariharanramamurthy99464 жыл бұрын
this professor really confuses alpha and beta which made me struck in this concept and iam completly clueless to understand the logic
@andychan73502 жыл бұрын
Good Content, but this professor's spoken English is hard to understand clearly.
@janasandeep7 жыл бұрын
1:15:35 👌
@sudn36826 жыл бұрын
not clear at explaining B(N,K) evaluation.
@markh14626 жыл бұрын
Slide 10 is quite a sloppy way to write a proof though.. Otherwise, great lecture.
@prazman6 жыл бұрын
I still like the way he argued for it. Guess that's what he could fit into a 60 min class
@evinism6 жыл бұрын
another hypothesis. more. more. OH NO. WE ARE IN TROUBLE
@marcogelsomini76552 жыл бұрын
ok that was hard
@samchan25357 жыл бұрын
Why is the Prof so funny?
@iSohrab7 жыл бұрын
this kind of lectures need such a Prof, otherwise everybody will sleep :)
@MrCmon1135 жыл бұрын
He isn't. VC theory is just an inherently funny subject.
@wallofstar7 жыл бұрын
Like his lecture... he somehow look like Mr.Bean
@abdulateek37356 жыл бұрын
His nose only :p
@brainstormingsharing13094 жыл бұрын
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍