Lecture 06 - Theory of Generalization

  Рет қаралды 194,968

caltech

caltech

Күн бұрын

Пікірлер: 96
@mikeshudani7984
@mikeshudani7984 9 жыл бұрын
One of the best ML teachers I've ever come across (all online tutorials inclusive)!
@onurcanisler
@onurcanisler 2 жыл бұрын
*Even my professor admires Professor Yaser Abu-Mostafa. The way he describes the theoretical parts is beautiful.*
@VietNguyenJT
@VietNguyenJT 11 жыл бұрын
That second to last question, the "silly" one, actually elicited a pretty informative response.
@spritica
@spritica 11 жыл бұрын
The example from lec-5 puzzle: N=3 (number of points) k=2 ; i.e. on any two points we cant have all patterns Thus we get only 4 rows (not 2^3 = 8), as shown below: S2- : -one -one -one S2+ : -one -one +one S1 : -one +one -one +one -one -one The rows have been annotated with S2-, S2+ and S1 tags. (alpha=2, beta=1)
@-long-
@-long- 5 жыл бұрын
Anyone who is confused with the calculation of B(N, k), especially S1 then remember this: S1 is the group of unique rows with the length N-1 and length N. They have been restricted by the constraint so they are different from S2. Thus S1 and S2 are disjoint. Credits to the comment of David below (you can find his example very useful for understand this). Professor Yaser mixed the description of S1 with something else, that's why it sounds confusing. Besides that this is a great lecture as always
@-long-
@-long- 5 жыл бұрын
and you guys might have noticed that confusion is raised in the Q&A when someone asked why alpha is different from beta
@s25412
@s25412 3 жыл бұрын
totally agreed, I was confused at 10:55 where it seems like he is explaining the S2s.
@willemhekman1788
@willemhekman1788 6 жыл бұрын
What beautiful theory and beautifully explained!
@ChunSun
@ChunSun 12 жыл бұрын
S1: a row with a pattern from x_1 to x_{N-1}, and we have only one row of that, with X_N being either +1 or -1. S2: a row with a pattern from x_1 to x_{N-1}, and we have two rows of that, with X_N being +1 AND -1.
@spritica
@spritica 11 жыл бұрын
You are probably looking at all 2^N patterns on x_N. The fact that we are estimating B(N,k), means that k is a break point, such that we only have those rows in the matrix which do not have all possible patterns on any k-points. This means that the matrix will have #rows
@akankshachawla2280
@akankshachawla2280 3 жыл бұрын
THANK YOU, it cleared a BIG doubt of mine.
@holopekochan
@holopekochan 5 жыл бұрын
After his explanation, it makes my mind much clearer. Good Prof~
@kevind.shabahang
@kevind.shabahang 3 жыл бұрын
I love the abstractness of this level of analysis
@youweiliang
@youweiliang 5 жыл бұрын
The grouping of patterns may be a bit confusing. The patterns are actually grouped by whether their prefix appears once or appears twice. See also www.cs.rpi.edu/~magdon/courses/LFD-Slides/SlidesLect06.pdf
@sendhan6454
@sendhan6454 2 жыл бұрын
how does this univ have these slides
@moritzburmester3745
@moritzburmester3745 Жыл бұрын
@@sendhan6454 its an online course by the co author of the book lfd
@avimohan6594
@avimohan6594 7 жыл бұрын
"Yes you do!" Man, this guy is amazing. Completely deserves his Feynman award.
@avimohan6594
@avimohan6594 7 жыл бұрын
Maybe he should be given another one.
@danieljacales326
@danieljacales326 4 жыл бұрын
What better option to past the time in quarantine that take this course. Awesome content. Thanks.
@saharqadan2805
@saharqadan2805 7 жыл бұрын
in slide 5/17 @14:51 when the prof. says that s1 is different from s2 plus, are not s2 plus created from s1?
@-long-
@-long- 5 жыл бұрын
no they are different, s1 s2- and s2+ are disjoint
@hariharanramamurthy9946
@hariharanramamurthy9946 4 жыл бұрын
yes i agree what you say but could not fathom the logic s1 and s2 are different
@movax20h
@movax20h 8 жыл бұрын
In 27:05, prof, says "Now, we know for the fact that maximal number is 4". That is not true. We know that maximal number is less than or equal 4. It just happens that it is equal here. For the fact that for k > N, we are getting B(N,k) = 2^N, is also luck. It means the actual maximal number of dichotomies is less than or equal to 2^N. Yes, it is 2^N exactly here, but this is not due to the construction. So this, result gives us that the maximum is less than or equal to 4. We still need in principle to try and construct example, and if we are able to do example with 4, then we know it is actual maximal value.
@po-entseng6602
@po-entseng6602 6 жыл бұрын
if N < k, number of dichotomies must equal 2^N, otherwise this N will be break point, since break point is defined as minimum of the point x such that number of dichotomies is less than 2^x
@manjuhhh
@manjuhhh 10 жыл бұрын
Thank you Caltech and Prof Yaser Abu in particular.
@brainstormingsharing1309
@brainstormingsharing1309 4 жыл бұрын
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍
@brod515
@brod515 4 жыл бұрын
24:25. "4 points what are you talking about... we have only 1 point" I died laughing
@coolrobotics
@coolrobotics 5 жыл бұрын
So perfect ......... Can't be better than this..... Thumbs up...
@ajayram198
@ajayram198 6 жыл бұрын
Could someone please explain to me the boundary conditions that have been got at 25:04?
@montintinmontintin
@montintinmontintin 12 жыл бұрын
Thanks. The professor says that each of the three variables can take the values +/-1. Not sure how any row could belong to Alpha and not Beta if that's the case.
@exmachina767
@exmachina767 3 жыл бұрын
(Commenting for people who may come in the future and have the same question) That’s because when enumerating the rows of B(N, k), it’s conceivable that some of the resulting valid patterns happen to have a unique prefix in the first N-1 columns. You can see this illustrated in the last two rows of the puzzle at the end of lecture 5 (those two rows would be in Alpha, while the first two, which are repeated, would be in Beta)
@AndyLee-xq8wq
@AndyLee-xq8wq Жыл бұрын
I didn't well understand the "What to do about E_out" part. How the transition been made to use 2N samples?
@montintinmontintin
@montintinmontintin 12 жыл бұрын
I need an example of a row in Alpha and NOT in beta. It is not clear how would that look like. Anyone ?
@jasmineakkal
@jasmineakkal 6 жыл бұрын
Teaching, Examples everything is great !!
@thangbom4742
@thangbom4742 6 жыл бұрын
as far as I understand, if the final hypothesis is a dichotomy during the training phase, the E_in is zero, isnt it?
@MartinLarsson
@MartinLarsson 12 жыл бұрын
I don't understand the criteria for why a row should be inside S_1. From my understanding, it sounded like _all_ possible rows were included in S_1. Could somebody explain it with other words?
@pratik6708
@pratik6708 9 жыл бұрын
How are the rows in S1 different from S2+? if you cut out the extension they are the same
@DavidBudaghyan
@DavidBudaghyan 9 жыл бұрын
pratik chachad I was confused at first too, but the idea is very simple. for example if you have this matrix: 1: [+1 +1 -1] 2: [+1 +1 +1] 3: [+1 -1 +1] 4: [-1 +1 +1] The first two rows will be in S2, as they are identical except the 3rd element. But the 3rd and 4th will be in S1, as there are no rows that are identical except the last element. (if they existed it they would look like this 5: [+1 -1 -1] and 6: [-1 +1 -1] correspondingly and all of them would be included in S2) As 5 and 6 do not exist in the given matrix, it means that [+1 -1 ?] has only one extension 3: [+1 -1 +1]. So does [-1 +1 ?] which is 4: [-1 +1 +1]. But [+1 +1 ?] has both extensions (see 1 and 2)
@a1988ditya
@a1988ditya 9 жыл бұрын
+pratik chachad This is not clearly explained by him . According to him S1 & S2 are disjoint and together they make up B(N,k) . So here S2 = 2 * Beta , where Beta is just the X with n-1 dimensions
7 жыл бұрын
Thanks for the clarification. I wasn't getting it either.
@abdulateek3735
@abdulateek3735 6 жыл бұрын
I had the same doubt. Thanks for clarifying ☺
@-long-
@-long- 5 жыл бұрын
Thank you for asking this and thanks David for the great answer. One thing I think it needs to be clarified is, each row in S1 happens only once because of the constraint. That's why there is no additional +1 -1 -1 besides 3rd row as in the example of David.
@utkarsh-21st
@utkarsh-21st 4 жыл бұрын
I like his choice of words.
@saifurrehman6756
@saifurrehman6756 2 жыл бұрын
I did not get anything from lectures 5 and 6. Can somebody guide me on how to understand this stuff? Any reading material or any other source?
@0LeonardoViana1
@0LeonardoViana1 7 жыл бұрын
I'm right that all this explanations is for explain combinatorial analysis??
@matthewmarston5149
@matthewmarston5149 Жыл бұрын
Caltech, do I get better grades when the Instructors are speaking Russian, Chinese, and French Kaiser Tsar Matthew Floyd Marston Romanov Windsor 2 Rothschild Rockefeller Cartier 2
@NitinKumar-xz8cz
@NitinKumar-xz8cz 10 жыл бұрын
shouldn't the term Eout in the VC inequality be Ein prime? Because he is relating the performance of Ein with Ein prime and not Eout which enabled him to restrict the analysis to the space of dichotomies only...?
@jaredtramontano5249
@jaredtramontano5249 8 жыл бұрын
no. read the proof.
@slabod
@slabod 7 жыл бұрын
can you give the link to the written VC inequality proof mentioned in a lecture?
@storm31180
@storm31180 6 жыл бұрын
Hi, I'd love to lay hands on the written proof as well.
@a1988ditya
@a1988ditya 9 жыл бұрын
B(n,k) = a+ 2*b confused me :) ...According to him 'a' actually corresponds to Sample space - 2*b. Thanks a lot Professor ... You're awesome !!!!
@jjepsuomi
@jjepsuomi 11 жыл бұрын
Could someone please clarify the part about E_out in 52:50? I didn't understand the rest of the lecture even though I watched it many times :( Simple example?
@erenaydayanik
@erenaydayanik 9 жыл бұрын
Thank you for such a great video series, but I'm confused at a point. In the 4th slide of lecture-5, under the testing part, it is written that bad event probability should be less than or equal to vanilla-plain hoefding inequality (without M). But at the last slide of this lecture it is written that P(E_in(g)-E_out(g))
@AlexEx70
@AlexEx70 8 жыл бұрын
The point is that g is any(!) hypothesis picked out of all possible hypothesis set. We don't know what this hypothesis is. But we need a guarantee that when we use this particular learning model (which is charactarized by possible hypothesis set count) we can pick any hypothesis, and possibility that something goes wrong is predictable and limited.
@rahulkhapre9351
@rahulkhapre9351 6 жыл бұрын
Could someone explain what did he do when he took out 2 samples from a bin ?
@fierydino9402
@fierydino9402 3 жыл бұрын
For those who have hard time understanding B(n, k) like me, check www.cs.princeton.edu/courses/archive/spr08/cos511/scribe_notes/0220.pdf as well.
@thangbom4742
@thangbom4742 6 жыл бұрын
breakpoint idea is excellent. Give me a breakpoint, I can build a model that could learn as well as you want.
@vkvfoe
@vkvfoe 6 жыл бұрын
he is amazing!
@mohammedzidan1203
@mohammedzidan1203 11 жыл бұрын
محاضرات ممتازة يا دكتور وانا استفادت منها كتير والله
@3csgold450
@3csgold450 9 жыл бұрын
hi can u give an easiest example of theory of generalization at configuration? thank u so much
@gauravparmar1809
@gauravparmar1809 11 жыл бұрын
great explanation man!!
@yichen8884
@yichen8884 2 ай бұрын
25:30 remind me of of dynamic programming
@rolanvc
@rolanvc 8 жыл бұрын
This is a great set of videos! Thank you very much Prof Yaser! Question though, in the equation for the growth function, N is the number of points. For B(N,k), N is the number of columns ( dimensions?).. this is confusing. I imagine this will be substittuded into the Hoeffding's inequality, where N is the amount of data.. rows? Did I misunderstand something?
@TitusBigelow
@TitusBigelow 8 жыл бұрын
In B(N,k), N is still the number of points. x1, x2, x3, ..., xn are all input vectors. I think you were confusing these as scalar features (dimensions) on a single input. In this case, if we look at the first row, input x1 maps to +1, x2 maps to +1, ..., xn maps to +1. The dimensionality of the actual input x1 is irrelevant to this. Hope explained it well enough.
@developerpp395
@developerpp395 6 жыл бұрын
what is the reference text book for this class ?
@ZombieLincoln666
@ZombieLincoln666 6 жыл бұрын
Learning From Data
@chenyuzi
@chenyuzi 11 жыл бұрын
brilliant explanation in slide 10
@clarkupdike6518
@clarkupdike6518 2 жыл бұрын
Actually the algebra needed to prove the induction step isn't that bad... just had to plug in (N-1)! = N/N! and friends in a couple spots. The combinatorial explanation was harder to follow... at least for me.
@wimthiels638
@wimthiels638 8 жыл бұрын
thanks, very intuitive explanation !
@yusufahmed3223
@yusufahmed3223 8 жыл бұрын
Very good Lecturese
@aliakbarpamz
@aliakbarpamz 7 жыл бұрын
Explained well and can understand clearly
@malharjajoo7393
@malharjajoo7393 8 жыл бұрын
Great answers
@actuallyactuary2787
@actuallyactuary2787 3 жыл бұрын
I still havent understood the significance of having a polynomial. Could somebody please dumb it down for me? :(
@nguyensinhtu8248
@nguyensinhtu8248 6 жыл бұрын
i actually do not know how to apply these theory in practice
@soultouch08
@soultouch08 4 жыл бұрын
This is more about generalisation of classifiers and learning models. In practice you'll be actually using (probably already invented and tested) learning models (like neural network) but this is just a conceptual understanding for why those things can learn in the first place
@AlexEx70
@AlexEx70 8 жыл бұрын
Tried to watch at 0.5x speed to better understand the material, but didn't cope. Makes me laugh all the time. It's like Boris Eltsin reading the lecture)
@jmalbornoz
@jmalbornoz 11 жыл бұрын
Professor Yaser is the MacDaddy!
@thanhquocbaonguyen8379
@thanhquocbaonguyen8379 3 жыл бұрын
i almost bursted out laughing at the question the bottom line of the learnability of a model is that a learning model cannot learn everything ...
@brunor8674
@brunor8674 3 жыл бұрын
Ok.
@hariharanramamurthy9946
@hariharanramamurthy9946 4 жыл бұрын
this professor really confuses alpha and beta which made me struck in this concept and iam completly clueless to understand the logic
@andychan7350
@andychan7350 2 жыл бұрын
Good Content, but this professor's spoken English is hard to understand clearly.
@janasandeep
@janasandeep 7 жыл бұрын
1:15:35 👌
@sudn3682
@sudn3682 6 жыл бұрын
not clear at explaining B(N,K) evaluation.
@markh1462
@markh1462 6 жыл бұрын
Slide 10 is quite a sloppy way to write a proof though.. Otherwise, great lecture.
@prazman
@prazman 6 жыл бұрын
I still like the way he argued for it. Guess that's what he could fit into a 60 min class
@evinism
@evinism 6 жыл бұрын
another hypothesis. more. more. OH NO. WE ARE IN TROUBLE
@marcogelsomini7655
@marcogelsomini7655 2 жыл бұрын
ok that was hard
@samchan2535
@samchan2535 7 жыл бұрын
Why is the Prof so funny?
@iSohrab
@iSohrab 7 жыл бұрын
this kind of lectures need such a Prof, otherwise everybody will sleep :)
@MrCmon113
@MrCmon113 5 жыл бұрын
He isn't. VC theory is just an inherently funny subject.
@wallofstar
@wallofstar 7 жыл бұрын
Like his lecture... he somehow look like Mr.Bean
@abdulateek3735
@abdulateek3735 6 жыл бұрын
His nose only :p
@brainstormingsharing1309
@brainstormingsharing1309 4 жыл бұрын
Absolutely well done and definitely keep it up!!! 👍👍👍👍👍
Lecture 05 - Training Versus Testing
1:16:58
caltech
Рет қаралды 226 М.
Lecture 07 - The VC Dimension
1:13:31
caltech
Рет қаралды 191 М.
黑天使被操控了#short #angel #clown
00:40
Super Beauty team
Рет қаралды 61 МЛН
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
Game Theory
1:07:08
Yale University
Рет қаралды 589 М.
Lecture 12 - Regularization
1:15:14
caltech
Рет қаралды 135 М.
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 66 М.
Lecture 08 - Bias-Variance Tradeoff
1:16:51
caltech
Рет қаралды 164 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 4,2 МЛН
Lecture 13 - Validation
1:26:12
caltech
Рет қаралды 104 М.
Evolution of software architecture with the co-creator of UML (Grady Booch)
1:30:43
The Pragmatic Engineer
Рет қаралды 83 М.
Model Complexity and VC Dimension
21:20
Bert Huang
Рет қаралды 34 М.
Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
1:44:31