No video

Lecture 5 "Perceptron" -Cornell CS4780 SP17

  Рет қаралды 43,687

Kilian Weinberger

Kilian Weinberger

Күн бұрын

Пікірлер: 71
@emmafountain2059
@emmafountain2059 11 ай бұрын
Not gonna lie that MNIST demo was probably like the coolest perceptron demo I've ever seen. Seeing how the weight vector actually updates and how it affects the classification was honestly kinda beautiful.
@yus-2895
@yus-2895 3 жыл бұрын
I am working hard to stop being the software engineer that uses black boxes. Data Science is a big field and your classes are always the first thing I revisit for core concepts and intuitions. Thanks Professor.
@Ankansworld
@Ankansworld 3 жыл бұрын
I started this playlist and slowly, getting more and more interested in the lectures. Thanking the Prof. a Kilian times :)
@StarzzLAB
@StarzzLAB 3 жыл бұрын
I am getting interested in the lectures incredibly quickly.
@varunjindal1520
@varunjindal1520 3 жыл бұрын
Professor Killian classes are so interactive that I am also laughing when people are laughing. Explained in very simple yet very powerful way.
@wtjin7652
@wtjin7652 3 жыл бұрын
One very impressive attribute of these lectures is the incredibly clear handwriting not being possessed by vast majority of professors today.
@aayushchhabra7465
@aayushchhabra7465 5 жыл бұрын
Thanks. Appreciate the hard work you put into this!
@rezaafra7703
@rezaafra7703 2 жыл бұрын
Thank you professor Weinberger for these fantastic lectures.
@jianwenliu458
@jianwenliu458 5 жыл бұрын
Great demo, thank you
@naifalkhunaizi4372
@naifalkhunaizi4372 3 жыл бұрын
"It can be very romantic if it's read the right way" (NOTHING more romantic than this lecture). Thanks Professor Kilian!
@insoucyant
@insoucyant 3 жыл бұрын
your lectures are unbelievably amazing. Thanks Prof :)
@jahnvi8373
@jahnvi8373 5 ай бұрын
thank you for these lectures!
@java2379
@java2379 11 ай бұрын
Thank you for presenting algorithms and Maths behind the scene. At last i could understand why it 'works' instead of trusting the algorithm. Many implementations introduce a learning rate multiplier which i now understand does not make sense for the 1 layer perceptron. If you wanna fix the 'bad coding practice' you mentioned in the video , just change the while(true)[...] ( which actually has no break so will probably loop forever ) into do { ... } while(m>0) and you are good to go.
@bnglr
@bnglr 4 жыл бұрын
i give a like for this dude‘s accent
@vaibhavgupta951
@vaibhavgupta951 3 жыл бұрын
Professor, you are awesome!
@fulincai6586
@fulincai6586 2 жыл бұрын
Thank you, professor. Your teaching is so great!
@hello-pd7tc
@hello-pd7tc 4 жыл бұрын
Day 5. great stuff.
@KulvinderSingh-pm7cr
@KulvinderSingh-pm7cr 5 жыл бұрын
Notes here : www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote03.html
@DiegoAToala
@DiegoAToala 4 жыл бұрын
Thank you, great lecture!
@zano7369
@zano7369 5 жыл бұрын
You rock! Can you post variations of your projects? Open Source Education for the win!
@kilianweinberger698
@kilianweinberger698 5 жыл бұрын
Sorry, I am trying to find a way to make them public, but for now I cannot do it yet. The reason is that they are still used at Cornell, and solutions may leak.
@VIVEKKUMAR-kv8si
@VIVEKKUMAR-kv8si Жыл бұрын
@@kilianweinberger698 Any chance they are public now? It would help people who are self studying.
@bharasiva96
@bharasiva96 4 жыл бұрын
Proof of convergence of perceptron begins at 43:13
@anirbanghosh6328
@anirbanghosh6328 4 жыл бұрын
Where can we get the projects?
@stanislavzamecnik3049
@stanislavzamecnik3049 3 жыл бұрын
Thanks for the lecture professor, it is great! I wanted to ask, at the time 38:30, when you were showing the bound for the number of mistakes, when we are classifying just one point, I suppose the proof was for w different than zero vector. Otherwise, the k would be bounded always just by zero. How would the proof change, with zero initialization?
@bharasiva96
@bharasiva96 4 жыл бұрын
The perceptron algorithm begins at 12:45
@VIVEKKUMAR-kv8si
@VIVEKKUMAR-kv8si Жыл бұрын
At 37:40 isn't w(t+1) incorrectly labelled? That is the hyperplane. w(t+1) is the vector obtained by adding w(t) and x.
@durkaflip
@durkaflip 4 жыл бұрын
How does the initial zero vector ever get smaller than zero and how does the w^Tx
@kilianweinberger698
@kilianweinberger698 4 жыл бұрын
It should be
@bharasiva96
@bharasiva96 4 жыл бұрын
@Killian Weinberger: At 2:46 I am a little confused with the explanation. I completely agree with the idea that when you have two points that are far off then using k-nearest neighbors is a little absurd since the assumption in k-nearest neighbors is that nearby points have the same labels and in our case the two points maybe closest to each other but objectively, they are too far for us to use the k-nearest neighbors algorithm. But the part after that is what confuses me a little. So the right most point is "x", let's call that point x1, and left most point is an "o", which we can call x2. And then you go on to say that around that x1 it's all "x", and around x2, it's all "o". But my question is, how can you say this since we just established that there are no other points closest to this x1 and x2 i.e. that there are no points between x1 and x2? Are you saying that if there were test points that were put near x1 or x2, they would be classified as an "x" or an "o" respectively? And if you are saying that after moving from x1 towards x2, you have no idea when you move from "x" label to "o" label aren't you assuming that you "x" and "o" are separated by a hyper plane which may not be the case?
@kilianweinberger698
@kilianweinberger698 4 жыл бұрын
Well, it is not quite as binary. In low dimensional spaces (2d or 3d) locality can go a lot further and even e.g. 50 nearest neighbors may still be very similar. Hope this helps.
@bharasiva96
@bharasiva96 4 жыл бұрын
@@kilianweinberger698 I am sorry but I am still confused. I can't follow how that will answer my question though. I understand that in low dimensional spaces, the 50 nearest neighbors might be similar. But if I understand the problem at 2:46 correctly, isn't that example talking about higher dimensional spaces since the points were far apart due to the curse of dimensionality?
@iasudpfiajsdfa
@iasudpfiajsdfa 4 жыл бұрын
"But my question is, how can you say this since we just established that there are no other points closest to this x1 and x2 i.e. that there are no points between x1 and x2? Are you saying that if there were test points that were put near x1 or x2, they would be classified as an "x" or an "o" respectively?" - Yes, I think that is what he means. Each position in the vector space has a label, we just don't know what that label is. So we collect a few samples and then guess new points based on the K nearest neighbors. The problem is, that the few samples we collected are (probably) spread really far apart because the vector space is so large. As we try and select points on the 'interior' we have trouble determining exactly where the points change from X's to O's. I don't think there is necessarily a nice, clean hyperplane separating those X's and O's; we could also have a curve separating them or any other weird shape; the problem is that from a few datapoints scattered around the vector space we can't tell what is actually happening in these large gaps. This is how I understood it at least. Hope it helps.
@bharasiva96
@bharasiva96 4 жыл бұрын
@@iasudpfiajsdfa Thank you so much for the reply. I was also thinking maybe that's what he was talking about. I have to accept though, I didn't consider the fact that the labels associated with the spaces could also be separated by curves and not just planes although in hindsight it looks obvious and I look a little stupid now for not considering it. Thanks again. Your answer was really helpful.
@bharasiva96
@bharasiva96 4 жыл бұрын
@@iasudpfiajsdfa Jacob Stewart Just one more point to add here, in this argument, we are still making the assumption that nearby points behave similarly correct? That is, when you are saying there is a curve separating the Xs and Os, you are assuming that somehow locally, the points will have the same label. You just don't know how the local behavior changes from one label to another on a global scale.
@eswaraprasadp7930
@eswaraprasadp7930 5 жыл бұрын
In the demo, I didn't understand how the 0s and 7s are being classified to one side after it finds the classification is wrong and adding or subtracting the pixel values. Can somebody explain this?
@SundaraRamanR
@SundaraRamanR 4 жыл бұрын
It's the same thing that happens with circles and x's before. The pixel values are elements of a 256-dimensional vector that describes a digit's image. By adding each of those vectors to our w we are creating a 255-dimensional hyperplane that separates the 7-images area and the 0-images area.
@hdang1997
@hdang1997 4 жыл бұрын
@@SundaraRamanR Sir, then how is it that the images are moving up and down, and not the hyperplane?
@evanm2024
@evanm2024 3 жыл бұрын
The vertical axis represents the dot product wTx
@miguelduqueb7065
@miguelduqueb7065 2 жыл бұрын
I am very happy with these lectures because they offer both intuition and mathematical description. Thank you for uploading! I am watching them and taking notes eagerly. Question about this lecture: At time 39:00, I think the inequality should be >0, because before adding vector x k times we had w*x
@goldencircle4331
@goldencircle4331 Жыл бұрын
After k times getting wrong we get the right class for x (i.e., in a {k+1}th time we correctly classify x). Therefore it is an upper bound.
@vladimirdyagilev8946
@vladimirdyagilev8946 5 жыл бұрын
Is it enough to watch the lectures or should we be completing the required readings?
@kilianweinberger698
@kilianweinberger698 5 жыл бұрын
I would recommend reading through the class notes. If you understood everything just from viewing the lectures, then it shouldn’t take you very long. If not, you see what you missed.
@JoaoVitorBRgomes
@JoaoVitorBRgomes 3 жыл бұрын
At circa 25:30 if you say the margin is bigger it is faster to converge (less number of iterations), does it mean that one hot encoding (farms on kansas) will help the perceptron (or linear regression perhaps) find faster the hyperplane that separates the data?
@kilianweinberger698
@kilianweinberger698 3 жыл бұрын
Sometimes. A one-hot encoding increases the dimensionality which can help increase the margin. But keep in mind that you also re-scale the data to the unit-sphere.
@JoaoVitorBRgomes
@JoaoVitorBRgomes 3 жыл бұрын
@@kilianweinberger698 thank you, professor, although we are thousands of kilometers apart, I give you a virtual handshake of gratitude!
@krsnandhan3107
@krsnandhan3107 2 жыл бұрын
Hi, Thank you for the amazing lecture. I have a small doubt regarding the assumption of the Perceptron. How can we check that there exists a hyperplane which is linearly separable, before using the algorithm?
@kilianweinberger698
@kilianweinberger698 2 жыл бұрын
Well, that's the weakness of the Perceptron. If there is no such hyperplane, it will never converge. That's why people use Logistic Regression or SVM in practice.
@deltasun
@deltasun 4 жыл бұрын
great lecture, thank you! one curiosity: the fact that perceptron converges only in the linearly separable case depends specifically on the descent algorithm (actually an SGD), or depends on the loss function itslef? Namely, the loss function is (i guess) -\sum_i y_i (wx_i) where sum is over misclassified points only; i.e. we are minimizing the distance from the boundary of misclassified points (perfectly intuitive thing to do, actually). If I minimize this with, say, a Newton method, does it converge in the linearly non-separable case?
@TheTacticalDood
@TheTacticalDood Жыл бұрын
As far as I know, there are variants of the Perceptron algorithm that converge even in the case of linearly non-separable case. However, of course, the resulting hyperplane of such approaches will never reach a zero misclassification error. So, yes it depends on the update rule or the descent algorithm as you named it not on the error.
@SundaraRamanR
@SundaraRamanR 4 жыл бұрын
Just finished this lecture, now I'm looking forward to at least one Valentine poem about this proof in the next session. I'll be disappointed if there isn't one.
@cuysaurus
@cuysaurus 4 жыл бұрын
How would it work if the 2 classes were in the same quadrant, e.g., the fisrt quadrant.
@kilianweinberger698
@kilianweinberger698 4 жыл бұрын
As long as they are linearly separable, it will always work.
@shahedulislam94
@shahedulislam94 5 жыл бұрын
37:35 Isn't wt+1 the new hyperplane and the sum of the vectors the new w?
@kilianweinberger698
@kilianweinberger698 5 жыл бұрын
w_{t+1} should be the new hyperplane
@Rizwankhan2000
@Rizwankhan2000 3 жыл бұрын
Great lecture. Is it possible to download Matlab code used for demo?
@Rizwankhan2000
@Rizwankhan2000 3 жыл бұрын
Ok, I coded my self :-) kzbin.info/www/bejne/aqjbpIZ-oqZmoMU
@AbdullahPunctureWale-II
@AbdullahPunctureWale-II 5 жыл бұрын
Professor, thanks for the videos... how do we master the math behind all these...
@AbdullahPunctureWale-II
@AbdullahPunctureWale-II 5 жыл бұрын
Finally cracked perceptron and svm math behind... looking for to watch your sessions soon... thank you... 😂😂😂
@hdang1997
@hdang1997 4 жыл бұрын
@@AbdullahPunctureWale-II How and from where?
@AbdullahPunctureWale-II
@AbdullahPunctureWale-II 4 жыл бұрын
@@hdang1997 i did on an excel
@hdang1997
@hdang1997 4 жыл бұрын
@@AbdullahPunctureWale-II Can you please share it with me? my email id is 1705138@kiit.ac.in
@DommageCollateral
@DommageCollateral 11 ай бұрын
mach mal bitte ein kleines video auf deutsch wo du zeigst wie man mit pytorch eine ki erzeugst. es gibt einfach kaum kurse auf deutsch
@rishabhpoddar1
@rishabhpoddar1 5 жыл бұрын
why is the hyperplane perpendicular to the w vector?
@Donutswithlazerz
@Donutswithlazerz 5 жыл бұрын
So, you can think of the definition of orthogonal vectors, which is that the dot product of the vectors is 0
@SundaraRamanR
@SundaraRamanR 4 жыл бұрын
By definition. There's always some vector that's perpendicular to the hyperplane, and we choose to call that w.
@xiaoweidu4667
@xiaoweidu4667 3 жыл бұрын
stupid questions fragmented a nice lecture.
@anonymousperson9757
@anonymousperson9757 Жыл бұрын
25:57 Are we ever going to find out what passiveaggressive.m does?
Lecture 4 "Curse of Dimensionality / Perceptron" -Cornell CS4780 SP17
47:43
Machine Learning Lecture 32 "Boosting" -Cornell CS4780 SP17
48:27
Kilian Weinberger
Рет қаралды 34 М.
女孩妒忌小丑女? #小丑#shorts
00:34
好人小丑
Рет қаралды 76 МЛН
Look at two different videos 😁 @karina-kola
00:11
Andrey Grechka
Рет қаралды 15 МЛН
PEDRO PEDRO INSIDEOUT
00:10
MOOMOO STUDIO [무무 스튜디오]
Рет қаралды 8 МЛН
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 497 М.
Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17
52:41
Perceptrons V Neurons - Machine Learning Is Not Like Your Brain #2
5:33
Future AI Society
Рет қаралды 7 М.
Eugene Dynkin: Seventy Years in Mathematics
1:16:02
Cornell University
Рет қаралды 3,9 М.
Machine Learning Lecture 33 "Boosting Continued" -Cornell CS4780 SP17
47:54
Machine Learning Lecture 30 "Bagging" -Cornell CS4780 SP17
49:43
Kilian Weinberger
Рет қаралды 24 М.
Machine Learning Lecture 31 "Random Forests / Bagging" -Cornell CS4780 SP17
47:25
Lecture 1 | The Perceptron - History, Discovery, and Theory
1:09:13
Carnegie Mellon University Deep Learning
Рет қаралды 39 М.
Machine Learning Lecture 10 "Naive Bayes continued" -Cornell CS4780 SP17
48:33
女孩妒忌小丑女? #小丑#shorts
00:34
好人小丑
Рет қаралды 76 МЛН