Not gonna lie that MNIST demo was probably like the coolest perceptron demo I've ever seen. Seeing how the weight vector actually updates and how it affects the classification was honestly kinda beautiful.
@yus-28953 жыл бұрын
I am working hard to stop being the software engineer that uses black boxes. Data Science is a big field and your classes are always the first thing I revisit for core concepts and intuitions. Thanks Professor.
@Ankansworld3 жыл бұрын
I started this playlist and slowly, getting more and more interested in the lectures. Thanking the Prof. a Kilian times :)
@StarzzLAB3 жыл бұрын
I am getting interested in the lectures incredibly quickly.
@varunjindal15203 жыл бұрын
Professor Killian classes are so interactive that I am also laughing when people are laughing. Explained in very simple yet very powerful way.
@wtjin76523 жыл бұрын
One very impressive attribute of these lectures is the incredibly clear handwriting not being possessed by vast majority of professors today.
@aayushchhabra74655 жыл бұрын
Thanks. Appreciate the hard work you put into this!
@rezaafra77032 жыл бұрын
Thank you professor Weinberger for these fantastic lectures.
@jianwenliu4585 жыл бұрын
Great demo, thank you
@naifalkhunaizi43723 жыл бұрын
"It can be very romantic if it's read the right way" (NOTHING more romantic than this lecture). Thanks Professor Kilian!
@insoucyant3 жыл бұрын
your lectures are unbelievably amazing. Thanks Prof :)
@jahnvi83735 ай бұрын
thank you for these lectures!
@java237911 ай бұрын
Thank you for presenting algorithms and Maths behind the scene. At last i could understand why it 'works' instead of trusting the algorithm. Many implementations introduce a learning rate multiplier which i now understand does not make sense for the 1 layer perceptron. If you wanna fix the 'bad coding practice' you mentioned in the video , just change the while(true)[...] ( which actually has no break so will probably loop forever ) into do { ... } while(m>0) and you are good to go.
@bnglr4 жыл бұрын
i give a like for this dude‘s accent
@vaibhavgupta9513 жыл бұрын
Professor, you are awesome!
@fulincai65862 жыл бұрын
Thank you, professor. Your teaching is so great!
@hello-pd7tc4 жыл бұрын
Day 5. great stuff.
@KulvinderSingh-pm7cr5 жыл бұрын
Notes here : www.cs.cornell.edu/courses/cs4780/2017sp/lectures/lecturenote03.html
@DiegoAToala4 жыл бұрын
Thank you, great lecture!
@zano73695 жыл бұрын
You rock! Can you post variations of your projects? Open Source Education for the win!
@kilianweinberger6985 жыл бұрын
Sorry, I am trying to find a way to make them public, but for now I cannot do it yet. The reason is that they are still used at Cornell, and solutions may leak.
@VIVEKKUMAR-kv8si Жыл бұрын
@@kilianweinberger698 Any chance they are public now? It would help people who are self studying.
@bharasiva964 жыл бұрын
Proof of convergence of perceptron begins at 43:13
@anirbanghosh63284 жыл бұрын
Where can we get the projects?
@stanislavzamecnik30493 жыл бұрын
Thanks for the lecture professor, it is great! I wanted to ask, at the time 38:30, when you were showing the bound for the number of mistakes, when we are classifying just one point, I suppose the proof was for w different than zero vector. Otherwise, the k would be bounded always just by zero. How would the proof change, with zero initialization?
@bharasiva964 жыл бұрын
The perceptron algorithm begins at 12:45
@VIVEKKUMAR-kv8si Жыл бұрын
At 37:40 isn't w(t+1) incorrectly labelled? That is the hyperplane. w(t+1) is the vector obtained by adding w(t) and x.
@durkaflip4 жыл бұрын
How does the initial zero vector ever get smaller than zero and how does the w^Tx
@kilianweinberger6984 жыл бұрын
It should be
@bharasiva964 жыл бұрын
@Killian Weinberger: At 2:46 I am a little confused with the explanation. I completely agree with the idea that when you have two points that are far off then using k-nearest neighbors is a little absurd since the assumption in k-nearest neighbors is that nearby points have the same labels and in our case the two points maybe closest to each other but objectively, they are too far for us to use the k-nearest neighbors algorithm. But the part after that is what confuses me a little. So the right most point is "x", let's call that point x1, and left most point is an "o", which we can call x2. And then you go on to say that around that x1 it's all "x", and around x2, it's all "o". But my question is, how can you say this since we just established that there are no other points closest to this x1 and x2 i.e. that there are no points between x1 and x2? Are you saying that if there were test points that were put near x1 or x2, they would be classified as an "x" or an "o" respectively? And if you are saying that after moving from x1 towards x2, you have no idea when you move from "x" label to "o" label aren't you assuming that you "x" and "o" are separated by a hyper plane which may not be the case?
@kilianweinberger6984 жыл бұрын
Well, it is not quite as binary. In low dimensional spaces (2d or 3d) locality can go a lot further and even e.g. 50 nearest neighbors may still be very similar. Hope this helps.
@bharasiva964 жыл бұрын
@@kilianweinberger698 I am sorry but I am still confused. I can't follow how that will answer my question though. I understand that in low dimensional spaces, the 50 nearest neighbors might be similar. But if I understand the problem at 2:46 correctly, isn't that example talking about higher dimensional spaces since the points were far apart due to the curse of dimensionality?
@iasudpfiajsdfa4 жыл бұрын
"But my question is, how can you say this since we just established that there are no other points closest to this x1 and x2 i.e. that there are no points between x1 and x2? Are you saying that if there were test points that were put near x1 or x2, they would be classified as an "x" or an "o" respectively?" - Yes, I think that is what he means. Each position in the vector space has a label, we just don't know what that label is. So we collect a few samples and then guess new points based on the K nearest neighbors. The problem is, that the few samples we collected are (probably) spread really far apart because the vector space is so large. As we try and select points on the 'interior' we have trouble determining exactly where the points change from X's to O's. I don't think there is necessarily a nice, clean hyperplane separating those X's and O's; we could also have a curve separating them or any other weird shape; the problem is that from a few datapoints scattered around the vector space we can't tell what is actually happening in these large gaps. This is how I understood it at least. Hope it helps.
@bharasiva964 жыл бұрын
@@iasudpfiajsdfa Thank you so much for the reply. I was also thinking maybe that's what he was talking about. I have to accept though, I didn't consider the fact that the labels associated with the spaces could also be separated by curves and not just planes although in hindsight it looks obvious and I look a little stupid now for not considering it. Thanks again. Your answer was really helpful.
@bharasiva964 жыл бұрын
@@iasudpfiajsdfa Jacob Stewart Just one more point to add here, in this argument, we are still making the assumption that nearby points behave similarly correct? That is, when you are saying there is a curve separating the Xs and Os, you are assuming that somehow locally, the points will have the same label. You just don't know how the local behavior changes from one label to another on a global scale.
@eswaraprasadp79305 жыл бұрын
In the demo, I didn't understand how the 0s and 7s are being classified to one side after it finds the classification is wrong and adding or subtracting the pixel values. Can somebody explain this?
@SundaraRamanR4 жыл бұрын
It's the same thing that happens with circles and x's before. The pixel values are elements of a 256-dimensional vector that describes a digit's image. By adding each of those vectors to our w we are creating a 255-dimensional hyperplane that separates the 7-images area and the 0-images area.
@hdang19974 жыл бұрын
@@SundaraRamanR Sir, then how is it that the images are moving up and down, and not the hyperplane?
@evanm20243 жыл бұрын
The vertical axis represents the dot product wTx
@miguelduqueb70652 жыл бұрын
I am very happy with these lectures because they offer both intuition and mathematical description. Thank you for uploading! I am watching them and taking notes eagerly. Question about this lecture: At time 39:00, I think the inequality should be >0, because before adding vector x k times we had w*x
@goldencircle4331 Жыл бұрын
After k times getting wrong we get the right class for x (i.e., in a {k+1}th time we correctly classify x). Therefore it is an upper bound.
@vladimirdyagilev89465 жыл бұрын
Is it enough to watch the lectures or should we be completing the required readings?
@kilianweinberger6985 жыл бұрын
I would recommend reading through the class notes. If you understood everything just from viewing the lectures, then it shouldn’t take you very long. If not, you see what you missed.
@JoaoVitorBRgomes3 жыл бұрын
At circa 25:30 if you say the margin is bigger it is faster to converge (less number of iterations), does it mean that one hot encoding (farms on kansas) will help the perceptron (or linear regression perhaps) find faster the hyperplane that separates the data?
@kilianweinberger6983 жыл бұрын
Sometimes. A one-hot encoding increases the dimensionality which can help increase the margin. But keep in mind that you also re-scale the data to the unit-sphere.
@JoaoVitorBRgomes3 жыл бұрын
@@kilianweinberger698 thank you, professor, although we are thousands of kilometers apart, I give you a virtual handshake of gratitude!
@krsnandhan31072 жыл бұрын
Hi, Thank you for the amazing lecture. I have a small doubt regarding the assumption of the Perceptron. How can we check that there exists a hyperplane which is linearly separable, before using the algorithm?
@kilianweinberger6982 жыл бұрын
Well, that's the weakness of the Perceptron. If there is no such hyperplane, it will never converge. That's why people use Logistic Regression or SVM in practice.
@deltasun4 жыл бұрын
great lecture, thank you! one curiosity: the fact that perceptron converges only in the linearly separable case depends specifically on the descent algorithm (actually an SGD), or depends on the loss function itslef? Namely, the loss function is (i guess) -\sum_i y_i (wx_i) where sum is over misclassified points only; i.e. we are minimizing the distance from the boundary of misclassified points (perfectly intuitive thing to do, actually). If I minimize this with, say, a Newton method, does it converge in the linearly non-separable case?
@TheTacticalDood Жыл бұрын
As far as I know, there are variants of the Perceptron algorithm that converge even in the case of linearly non-separable case. However, of course, the resulting hyperplane of such approaches will never reach a zero misclassification error. So, yes it depends on the update rule or the descent algorithm as you named it not on the error.
@SundaraRamanR4 жыл бұрын
Just finished this lecture, now I'm looking forward to at least one Valentine poem about this proof in the next session. I'll be disappointed if there isn't one.
@cuysaurus4 жыл бұрын
How would it work if the 2 classes were in the same quadrant, e.g., the fisrt quadrant.
@kilianweinberger6984 жыл бұрын
As long as they are linearly separable, it will always work.
@shahedulislam945 жыл бұрын
37:35 Isn't wt+1 the new hyperplane and the sum of the vectors the new w?
@kilianweinberger6985 жыл бұрын
w_{t+1} should be the new hyperplane
@Rizwankhan20003 жыл бұрын
Great lecture. Is it possible to download Matlab code used for demo?
@Rizwankhan20003 жыл бұрын
Ok, I coded my self :-) kzbin.info/www/bejne/aqjbpIZ-oqZmoMU
@AbdullahPunctureWale-II5 жыл бұрын
Professor, thanks for the videos... how do we master the math behind all these...
@AbdullahPunctureWale-II5 жыл бұрын
Finally cracked perceptron and svm math behind... looking for to watch your sessions soon... thank you... 😂😂😂
@hdang19974 жыл бұрын
@@AbdullahPunctureWale-II How and from where?
@AbdullahPunctureWale-II4 жыл бұрын
@@hdang1997 i did on an excel
@hdang19974 жыл бұрын
@@AbdullahPunctureWale-II Can you please share it with me? my email id is 1705138@kiit.ac.in
@DommageCollateral11 ай бұрын
mach mal bitte ein kleines video auf deutsch wo du zeigst wie man mit pytorch eine ki erzeugst. es gibt einfach kaum kurse auf deutsch
@rishabhpoddar15 жыл бұрын
why is the hyperplane perpendicular to the w vector?
@Donutswithlazerz5 жыл бұрын
So, you can think of the definition of orthogonal vectors, which is that the dot product of the vectors is 0
@SundaraRamanR4 жыл бұрын
By definition. There's always some vector that's perpendicular to the hyperplane, and we choose to call that w.
@xiaoweidu46673 жыл бұрын
stupid questions fragmented a nice lecture.
@anonymousperson9757 Жыл бұрын
25:57 Are we ever going to find out what passiveaggressive.m does?