Machine Learning Lecture 14 "(Linear) Support Vector Machines" -Cornell CS4780 SP17

Рет қаралды 43,608

Күн бұрын

Пікірлер: 72

@pvr2iitd 4 жыл бұрын

Exactly what I was looking for. I searched a lot, as usual started with CS229 (Stanford), moved to AI (MIT), but this by far is the most precise explanation of SVMs (particularly the maths behind it). Thanks.

@andreamercuri9984 4 жыл бұрын

One of the best machine learning course I've have ever attended. "Maximization is for losers" is a gem! 😂. Thank you very much Professor Kilian, you're great

@vaibhavsingh8715 2 жыл бұрын

I always reflexly raise my hand whenever he says "raise your hand if you are with me". Thank you so much Professor. These videos are treasure trove. I wish I was there in the class.

@nikhilsaini6441 4 жыл бұрын

The quality of these lectures is so good that I hit the "Like" button first and then watch the video. Thank you Prof.

@jiahao2709 5 жыл бұрын

11:50 SVM start

@nawafalrasheed109 4 жыл бұрын

Thank you :)

@178msut 2 жыл бұрын

Using this to supplement Cornell's current CS 4780 professor's lectures and I'm finding them to be more helpful tbh. Excellent quality and passion.

@hamzak93 9 ай бұрын

Can't thank you enough for these lectures Professor Weinberger!

@hamzaleb9215 5 жыл бұрын

The best course on SVM. Thank you Kilian.

@Went12435 2 жыл бұрын

The best SVM explanation and derivation that I have ever found on KZbin

@florianellsaesser4165 4 жыл бұрын

Amazing lectures. So well explained and great humour. Thank you!

@darshansolanki5535 3 жыл бұрын

The best of SVM, I have ever seen derived by anyone

@SAINIVEDH 4 жыл бұрын

Ahh..Felt like watching a movie. Best intro to SVM on YT

@saikumartadi8494 4 жыл бұрын

it would be great if you make the course assignments available to the youtube audience as well. thanks a lot for the video, as always its fabulous:)

@rajdeepgdgd 4 жыл бұрын

www.dropbox.com/s/tbxnjzk5w67u0sp/Homeworks.zip?dl=0&file_subpath=%2F2018Spring

@khamphacuocsong214 2 жыл бұрын

Your lesson clear my understanding about SVM. Thanks you so much.

@in100seconds5 4 жыл бұрын

Dear Kilian, please share other courses that you teach. It is a wonderful resource.

@kilianweinberger698 4 жыл бұрын

I will. Unfortunately this is the only one that was recorded so far.

@deepfakevasmoy3477 4 жыл бұрын

exactly :) that would be so great!

@DavesTechChannel 5 жыл бұрын

Great teacher

@yuniyunhaf5767 5 жыл бұрын

he made ML simple. thank you

@prwi87 Жыл бұрын

6:00 it may be the case that the notation have changed, but there is no such thing (at least as of today) as (y - Xw)^2, since we cannot square the vectors, it should be ||y - Xw||^2, the squared l2-norm. 23:00 the method of finding the length between the point and the hyperplane was very clever, but nominator should be the absolute value of w.Tx+b since the distance must be always positive. I think that there is a simpler method, it requires more linear algebra so it may be the case that Professor took this approach. 37:10 i have denoted it as "brilliant move !!" in my notes!

@rajeshs2840 5 жыл бұрын

I raise my hands... Mr. Kilian ...

@ankurnath695 4 жыл бұрын

great lecture, sir!

@abs4413 Жыл бұрын

Dear Professor. At 35:40 you talk of the 'trick' to rescale w and b such that MIN |wx + b| = 1 (over all data points). Is it not more accurate to state that we do not rescale w and b for this trick, but rather that we choose b such that our trick works? The outer maximization changes w to minimize the norm and thus the direction of the decision boundary, while our value for b is such that MIN |wx + b| = 1. With direction above I mean that w defines the decision boundary (perpendicular to it) while b can only make the decision boundary move in a parallel direction. I hope I have explained myself clearly. Thank you for your lectures!

@kilianweinberger698 Жыл бұрын

Yes that’s fair.

@kunindsahu974 3 жыл бұрын

I have to ask though, do support vector machines still find much application today, since they are outclassed on structured data by ensemble methods and even on unstructured data, deep learning outperforms them.

@imedkhabbouchi2161 2 жыл бұрын

Very good lecture, many thanks for the explanations as well for the humor :D Could you please share the demo of this class?

@saquibmansoor7146 3 жыл бұрын

Awesome Lecture.. Sir, the link mentioned for Ben Taskar's notes on your webpage is not working.

@JoaoVitorBRgomes 3 жыл бұрын

@33:20 , @kilian Weinberger, why you have to multiply by yi? Wouldn't be enough to solve Wt*xi + b => 0 ?

@kilianweinberger698 3 жыл бұрын

So if y=+1 you want wx+b>0 and if y=-1 you want wx+b0. And if your want to make sure they are unambiguously positive and negative (not zero), you enforce yi(Wxi+b)>1.

@JoaoVitorBRgomes 3 жыл бұрын

@@kilianweinberger698 is yi*(Wxi + b)=>1 hard margin classification? Also, => 1 was it arbitrarily choosen or it is because we normalized it? Dividing by the largest norm of W?

@kilianweinberger698 3 жыл бұрын

Yes, and yes. It is the hard margin formulation. If you were to make the margin C (where C is any positive constant), you could always divide both sides of the inequality by C and obtain an inequality >=1, and a slightly rescaled w and b.

@anunaysanganal 4 жыл бұрын

Wouldn't SVM's by default over-fit the data? As we are trying to find a separating hyperplane in infinite dimensions (if we use rbf), we are bound to find a hyperplane that separates the classes perfectly. Hence, aren't we essentially over-fitting the data?

@kilianweinberger698 4 жыл бұрын

Yes, and no. You can always get 0% training error (provided all points are unique), but by searching for the maximum margin hyperplane you also regularize your solution (i.e. make it simpler). Allowing slack variables allows you to find a simpler solution (larger margin) at the cost of potentially misclassifying some of the training points. So, setting the regularization hyper-parameter (C or lambda) correctly is crucial.

@anunaysanganal 4 жыл бұрын

@@kilianweinberger698 Thank you very much! I get it now.

@MrSirawichj 4 жыл бұрын

in margin equation, in denominator how the legnth of w it become wTw without square root

@yasaradeel 4 жыл бұрын

wTw is a positive value. Maximising an increasing function's square root is identical to maximising the function itself.

@lvlanson 2 жыл бұрын

Dear Mr Weinberger, first of all big thanks for this very good lecture. You explain this very nicely and it is easy to understand. I have a question for 34:41. Why did you drop the root for w^T w? Normally for the norm in the euclidean space we have p=2 for the minkowsky norm, such that we should have a root. Why is it okay to drop it? All the best from Germany :)

@kilianweinberger698 2 жыл бұрын

I just squared both sides of the equation. Roots are kind of a pain to deal with. :-)

@YassineFAQIR-b4l 10 ай бұрын

Amazing content for free. 😎

@mostafaatallah7001 4 жыл бұрын

Thank you very much, Sir, for this remarkable lectures I would be grateful if you answer my following question: In the lecture you mentioned that (wTxi+b) must be >= 0, my question is does that imply that it is okay to have points on the decision boundary where wT xi +b = 0 ? should the correct case be that wTxi+b be only greater than >0 as in the Perceptron algorithm? Again thank you very much Sir, you are an awesome teacher

@kilianweinberger698 4 жыл бұрын

For the Perceptron this is important to avoid a situation where you perform no update (e.g. if you initialize with w=0, the all-zeros vector). In the case of SVM this is not so critical. If your inputs have w'x+b=0 you would still incur a loss, and the optimization problem would try to move it to the correct side.

@mostafaatallah7001 4 жыл бұрын

@@kilianweinberger698 Thank you very much

@JoaoVitorBRgomes 3 жыл бұрын

@killian weinberger : at circa 6:59 you ask if we have questions... yes. Where the wT came from? You were dealing only with xT, xW and y variables! How tha wT popped up there?

@kilianweinberger698 3 жыл бұрын

(Xw-Y)'=w'X'-Y' (where ' is the transpose operator)

@JoaoVitorBRgomes 3 жыл бұрын

@@kilianweinberger698 I thought you had wrriten Xw , not : X (dot) w .

@abhishekbalawan6817 3 жыл бұрын

Sir, @ around 36.00, you mention rescaling w and b so that minimisation function reduces to value 1. But, are we not supposed to scale w outside minimisation function also? I mean, why are we not changing w in (max 1/w(t).w ) simultaneously?

@kilianweinberger698 3 жыл бұрын

Yes, and indeed the re-scaling does happen everywhere through the constraint. One way to think about it is that beforehand we could change the orientation of w and the scale of w. Now the scale of w is fixed (through a constraint) but the orientation is still something we are optimizing. So the objective function now has one less degree of freedom. I hope this helps.

@abhishekbalawan6817 3 жыл бұрын

@@kilianweinberger698 thank you for the reply sir. My understanding is that w is expressed in terms of variables rather than fixed numeric quantity and both it's direction and magnitude are dependent upon those variables which need to be considered. Correct me if I am wrong.

@sonalika2405 4 жыл бұрын

Sir, can you please provide the link to your matrix cookbook?

@kilianweinberger698 4 жыл бұрын

www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

@akashkhunt4191 4 жыл бұрын

Sir What is Integer SVM. Is it possible to get an Integer margin for an svm or weights in integers?

@kilianweinberger698 4 жыл бұрын

Not sure, but it is not very popular. ;-)

@bharatbajoria 4 жыл бұрын

In norm of d shouldn't there be radical sign with wTw in denominator around 38:00 ?

@SureshKumar-yc5lc 4 жыл бұрын

Doesn't matter ...wTw is larger than sqrt(wTw)..so if you minimize wTw you're even minimizing sqrt(wTw)....also wTw has good mathematical properties...convex optimization!!..Hope this helps...!! Cheers from folks @IISc

@bharatbajoria 4 жыл бұрын

@@SureshKumar-yc5lc got it, thank you.

@spartacusche 4 жыл бұрын

37:00 I don't understand how the min wx+b =1

@kilianweinberger698 4 жыл бұрын

Oh, that comes from the constraint and the objective. Assume for contradiction that wx+b=c where c>1. Then you could define a new hyperplane with parameters w/c and b/c, which has a lower objective value ||w/c||^2

@rahulverma-fg1ou 2 жыл бұрын

The demo is missing?

@jaydhanwant4072 3 жыл бұрын

Alright, welcome everybody

@sagar35756 5 жыл бұрын

Wow

@vatsan16 4 жыл бұрын

wait so if someone publishes a paper that uses SVM, the patent troll could sue them???

@kilianweinberger698 4 жыл бұрын

Actually, I believe the patent has run out a couple of years ago. So you are good to go :-)

@saikrishna-ee9mz 5 жыл бұрын

cant it be simplified when X is a square martrix? in computing w in closed form

@ugurkap 5 жыл бұрын

Yes you can simplify it if X is square in theory but in reality, that would never happen because it means you have as much features as you have training examples. You can't get a healthy result.