Exactly what I was looking for. I searched a lot, as usual started with CS229 (Stanford), moved to AI (MIT), but this by far is the most precise explanation of SVMs (particularly the maths behind it). Thanks.
@andreamercuri99844 жыл бұрын
One of the best machine learning course I've have ever attended. "Maximization is for losers" is a gem! 😂. Thank you very much Professor Kilian, you're great
@vaibhavsingh87152 жыл бұрын
I always reflexly raise my hand whenever he says "raise your hand if you are with me". Thank you so much Professor. These videos are treasure trove. I wish I was there in the class.
@nikhilsaini64414 жыл бұрын
The quality of these lectures is so good that I hit the "Like" button first and then watch the video. Thank you Prof.
@jiahao27095 жыл бұрын
11:50 SVM start
@nawafalrasheed1094 жыл бұрын
Thank you :)
@178msut2 жыл бұрын
Using this to supplement Cornell's current CS 4780 professor's lectures and I'm finding them to be more helpful tbh. Excellent quality and passion.
@hamzak939 ай бұрын
Can't thank you enough for these lectures Professor Weinberger!
@hamzaleb92155 жыл бұрын
The best course on SVM. Thank you Kilian.
@Went124352 жыл бұрын
The best SVM explanation and derivation that I have ever found on KZbin
@florianellsaesser41654 жыл бұрын
Amazing lectures. So well explained and great humour. Thank you!
@darshansolanki55353 жыл бұрын
The best of SVM, I have ever seen derived by anyone
@SAINIVEDH4 жыл бұрын
Ahh..Felt like watching a movie. Best intro to SVM on YT
@saikumartadi84944 жыл бұрын
it would be great if you make the course assignments available to the youtube audience as well. thanks a lot for the video, as always its fabulous:)
Your lesson clear my understanding about SVM. Thanks you so much.
@in100seconds54 жыл бұрын
Dear Kilian, please share other courses that you teach. It is a wonderful resource.
@kilianweinberger6984 жыл бұрын
I will. Unfortunately this is the only one that was recorded so far.
@deepfakevasmoy34774 жыл бұрын
exactly :) that would be so great!
@DavesTechChannel5 жыл бұрын
Great teacher
@yuniyunhaf57675 жыл бұрын
he made ML simple. thank you
@prwi87 Жыл бұрын
6:00 it may be the case that the notation have changed, but there is no such thing (at least as of today) as (y - Xw)^2, since we cannot square the vectors, it should be ||y - Xw||^2, the squared l2-norm. 23:00 the method of finding the length between the point and the hyperplane was very clever, but nominator should be the absolute value of w.Tx+b since the distance must be always positive. I think that there is a simpler method, it requires more linear algebra so it may be the case that Professor took this approach. 37:10 i have denoted it as "brilliant move !!" in my notes!
@rajeshs28405 жыл бұрын
I raise my hands... Mr. Kilian ...
@ankurnath6954 жыл бұрын
great lecture, sir!
@abs4413 Жыл бұрын
Dear Professor. At 35:40 you talk of the 'trick' to rescale w and b such that MIN |wx + b| = 1 (over all data points). Is it not more accurate to state that we do not rescale w and b for this trick, but rather that we choose b such that our trick works? The outer maximization changes w to minimize the norm and thus the direction of the decision boundary, while our value for b is such that MIN |wx + b| = 1. With direction above I mean that w defines the decision boundary (perpendicular to it) while b can only make the decision boundary move in a parallel direction. I hope I have explained myself clearly. Thank you for your lectures!
@kilianweinberger698 Жыл бұрын
Yes that’s fair.
@kunindsahu9743 жыл бұрын
I have to ask though, do support vector machines still find much application today, since they are outclassed on structured data by ensemble methods and even on unstructured data, deep learning outperforms them.
@imedkhabbouchi21612 жыл бұрын
Very good lecture, many thanks for the explanations as well for the humor :D Could you please share the demo of this class?
@saquibmansoor71463 жыл бұрын
Awesome Lecture.. Sir, the link mentioned for Ben Taskar's notes on your webpage is not working.
@JoaoVitorBRgomes3 жыл бұрын
@33:20 , @kilian Weinberger, why you have to multiply by yi? Wouldn't be enough to solve Wt*xi + b => 0 ?
@kilianweinberger6983 жыл бұрын
So if y=+1 you want wx+b>0 and if y=-1 you want wx+b0. And if your want to make sure they are unambiguously positive and negative (not zero), you enforce yi(Wxi+b)>1.
@JoaoVitorBRgomes3 жыл бұрын
@@kilianweinberger698 is yi*(Wxi + b)=>1 hard margin classification? Also, => 1 was it arbitrarily choosen or it is because we normalized it? Dividing by the largest norm of W?
@kilianweinberger6983 жыл бұрын
Yes, and yes. It is the hard margin formulation. If you were to make the margin C (where C is any positive constant), you could always divide both sides of the inequality by C and obtain an inequality >=1, and a slightly rescaled w and b.
@anunaysanganal4 жыл бұрын
Wouldn't SVM's by default over-fit the data? As we are trying to find a separating hyperplane in infinite dimensions (if we use rbf), we are bound to find a hyperplane that separates the classes perfectly. Hence, aren't we essentially over-fitting the data?
@kilianweinberger6984 жыл бұрын
Yes, and no. You can always get 0% training error (provided all points are unique), but by searching for the maximum margin hyperplane you also regularize your solution (i.e. make it simpler). Allowing slack variables allows you to find a simpler solution (larger margin) at the cost of potentially misclassifying some of the training points. So, setting the regularization hyper-parameter (C or lambda) correctly is crucial.
@anunaysanganal4 жыл бұрын
@@kilianweinberger698 Thank you very much! I get it now.
@MrSirawichj4 жыл бұрын
in margin equation, in denominator how the legnth of w it become wTw without square root
@yasaradeel4 жыл бұрын
wTw is a positive value. Maximising an increasing function's square root is identical to maximising the function itself.
@lvlanson2 жыл бұрын
Dear Mr Weinberger, first of all big thanks for this very good lecture. You explain this very nicely and it is easy to understand. I have a question for 34:41. Why did you drop the root for w^T w? Normally for the norm in the euclidean space we have p=2 for the minkowsky norm, such that we should have a root. Why is it okay to drop it? All the best from Germany :)
@kilianweinberger6982 жыл бұрын
I just squared both sides of the equation. Roots are kind of a pain to deal with. :-)
@YassineFAQIR-b4l10 ай бұрын
Amazing content for free. 😎
@mostafaatallah70014 жыл бұрын
Thank you very much, Sir, for this remarkable lectures I would be grateful if you answer my following question: In the lecture you mentioned that (wTxi+b) must be >= 0, my question is does that imply that it is okay to have points on the decision boundary where wT xi +b = 0 ? should the correct case be that wTxi+b be only greater than >0 as in the Perceptron algorithm? Again thank you very much Sir, you are an awesome teacher
@kilianweinberger6984 жыл бұрын
For the Perceptron this is important to avoid a situation where you perform no update (e.g. if you initialize with w=0, the all-zeros vector). In the case of SVM this is not so critical. If your inputs have w'x+b=0 you would still incur a loss, and the optimization problem would try to move it to the correct side.
@mostafaatallah70014 жыл бұрын
@@kilianweinberger698 Thank you very much
@JoaoVitorBRgomes3 жыл бұрын
@killian weinberger : at circa 6:59 you ask if we have questions... yes. Where the wT came from? You were dealing only with xT, xW and y variables! How tha wT popped up there?
@kilianweinberger6983 жыл бұрын
(Xw-Y)'=w'X'-Y' (where ' is the transpose operator)
@JoaoVitorBRgomes3 жыл бұрын
@@kilianweinberger698 I thought you had wrriten Xw , not : X (dot) w .
@abhishekbalawan68173 жыл бұрын
Sir, @ around 36.00, you mention rescaling w and b so that minimisation function reduces to value 1. But, are we not supposed to scale w outside minimisation function also? I mean, why are we not changing w in (max 1/w(t).w ) simultaneously?
@kilianweinberger6983 жыл бұрын
Yes, and indeed the re-scaling does happen everywhere through the constraint. One way to think about it is that beforehand we could change the orientation of w and the scale of w. Now the scale of w is fixed (through a constraint) but the orientation is still something we are optimizing. So the objective function now has one less degree of freedom. I hope this helps.
@abhishekbalawan68173 жыл бұрын
@@kilianweinberger698 thank you for the reply sir. My understanding is that w is expressed in terms of variables rather than fixed numeric quantity and both it's direction and magnitude are dependent upon those variables which need to be considered. Correct me if I am wrong.
@sonalika24054 жыл бұрын
Sir, can you please provide the link to your matrix cookbook?
Sir What is Integer SVM. Is it possible to get an Integer margin for an svm or weights in integers?
@kilianweinberger6984 жыл бұрын
Not sure, but it is not very popular. ;-)
@bharatbajoria4 жыл бұрын
In norm of d shouldn't there be radical sign with wTw in denominator around 38:00 ?
@SureshKumar-yc5lc4 жыл бұрын
Doesn't matter ...wTw is larger than sqrt(wTw)..so if you minimize wTw you're even minimizing sqrt(wTw)....also wTw has good mathematical properties...convex optimization!!..Hope this helps...!! Cheers from folks @IISc
@bharatbajoria4 жыл бұрын
@@SureshKumar-yc5lc got it, thank you.
@spartacusche4 жыл бұрын
37:00 I don't understand how the min wx+b =1
@kilianweinberger6984 жыл бұрын
Oh, that comes from the constraint and the objective. Assume for contradiction that wx+b=c where c>1. Then you could define a new hyperplane with parameters w/c and b/c, which has a lower objective value ||w/c||^2
@rahulverma-fg1ou2 жыл бұрын
The demo is missing?
@jaydhanwant40723 жыл бұрын
Alright, welcome everybody
@sagar357565 жыл бұрын
Wow
@vatsan164 жыл бұрын
wait so if someone publishes a paper that uses SVM, the patent troll could sue them???
@kilianweinberger6984 жыл бұрын
Actually, I believe the patent has run out a couple of years ago. So you are good to go :-)
@saikrishna-ee9mz5 жыл бұрын
cant it be simplified when X is a square martrix? in computing w in closed form
@ugurkap5 жыл бұрын
Yes you can simplify it if X is square in theory but in reality, that would never happen because it means you have as much features as you have training examples. You can't get a healthy result.
@HuyNguyen-fp7oz4 жыл бұрын
why don't you discuss relation between functional margin and geometric margin?
@kilianweinberger6984 жыл бұрын
Sorry, there is only so much time I can spend on SVMs.