I am pretty convinced that this guy can explain theory of relativity to a toddler. Pure Respect!!
@rajeshs28404 жыл бұрын
Is he seems to be a great prof.. I love him..
@godsgaming91203 жыл бұрын
Studying at the best uni in India and still completely dependent on these lectures. Amazing explanations! Hope you visit IIT Delhi sometime!
@rhngla Жыл бұрын
The subscript of x_i implicitly switches from referring to i-th feature dimension to i-th data sample at the 7:40 mark for the discussion on kernels. Just a note to prevent potential confusion arising from this.
@KulvinderSingh-pm7cr5 жыл бұрын
Kernel trick on the last data set blew my mind... literally !!
@rajeshs28404 жыл бұрын
Your lectures for me, is me as a kid going to candy shop.. I love them..
@mrobjectoriented4 жыл бұрын
The climax at the end is worth the 50 minutes lecture!
@j.adrianriosa.41633 жыл бұрын
I even can't explain how good this explanations are!! Thank you sir!
@81gursimran4 жыл бұрын
Love your lectures! Especially the demos!
@blz1rorudon4 жыл бұрын
Mad respect to this guy. I might consider being his lifelong disciple or some shit. Noicce!!
@venugopalmani27395 жыл бұрын
How I start my day everyday these days : Welcome everybody! Please put away your laptops. You should make a t-shirt or something with that line.
@sandeepreddy62954 жыл бұрын
a one more very good lecture from the playlist
@rishabhkumar-qs3jb2 жыл бұрын
Best course of machine learning I ever watched. Amazing...:)
@shashihnt3 жыл бұрын
Extra claps for the demos they are so cool.
@jiahao27095 жыл бұрын
I really like your course,very interesting
@jayye78475 жыл бұрын
Amazing! Good explanation! Very helpful! Great thanks from China.
@hassamsheikh5 жыл бұрын
I wonder how lucky the grad students are whose adviser is this professor.
@nabilakdim27672 жыл бұрын
Intuition about kernels : a good Kernel says two different points are "similar" in the attribute space when their labels are "similar" in the label space
@Tyokok6 ай бұрын
one question: around 17:33 "this is only true for linear classifier", but from the induction proof it can also apply to linear regression, why we never see linear regression show w = sum( alpha_i * x_i)? Thank you so much for the great great lecture!
@chanwoochun76944 жыл бұрын
I should have taken this course and intro to wine before graduating...
@anunaysanganal3 жыл бұрын
Great Lecture! I just don't understand why we are using Linear Regression for Classification? Can we use sigmoid instead, as it also has the wTx component, so we can kernelize that as well?
@kilianweinberger6983 жыл бұрын
Just because it is simple. It is not ideal, but also not terrible. But yes, typically you would use the logistic loss, which makes more sense for classification.
@sudhanshuvashisht89603 жыл бұрын
Though the proof by induction to prove that the W vector can be expressed as a linear combination of all input vectors makes sense, the other point of view is confusing me. Here is the other point of view: Say I have 2-d training data with only 2 training points and I map them to the three-dimensional space using some kernel function. Now since I have got only two data points (vectors) in three dimensions, expressing W as a linear combination of these vectors imply the span of W would only be limited to the plane formed by these two vectors which seem to reduce/defy the purpose of mapping to higher-dimensional space. The same example can be made pragmatic when say we have 10k training points and we are mapping each of them to a million-dimensional space.
@kilianweinberger6983 жыл бұрын
Yes, good point. That’s why e.g. SVM with RBF kernel are often referred to as non-parametric. They become more powerful (the number of parameters and their expressive power increases) as you obtain more training data.
@sudhanshuvashisht89603 жыл бұрын
@@kilianweinberger698 Thanks Professor. This brings me to the next question: In my example (2 training points in 3-d space), does this mean there might be a better solution (in terms of lower loss) that is not in the plane spanned by those 2 training points?
@KulvinderSingh-pm7cr5 жыл бұрын
Infinite dimensions made me remember Dr. strange !!
@rahuldeora11204 жыл бұрын
Great lecture! But I had a question: In the induction proofat 15:14, as is said we can initialise any way, what if I initialise w to such a value that it cannot be written as a linear combination of the x's. Then every iteration, I will add a linear combination of x's but it still won't be in total a linear combination of the x's. Will this not dispove the induction for some initlizations?
@kilianweinberger6984 жыл бұрын
Yes, good catch. If the x’s do not span the full space (i.e. n
@rahuldeora11204 жыл бұрын
@@kilianweinberger698 Thank you for taking time to reply. When you saw 'in practice these scenarios are avoided by adding a little bit of l2-regularization. " how does l2 regularization make the weight vector a linear combination of the input data when the input does not span the space?
@rahuldeora58154 жыл бұрын
@@rahuldeora1120 Do reply
@jiviteshsharma10214 жыл бұрын
@@rahuldeora1120 it doesnt make it so it spans the space, but since we are enclosed in a ball the Regularization value is the best estimate for all the global minima present, thus kind of making it seems as though the data spans the regularized space- This is what i understood hope im not wrong professor
@bryanr29782 жыл бұрын
Hi Prof. Killian, around 34:53 Q&A you said that we could just set zero weights to the features that we don't care about. I was a bit confused how you could potentailly do this since you only have one alpha i for the i-th observation. If you assign zero to these features, it would be zero for all the features of the Xi. Am I wrong?
@71sephiroth4 жыл бұрын
whenever there is a breakthrough in ML there is always that exp(x) sneaking around somehow (Boosting, tSNE, RBF...)
@kilianweinberger6984 жыл бұрын
Good point ... maybe in the future we should start all papers with “Take exp(x) ...”. ;-)
@prattzencodes72213 жыл бұрын
Started with Gauss and his bell shaped distribution, I guess? 😏😏
@dimitriognibene89454 жыл бұрын
@kiliam weinberger the 2 hd in noth Korea feel very alone
@KK-mt4km4 жыл бұрын
respect
@padmapriyapugazhendhi74654 жыл бұрын
How can you say the gradient is a linear combination of inputs??
@kilianweinberger6984 жыл бұрын
Not for every loss - but for many of them. E.g. for the squared loss, take a look at the gradient derivation in the notes. The gradient consists of a sum of terms \sum_i gamma_i*x_i .
@padmapriyapugazhendhi74654 жыл бұрын
@@kilianweinberger698 I am sorry if its a silly doubt. I thought that a linear combination means the coefficients of x_i should be constant independent of x_i. When gamma itself depends on x_i, isn't it then a non-linear combination?
@kilianweinberger6984 жыл бұрын
No, it is still linear. The gradient being a linear combinations of the inputs just means that the gradient always lies in the space spanned by the input vectors. If the coefficients are a function of x_i or not doesn’t matter in this particular context. Hope this helps. (Btw, you are not alone, a lot of students find that confusing ...)
@padmapriyapugazhendhi74654 жыл бұрын
Thank you for your patient replies. Just one more intriguing question. Is x_i a vector? so that I can write x_i as (x_i1, xi2... x_in)
@dolphinwhale62105 жыл бұрын
these lectures are for undergraduate or graduate program??
@kilianweinberger6985 жыл бұрын
Both, but the class typically has more undergrads than graduate students.
@mfavier2 жыл бұрын
About the inductive proof around 15:14, I think we should also specify that, because the linear space generated by the inputs is closed, then the sequence of w_i converges to a w that is in the same linear space. Otherwise we are merely saying that w is in the boundary of that linear space
@raydex72594 жыл бұрын
(Question I ask myself after 11 minutes): In this Video a linear !Classifier! is stated as an example and Squared Loss is used. Squared loss does not make much sense in classification or am I wrong?
@kilianweinberger6984 жыл бұрын
Well, it is not totally crazy. In practice people still use the squared loss often for classification, just because it is so easy to implement and comes with a closed form solution. But you are right, if you want top performance a logistic loss makes more sense - simply because if the classifier is very confident about a sample it can give it a very large or very negative inner-product, whereas with a squared loss it is trying to hit the label exactly (e.g. +1 or -1, and a +5 would actually be penalized).