Lecture 3 "k-nearest neighbors" -Cornell CS4780 SP17

Рет қаралды 74,409

Күн бұрын

Пікірлер: 118

@alexenax1109 5 жыл бұрын

The lecture starts at 2:00! Amazing explanation on how to pick the right algorithm for your dataset 27:10 otherwise cause bad ML choices! The lecture starts to approach the k-NN algorithm at 36:00 (before it's about the training, validation, and test set and about the minimazing the expected error).

@omalve9454 Жыл бұрын

I raise my hand unconciously when you say "Raise your hand if you understood.". Best lectures ever!

@ehfo 5 жыл бұрын

your lectures aren't boring at all!!!

@sachinpaul2111 3 жыл бұрын

“When you watch these from bed , they get boring”: sorry Professor, I’m rewatching this class the fifth time and it’s NEVER bored me Every time I rewatch , I get new appreciation for a new subtlety of the things you say. It’s gotten to the point that I kinda imitate you when I’m interviewing with companies. In my interviews, it kinda takes the pressure off when I just think of it as your class and me explaining what’s taught in class

@kilianweinberger698 3 жыл бұрын

Haha, thanks, and good luck with your interviews!

@jachawkvr 4 жыл бұрын

I took a grad-level class in machine learning and got an A, but only now do I realize how crappy my professor was and how little I actually understood. I am really glad I am able to view these lectures for free. Thank you, Dr.Weinberger!

@gurdeeepsinghs 3 жыл бұрын

at 39 minutes Prof. Weinberger said "raise your hand if that makes sense" I actually did !! super high quality content here. That's the level of engagement being created across the world. Respect from India !!

@RENGcast 5 жыл бұрын

you sir, are GOAT of ML

@sekfook97 3 жыл бұрын

This lecturer has tremendous charisma!

@abimaeldominguez4126 4 жыл бұрын

These videos help people from other countries which for some reason can't have access to a get a degree in machine learning. , ...In my case now I know exactly why I should not split the data randomly with the datasets that I use at my work, thanks so much.

@juliocardenas4485 2 жыл бұрын

Absolutely!! I have shared summaries of these lectures translated to Spanish. I live in the US but grew up in México

@randajer 9 ай бұрын

Want to get started on some machine learning studying and this is great! Easy to watch while performing menial tasks at work and I can review anything I have questions on at home. Having the notes available to read from ahead of time and then look at during and after the video is tremendous for understanding, thank you very much for providing everyone with such a great font of knowledge.

@jorgebetancourt2610 Жыл бұрын

Professor Weinberger, I have taken two graduate-level courses in ML, and I believed I had an understanding until I started your course at eCornell. Man, Build your University! I’m speechless about the level of quality of your lectures! Thank you!

@77styl 3 жыл бұрын

I just finished my first semester studying Data Science and today was supposed to be my first day of holidays, yet I have already watched three of the lectures and still going on. I knew how to apply some of the algorithms in R, but knowing the intuition behind them makes it much more clearer. Thank you professor Weinberger for the amazing content.

@meenakshisarkar7529 4 жыл бұрын

Welcome to 2020 where your entire college semester is done from your bedroom. :D

@TrentTube 5 жыл бұрын

Thank you for speaking to the assumptions associated with different models and the chaos of data in the real world.

@Karim-nq1be 8 ай бұрын

I was looking for an answer that was quite technical in another video but I got hooked. Thank you so much for providing such great knowledge.

@nolancao2878 4 жыл бұрын

thanks for the lessons and especially providing coursework, notes, and exams

@Oscar-ip3ys 2 жыл бұрын

Thanks for the lecture! The party game example is really insightful and one that you for sure remember in the future. I also appreciate the jokes a lot, they make the lectures highly engaging!

@luq2337 4 жыл бұрын

My uncle recommended me this channel. Very very very great class!!!

@pranavhegde98 2 жыл бұрын

This series of lectures brought back my love of learning

@vieacademy6235 4 жыл бұрын

Many thanks for the systematic presentation of ML. You make it so easy to follow the subject.

@anmolagarwal999 Жыл бұрын

Excellent intuition on why validation sets are needed: 13:20

@vaibhavgupta550 5 жыл бұрын

Amazing lectures sir.. Loved them.. This was just the thing I was looking for and not able to find earlier.

@satviktripathi6601 4 жыл бұрын

one of my favorite lectures on ML

@rachel2046 3 жыл бұрын

I honestly have more respect for Cornell because of Professor Weinberger's lectures.

@maliozers 5 жыл бұрын

Who has access to attend this class and prefer to watch online, really ???

@manjulbalayar9704 Жыл бұрын

Hello Prof Weinberger, I am really enjoying your lectures a lot. Wish I was there in-person in this Fall or next Spring. I was wondering if us viewers online could have access to some older homeworks or assignments for practice. That would be the best! Thanks!

@alexenax1109 5 жыл бұрын

Thanks from Italy!!!

@michaelmellinger2324 2 жыл бұрын

2:00 Lecture begins - Recap of last lecture 7:50 Can’t split train/test any way we want 11:35 Very often people split train/validation/test. Take best on validation set 24:20 Question for class. as n goes to infinity… 26:15 Weak law of Large Numbers. The average of a random variable becomes the expected value in the limit 27:30 How to find the hypothesis class H. The Party Game. 36:00 k-Nearest Neighbors 41:45 Only as good as its distance metric

@habeebijaz5907 5 ай бұрын

He is Hermann Minkowski and was Einstein's teacher. Minkowski metric is the metric of flat space time and forms the backbone of special relativity. The ideas developed by Minkowski were later extended by Einstein to develop the theory of general relativity.

@janbolmer4965 5 жыл бұрын

Thanks for uploading these Lectures!

@VIVEKKUMAR-kv8si Жыл бұрын

Was there a paper on the medical problem with only 11 samples? I was doing a sample study for small sample size problems and was curious what sort of algorithm were used over such a small sized dataset.

@abhinav9561 4 жыл бұрын

KNN starts at 35:57

@SumitSharma-pu6yi 2 жыл бұрын

Hello Dr Kilian, Greetings from India ! I loved your videos. Could you please take up some modules/lectures specialized in deep learning. Will go binge watching on that too. 😃 Best, Sumit

@geethasaikrishna8286 4 жыл бұрын

Once again thanks Prof.Kilian Weinberger for the amazing lecture, one question in the lecture notes: In the 1-NN Convergence Proof section it is mentioned as, "Bad news: We are cursed!!" & the convergence proof is for n tends to infinity but after watching lecture cursed problem occurs when dimensions(d) tends to infinity/large . So did I misinterpret the statement of cursed when n tends to infinity

@adityabhardwaj408 5 жыл бұрын

This is a great way of letting seekers study. However, is there a way to add questions raised by students in a link. The recording seems to have a noise which refrains from hearing the questions well. Adding questions will add more value and we will be able to relate our questions to theirs & we will have less doubts!

@whyitdoesmatter2814 4 жыл бұрын

Thanks a lot for your enthusiasm! Coming back to the discussion you had early on concerning splitting the datasets into training, cross validation and test set...My understanding is that for a given dataset D with m values, the first. Step is to train the algorithm on the training set to obtain a parameter, evaluated each parameter on the cross validation one and pick up the smallest one and train the lowest one on a new training set (training and cross validation set) and finally, test it on the test set. Is that correct? Also, concerning the knn algorithm, do you obtain the k parameter on the training or the cross validation set? I am a bit confused. Best regard, Axel from Norway.

@kilianweinberger698 4 жыл бұрын

Yes, if by “smallest one” you mean the one that leads to the smallest error. For kNN you can even compute the leave-one-out error i.e. you go through each training sample, pretend it was a test sample, and check if you were to classify it correctly with k=1,3,5,7,..,K. After you have done this for the whole set, you pick the k that lead to the fewest misclassifications (and in case of a tie the smallest k). Hope this helps.

@anmolmonga1933 4 жыл бұрын

@@kilianweinberger698 Can you do hyperparameter tuning on the training-validation test for multiple algorithms like SVM and Random Forest and then compare results on the test set or the comparing the output of multiple model should also be done on the training-validation set. If you are reproducing it for a paper.

@bharatbajoria 4 жыл бұрын

knn starts at 36:02

@VIVEKKUMAR-kv8si Жыл бұрын

You said if it's iid data, split it uniformly at random. What should have been the correct approach for the spam filter case then? Is it iid? I think not since some mails might be similar to others. Thank you.

@kilianweinberger698 Жыл бұрын

You have to split by time. Let's say you have 4 weeks worth of data, put the first 3 weeks into training and the last week into validation. This way you simulate the real application case, namely that you train on past data to predict the labels of future data.

@maddai1764 5 жыл бұрын

Thx a lot for this nice course. i think at 48:06 it's just 32 and not 32 to the power of 32. Am i missing something Dear @kilian

@kilianweinberger698 5 жыл бұрын

yep, you are right. well spotted :-)

@homeroni 4 жыл бұрын

"Choosing between your mama and papa or something, what are you gonna do? I like them both."

@harshavardhanasrinivasan3125 2 жыл бұрын

What is the programming choice for writing the assignments and project

@doyourealise 2 жыл бұрын

hello sir :) How are you? Hope you are doing well. This si 2022 and nothing can beat your ml lectures. Watching it again :)

@WellItsNotTough 5 ай бұрын

We have a quiz question in lecture notes : "How does k affect the classifier? What happens if k = n? What happens if k = 1?" I do not think it is discussed in lectures. In my opinion, k is the only hyperparameter in this algorithm. For k = n, we are taking mode of the entire dataset labels as the output for test point, where as for k =1 , it will be assigned label that of the closest nearest neighbor. I have a doubt here, as we are using distance metric, what if we have 2 points(for simplicity) that are at equal distance to test point and have different labels. What happens in that case for k = 1? Similarly, for k = n, if we have equal proportion of binary class labels, how does mode works in that case?

@kilianweinberger698 3 ай бұрын

Yes, for k=n it is the mode and k=1 is the nearest neighbor. If the label assignment is a draw (e.g. two points are equidistant) a common option is break ties randomly.

@WellItsNotTough 3 ай бұрын

@@kilianweinberger698 Thank you for the answer Prof. Weinberger and for this amazing series as well.!

@danielvillarraga2225 4 жыл бұрын

Professor Kilian, I am coming to cornell to enroll in a Ph.d. on civil engineering this fall. I have watched some of your lectures and find them really engaging. I have some understanding of most of the topics on this course but I would like to take some classes on ML. Would you reccomend me to enroll in this course or any other? is this a grad course?

@kilianweinberger698 4 жыл бұрын

Welcome to Cornell! This is a graduate course, offered every fall. It’s probably a good choice if you want to learn the basics in ML. It also “unlocks” several more specialized courses.

@danielvillarraga2225 4 жыл бұрын

@@kilianweinberger698 thank you, professor. I will try to enroll this fall.

@vamsikrishnaj4429 4 жыл бұрын

@@kilianweinberger698 Is this lecture Series along with implementing these with python libraries is enough . And so i can dive into Deep learning. Reply please!

@bharatbajoria 3 жыл бұрын

Sir, u mentioned about 11 data points on a case @23:00 , how about we try Bootstrapping on it and then find best Hypothesis class and function subsequently?

@ChandraveshChaudhari 3 жыл бұрын

Guys where is video lecture for 1-NN Convergence Proof Cover and Hart 1967[1]: As n→∞, the 1-NN error is no more than twice the error of the Bayes Optimal classifier.

@nuoalei1626 3 жыл бұрын

I want that too.

@patrikpersson6059 4 жыл бұрын

Love your lectures! You briefly mentioned metric learning in regards to finding a good distance function, do you know of any good primers or general reading advice on this topic?

@kilianweinberger698 4 жыл бұрын

Maybe read one of my first papers on Large Margin Nearest Neighbors ( papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification )

@muratcan__22 5 жыл бұрын

perfect courses sir, thanks.

@KOSem-ke9jn 5 жыл бұрын

Hi Professor, thanks for making these videos publicly available. In your formalisation of the algorithm you define a test point as x (presumably a vector), but in your specification of the conditions for points excluded from the k-NN you introduce y’ and y’’ which, to me, either seem redundant if x' and x'' are vectors or have not been consistently applied if a point is now a tuple (x,y) in which case the distance function should be applied to 2 tuples. Am I missing something?

@bluejimmy168 5 жыл бұрын

I also dont understand that part. At 39:45 he uses (x',y'), Im not sure if he meant ordered pair or two vectors named x', y'. Is there a difference between a vector and a tuple?

@KOSem-ke9jn 5 жыл бұрын

@@bluejimmy168 Hi, yes the notation is a bit confusing in my opinion. I think there is a technical difference between a vector and a tuple; what I meant above was whether x represents the entire vector object or it represents a value in one co-ordinate in a 2 co-ordinate vector representation which I call a tuple - an ordered pair is a tuple, I think.

@kilianweinberger698 5 жыл бұрын

sorry, yes, I was a little sloppy there. :-/ I hope you can figure it out from the context.

@KOSem-ke9jn 5 жыл бұрын

@@kilianweinberger698 Yes it's clear - just wanted to confirm that I hadn't missed anything. Your lectures are lucid on the whole. Many thanks for sharing.

@subhanali4535 5 жыл бұрын

Sir, I wanna learn Deep learning, can i skip the rest of classes, I watched first 3 classes, guide me please

@kilianweinberger698 5 жыл бұрын

hmm, you may need to be patient. I would recommend you understand logistic regression and gradient descent. If you cannot wait, skip after that, but you are missing out on some important concepts.

@subhanali4535 5 жыл бұрын

@@kilianweinberger698 Thank you so much Sir

@abhinavmishra9401 4 жыл бұрын

@Kilian Weinberger I have the same situation. But, I can go farther than gradient descent. How far do your recommend before jumping to Deep Learning so that the Loss in understanding DL is minimized?

@kevinchittilapilly8221 4 жыл бұрын

Hi Sir. Can you please guide me as to where I should study the maths required for ML. I did a few courses but it only covered basic calculas and stuff. I had no clue about the weak law of large numbers u talked about at 26:50. Please help

@kilianweinberger698 4 жыл бұрын

Maybe check out Khan Academy www.khanacademy.org/ It is pretty good.

@kevinchittilapilly8221 4 жыл бұрын

Thanks a lot sir

@rytonmoffatanalytica 3 жыл бұрын

@@kilianweinberger698 thank you!!!!!!!!!!!!!!!!

@sairajrege3340 4 жыл бұрын

Is the algorithm only affected by euclidean distance or the number of classified points also matter?

@minhtamnguyen4842 5 жыл бұрын

just so brilliant

@ivanehsan2683 3 жыл бұрын

Is the D(validation) can be also define as a beta test for the h(x) ?

@prathikshaav9461 5 жыл бұрын

is there link to homework, exam and solutions for the same... it would be helpful

@kilianweinberger698 5 жыл бұрын

Past 4780 exams are here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian past Exams.zip?dl=0 Past 4780 Homeworks are here: www.dropbox.com/s/tbxnjzk5w67u0sp/Homeworks.zip?dl=0

@saikumartadi8494 4 жыл бұрын

@@kilianweinberger698 sir it will be very helpful if you share the assignments too because from your demonstrations i see they are much different than the general ones we get in other colleges and we can learn a lot from them.i learn a lot from your lectures . every video i saw is the best i have watched for that topic

@Dendus90 5 жыл бұрын

Dear Profesor, This is the best ML lecture I've ever seen. Are you going to provide more that kind of materials? PS. Are you looking for any postdocs? ;)

@kilianweinberger698 5 жыл бұрын

Thanks! Unfortunately not at the moment.

@antokay5530 5 жыл бұрын

Professor, in regards to your spam classifier example, instead of splitting train and test data by time, what if you eliminated all duplicate emails prior to splitting and training? would that work in this case? thank you and thanks for posting these!

@kilianweinberger698 5 жыл бұрын

The problem is that there may be new spam types that appear. E.g. imagine on Saturday spammers suddenly start sending out "lottery spam". Even if the emails are not identical, your spam filter would pick up on the word "lottery" as very spammy - but this is unrealistic, as in the real world you wouldn't have seen any such spam before. Hope this makes sense.

@KulvinderSingh-pm7cr 6 жыл бұрын

Thanks professor !!!

@waihan6772 9 ай бұрын

great lecture!

@ting-yuhsu4229 4 жыл бұрын

YOU ARE AWESOME!

@hrushikeshvaidya9466 5 жыл бұрын

Just to make sure, the x and z in the distance function (at 42:50) are the rth dimensions of the position vectors of the two points being considered, right?

@kilianweinberger698 5 жыл бұрын

x and z are the two vectors and [x]_r is the r-th dimension of vector x. Hope this helps.

@hrushikeshvaidya9466 5 жыл бұрын

@@kilianweinberger698 Oh, I get it now. Thanks for the clarification, professor! I look forward to coming to Cornell this fall

@soulwreckedyouth877 4 жыл бұрын

Thanks from Germany

@marcogelsomini7655 2 жыл бұрын

He is the best

@adiflorense1477 4 жыл бұрын

14:46 What is the difference between a validation dataset and a testing dataset? I think they are the same

@kishorekhaturia7066 3 жыл бұрын

No, validation set is part of training set to build the model, and test set is used to analyse how well your model generalize,

@vikramm4967 3 жыл бұрын

Is it possible to get the questions of test?

@vivekmittal2290 5 жыл бұрын

Sir, Where I can find the project files.

@jasongomez6783 4 жыл бұрын

Now in 2020 all classes are online :(. I am an undergrad and I want to learn about machine learning

@yuniyunhaf5767 5 жыл бұрын

this is amazing, thank u sir

@gregmakov2680 2 жыл бұрын

the reason that "most people do not do right actually" is not by themself, but the gap btw theory model and practical situations is not described clearly in almost all of lectures in all classes in the world!!!! this gap makes students confused heavy super a lot including me :D:D:DD:D

@semrana1986 4 жыл бұрын

After a certain time the students were trying to buy some more time by stalling the Professor from moving on... been there, done that

@hello-pd7tc 4 жыл бұрын

Day 3 ✅

@mohamedanwar3867 Жыл бұрын

Thank you sir

@subhasdh2446 2 жыл бұрын

My normal speed for most youtube lectures is 1.5X and sometimes 1.75X. I think you're speaking a bit fast cause 1.5X sounds way too fast and I had to switch to 1.25.