K Nearest Neighbor classification with Intuition and practical solution

Рет қаралды 157,809

Krish Naik

Күн бұрын

Пікірлер

@programmingpurpose1329 3 жыл бұрын

This explaination is one of the most precise explanation that I have seen on Internet.

@nelsondelarosa5490 Жыл бұрын

This is in fact well explained, defining every term, and assuming no previous knowledge. Thanks so much!

@sivareddynagireddy56 2 жыл бұрын

No words about u r explanation sir,simple lucid way explanation !!!!!

@sandipansarkar9211 4 жыл бұрын

Cool. Also finished my practice in Jupyter notebook. Thanks

@vaibhavchaudhary1569 3 жыл бұрын

Feature scaling (StandardScalar) should be applied after train test split. As it will not lead to information leak.

@datascience3046 2 жыл бұрын

TRUE

@kiruthigan2014 4 жыл бұрын

Loved Ur videos and Ur taste in music..kadhal vanthale in the bookmark 😂❤️🔥

@surendar3543 2 жыл бұрын

Neenga tamilah

@Tapsthequant 5 жыл бұрын

Thank you, you asked a question I had in my head, looking forward to applying the suggested solution, about imbalanced dataset...

@TechnoSparkBigData 5 жыл бұрын

Sir you are great inspiration to me. Thanks a lot for making every complex problem simpler.

@appiahdennis2725 3 жыл бұрын

Respect Krish Naik

@ijeffking 5 жыл бұрын

Very well explained again. Thank you so much.

@shubhamsongire6712 3 жыл бұрын

Thank you so much Krish for this great playlist. You are gem

@903vishnu 3 жыл бұрын

really its good... but you mentioned K=150, as per my knowledge we are not supposed to take even number. there might be chance of equal number of classes got selected nearest neighbor... algorithm may not be able to estimate the class for new record...

@deshduniya360scan7 3 жыл бұрын

Explain like a pro,thank you

@sazeebulbashar5686 2 жыл бұрын

Thank You Naik...... This is a very helpful video

@MaanVihaan 4 жыл бұрын

Very nice sir ur explanation and coding technique is very nice.... I am new learner of data science please keep uploading such video and new techniques of different kinds of algorithms which help us make easy to understand to deal with different kinds of datasets.

@padduchennamsetti6516 3 ай бұрын

congratulations krish on 1million subscribers🥳

@DeepakSharma-od5ym 4 жыл бұрын

error_rate = [] for i in (1,40): knn = KNeighborsClassifier(n_neighbors=i) knn.fit(X_train,y_train) pred_i = knn.predict(X_test) error_rate.append(np.mean(pred_i != y_test)) plt.figure(figsize = (10,6)) plt.plot(range(1,40), error_rate, color = 'blue', linstyle = 'dashed', marker = 'o' ) plt.xlabel('k') plt.ylabel('error rate') My above code giving error "x and y must have same first dimension, but have shapes (39,) and (2,)" Please suggest

@shreeyajoshi9771 2 жыл бұрын

Thank you very much for this video. Helped a lot!

@Kishor-ai Жыл бұрын

Thanks for this video krish naik sir🤩

@codyphillippi8831 3 жыл бұрын

This is awesome! Thank you so much. I am working on a project at work for lead segmentation to help us find our "ideal lead" for our sales reps with a lot of very messy data. This is a great starting point. Quick question (might be a loaded question ha) - after we find these clusters, how do we go about seeing the "cluster profiles"? Or what all data points make up these clusters (in categorical form)

@joeljoseph26 11 ай бұрын

use any visualization library to see the clustering.

@shyam15287 5 жыл бұрын

All the best Superb Explanation you are a superb resource u will reach great heights continue ur good work

@scifimoviesinparts3837 3 жыл бұрын

At 18:52, you said larger value of K will lead to overfitting, which is not true. Smaller value of K leads to overfitting. I think, if there are 2 K-values giving same error, we choose the one that is bigger because it is less impacted by outliers.

@mahikhan5716 3 жыл бұрын

how can we choose optimal value of k by KNN ?

@joeljoseph26 11 ай бұрын

Minkowski distance = (Manhattan) and (Euclidean)

@ManashreeKorgaonkar Жыл бұрын

Thank you so much for sharing this information. I'd just one doubt sir if we will scale before train_test_split wont it be lead to data leakage? as during scaling process during fit when it consider average of all the data points it also take the value of test data set so my model will already have some hint regarding it??

@rambaldotra2221 3 жыл бұрын

Grateful Sir ✨✨Thanks A lot.

@indrajitbanerjee5131 3 жыл бұрын

Nicely explained.

@asawanted 3 жыл бұрын

What if we choose a K value and hit a local optima? How would we know if I should stop at that K value or proceed to a higher value in search of global optima?

@sandipansarkar9211 4 жыл бұрын

great explanation Krish.

@abdelrhmandameen2215 3 жыл бұрын

Great work, thank you

@louerleseigneur4532 3 жыл бұрын

Thanks Krish

@adhvaithstudio6412 3 жыл бұрын

can you exlplain how hyper parameters will helps in what scenarious

@krish3486 Жыл бұрын

Sir why there we check only 1 to 40 neighbours only in the for loop

@vignesh7687 3 жыл бұрын

Sooper Explanation Krish. I have a doubt here.. When do we need to use MinMaxScaler() and when do we use StandardScaler()? Is there any difference? or we have to try using both and see which gives better results? Please clarify

@ankurkaiser 11 ай бұрын

Hope this answer finds you well, MinMaxScaler() an StandardScaler() are basically the same standard process except for Normalization data does't follow Gaussian Distribution and for Standardization it should. Normalization is used with models like KNN and Neutral Networks. It rescales the data between 0 to 1, so if your data doesnt't follow GD and you data ppoints are basically closer to 0/1 or if your business requirements are to normalize the data you go with MinMaxScaler(), else in general we use StandardScaler(), and its fast and easier to implement.

@chaithanyar9961 5 жыл бұрын

Hello sir , will this code work in tensor flow?? any changes to be made if I want excecute it in tf

@laxmiagarwal3285 4 жыл бұрын

This is very nice video.. But I'm having one doubt..what value u are taking for calculating the mean of error rate as prediceted values are in terms of 0 and 1

@mdmynuddin1888 3 жыл бұрын

if both category same neighbor point , than which category belongs to new data point?

@yathishl9973 4 жыл бұрын

Hi Krish, you are really amazing. I learn many things from you. I have a doubt, what measures should I take if the error rate increases with K-Value, please advice

@KnowledgeAmplifier1 4 жыл бұрын

Then you should decrease that k value, too small k value leads to overfitting, too large k value leads to underfitting , you have to wisely choose some middle value ☺️so that both bias and variance become as less as possible

@adipurnomo5683 3 жыл бұрын

If K too small that Will sensitive to outlier, if K too large that other class Will be included

@aditisrivastava7079 5 жыл бұрын

Very well explained

@princessazula99 2 жыл бұрын

for my assignment i am not allowed to use import packages for knn but I have to write it myself. do you have a code without the imported knn method?

@ramu7762 3 жыл бұрын

spot on. thank you.

@colabwork1910 2 жыл бұрын

Awesome

@shaz-z506 4 жыл бұрын

Hi Krish, Just wanna verify since you've said at 5:10 that model is ready, but KNN is instance-based learning with no parameter assumption then I don't think so that it creates any model out of the algorithm. Please let me know I'm wrong as I need some clarity.

@devinpython5555 4 жыл бұрын

Could you please explain to me why fit and transform is done for the x values (in the above example leaving target column rest data is x values)

@vishalrai2859 4 жыл бұрын

yes i also want to know

@adipurnomo5683 3 жыл бұрын

Because transform it for scale input data.

@shaz-z506 5 жыл бұрын

Hi Krish, In what scenario we'll use manhattan over euclidean.

@madeye1258 4 жыл бұрын

5.03 , if we are classifying the points based on the number of points next to it, then why we need to calculate the distance in step 2

@adipurnomo5683 3 жыл бұрын

Because calculate distance purpose is to sort value training data point before voting based K value.

@manusharma8527 5 жыл бұрын

i am not getting any any Classified Data csv file on Keggal.Please can you tell me the real name of that csv file

@birendrasingh7133 3 жыл бұрын

Awesome 😁

@makrandrastogi5588 3 жыл бұрын

can anybody tell why in most of the cases we use euclidian distance and not manhattan distance ?

@parammehta3559 3 жыл бұрын

Sir is it normal that sometimes as the value of n_neighbors is increasing the error rate is also increasing?

@vishalaaa1 4 жыл бұрын

Thank you

@Anubhav_Rajput07007 3 жыл бұрын

#Hi Krish, hope you are doing well. i trying to find the best value for K. but the code is not execute.. its running last 20 mint.

@techtalki 3 жыл бұрын

It will check all the cases of 'K'. If you want to speed up choose the less value of K or a smaller dataset.

@vibhutigoyal769 3 жыл бұрын

Is knn non- linear algorithm???

@sunilkumarkatta9062 3 жыл бұрын

How we will get error value to calculate accurate k value😅

@uchuynguyen9927 4 жыл бұрын

where you taking np.mean(pred_i != y_test), i think it should be pred_i = knn.predict(y_test) so then we will compare the predict y_test to actual y_test, then we''ll find the errors. If i wrong can somebody explain, thank you!

@manikaransingh3234 4 жыл бұрын

No, I'm sorry but you're not right! actually, pred_i is already predicted values with knn model (what you say it should be, its already done in the line above) There is nothing like finding error because it is a classification problem, not a regression one. suppose, pred_i=[1,1,1,1,0,1,0,0,0,1] test_i= [1,1,1,1,1,1,0,1,0,1] pred_i != test_i will result in [f,f,f,f,t,f,f,t,f,f] f= False, t=True thenn np.mean will take mean of true values which in this case will be 0.2(The error). I hope you get it

@SportsKiCharcha 3 жыл бұрын

@@manikaransingh3234 mean of true values mean??

@SportsKiCharcha 3 жыл бұрын

how is it 0.2 ??

@manikaransingh3234 3 жыл бұрын

@@SportsKiCharcha mean of true values is the number of True values divided by the sample. The result has 2 True values out of 10. 2/10 = 0.2

@SportsKiCharcha 3 жыл бұрын

Ok got it..thanks

@shayanshafiqahmad 5 жыл бұрын

What is the reason for taking pred_i !=y_test?

@shivaprakashranga8688 4 жыл бұрын

Pred_i value contains all the prediction values (like 1,0,1,0,0,1...) upon y_test(1,0,0,1,1,...) when K=1, pred_i !=y_test takes the value which is not predicted correctly(error) . No need to take correct predicted values. ex: out of 100 data points 60 not predicted correctly wrt y_test so these 60 data points we calculate mean. This will be continue for K=2,3.. upto 40. which ever having low mean value that we consider for K (elbow method)

@im_joshila_aj 4 жыл бұрын

So this pred_i! =y_test will return a true /false value right? In the form of 0 and 1 and then mean will be calculated?

@weekendresearcher 4 жыл бұрын

So, shud K always be an Odd number?

@KnowledgeAmplifier1 4 жыл бұрын

If you choose k value odd ,then there is more probability that tie will not occur, but still there are tie breaker available , so that we can have flexibility in choosing the k value☺️like sometime we consider weighted Nearest Neighbours or use the class with the nearest neighbor among tied groups, sometime we use a random tiebreaker among tied groups etc☺️✌️

@sriramswar 5 жыл бұрын

Hi Krish, unable to open ipynb file in Jupiter noteboox. Getting the below error: Error loading notebook Unreadable Notebook: C:\Users\Srira\01_K_Nearest_Neighbors.ipynb NotJSONError('Notebook does not appear to be JSON: \'\ \

@krishnaik06 5 жыл бұрын

Dear Sriram I am able to open the ipynb file. Please use the jupyter notebook to open the file

@sriramswar 5 жыл бұрын

@@krishnaik06 Hi Krish, I used Jupyter Notebook only. Not sure, if there is a problem at my end. Also, a suggestion! It would be better if random_state parameter is used in the code/tutorial so that everyone gets consistent results. I got different results when I tried the same code and I was confused for a moment and then understood the reason. Others may get confused, so just giving a suggestion!

@krishnaik06 5 жыл бұрын

Then probably there may be a problem with jupyter notebook file

@tagheuer001 2 жыл бұрын

Why do repeat phrases so many times?

@sensei-guide 5 жыл бұрын

As my k value increase my error rate also increasing bro

@ahmedjyad 4 жыл бұрын

It's a normal outcome and common example of overfitting. Basically, if you k value is too high, you risk the chance of having an algorithm that just outputs the class with the highest occurrence.

@yathishl9973 4 жыл бұрын

@@ahmedjyad Hi, thanks for your input, please advice how to correct it

@ahmedjyad 4 жыл бұрын

@@yathishl9973 Changing your k value is an example of hyperparamter tuning, which is a process to find the best parameter that produces the best classification model. You shouldn't really have a very high k value because that would result in over-fitting. So basically you getting higher error as u increase the k value is basically correct itself as it is expected. I hope you understand what I am trying to say. If not feel free to reach out to me.

@dragolov 3 жыл бұрын

These are 2 musical (jazz) solos generated using K Nearest Neighbor classifier: kzbin.info/www/bejne/sKWWoI1nipp0etE kzbin.info/www/bejne/iZnIpa2VaLCKodU

@ArunKumar-yb2jn 3 жыл бұрын

Krish - This seems to be a repeat of over a thousand similar videos on the internet, barring a few. What new insight have you brought here? You didn't define what that Y and X were and simply jumped into drawing X marks on the chart. Why do we need intuition of KNN? Why can't we really understand what it IS? This sort of explanation 'appears' to be clear, but in fact it really doesn't add to a student's understanding. Please take some actual data points and run the algorithm.