This explaination is one of the most precise explanation that I have seen on Internet.
@nelsondelarosa5490 Жыл бұрын
This is in fact well explained, defining every term, and assuming no previous knowledge. Thanks so much!
@sivareddynagireddy562 жыл бұрын
No words about u r explanation sir,simple lucid way explanation !!!!!
@sandipansarkar92114 жыл бұрын
Cool. Also finished my practice in Jupyter notebook. Thanks
@vaibhavchaudhary15693 жыл бұрын
Feature scaling (StandardScalar) should be applied after train test split. As it will not lead to information leak.
@datascience30462 жыл бұрын
TRUE
@kiruthigan20144 жыл бұрын
Loved Ur videos and Ur taste in music..kadhal vanthale in the bookmark 😂❤️🔥
@surendar35432 жыл бұрын
Neenga tamilah
@Tapsthequant5 жыл бұрын
Thank you, you asked a question I had in my head, looking forward to applying the suggested solution, about imbalanced dataset...
@TechnoSparkBigData5 жыл бұрын
Sir you are great inspiration to me. Thanks a lot for making every complex problem simpler.
@appiahdennis27253 жыл бұрын
Respect Krish Naik
@ijeffking5 жыл бұрын
Very well explained again. Thank you so much.
@shubhamsongire67123 жыл бұрын
Thank you so much Krish for this great playlist. You are gem
@903vishnu3 жыл бұрын
really its good... but you mentioned K=150, as per my knowledge we are not supposed to take even number. there might be chance of equal number of classes got selected nearest neighbor... algorithm may not be able to estimate the class for new record...
@deshduniya360scan73 жыл бұрын
Explain like a pro,thank you
@sazeebulbashar56862 жыл бұрын
Thank You Naik...... This is a very helpful video
@MaanVihaan4 жыл бұрын
Very nice sir ur explanation and coding technique is very nice.... I am new learner of data science please keep uploading such video and new techniques of different kinds of algorithms which help us make easy to understand to deal with different kinds of datasets.
@padduchennamsetti65163 ай бұрын
congratulations krish on 1million subscribers🥳
@DeepakSharma-od5ym4 жыл бұрын
error_rate = [] for i in (1,40): knn = KNeighborsClassifier(n_neighbors=i) knn.fit(X_train,y_train) pred_i = knn.predict(X_test) error_rate.append(np.mean(pred_i != y_test)) plt.figure(figsize = (10,6)) plt.plot(range(1,40), error_rate, color = 'blue', linstyle = 'dashed', marker = 'o' ) plt.xlabel('k') plt.ylabel('error rate') My above code giving error "x and y must have same first dimension, but have shapes (39,) and (2,)" Please suggest
@shreeyajoshi97712 жыл бұрын
Thank you very much for this video. Helped a lot!
@Kishor-ai Жыл бұрын
Thanks for this video krish naik sir🤩
@codyphillippi88313 жыл бұрын
This is awesome! Thank you so much. I am working on a project at work for lead segmentation to help us find our "ideal lead" for our sales reps with a lot of very messy data. This is a great starting point. Quick question (might be a loaded question ha) - after we find these clusters, how do we go about seeing the "cluster profiles"? Or what all data points make up these clusters (in categorical form)
@joeljoseph2611 ай бұрын
use any visualization library to see the clustering.
@shyam152875 жыл бұрын
All the best Superb Explanation you are a superb resource u will reach great heights continue ur good work
@scifimoviesinparts38373 жыл бұрын
At 18:52, you said larger value of K will lead to overfitting, which is not true. Smaller value of K leads to overfitting. I think, if there are 2 K-values giving same error, we choose the one that is bigger because it is less impacted by outliers.
@mahikhan57163 жыл бұрын
how can we choose optimal value of k by KNN ?
@joeljoseph2611 ай бұрын
Minkowski distance = (Manhattan) and (Euclidean)
@ManashreeKorgaonkar Жыл бұрын
Thank you so much for sharing this information. I'd just one doubt sir if we will scale before train_test_split wont it be lead to data leakage? as during scaling process during fit when it consider average of all the data points it also take the value of test data set so my model will already have some hint regarding it??
@rambaldotra22213 жыл бұрын
Grateful Sir ✨✨Thanks A lot.
@indrajitbanerjee51313 жыл бұрын
Nicely explained.
@asawanted3 жыл бұрын
What if we choose a K value and hit a local optima? How would we know if I should stop at that K value or proceed to a higher value in search of global optima?
@sandipansarkar92114 жыл бұрын
great explanation Krish.
@abdelrhmandameen22153 жыл бұрын
Great work, thank you
@louerleseigneur45323 жыл бұрын
Thanks Krish
@adhvaithstudio64123 жыл бұрын
can you exlplain how hyper parameters will helps in what scenarious
@krish3486 Жыл бұрын
Sir why there we check only 1 to 40 neighbours only in the for loop
@vignesh76873 жыл бұрын
Sooper Explanation Krish. I have a doubt here.. When do we need to use MinMaxScaler() and when do we use StandardScaler()? Is there any difference? or we have to try using both and see which gives better results? Please clarify
@ankurkaiser11 ай бұрын
Hope this answer finds you well, MinMaxScaler() an StandardScaler() are basically the same standard process except for Normalization data does't follow Gaussian Distribution and for Standardization it should. Normalization is used with models like KNN and Neutral Networks. It rescales the data between 0 to 1, so if your data doesnt't follow GD and you data ppoints are basically closer to 0/1 or if your business requirements are to normalize the data you go with MinMaxScaler(), else in general we use StandardScaler(), and its fast and easier to implement.
@chaithanyar99615 жыл бұрын
Hello sir , will this code work in tensor flow?? any changes to be made if I want excecute it in tf
@laxmiagarwal32854 жыл бұрын
This is very nice video.. But I'm having one doubt..what value u are taking for calculating the mean of error rate as prediceted values are in terms of 0 and 1
@mdmynuddin18883 жыл бұрын
if both category same neighbor point , than which category belongs to new data point?
@yathishl99734 жыл бұрын
Hi Krish, you are really amazing. I learn many things from you. I have a doubt, what measures should I take if the error rate increases with K-Value, please advice
@KnowledgeAmplifier14 жыл бұрын
Then you should decrease that k value, too small k value leads to overfitting, too large k value leads to underfitting , you have to wisely choose some middle value ☺️so that both bias and variance become as less as possible
@adipurnomo56833 жыл бұрын
If K too small that Will sensitive to outlier, if K too large that other class Will be included
@aditisrivastava70795 жыл бұрын
Very well explained
@princessazula992 жыл бұрын
for my assignment i am not allowed to use import packages for knn but I have to write it myself. do you have a code without the imported knn method?
@ramu77623 жыл бұрын
spot on. thank you.
@colabwork19102 жыл бұрын
Awesome
@shaz-z5064 жыл бұрын
Hi Krish, Just wanna verify since you've said at 5:10 that model is ready, but KNN is instance-based learning with no parameter assumption then I don't think so that it creates any model out of the algorithm. Please let me know I'm wrong as I need some clarity.
@devinpython55554 жыл бұрын
Could you please explain to me why fit and transform is done for the x values (in the above example leaving target column rest data is x values)
@vishalrai28594 жыл бұрын
yes i also want to know
@adipurnomo56833 жыл бұрын
Because transform it for scale input data.
@shaz-z5065 жыл бұрын
Hi Krish, In what scenario we'll use manhattan over euclidean.
@madeye12584 жыл бұрын
5.03 , if we are classifying the points based on the number of points next to it, then why we need to calculate the distance in step 2
@adipurnomo56833 жыл бұрын
Because calculate distance purpose is to sort value training data point before voting based K value.
@manusharma85275 жыл бұрын
i am not getting any any Classified Data csv file on Keggal.Please can you tell me the real name of that csv file
@birendrasingh71333 жыл бұрын
Awesome 😁
@makrandrastogi55883 жыл бұрын
can anybody tell why in most of the cases we use euclidian distance and not manhattan distance ?
@parammehta35593 жыл бұрын
Sir is it normal that sometimes as the value of n_neighbors is increasing the error rate is also increasing?
@vishalaaa14 жыл бұрын
Thank you
@Anubhav_Rajput070073 жыл бұрын
#Hi Krish, hope you are doing well. i trying to find the best value for K. but the code is not execute.. its running last 20 mint.
@techtalki3 жыл бұрын
It will check all the cases of 'K'. If you want to speed up choose the less value of K or a smaller dataset.
@vibhutigoyal7693 жыл бұрын
Is knn non- linear algorithm???
@sunilkumarkatta90623 жыл бұрын
How we will get error value to calculate accurate k value😅
@uchuynguyen99274 жыл бұрын
where you taking np.mean(pred_i != y_test), i think it should be pred_i = knn.predict(y_test) so then we will compare the predict y_test to actual y_test, then we''ll find the errors. If i wrong can somebody explain, thank you!
@manikaransingh32344 жыл бұрын
No, I'm sorry but you're not right! actually, pred_i is already predicted values with knn model (what you say it should be, its already done in the line above) There is nothing like finding error because it is a classification problem, not a regression one. suppose, pred_i=[1,1,1,1,0,1,0,0,0,1] test_i= [1,1,1,1,1,1,0,1,0,1] pred_i != test_i will result in [f,f,f,f,t,f,f,t,f,f] f= False, t=True thenn np.mean will take mean of true values which in this case will be 0.2(The error). I hope you get it
@SportsKiCharcha3 жыл бұрын
@@manikaransingh3234 mean of true values mean??
@SportsKiCharcha3 жыл бұрын
how is it 0.2 ??
@manikaransingh32343 жыл бұрын
@@SportsKiCharcha mean of true values is the number of True values divided by the sample. The result has 2 True values out of 10. 2/10 = 0.2
@SportsKiCharcha3 жыл бұрын
Ok got it..thanks
@shayanshafiqahmad5 жыл бұрын
What is the reason for taking pred_i !=y_test?
@shivaprakashranga86884 жыл бұрын
Pred_i value contains all the prediction values (like 1,0,1,0,0,1...) upon y_test(1,0,0,1,1,...) when K=1, pred_i !=y_test takes the value which is not predicted correctly(error) . No need to take correct predicted values. ex: out of 100 data points 60 not predicted correctly wrt y_test so these 60 data points we calculate mean. This will be continue for K=2,3.. upto 40. which ever having low mean value that we consider for K (elbow method)
@im_joshila_aj4 жыл бұрын
So this pred_i! =y_test will return a true /false value right? In the form of 0 and 1 and then mean will be calculated?
@weekendresearcher4 жыл бұрын
So, shud K always be an Odd number?
@KnowledgeAmplifier14 жыл бұрын
If you choose k value odd ,then there is more probability that tie will not occur, but still there are tie breaker available , so that we can have flexibility in choosing the k value☺️like sometime we consider weighted Nearest Neighbours or use the class with the nearest neighbor among tied groups, sometime we use a random tiebreaker among tied groups etc☺️✌️
@sriramswar5 жыл бұрын
Hi Krish, unable to open ipynb file in Jupiter noteboox. Getting the below error: Error loading notebook Unreadable Notebook: C:\Users\Srira\01_K_Nearest_Neighbors.ipynb NotJSONError('Notebook does not appear to be JSON: \'\ \
@krishnaik065 жыл бұрын
Dear Sriram I am able to open the ipynb file. Please use the jupyter notebook to open the file
@sriramswar5 жыл бұрын
@@krishnaik06 Hi Krish, I used Jupyter Notebook only. Not sure, if there is a problem at my end. Also, a suggestion! It would be better if random_state parameter is used in the code/tutorial so that everyone gets consistent results. I got different results when I tried the same code and I was confused for a moment and then understood the reason. Others may get confused, so just giving a suggestion!
@krishnaik065 жыл бұрын
Then probably there may be a problem with jupyter notebook file
@tagheuer0012 жыл бұрын
Why do repeat phrases so many times?
@sensei-guide5 жыл бұрын
As my k value increase my error rate also increasing bro
@ahmedjyad4 жыл бұрын
It's a normal outcome and common example of overfitting. Basically, if you k value is too high, you risk the chance of having an algorithm that just outputs the class with the highest occurrence.
@yathishl99734 жыл бұрын
@@ahmedjyad Hi, thanks for your input, please advice how to correct it
@ahmedjyad4 жыл бұрын
@@yathishl9973 Changing your k value is an example of hyperparamter tuning, which is a process to find the best parameter that produces the best classification model. You shouldn't really have a very high k value because that would result in over-fitting. So basically you getting higher error as u increase the k value is basically correct itself as it is expected. I hope you understand what I am trying to say. If not feel free to reach out to me.
@dragolov3 жыл бұрын
These are 2 musical (jazz) solos generated using K Nearest Neighbor classifier: kzbin.info/www/bejne/sKWWoI1nipp0etE kzbin.info/www/bejne/iZnIpa2VaLCKodU
@ArunKumar-yb2jn3 жыл бұрын
Krish - This seems to be a repeat of over a thousand similar videos on the internet, barring a few. What new insight have you brought here? You didn't define what that Y and X were and simply jumped into drawing X marks on the chart. Why do we need intuition of KNN? Why can't we really understand what it IS? This sort of explanation 'appears' to be clear, but in fact it really doesn't add to a student's understanding. Please take some actual data points and run the algorithm.