Very well explained! Teacher like you must be appreciated!!
@rachittoshniwal4 жыл бұрын
Wow, thanks Nikhil ! Appreciate your kind words! :)
@kumarnikhil81974 жыл бұрын
@@rachittoshniwal I know its a really tough job to make edu videos which hardly gets much views as compared to filth which is piling up on YT, please don't lose motivation, just remember there is always that one weak person who by your help can sleep peacefully that night.
@rachittoshniwal4 жыл бұрын
@@kumarnikhil8197 you're making me nervous now Nikhil :p thanks btw!
@r.s.5725 ай бұрын
thank you for explaining this! :) poor PhDs are thankful for people like you who use their free time to do such videos!
@ritvikpalvankar19032 жыл бұрын
Hello, thank yo so much for a clear explanation. I was asked this question in an interview and I think I did a good job by watching this video a day before. :)
@rachittoshniwal2 жыл бұрын
Wow, I'm so glad it helped Ritvik! I hope you get the job! :)
@pushpakkothekar92712 жыл бұрын
Learned KNN imputation buddy thank you. Liked and Subscribed brother....
@DrizzyJ778 ай бұрын
Thanks Needed a clear explanation for my missed class😅
@tridibpal8572 жыл бұрын
Sir you are awesome . Please take a bow .
@rachittoshniwal2 жыл бұрын
Haha, thanks!
@ivanrazu4 жыл бұрын
Nice example, I do have a question. When you do the imputation for the other missing values, do you use the imputed value you just found when computing distances with respect to that row? Or do you do all imputations simultaneously?
@rachittoshniwal4 жыл бұрын
hi Ivan! No, we do not pay attention to any newly imputed values for imputing other values. All NaN's get imputed independent of each other. So basically yeah, in a sense all get imputed simultaneously
@ivanrazu4 жыл бұрын
@@rachittoshniwal Ok got it. Thank you, Rachit!
@rachittoshniwal4 жыл бұрын
@@ivanrazu :)
@kennethbassett63302 жыл бұрын
Thanks for the great video! I have a question: Let's say I am finding the 5 nearest neighbors. I am trying to fill in a missing value for column A for a certain data point. One of it's nearest neighbors is also missing a value in column A, should I take the average of the remaining 4 neighbors, or should I include the next closest neighbor (6th furthest) in the average?
@prithvisingh41733 жыл бұрын
nice , bro you got my concept and doubts cleared . Thanks ......
@rachittoshniwal3 жыл бұрын
I'm glad it helped, Prithvi!
@pumpitup19934 жыл бұрын
Very nicely explained, can you do the same for MICE imputation?
@rachittoshniwal4 жыл бұрын
I'm glad you liked it! I'll look into MICE!
@rachittoshniwal4 жыл бұрын
Hi Sourav, I've just published one on MICE here: kzbin.info/www/bejne/jYHMioKJaNZ-bZI Do check it out and let me know if you do (or if you do not ! ) find it useful :)
@pumpitup19934 жыл бұрын
@@rachittoshniwal yes i just saw it, really helpful,thanks a lot!
@akshatjain17462 ай бұрын
short simple informative!
@bhavnatanwar85913 жыл бұрын
your vedio is really helpful:)), i have a question, does KNN imputer uses the imputed values in calculation further, what i mean is suppose we have imputed the missing values in the first column and now we have to compute the values in the second column so does it uses the values imputed in the first column or does it consider it as missing and is used in weight as missing entry??
@rachittoshniwal3 жыл бұрын
Hi Bhavna, thanks! Well, no. It doesn't take into account the imputed values in one column while imputing other columns. It considers the "original" missing values as missing. All columns are independent of each other during imputations basically.
@bhavnatanwar85913 жыл бұрын
@@rachittoshniwal thanks this was really helpful:))
@Mflegend4263 жыл бұрын
very nice explanation bro. Is there any way/method to select number of neighbours while imputing values?
@rachittoshniwal3 жыл бұрын
Thanks Ajeeth. Glad it helped! Well, you could try a grid search or even give the elbow method a shot
@heteromodal3 жыл бұрын
Thank you again for a great tutorial! Can you give an outline to when this method would be preferable to MICE for example?
@rachittoshniwal3 жыл бұрын
First of all, thanks! I'm glad you liked it! Well, mice is helpful when the features are "correlated", and if you know that's the case, go ahead with it. Otherwise, look for other methods (like knn for example) But it's more of trial and error really. The Imputer that gives the best results is the best one!
@heteromodal3 жыл бұрын
@@rachittoshniwal Thanks again! Really appreciate your videos and responses! :)
@rachittoshniwal3 жыл бұрын
@@heteromodal thanks! My pleasure!
@ethiopiansickness3 жыл бұрын
I'm surprised you don't have more subscriptions to your channel. A lot of your videos are at the top of search queries on youtube, so I am sure eventually you will get the subscriptions and views that you deserve. Keep up the great work!
@rachittoshniwal3 жыл бұрын
Haha! Thank you Shiffraw! Appreciate that!
@ismafoot113 жыл бұрын
Excellent video however what is the impact of doing this when the features having extreme variability. For example if one column ranged between 0 and 1Million and the other columns hovered around 10-20. Should you normalize/standardize your data before hand ? If so, should you normalize or standardize and how would you do it if you have missing values in that column
@rachittoshniwal3 жыл бұрын
Thanks! Yes, we should ideally normalize the data if they're on different scales. Scikit learn will ignore presence of missing values, and scale the columns based on the non missing values. The NaNs remain NaNs. You can then impute those values. There is no one correct answer as to whether to normalize or standardize. Trial and error, whichever works best on your data
@tugce23262 жыл бұрын
Hi Rachit, Very nicely explainedThat's why I want to ask you something. I have 440 data belonging to 9 precipitation observation stations (data matrix:440×9). There are missing values at each station. 9 none of precipitation series shows a normal distribution. However, the missing in 9 precipitation series are completely random. My question is;1) Can I use the k-NN/Random forest/ and MICE methods even though 9 precipitation data is not distributed normally? 2) Are there any prerequisites/conditions for using these methods? 3) Could I use these methods if my data was not MCAR?
@rachittoshniwal2 жыл бұрын
Hi! Following a distribution is not a pre req for imputation. However, if the data is skewed, it is better to go for median imputation than mean, because median is a better approximate than mean in that case. So MICE works better when the data is MAR, if not you might get suboptimal results. At the end of the day though, it is mostly trial and error while finding the best method. Hope it helps!
@SumitKumar-sj5xw2 жыл бұрын
very good explanation
@TheReluctantCoder3 жыл бұрын
Very good explanation! Thank you!
@TheElementFive2 жыл бұрын
Suppose you want to apply this technique to a dataset where the outcome variable is discrete. Would it be logical to limit your set of neighbors to those belonging to the class associated with the row you are imputing (i.e., calculate the Euclidean distance between the row to impute, and only all other rows for which y_current_row == y_neighbor row) ?
@ThePablo5052 жыл бұрын
Thank you so much
@md.faisalsohail91084 жыл бұрын
simply awesome. thanks, brother.
@rachittoshniwal4 жыл бұрын
I'm glad you liked it! :)
@md.faisalsohail91084 жыл бұрын
@@rachittoshniwal hope u and ur channel grows exponentially.
@rachittoshniwal4 жыл бұрын
@@md.faisalsohail9108 whoa! Thank you for the kind words! :)
@rizkiekaputri21222 жыл бұрын
please input subtitle in this video, my final task is about this topic, i really hope u put subtitle here so I can understand what are u explain in.
@shoaibahmed5848 Жыл бұрын
What about 1 row missing value and 4th row missing value is those values to be filled necessary?
@noorbariahmohamad87592 жыл бұрын
Prof, what if NAN happened at the same time ? means Friends, GOT, Suit, Breaking Bad, HIYM all missing at row 2. Still can impute using kNN method?
@barathwajas67022 жыл бұрын
Hi Rachit, Quick Question how do you evaluate and tune the models if the imputer did predict the correct or nearby value or not? Thanks in advance.
@rachittoshniwal2 жыл бұрын
We can only judge goodness of the imputation by the model performance. If we get a good final model, it means the imputer was able to get close to the real values
@barathwajas67022 жыл бұрын
@@rachittoshniwal correct but in your example case was there any tuning done if so can you share that insight? TIA.
@NitinMukeshIITB3 жыл бұрын
Awesome explanations
@rachittoshniwal3 жыл бұрын
Thanks Nitin! I'm glad it helped!
@mohitgoyal2293 жыл бұрын
Rachit can you recommend some books, where we can find techniques like these in details.
@rachittoshniwal3 жыл бұрын
I don't really have any good recommendations, but you can check the scikit learn documentations of the algorithm you want, they usually refer to a research paper/ a good reference on which their implementation is based.
@KartikRai-YrIDDCompSciEngg2 жыл бұрын
What if (Row 3 Col0),(Row 4 Col0) also had missing values. So the mean ( (50+29)/2 ) would not be possible, then how does the algorithm proceed.
@RS-fe1hk4 жыл бұрын
2 doubts : 1) if v r giving k-neighbour =2 and the nan value is present in 1st row instead of 2nd row which are the rows will be selected for calculating euclidean? 2) while 'weight' value 'total' / present cord .. What is the total nd present cord value if the both the values are nan.. Example : in ur example say instead of 85 there is Nan value.. Then wat is total and present cord value while computing with 1st row?
@rachittoshniwal4 жыл бұрын
If I understand your second question, Total will always be 4 in this case, because there are 4 other columns. For a combination to be considered as "present coord", both values must be present, so if 85 was a nan, then while comparing person 2 with 1, both will be nan for HIMYM column and hence won't be considered in "present coord" I didn't quite get your first question. By 1st row, you mean 1st person or 0th person?
@RS-fe1hk4 жыл бұрын
@@rachittoshniwal @Rachit Toshniwal that's answers the 2nd question.. Thanks... And for 1st question 1st row means 1st index not the 0th index.. (ie friends =44)
@rachittoshniwal4 жыл бұрын
@@RS-fe1hk so you mean to say for person 1, both Friends and HIMYM are nans?
@RS-fe1hk4 жыл бұрын
@@rachittoshniwal yeahh... If we try to fill Nan for person 1 and give k-neighbors as 2... How it will select rows. Because Above person 1 there is only 1row ( person 0) is present rite.. So in that case wat r the rows will get selected for imputation?
@rachittoshniwal4 жыл бұрын
@@RS-fe1hk It doesn't matter how many rows are above or below the row in which there is a missing value. It will scan through all rows in the dataset and find the top 2 neighbors I hope that solves your query. Let me know if it doesn't!
@yv40003 жыл бұрын
Do we need to scale the features before imputation
@rachittoshniwal3 жыл бұрын
If the features are in different scales, then yes.
@yv40003 жыл бұрын
@@rachittoshniwal how should we scale a feature with null values present? min-max scalar wont work with null values present.
@rachittoshniwal3 жыл бұрын
@@yv4000 it does work with missing data. It just ignores the presence of those missing values
@yv40003 жыл бұрын
@@rachittoshniwal Thanks!
@venkateshwarlusonnathi41373 жыл бұрын
should you not normalize the values before doing KNN? or is it because all of them are supposed to be in the same range of 0 to 100, we dont need to do here?
@rachittoshniwal3 жыл бұрын
Yes, precisely. Since all are on the same scale, it isn't mandatory to perform normalization. But if features are set on different ranges, then you should
@anirudhgupta4553 жыл бұрын
how would imputation happen if any of these variables were categorical?
@rachittoshniwal3 жыл бұрын
yeah, so KNN imputer wouldn't work particularly well for categorical data.