How To Handle Missing Values in Categorical Features

  Рет қаралды 115,011

Krish Naik

Krish Naik

5 жыл бұрын

Hello All here is a video which provides the detailed explanation about how we can handle the missing values in categorical values
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
Buy the Best book of Machine Learning, Deep Learning with python sklearn and tensorflow from below
amazon url:
www.amazon.in/Hands-Machine-L...
Connect with me here:
Twitter: / krishnaik06
Facebook: / krishnaik06
instagram: / krishnaik06
Subscribe my unboxing Channel
/ @krishnaikhindi
Below are the various playlist created on ML,Data Science and Deep Learning. Please subscribe and support the channel. Happy Learning!
Deep Learning Playlist: • Tutorial 1- Introducti...
Data Science Projects playlist: • Generative Adversarial...
NLP playlist: • Natural Language Proce...
Statistics Playlist: • Population vs Sample i...
Feature Engineering playlist: • Feature Engineering in...
Computer Vision playlist: • OpenCV Installation | ...
Data Science Interview Question playlist: • Complete Life Cycle of...
You can buy my book on Finance with Machine Learning and Deep Learning from the below url
amazon url: www.amazon.in/Hands-Python-Fi...
🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY KZbin CHANNEL

Пікірлер: 117
@mohitupadhayay1439
@mohitupadhayay1439 2 жыл бұрын
This was such an amazing life saver. I didn't even knew I had this question and the video just popped up. Didn't find this tutorial anywhere else.
@doop9134
@doop9134 Жыл бұрын
I was stuck for days trying to figure out how to predict missing data using ML. This helped me understand so so so much better! 😍 Thank you so much!! 🙏💚
@aksontv
@aksontv 4 жыл бұрын
Finally got right man to learn data science and ML. Thank you sir!
@gabrielburgos2533
@gabrielburgos2533 Жыл бұрын
You are the MVP, when no one has the answer, you do.
@soumikchakraborty90
@soumikchakraborty90 4 жыл бұрын
You are just awesome bro. Please make a video on AIC, AUC, ROC curve.
@duvanmartinez8586
@duvanmartinez8586 4 жыл бұрын
Great work, you're awesome, you're the best youtuber I've found.
@abinashkumarsinha8958
@abinashkumarsinha8958 2 жыл бұрын
This helped me a lot in my project work. Very useful and very well explained.
@keshavbansal5148
@keshavbansal5148 4 жыл бұрын
started this playlist today, loving it
@pallabsaha4098
@pallabsaha4098 4 жыл бұрын
Very well explained. If you could show the same on a dataset and code that would be very helpful. Thank you sir for your videos. Love them all.
@Susa270
@Susa270 2 жыл бұрын
Hello @ Krish Naik Hope you are doing well 🙂 First of all would like to thank you for such knowledgable videos. Most of the times your videos are really beam of hope. Can you please let me know where can I check the actual coding for the above mentioned concepts. It is a little difficult to get it in live scenario. Please guide, a humble request.
@sandeepnallala48
@sandeepnallala48 2 жыл бұрын
doing a great work Krish. thanks a lot. Loved your Videos : )
@AmitYadav-ig8yt
@AmitYadav-ig8yt 4 жыл бұрын
One more question- in some data set we find columns with many categories like Cars name column will have many cars name..In such case if we use this Unsupervised technique to create clusters, Won't it be too many clusters ?
@anandacharya9919
@anandacharya9919 4 жыл бұрын
Thank you for this video. Please also make video how to handle missing value and Outlier in continues variables.
@amedyasar9468
@amedyasar9468 3 жыл бұрын
it was quite short explaination and nice points to undersdtand. Tanks!
@another_hindu
@another_hindu 3 жыл бұрын
Hello sir, maybe I am here too late but I still hope that you would acknowledge this question as it might be of immense value. I have a disputed question which basically revolves around knn imputer, scaling and the concept of data leakage. As the knn imputer works on the principles same as knn algo, it does share the pros and cons of knn algo, right. So wont it be better to simply scale the data first ? Also, in case I am separating out the train and test data in order to avoid data leakage, should I split the data and then scale, impute ? Or should I impute and then split,scale it ? In case I split first...which is the most common preference which stats should I use for the user input. And lastly how should I handle the label encoded columns if any ? Nobody is discussing on this when it is one of the most imp problems a person would likely face. Can you please make a video on this ?
@tumul1474
@tumul1474 4 жыл бұрын
thank you sir ! amazing video as always
@thatguyadarsh
@thatguyadarsh 3 жыл бұрын
Amazing !! Use ML model to predict the NaN values.. That is clever sir.
@chandrasekarank8583
@chandrasekarank8583 4 жыл бұрын
Sir what if i can label encode the data then i can do a simple imputer which will replace the nan values by the mean or median as i wanted. Sir please tell me whether this is a way to do
@hv3300
@hv3300 4 жыл бұрын
Excellent video, as usual.
@muzamilshah8028
@muzamilshah8028 4 жыл бұрын
lets consider i want to predict value for f1 & row 2 as you have mention but what if we have also missing value in f2,f3 but not in same row ..what will we do in that scenario ????
@shivambhayre5056
@shivambhayre5056 4 жыл бұрын
I have no words to say just a thanks🙏
@ankurbanerji6605
@ankurbanerji6605 3 жыл бұрын
Great explanation sir! Can you explain how to handle the missing values for multiple columns in a dataset
@andyjackson4563
@andyjackson4563 Жыл бұрын
Thanks for explaining these methods
@AmitYadav-ig8yt
@AmitYadav-ig8yt 4 жыл бұрын
Sir, Can we get code for Create a classifier algorithm method for Missing value?
@lukaszmichalak9985
@lukaszmichalak9985 4 жыл бұрын
Don't you increase correlation between features with those methods? If so - what that will bring to the output model - to the prediction?
@aronpollner
@aronpollner Жыл бұрын
Is there a Multivariate Imputer implementation for categorical values like a class from sklearn?
@Geethu_Mohan_DA
@Geethu_Mohan_DA Жыл бұрын
Easy to understand. Thank you
@mohiuddinshojib2647
@mohiuddinshojib2647 Жыл бұрын
that is really informative
@shaileshsahu9551
@shaileshsahu9551 4 жыл бұрын
Please add a video in the Data Science and ML playlist of how to create our own predictor or estimator classifier algorithm to predict both categorical and continuous variables.
@daniellazarolazaro1033
@daniellazarolazaro1033 3 жыл бұрын
Thank you so much, this video actually helps a lot when you just got started like me hahahha, as I was saying, thank you so much for this great great great work!!!
@madhurchaudhary5109
@madhurchaudhary5109 3 жыл бұрын
Hi Krish, This is well explained!! I have an ID column which has unique value but for some records, ID is null how I can handle this type of data.
@itsmoolya
@itsmoolya 4 жыл бұрын
This is a good explanation!
@user-vy4jo3lt2v
@user-vy4jo3lt2v 9 ай бұрын
If we want to apply classifier algorithm on multiple columns then its possible ?
@divyaharshad9985
@divyaharshad9985 5 ай бұрын
For technique 3 will it lead to multicollinearity in the data?
@ZUBINABRAHAM
@ZUBINABRAHAM 3 жыл бұрын
Thanks for the video it was informative. Can we use KNN?
@fahimekheradmand5880
@fahimekheradmand5880 4 жыл бұрын
Excellent, Thank you
@abhipraydumka8587
@abhipraydumka8587 4 жыл бұрын
Can you tell me how to assign a unique cateogry lets say U(undefined ) to missing cateogrical data
@theoutlet9300
@theoutlet9300 3 жыл бұрын
since we are using output to predict our feature and then feature to predict our output, wouldnt it cause problems in prediction?
@abdulhakeem4715
@abdulhakeem4715 Ай бұрын
clean explaination
@preetnandeshwar5331
@preetnandeshwar5331 3 жыл бұрын
which missing catgorial method suit for which data set and why?or we just have to use it like HIT AND TRIAL METHOD? Plz anyone help me .I am begineer
@mitultank7872
@mitultank7872 2 жыл бұрын
If I have the missing values in numerical column, and I want to fill that based on other categorical variable column . Then how can I handle that?
@Nursin-rg1ey
@Nursin-rg1ey Жыл бұрын
thanks very much sir
@ommehta4501
@ommehta4501 2 жыл бұрын
If we have date categorical feature and have some missing values, please tell me how to do with this
@Saikrishna-lx9it
@Saikrishna-lx9it 4 жыл бұрын
Hi bro can you make one end to end chatbot video using rasa nlu, which is useful for all who are interested in nlp.
@raghavkumar8333
@raghavkumar8333 4 жыл бұрын
Sir, I have a student attrition dataset where I need to predict the reasons for student dropping out in 2nd year who got admission in 1st year. An year consist of 2 terms and I have grades of student (a,b,c,d) in 6 different courses in 1st and 2nd terms now most of these grade columns of 6 different courses in 2nd term are missing. Intuitive I think it could be a reason for dropping out. My question is 1) Should I impute missing values in this case because it is possible that it is not missing those students already dropped out. So, should I create dummy variables 2) If I impute missing value what technique should I use to impute those missing categorical variables
@aditya_baser
@aditya_baser 4 жыл бұрын
Here, you only had one categorical column. What if you have multiple categorical columns, how do you go about with the missing value treatment in that case?
@MegaJaivardhan
@MegaJaivardhan 4 жыл бұрын
love you bro.. could you make a video AUC and ROC curve?
@kumarraju2923
@kumarraju2923 4 жыл бұрын
How the initial clusters are selected for missing values
@amitjajoo9510
@amitjajoo9510 4 жыл бұрын
sir thanks for making feature engineering playlist.
@anuragmishra6262
@anuragmishra6262 4 жыл бұрын
Can you please show practical implementation of the same. Thanks 😊
@tahamansoor599
@tahamansoor599 4 жыл бұрын
its great it would be better if u show us a hands on the dataset
@clivefernandes5435
@clivefernandes5435 4 жыл бұрын
Is method 3 widely used ? Never heard of it
@jaiminshah143
@jaiminshah143 3 жыл бұрын
How to handle missing(NaN) values in column having binary data values i.e Just 0 or 1 ?
@madunishant6052
@madunishant6052 4 жыл бұрын
Thanks! 😊
@Analystmind
@Analystmind Жыл бұрын
What if my model's missing values are not categorically it's number
@RAJI11000
@RAJI11000 4 жыл бұрын
Sir how can impute if feature value like 100 mbps
@chirumadderla8129
@chirumadderla8129 2 жыл бұрын
If there are several missing values in the solar radiation data during the night times and early morning hours how to handle them .The dataset I considered is of one year
@bismeetsingh352
@bismeetsingh352 4 жыл бұрын
What do you do when you have missing values in textual data?
@sachinborgave8094
@sachinborgave8094 4 жыл бұрын
Excellent Sir, can you please provide a python source code i.e. how to fill missing category data using logistics reg
@CheeseKransky12
@CheeseKransky12 4 жыл бұрын
Thanks krish
@RajaKumar-ne9bt
@RajaKumar-ne9bt 2 жыл бұрын
Why we are skipping the output when doing clustering?
@AmitYadav-ig8yt
@AmitYadav-ig8yt 4 жыл бұрын
Just a request...May you please upload codes for this also..-, I saw in many videos codes are missing for techniques..it will be very helpful if you provide us code. Thanks a lot
@nasiksami2351
@nasiksami2351 3 жыл бұрын
Amazing!
@sachinborgave8094
@sachinborgave8094 4 жыл бұрын
Hello sir... Please make a video that how to fill missing categories using logistic regression...
@AutitsicDysexlia
@AutitsicDysexlia 3 жыл бұрын
This is what I did in DAX, but I did it in a more complex way... because I was using DAX. But it's effectively a RandomForest method that I used.
@hindajjouri9151
@hindajjouri9151 6 ай бұрын
thank you
@VikasSharma-ye7pu
@VikasSharma-ye7pu 4 жыл бұрын
Hi krish ... Pls make video on in explaining 2 kaggle competition projects ...
@akshayvilayatkar7985
@akshayvilayatkar7985 4 жыл бұрын
How we can handle alphanumeric missing values in dataset. I can not got out of this problem ,Please help krish
@sadikbilal5149
@sadikbilal5149 2 жыл бұрын
Nice , plz u have code to implement that techniques?
@ele_wings7521
@ele_wings7521 4 жыл бұрын
thank you sir...
@Raja-tt4ll
@Raja-tt4ll 4 жыл бұрын
very nice video
@RK-un6ou
@RK-un6ou 3 жыл бұрын
Why do we fill NaN values with mean or median? And why does it won't effect the dataset Can you explain a bit in this?
@ashokpalivela311
@ashokpalivela311 4 жыл бұрын
thank you😍
@chinmaybhat9636
@chinmaybhat9636 4 жыл бұрын
Can you Share the Same thing by taking one dataset and showcase the same
@RishikeshGangaDarshan
@RishikeshGangaDarshan 3 жыл бұрын
How to handel in regression oroblem
@sriraj8392
@sriraj8392 2 жыл бұрын
sir will u teach offline classes ...?
@saurabhpathare4157
@saurabhpathare4157 3 жыл бұрын
I am always reluctant to delete or use mode for categorical values. This video explains a lot. Good approach! In technique 3, which classifier do you recommend for best efficiency?
@riteshmukhopadhyay6922
@riteshmukhopadhyay6922 2 жыл бұрын
KNN, there is no particular ways as such it depends on the dataset
@192Kiran
@192Kiran 4 жыл бұрын
Krish . could please do with datasets
@sandyjust
@sandyjust 4 жыл бұрын
Great explanation of the concept. With unsupervised technique we might be in situation that both male and female falls under group 2. Then what would our approach?
@kaustabhmandal7483
@kaustabhmandal7483 4 жыл бұрын
I have also observed that in this video. You can put the the category with max frequency in that cluster.
@shaikhkashif9973
@shaikhkashif9973 Жыл бұрын
Sir pehle outliers fill yah null values fill karna chahiye ols answer
@pankajkar2008
@pankajkar2008 4 жыл бұрын
pure concepts
@1a17890
@1a17890 4 ай бұрын
Sirji can you kindly show how it's done
@napoleonx5259
@napoleonx5259 Жыл бұрын
كفو كريشنا ❤
@sandipansarkar9211
@sandipansarkar9211 2 жыл бұрын
finished watching
@ashwinkrishnan4285
@ashwinkrishnan4285 3 жыл бұрын
If we apply classifier algorithm to predict the Gender feature if it is male or female through other features including output feature as well, in training dataset and get the missing values of gender feature (Test dataset), and then finally when we go for the model to predict the classification of output hope it would be influenced or the data leakage would have happened as we considered that to fill missing column values? Please clarify on this point Krish..
@chirathabey7729
@chirathabey7729 3 жыл бұрын
It won't as much because even though we are training including the output feature, it only used for predicting the missing samples ONLY. Considering the fact that there is much less missing samples as compared to rest of the samples. If the missing samples are considerably high and have in many other features then it will certainly create a bias on the final prediction.
@AmitYadav-ig8yt
@AmitYadav-ig8yt 4 жыл бұрын
Sir, U took data set which has a missing value in just one column. You told about Predicting missing value my using other columns as Training set. Let's say we have a data set in which every columns have some missing values..In such case which columns should be use to predict missing values?
@kannadarecipes-6626
@kannadarecipes-6626 4 жыл бұрын
Following
@habilmohammed5127
@habilmohammed5127 4 жыл бұрын
Following
@leilafakhraei78
@leilafakhraei78 4 жыл бұрын
Following
@barnadipdey8486
@barnadipdey8486 4 жыл бұрын
yes Amit I have the same query ,if you had solved this please dm me.
@mohammadarif8057
@mohammadarif8057 4 жыл бұрын
Sir can you provide a practical approach with complex data set ...that would be great thank you
@janinajochim1843
@janinajochim1843 4 жыл бұрын
Thank you for the video! Would you happen to know what to do in cases where the value is"Missing by design". I have a case where I am using the variable "Father's reaction to pregnancy" -- it has missing values for participants who did not know the father of the child because they didn't get this question :/
@sawradipsaha5377
@sawradipsaha5377 4 жыл бұрын
May be you can consider that as a different catagory.
@AmitYadav-ig8yt
@AmitYadav-ig8yt 4 жыл бұрын
You said to Create a classifier to predict the missing values. What to do if we have Linear regression problem and Missing values there?, Should we create classifier for that too? Please response
@chirathabey7729
@chirathabey7729 3 жыл бұрын
Yes, if you are trying to predict the missing value which belongs to a Categorical variable. Because when you are predicting missing value, your output variable will be the missing value variable and rest of the variables will become the input variables. You can think of you are trying to solve an entirely independent problem.
@dineshkumar-kc7vt
@dineshkumar-kc7vt 4 жыл бұрын
im unable to overcome this problem. I have initially done is get_dummies for the Dataset and i want to handle the missing values but i'm getting error so as TypeError: '(slice(None, None, None), slice(0, 2, None))' is an invalid key Please Help Me
@chirathabey7729
@chirathabey7729 3 жыл бұрын
Before you apply One-Hot-Encoding, do the missing value treatment first
@Justme-dk7vm
@Justme-dk7vm 2 ай бұрын
Sir why do you have the same voice as my college chairman? 😩💓
@analistaremoto
@analistaremoto 3 жыл бұрын
Niiiiiice!
@junaidlatif2881
@junaidlatif2881 Жыл бұрын
But how to apply!
@archanapereira1333
@archanapereira1333 4 жыл бұрын
How to identify dependent n independent variables in a dataset ?
@chirathabey7729
@chirathabey7729 3 жыл бұрын
It depends on the problem description. It describes what the problem is. So, your output variable / dependent variable will give the answers to your problem. Rest of the features will become your independent variables
@shivambhayre5056
@shivambhayre5056 4 жыл бұрын
If it is in quantitative variables we can replace missing value by mean
@AmitYadav-ig8yt
@AmitYadav-ig8yt 4 жыл бұрын
Is it a question?, If yes, Then Yep You can take mean to replace Quantitative missing values
@vasusharma1773
@vasusharma1773 4 жыл бұрын
sir if you could just show this in a code, it will be very helpful
@cutyoopsmoments2800
@cutyoopsmoments2800 4 жыл бұрын
Bro I want to make my career in Machine Learning. Kindly guide...
@jaypatil4786
@jaypatil4786 4 жыл бұрын
I have one easy question ...but I not remember it now please tell me to view how many missing values in dataset
@saravananm2280
@saravananm2280 4 жыл бұрын
dataset.isnull().sum()
@nhprml6324
@nhprml6324 4 жыл бұрын
we can replace missing values with corresponding feature's mean value.
@arjyabasu1311
@arjyabasu1311 4 жыл бұрын
Sir please upload the implementation of these methods !!
@harshtiwari8765
@harshtiwari8765 4 жыл бұрын
can u send me the notes for feature enginerring which was given by Krish naik ? Help is appreciated
@martinlyuba5105
@martinlyuba5105 10 ай бұрын
Great tutorila. your email please
Handling Missing Data Easily Explained| Machine Learning
23:22
Krish Naik
Рет қаралды 177 М.
That's how money comes into our family
00:14
Mamasoboliha
Рет қаралды 12 МЛН
LOVE LETTER - POPPY PLAYTIME CHAPTER 3 | GH'S ANIMATION
00:15
Different Types of Feature Engineering Encoding Techniques
24:07
Krish Naik
Рет қаралды 188 М.
Learn to deliver PRESENTATIONS confidently in ENGLISH! 🔥
8:11
WiseUp Communications
Рет қаралды 673 М.
FASTEST Way to Learn Data Science and ACTUALLY Get a Job
9:00
Sahil & Sarra
Рет қаралды 217 М.
Handling Missing Values in Pandas Dataframe | GeeksforGeeks
22:17
GeeksforGeeks
Рет қаралды 125 М.
Handling Missing Data | Part 1 | Complete Case Analysis
24:54
Don't Replace Missing Values In Your Dataset.
6:10
Underfitted
Рет қаралды 8 М.
Standardization Vs Normalization- Feature Scaling
12:52
Krish Naik
Рет қаралды 292 М.
Dealing With Missing Data - Multiple Imputation
11:02
ritvikmath
Рет қаралды 45 М.
Variational Autoencoders
15:05
Arxiv Insights
Рет қаралды 486 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57