Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews

  Рет қаралды 5,467

Emma Ding

Emma Ding

Күн бұрын

Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.
Feature hashing
en.wikipedia.org/wiki/Feature...
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 Introduction
00:48 Categorical Data
02:22 Ordinal Features & Class Labels
03:38 One-Hot Encoding
05:32 Dummy Encoding
06:30 Problems of One-Hot & Dummy Encoding
07:26 Feature Hashing

Пікірлер: 16
@linghaoyi
@linghaoyi Жыл бұрын
Thank you. Merry Christmas and Happy New Year!
@emma_ding
@emma_ding Жыл бұрын
Many of you have asked me to share my presentation notes, and now… I have them for you! Download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!
@junlizhou7167
@junlizhou7167 Жыл бұрын
Thanks for the informative video Emma! Love the Notion notes you created
@emma_ding
@emma_ding Жыл бұрын
So glad you enjoyed it! Thank you for watching. 😊
@qingxiawang161
@qingxiawang161 Жыл бұрын
Hi, Emma, thank you very much for the informative video, I really learned a lot from it! Keep up the good work❤
@hsuya3925
@hsuya3925 Жыл бұрын
Hi Emma, very informative video. Thanks for working on all these types of videos and sharing with us. Wanted to know is your notion page public? or can you share if possible.
@Dr_Hermit
@Dr_Hermit Жыл бұрын
I have been waitiing for these as well. :)
@emma_ding
@emma_ding Жыл бұрын
Of course! I'm working on getting all notes organized and sharable in one location, will let you know as soon as they are ready! :)
@emma_ding
@emma_ding Жыл бұрын
@sukumargv @hsuya3925 Here you go! You can now download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!
@nitishjambhurkar7990
@nitishjambhurkar7990 Жыл бұрын
Hi Emma, thank you soo much for this insight. Addition to this i also want to know how to handle large datasets like very large datasets because i was asked in an interview but i was unable to answer it correctly. So wanted to know from you how to handle very huge datasets and how to load ? what steps you would take to load these datasets. If you can make one video on this topic that would be great.
@jet3111
@jet3111 Жыл бұрын
Hi Emma, thank you for the very informative video. It would be great to discuss embedding methods for handling categorical data.
@emma_ding
@emma_ding Жыл бұрын
Great suggestion! I've added it to my list of content ideas. 😊 Thanks for watching!
@saudiorchestra6443
@saudiorchestra6443 9 ай бұрын
How do we deal with a category that appears for the first time in the test data? For examples, I the training data I have a column for the jobs. The training data contains these jobs: Doctor, Nurse, Lab technician, Administrator I used one hot encoding for the job column. What if the test data has an additional job Surgeon? How do we handle this situation?
@rakeshkumarsharma2250
@rakeshkumarsharma2250 Жыл бұрын
How I convert pincode /postal code
@sruthimallarapu7662
@sruthimallarapu7662 Жыл бұрын
Hi Emma, Can decision trees handle string categorical values (For example "gender" column takes "M" or "F"). Is it not necessary to convert the strings to numericals?
@georgezevallos
@georgezevallos 5 ай бұрын
All ML algorithms require to convert the strings into numerical values. Even NLP does it. Hope it helps.
100❤️
00:20
Nonomen ノノメン
Рет қаралды 73 МЛН
Stupid Barry Find Mellstroy in Escape From Prison Challenge
00:29
Garri Creative
Рет қаралды 15 МЛН
Купили айфон для собачки #shorts #iribaby
00:31
peguei um brawler no sorteio star épico
0:13
Minkzin gamer
Рет қаралды 471
How do I encode categorical features using scikit-learn?
27:59
Data School
Рет қаралды 137 М.
Handling categorical data
11:13
Sukamal Das
Рет қаралды 10 М.
100❤️
00:20
Nonomen ノノメン
Рет қаралды 73 МЛН