Handling Imbalanced Dataset in Machine Learning: Easy Explanation for Data Science Interviews

  Рет қаралды 19,667

Emma Ding

Emma Ding

Күн бұрын

Imbalanced Data is one of the most common machine learning problems you’ll come across in data science interviews. In this video, I cover what an imbalanced dataset is, what disadvantages it presents, and how to deal with imbalanced data when data contains only 1% of the minority class.
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 Introduction
01:20 Interview Questions
01:38 Imbalanced Data
03:15 Why it causes problems?
04:27 How to deal with imbalanced data?
08:13 Model-level methods
11:33 Evaluation Metrics
13:25 Outro

Пікірлер: 39
@AnkurSingh-mk9rc
@AnkurSingh-mk9rc Жыл бұрын
Thanks Emma , these short videos come in handy when preparing for interview
@elonchan9675
@elonchan9675 Жыл бұрын
Hi Emma, it is a really good summary videos on the matter of imbalanced dataset. Thank you and keep up the good work!
@dle3528
@dle3528 Жыл бұрын
This video is amazing. It was easy to understand and summarized different possibilities for dealing with unbalanced data. Congratulations! Keep helping people. I am very grateful for your explanation!
@ayambavictorndoma5672
@ayambavictorndoma5672 5 ай бұрын
I enjoyed this video. Thanks for this Emma
@ankgup87
@ankgup87 2 ай бұрын
This video helped me clear an interview. Subscribed. Thank you.
@spikeydude114
@spikeydude114 Жыл бұрын
Great topic! Thanks for covering
@user-qr4pi4ow7b
@user-qr4pi4ow7b 4 ай бұрын
Emma,great explanation and to the point.
@psg9278
@psg9278 4 ай бұрын
Best Video on ML, I understood very clearly. Thank You
@Itsdanielpeng
@Itsdanielpeng Жыл бұрын
This is really helpful. thank you so much for putting out these videos!
@emma_ding
@emma_ding Жыл бұрын
So glad you find them helpful, Daniel! Thanks for watching. 😊
@emma_ding
@emma_ding Жыл бұрын
Many of you have asked me to share my presentation notes, and now… I have them for you! Download all the PDFs of my Notion pages at www.emmading.com/get-all-my-free-resources. Enjoy!
@jerrywang1550
@jerrywang1550 Жыл бұрын
is it possible to share your notion file? Thank you
@emma_ding
@emma_ding Жыл бұрын
@@jerrywang1550 You can download all the PDFs of my Notion pages at emmading.com/resources by navigating to the individual posts. Enjoy!
@jerrywang1550
@jerrywang1550 Жыл бұрын
@@emma_ding I mean your notion files, not PDF. Thank you
@SonuKumar-gt5xs
@SonuKumar-gt5xs Жыл бұрын
Hi Emma, these videos are really good. can you make a video on time series analysis
@machinelearning6817
@machinelearning6817 Жыл бұрын
Subscribed !!
@efeincir3254
@efeincir3254 9 ай бұрын
Wonderfull!
@ATN_AI
@ATN_AI 9 ай бұрын
Hi! Is there a way you can share this notion document! Thank you!! Great content
@sanyam5685
@sanyam5685 Жыл бұрын
Thanks Emma, Can we also have a series of videos on deploying ML models in production?
@emma_ding
@emma_ding Жыл бұрын
Thanks for your comment, Sanyam! 😊 I've added your idea to my list of content suggestions.
@sambidpradhan32
@sambidpradhan32 Жыл бұрын
Hey Emma..big fan of your work😀,looking for series in model deployment.. if you can add things like processing(batch/stream), serving(batch/realtime) and learning(offline/online) part in production. sorry if it is a big ask🥲
@emma_ding
@emma_ding Жыл бұрын
Thanks for your comment! I've added your suggestions to my list of content ideas. 😊
@michaeldarmanis8477
@michaeldarmanis8477 3 ай бұрын
To my view, imbalance of data does not pose a problem. During classification one ought to model class membership distributions, and these may be small. As long as they are correct, there is no problem. One should, of course, use proper scoring rules (i.e. not accuracy) to maximize the classification problem. Tetlock's Superforecasting serves as a wonderful and very readable introduction to predicting unbalanced classes.
@thedislikebutton163
@thedislikebutton163 Жыл бұрын
Checkout this paper on Gumbel loss/activation for LVIS long tailed dataset, interesting method for imbalanced datasets
@shilashm5691
@shilashm5691 Жыл бұрын
Paper link?
@kaikapioka9711
@kaikapioka9711 9 ай бұрын
?
@jasonswift7468
@jasonswift7468 Жыл бұрын
Hi Emma. Could you talk about chatGPT (including its model, dataset, algorithms, system design, etc) for the next video? Thank you.
@emma_ding
@emma_ding Жыл бұрын
Thanks for your comment! 😊 I've added your idea to my list of content suggestions.
@Aria-ow4cl
@Aria-ow4cl Жыл бұрын
Hi, Emma! Thanks for sharing. Very helpful materials. But i got a probleme when downloading the presentation notes, somehow the notes for imbalanced dataset is missing, when I click the imbalanced dataset notes, it actually opens the notes for encoding categorical data, could you please help with this?
@emma_ding
@emma_ding Жыл бұрын
Thank you so much for letting me know! I apologize for the mix-up, and have corrected the issue. Thanks for your patience. 💛
@kevinpoisson713
@kevinpoisson713 Жыл бұрын
In the ‘why imbalance is important’ part, the accuracy for rare event predicting model can be solved by relying on other evaluating metric such as precision and recall, isn’t that right?. It’s not explaining the why
@mihretdesta9153
@mihretdesta9153 Жыл бұрын
hey Emma please send me the code for imbalanced image datasets
@srhrsh100
@srhrsh100 5 ай бұрын
You are just reading the text written in the book, try to explain with examples and further in detail, apart from what is already mentioned in the book.
@pacesferry
@pacesferry Жыл бұрын
Hi, audio clipping detected..
@DSlayer007
@DSlayer007 11 ай бұрын
is 75:25 imbalanced dataset
@pixelmasque
@pixelmasque Жыл бұрын
A gorgeous ML scientist
@mihretdesta9153
@mihretdesta9153 11 ай бұрын
please reply me
@zbynekba
@zbynekba 17 күн бұрын
Your content is good, but your strong accent needs improvement.
@mohamedsamy2895
@mohamedsamy2895 Ай бұрын
So bad
小女孩把路人当成离世的妈妈,太感人了.#short #angel #clown
00:53
⬅️🤔➡️
00:31
Celine Dept
Рет қаралды 36 МЛН
Каха ограбил банк
01:00
К-Media
Рет қаралды 3,7 МЛН
Watermelon Cat?! 🙀 #cat #cute #kitten
00:56
Stocat
Рет қаралды 33 МЛН
Advanced Machine Learning- Imbalanced Learning - Cost-Sensitive Learning 3
17:14
Statistical Learning and Data Science
Рет қаралды 393
Handling Imbalanced Datasets   SMOTE Technique
24:32
DataMites
Рет қаралды 49 М.
小女孩把路人当成离世的妈妈,太感人了.#short #angel #clown
00:53