Dealing with Imbalanced Datasets in ML Classification Problems | DataHour by Damini Dasgupta

  Рет қаралды 5,475

Analytics Vidhya

Analytics Vidhya

Күн бұрын

Пікірлер: 13
@alexilaiho6441
@alexilaiho6441 3 ай бұрын
The talk fails to cover how to deal with imabalnced datasets using SMOTE, and also using Focal Loss for Neural Nets
@Analyticsvidhya
@Analyticsvidhya 3 ай бұрын
DataHour Resources 🔗 bit.ly/3xUaue4
@shreyashimukhopadhyay6354
@shreyashimukhopadhyay6354 Жыл бұрын
Great tutorial on imbalanced data 👍Analytics Vidya Can you please share the notebook.
@Analyticsvidhya
@Analyticsvidhya Жыл бұрын
Dear Shreyashi, here's the download link: drive.google.com/drive/folders/1KV-CNZmuy8sqYDc1Jri-hCvYCeHSbYxs?usp=sharing
@shreyashimukhopadhyay6354
@shreyashimukhopadhyay6354 Жыл бұрын
@@Analyticsvidhya Thank you so much! 👍
@younesgasmi8518
@younesgasmi8518 8 ай бұрын
thank you so much miss. my question is can I use the undersampling technique before splitting the dataset into training and testing sets because there is not any data leakage when we use this method
@Analyticsvidhya
@Analyticsvidhya 8 ай бұрын
You're absolutely right that undersampling can be done before splitting the dataset in imbalanced classification problems. In fact, it's generally considered the preferred approach to avoid data leakage! Here's why: ➡️ Data leakage: If you undersample after splitting, some minority class instances might accidentally leak into the test set, making your model's performance seem better than it truly is on unseen data. ➡️ Preserving real-world distribution: Undersampling before the split ensures the training set reflects the actual imbalance in your real-world data, leading to a model that generalizes better. However, keep in mind that undersampling also has drawbacks like losing potentially valuable minority class data. It's always a good idea to compare different techniques like oversampling or SMOTE before making a final decision.
@gecarter53
@gecarter53 Жыл бұрын
Great seminar. Is the code publicly available?
@Analyticsvidhya
@Analyticsvidhya Жыл бұрын
Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/
@chukwumanwakpa3330
@chukwumanwakpa3330 Жыл бұрын
Thank you so much analytics Vidhya. I must commend you all for my improvement in deploying ML algorithms in solving problems. Just an observation please, I think it would be better if we can get the slide for all the data sets used in all lectures. thank you. Much love from Nigeria.
@Analyticsvidhya
@Analyticsvidhya Жыл бұрын
Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/
@therevolution8611
@therevolution8611 Жыл бұрын
Can I use oversampling , if I have multi label text for classification purpose??
@Analyticsvidhya
@Analyticsvidhya Жыл бұрын
Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/
Webinar "Evaluating XGBoost for balanced and Imbalanced datasets"
48:53
Data Phoenix Events
Рет қаралды 2,9 М.
Шок. Никокадо Авокадо похудел на 110 кг
00:44
Ozoda - Lada (Official Music Video)
06:07
Ozoda
Рет қаралды 12 МЛН
Остановили аттракцион из-за дочки!
00:42
Victoria Portfolio
Рет қаралды 3,7 МЛН
Live Discussion On Handling Imbalanced Dataset- Machine Learning
1:20:07
How to use SMOTE, Borderline SMOTE, ADASYN to handle class imbalance
12:56
How to handle imbalanced datasets in Python
11:48
Data Professor
Рет қаралды 51 М.