The talk fails to cover how to deal with imabalnced datasets using SMOTE, and also using Focal Loss for Neural Nets
@Analyticsvidhya3 ай бұрын
DataHour Resources 🔗 bit.ly/3xUaue4
@shreyashimukhopadhyay6354 Жыл бұрын
Great tutorial on imbalanced data 👍Analytics Vidya Can you please share the notebook.
@Analyticsvidhya Жыл бұрын
Dear Shreyashi, here's the download link: drive.google.com/drive/folders/1KV-CNZmuy8sqYDc1Jri-hCvYCeHSbYxs?usp=sharing
@shreyashimukhopadhyay6354 Жыл бұрын
@@Analyticsvidhya Thank you so much! 👍
@younesgasmi85188 ай бұрын
thank you so much miss. my question is can I use the undersampling technique before splitting the dataset into training and testing sets because there is not any data leakage when we use this method
@Analyticsvidhya8 ай бұрын
You're absolutely right that undersampling can be done before splitting the dataset in imbalanced classification problems. In fact, it's generally considered the preferred approach to avoid data leakage! Here's why: ➡️ Data leakage: If you undersample after splitting, some minority class instances might accidentally leak into the test set, making your model's performance seem better than it truly is on unseen data. ➡️ Preserving real-world distribution: Undersampling before the split ensures the training set reflects the actual imbalance in your real-world data, leading to a model that generalizes better. However, keep in mind that undersampling also has drawbacks like losing potentially valuable minority class data. It's always a good idea to compare different techniques like oversampling or SMOTE before making a final decision.
@gecarter53 Жыл бұрын
Great seminar. Is the code publicly available?
@Analyticsvidhya Жыл бұрын
Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/
@chukwumanwakpa3330 Жыл бұрын
Thank you so much analytics Vidhya. I must commend you all for my improvement in deploying ML algorithms in solving problems. Just an observation please, I think it would be better if we can get the slide for all the data sets used in all lectures. thank you. Much love from Nigeria.
@Analyticsvidhya Жыл бұрын
Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/
@therevolution8611 Жыл бұрын
Can I use oversampling , if I have multi label text for classification purpose??
@Analyticsvidhya Жыл бұрын
Dear Learner, Refer to our DataHack Platform for DataHour Material and Speaker Coordinates: datahack.analyticsvidhya.com/contest/all/