No video

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data

  Рет қаралды 18,557

Unfold Data Science

Unfold Data Science

Күн бұрын

Пікірлер: 46
@enchanted_swiftie
@enchanted_swiftie 2 жыл бұрын
I was at the same problem for the imbalance in dataset and by then I researched for different methods to take on. Here I am presenting my shortlist that I have created which might help you somewhere. Possible Solutions: 1. Make some changes in the algorithm • Adjust the class weight so it becomes sensitive to the minority class • Adjust the decision threshold (we can check by PR curve) • Penalize the algorithms by putting class_weight='balanced' 2. Discard the minority examples and treat all classes as one • Here we can treat the problem as the "anomaly detection" problem instead of classification For anomaly detection "Isolation forest" tend to give promising results 3. Balance the dataset by sampling • Undersample • Oversample & SMOTE 4. Ensemble learning by downsampling • It bootstraps different samples and each time it will balance the classes by undersampling the majority classes and then aggregates the results for voting 5. Usage other techniques • Algorithms such as Tomek links (which removes k nearest majority pair to increase division) • Focal loss I have also tried to look for the kaggle notebooks there people have also found out that XGBoost slightly outperforms other algorithms even it would require to give different class weights. - This was my cheat sheet of the 5 ways. Share your thoughts!!
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Very good explanation and thanks for putting the learning here. I will pin this comment on top for others benefit. My view - Data Science is all about trying/experimenting/failing and learning. Then something very good comes up.
@enchanted_swiftie
@enchanted_swiftie 2 жыл бұрын
@@UnfoldDataScience Won't lie, but when I started watching your videos, your explanations made things much simpler. You know, I was used to freak out (sorry for the words) by listening DBSCAN, Hierarchical Clustering and what not, but when I see those topics explained by you I feel so comfortable that now I would understand this. How simply but accurately you explain without missing the important things. PS: I was introduced to assumptions of linear regression by your channel. Before that I knew the model, came to know that there is something called "assumptions" and how important are they!! Totally missed by the instructions on online courses! Your channel is a huge contribution to the data science community on YT.
@KastijitBabar
@KastijitBabar 3 ай бұрын
You are the best Data Science And Machine Learning Teacher I have ever seen. Thanks a lot!!
@UnfoldDataScience
@UnfoldDataScience 3 ай бұрын
You are welcome!
@sreebvmcreation9388
@sreebvmcreation9388 Ай бұрын
Thank you sir, iam searching methods for imbalaced data , finally i got the methods with your video.Thank u so much once again. All in methods which one is best method .
@karthebans248
@karthebans248 2 жыл бұрын
Learned new things about the balancing of data sets for Imbalanced data sets. Thanks.
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Welcome.
@nivednambiar6845
@nivednambiar6845 2 жыл бұрын
An important concept when dealing with classification Thanks for sharing Aman 👍👍
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thanks Nived.
@zahedinima732
@zahedinima732 2 жыл бұрын
Such a clear and concise explanation. Thank you, Aman!
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thanks A lot.
@mamataparab9803
@mamataparab9803 2 жыл бұрын
Hello Aman, this is the third time I have watched this video, simply to learn your way of explaining things. Is it possible for you to create a video or give us some notes so we can find all the important questions for ensembling techniques?
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thanks Mamata, I do keep sharing on Instagram, please follow "unfolddatascience" On Instagram.
@mamataparab9803
@mamataparab9803 2 жыл бұрын
Sure, Aman. Thank you
@atod2572
@atod2572 Жыл бұрын
Awesome explanation. Can you please tell us when we use which technique? I mean with an example of dataset and selection of sampling technique.
@bijaynayak6473
@bijaynayak6473 2 жыл бұрын
Very Nice explanation kudos
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thanks for liking Bijay
@ayushparihar5989
@ayushparihar5989 Жыл бұрын
Good explanation
@swapnilgiram1355
@swapnilgiram1355 3 күн бұрын
Can we use smote technique
@NeeRaja_Sweet_Home
@NeeRaja_Sweet_Home 2 жыл бұрын
Hi Aman, In most of videos we could see imbalanced Dataset for classification problems but how to check and Handle imbalanced Dataset for regression problem. Thanks,
@riva.4484
@riva.4484 Жыл бұрын
Thank you so much! This video help me a lot. I have a question, how can we choose and decide which way is the best fit for our imbalance dataset?
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Its always trial and error.
@younesgasmi8518
@younesgasmi8518 8 ай бұрын
Can I use oversampling or undersampling before Splitting the dataset into training and testing ?
@dhanushraj3697
@dhanushraj3697 Жыл бұрын
The video was good but i request to add some extra information and explanation for each methods.
@snehalvaidya5843
@snehalvaidya5843 2 жыл бұрын
Thanks for sharing knowledge 🙂, plz share how to explain PCA in front of interviewer..
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
kzbin.info/www/bejne/paTKooSvbq2lbtU
@avikdinda7827
@avikdinda7827 Ай бұрын
If oversampling gives data leakage issues in total data? Or if I use smote in train data after the train test split it is giving poor precision to the minority however recall is ok...so what do I do to improve the precision of the minority class?
@dd3371
@dd3371 2 жыл бұрын
Thanks very much for sharing and explaining. What's your thought on logistic regression? Would imbalanced data still a problem if you build the model in GLM using logistic regression?
@nagarajsundar7931
@nagarajsundar7931 2 жыл бұрын
Hi Aman, Thanks for explaining various method. One question, when to use which method ?
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thanks Naga, cant have like one to one go for rule. some pointers are there which I can cover in different video, thanks for asking
@dilshadmuhammed8224
@dilshadmuhammed8224 8 ай бұрын
in my case i have more than 2 classes and those classes are in text ,for eg- well being , business analytics etc how will balance such classes
@sadhnarai8757
@sadhnarai8757 2 жыл бұрын
Very nice Aman
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thank you
@chalmerilexus2072
@chalmerilexus2072 2 жыл бұрын
Which method is preferable?
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
This is discussed towards end.
@maasahebbiustad8514
@maasahebbiustad8514 2 жыл бұрын
Hello sir, How to solve A Classification problem in which training data has only one class? 'This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1', please help me out
@mihretdesta9153
@mihretdesta9153 Жыл бұрын
hey sir, how about imbalanced image data for deep learning?
@UnfoldDataScience
@UnfoldDataScience Жыл бұрын
Data augmentation is one option.
@tharindumadusanka3038
@tharindumadusanka3038 2 жыл бұрын
i am doing MBA using apriori algorithm by using google colab. the problem is when i use more than 20 rows in csv transaction data it displays error. if the no of rows is less than 20 expected result come.
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
Thats not number of rows problem, some hidden issue may be there with row number 21 probably. I am just guessing.
@ratnajyotibhowmick9801
@ratnajyotibhowmick9801 2 жыл бұрын
Please share the source of the notebook. Thanks.
@UnfoldDataScience
@UnfoldDataScience 2 жыл бұрын
drive.google.com/drive/u/0/folders/13pZrCIqk1XN6W4I95A07bK8YRHBB3btt
@hasantalib6254
@hasantalib6254 11 ай бұрын
Hello I’m irritated to know from you how can deal with unbalanced penal data ? How can i transform the data when there is missing year ??
@PalaSheshu111
@PalaSheshu111 Жыл бұрын
github link
Чёрная ДЫРА 🕳️ | WICSUR #shorts
00:49
Бискас
Рет қаралды 7 МЛН
1ОШБ Да Вінчі навчання
00:14
AIRSOFT BALAN
Рет қаралды 4,6 МЛН
白天使选错惹黑天使生气。#天使 #小丑女
00:31
天使夫妇
Рет қаралды 17 МЛН
How to handle imbalanced datasets in Python
11:48
Data Professor
Рет қаралды 50 М.
Handling Imbalanced Datasets   SMOTE Technique
24:32
DataMites
Рет қаралды 50 М.
Чёрная ДЫРА 🕳️ | WICSUR #shorts
00:49
Бискас
Рет қаралды 7 МЛН