This is why you should care about unbalanced data .. as a data scientist

  Рет қаралды 16,548

ritvikmath

ritvikmath

Күн бұрын

Пікірлер: 25
@jessibenzel243
@jessibenzel243 3 жыл бұрын
We just talked about this in my machine learning course this week!! Great timing! This video is very helpful.
@haneulkim4902
@haneulkim4902 2 жыл бұрын
Great content, these practical content is gold. Thank you :)
@pgbpro20
@pgbpro20 3 жыл бұрын
ritvikmath coming with a video of one of my favorite topics - instant like!
@chenxiaodu2557
@chenxiaodu2557 4 ай бұрын
It should be "imbalanced data" instead of "unbalanced data"
@brenoingwersen784
@brenoingwersen784 3 ай бұрын
Lol 😂
@tech-n-data
@tech-n-data Жыл бұрын
Thank you so much for all you do.
@igorbreeze3734
@igorbreeze3734 2 жыл бұрын
Hi! Great video. Is there any way you would like to creat a full in-depth catboost tutorial on some random data? Would be super useful.
@JessWLStuart
@JessWLStuart 11 ай бұрын
Well presented!
@joelrubinson9973
@joelrubinson9973 2 жыл бұрын
very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).
@aghazi94
@aghazi94 2 жыл бұрын
you are seriously so underrated
@d.a.k.o.s9163
@d.a.k.o.s9163 Жыл бұрын
Great video! But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?
@bmebri1
@bmebri1 3 жыл бұрын
Excellent video! One question though: are certain classification models immune from class imbalance? Thanks!
@LanNguyen-eq6lf
@LanNguyen-eq6lf 3 жыл бұрын
To my knowledge, don't think any classification what immunes from imbalanced dataset because they are data-driven. However, you are still able to get very good accuracy from imbalanced dataset. It happens when inter-class separability is very high, for example, detection of water bodies (often a minority class) over a large area is often quite accurate.
@davidzhang4825
@davidzhang4825 2 жыл бұрын
Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?
@zahrashekarchi6139
@zahrashekarchi6139 Жыл бұрын
Great demo! just one thought, why did you not talk about downsampling the majority class? and see what can be the impact?
@douwe7493
@douwe7493 6 ай бұрын
This is something I am wondering about too!
@Sameerahmed373
@Sameerahmed373 2 жыл бұрын
Can we customise loss function? For example more weight for misclassification of true minor class and less weight for the other error?
@bernardfinucane2061
@bernardfinucane2061 2 жыл бұрын
You could predict that aircraft engines NEVER fail and almost always be right.
@mrirror2277
@mrirror2277 2 жыл бұрын
Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.
@shahrinnakkhatra2857
@shahrinnakkhatra2857 7 ай бұрын
That's where image augmentation comes to play. You can create different variations of that image by rotating, flipping etc various transformations
@Septumsempra8818
@Septumsempra8818 3 жыл бұрын
Are you familiar with Latent vectors in network analysis? s/o from South Africa
@junkbingo4482
@junkbingo4482 3 жыл бұрын
hi when people have problems with unbalanced data, it's just the proof they did not get what they do when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad
@junkbingo4482
@junkbingo4482 3 жыл бұрын
ups, nuance wrote 'yes'!!; thx to lstm, i did not check my post, sorry! ;-)
@danielwiczew
@danielwiczew 3 жыл бұрын
Okey, but with oversampling - how do you use cross validation ? Because if you use it on the oversampled dataset, you'll have dataleak
@ritvikmath
@ritvikmath 3 жыл бұрын
I think you'd want to define the folds on the original data and then oversample holding some folds fixed. Example: 3-fold CV. - split original data into 3 folds (A,B,C) - consider (A,B) as training data -> oversample that data -> validate using C. - repeat using A,B as validation sets - note that there is no data leak in this case
Probability Calibration : Data Science Concepts
10:23
ritvikmath
Рет қаралды 31 М.
This is how to take your ML models from great to GOAT
10:48
ritvikmath
Рет қаралды 7 М.
Шок. Никокадо Авокадо похудел на 110 кг
00:44
Electric Flying Bird with Hanging Wire Automatic for Ceiling Parrot
00:15
Bike Vs Tricycle Fast Challenge
00:43
Russo
Рет қаралды 96 МЛН
Officer Rabbit is so bad. He made Luffy deaf. #funny #supersiblings #comedy
00:18
Funny superhero siblings
Рет қаралды 3,2 МЛН
How to handle imbalanced datasets in Python
11:48
Data Professor
Рет қаралды 50 М.
[ICML 2021 Long Oral] Delving into Deep Imbalanced Regression
14:51
Handling Imbalanced Datasets   SMOTE Technique
24:32
DataMites
Рет қаралды 50 М.
How I Became A Data Scientist (No CS Degree, No Bootcamp)
12:28
Egor Howell
Рет қаралды 98 М.
How to resolve Class Imbalance in R
12:34
Mario Castro
Рет қаралды 10 М.
Шок. Никокадо Авокадо похудел на 110 кг
00:44