We just talked about this in my machine learning course this week!! Great timing! This video is very helpful.
@pgbpro203 жыл бұрын
ritvikmath coming with a video of one of my favorite topics - instant like!
@haneulkim49023 жыл бұрын
Great content, these practical content is gold. Thank you :)
@tech-n-data2 жыл бұрын
Thank you so much for all you do.
@JessWLStuart Жыл бұрын
Well presented!
@igorbreeze37342 жыл бұрын
Hi! Great video. Is there any way you would like to creat a full in-depth catboost tutorial on some random data? Would be super useful.
@joelrubinson99733 жыл бұрын
very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).
@d.a.k.o.s9163 Жыл бұрын
Great video! But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?
@aghazi943 жыл бұрын
you are seriously so underrated
@danielwiczew3 жыл бұрын
Okey, but with oversampling - how do you use cross validation ? Because if you use it on the oversampled dataset, you'll have dataleak
@ritvikmath3 жыл бұрын
I think you'd want to define the folds on the original data and then oversample holding some folds fixed. Example: 3-fold CV. - split original data into 3 folds (A,B,C) - consider (A,B) as training data -> oversample that data -> validate using C. - repeat using A,B as validation sets - note that there is no data leak in this case
@bmebri13 жыл бұрын
Excellent video! One question though: are certain classification models immune from class imbalance? Thanks!
@LanNguyen-eq6lf3 жыл бұрын
To my knowledge, don't think any classification what immunes from imbalanced dataset because they are data-driven. However, you are still able to get very good accuracy from imbalanced dataset. It happens when inter-class separability is very high, for example, detection of water bodies (often a minority class) over a large area is often quite accurate.
@Sameerahmed3733 жыл бұрын
Can we customise loss function? For example more weight for misclassification of true minor class and less weight for the other error?
@davidzhang48252 жыл бұрын
Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?
@zahrashekarchi61392 жыл бұрын
Great demo! just one thought, why did you not talk about downsampling the majority class? and see what can be the impact?
@douwe749310 ай бұрын
This is something I am wondering about too!
@chenxiaodu25578 ай бұрын
It should be "imbalanced data" instead of "unbalanced data"
@brenoingwersen7847 ай бұрын
Lol 😂
@mrirror22773 жыл бұрын
Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.
@shahrinnakkhatra2857 Жыл бұрын
That's where image augmentation comes to play. You can create different variations of that image by rotating, flipping etc various transformations
@bernardfinucane20613 жыл бұрын
You could predict that aircraft engines NEVER fail and almost always be right.
@Septumsempra88183 жыл бұрын
Are you familiar with Latent vectors in network analysis? s/o from South Africa
@junkbingo44823 жыл бұрын
hi when people have problems with unbalanced data, it's just the proof they did not get what they do when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad
@junkbingo44823 жыл бұрын
ups, nuance wrote 'yes'!!; thx to lstm, i did not check my post, sorry! ;-)