Ajinkya More | Resampling techniques and other strategies

  Рет қаралды 18,038

PyData

PyData

Күн бұрын

Пікірлер: 15
@WalterReade
@WalterReade 8 жыл бұрын
Thanks for the video; it was excellent and I learned a great deal. I'd suggest, though, that you split out the test data _before_ you apply the under/over sampling algorithm (to the train data only). That would give a much better comparison of the algorithms, showing how they perform on the unmodified test data.
@ajinkyamore7090
@ajinkyamore7090 8 жыл бұрын
Thanks. The train/test split is the first step (see cell number 2 in the notebook) and none of the under/over sampling methods are applied to the test set. The performance comparison is indeed on the unmodified test data.
@WalterReade
@WalterReade 8 жыл бұрын
I just noticed that as I was going through your notebook on github (thanks for uploading!) and was going to edit my comment. . Yes, that makes perfect sense. What initially confused me was that the graphs are showing the decision boundary on the train data (and I was thinking it was the test data).
@WalterReade
@WalterReade 8 жыл бұрын
I do like like the graphs showing the decision boundary on the train data, since it shows how the under/over sampling algs modify the data. I forked the notebook and am going to add the plots of the decision boundary on the test data as well.
@ajinkyamore7090
@ajinkyamore7090 8 жыл бұрын
Yes, the idea was to show the changes in the data distribution affect the decision boundary.
@OmyTrenav
@OmyTrenav 8 жыл бұрын
Great talk. Thanks!
@Johnnyboycurtis
@Johnnyboycurtis 8 жыл бұрын
Great presentation!
@rebiiahmed7836
@rebiiahmed7836 8 жыл бұрын
Thank you for your presentation! Could you please upload the code in notebook file for example?
@ajinkyamore7090
@ajinkyamore7090 8 жыл бұрын
Thanks! Here is a link to the notebook github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb and the slides www.slideshare.net/AjinkyaMore3/python-resampling
@WalterReade
@WalterReade 8 жыл бұрын
I found it with a bit of digging: github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb
@WalterReade
@WalterReade 8 жыл бұрын
LOL . . . I should have refreshed the comments before posting my comment. :-)
@rebiiahmed7836
@rebiiahmed7836 8 жыл бұрын
Great thanks to you Mr Ajinkya More.
@EarlWallaceNYC
@EarlWallaceNYC 8 жыл бұрын
Great video, Thanks. Where can I get the slides you used? (I found your paper on arXiv, but it doesn't have the code)
@ajinkyamore7090
@ajinkyamore7090 8 жыл бұрын
Thanks! Here is a link to the notebook github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb and the slides www.slideshare.net/AjinkyaMore3/python-resampling
@berry4862
@berry4862 8 жыл бұрын
Optimizing an arbitrary metric is rather useless for business. In particular, what is the business meaning of optimizing for precision of normal cases? Something like alarms per month may well be meaningful, but that would be Recall(pos)/Prec(pos)..
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
[ICML 2021 Long Oral] Delving into Deep Imbalanced Regression
14:51
Creating correct and capable classifiers - Ian Ozsvald
37:07
Can one do better than XGBoost? - Mateusz Susik
23:47
PyData
Рет қаралды 52 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН