Thanks for the video; it was excellent and I learned a great deal. I'd suggest, though, that you split out the test data _before_ you apply the under/over sampling algorithm (to the train data only). That would give a much better comparison of the algorithms, showing how they perform on the unmodified test data.
@ajinkyamore70908 жыл бұрын
Thanks. The train/test split is the first step (see cell number 2 in the notebook) and none of the under/over sampling methods are applied to the test set. The performance comparison is indeed on the unmodified test data.
@WalterReade8 жыл бұрын
I just noticed that as I was going through your notebook on github (thanks for uploading!) and was going to edit my comment. . Yes, that makes perfect sense. What initially confused me was that the graphs are showing the decision boundary on the train data (and I was thinking it was the test data).
@WalterReade8 жыл бұрын
I do like like the graphs showing the decision boundary on the train data, since it shows how the under/over sampling algs modify the data. I forked the notebook and am going to add the plots of the decision boundary on the test data as well.
@ajinkyamore70908 жыл бұрын
Yes, the idea was to show the changes in the data distribution affect the decision boundary.
@OmyTrenav8 жыл бұрын
Great talk. Thanks!
@Johnnyboycurtis8 жыл бұрын
Great presentation!
@rebiiahmed78368 жыл бұрын
Thank you for your presentation! Could you please upload the code in notebook file for example?
@ajinkyamore70908 жыл бұрын
Thanks! Here is a link to the notebook github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb and the slides www.slideshare.net/AjinkyaMore3/python-resampling
@WalterReade8 жыл бұрын
I found it with a bit of digging: github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb
@WalterReade8 жыл бұрын
LOL . . . I should have refreshed the comments before posting my comment. :-)
@rebiiahmed78368 жыл бұрын
Great thanks to you Mr Ajinkya More.
@EarlWallaceNYC8 жыл бұрын
Great video, Thanks. Where can I get the slides you used? (I found your paper on arXiv, but it doesn't have the code)
@ajinkyamore70908 жыл бұрын
Thanks! Here is a link to the notebook github.com/irreducible/PyData-Resampling/blob/master/PyData-Resampling-nb.ipynb and the slides www.slideshare.net/AjinkyaMore3/python-resampling
@berry48628 жыл бұрын
Optimizing an arbitrary metric is rather useless for business. In particular, what is the business meaning of optimizing for precision of normal cases? Something like alarms per month may well be meaningful, but that would be Recall(pos)/Prec(pos)..