There are so many ML papers these days, the authors have to resort to click baity titles. What a time to be alive.
@connor-shorten5 жыл бұрын
Aside from the hard example selection, is this identical to the RevNet technique for saving memory needed for backprop?
@YannicKilcher5 жыл бұрын
In my opinion, RevNet and SB are somewhat orthogonal. RevNet still computes the original loss and gradients, but does so with less memory requirements, while SB computes less gradients, but retains all memory requirements. The cost of RevNet is that they are restricted to certain architectures and need more computations, the cost of SB is the bias introduced to the loss function.
@connor-shorten5 жыл бұрын
@@YannicKilcher Interesting, thank you! Do you think the restriction in RevNet limits the representational capacity?
@YannicKilcher5 жыл бұрын
As a theoretical class of functions, probably not (or not much) but in a given practical situation, it might have an influence. It might be worth asking people working on normalizing flows etc. about how much their (very similar) constraints are hurting them.
@connor-shorten5 жыл бұрын
@@YannicKilcher Interesting, thanks for the discussion!
@simleek5 жыл бұрын
This actually seems a lot like intrinsically motivated AI. The only difference is that those AIs move to get more high-loss&decrease-in-loss input instead of selecting neurons or examples in a batch when training.
@sehbanomer81515 жыл бұрын
won't it just overfit to the selected hard examples and underfit to the easy ones?
@YannicKilcher5 жыл бұрын
One could argue that at that point, the previously "easy" samples will become "hard" and will be upweighted. But the essence of your comment is correct, there is definitely a bias introduced by the procedure.
@guanfuchen27413 жыл бұрын
I think it will be difficult for multi gpu training, because they will forward once and sync the results for a total gpu node batch for forward and backward, and it will be a tradeoff for extra forward time with saving sample backward time.
@AntonPanchishin5 жыл бұрын
Thanks for this review Yannic. I've been using a loss function to do a similar function and was interested to see how the different ways of training on the hardest stacked up. Here's an interactive colab notebook that demos 'regular' training on MNIST, using a per-batch focus on the biggest losers, and a per-epoch focus. Also, in the notebook is the code involved to change 'regular' training into this new method, and it turns out that it's very easy to do with only a couple lines of code and easily works with Keras. colab.research.google.com/drive/1QrSimz0aDKt7-C8Chg9zZXne2pmPoqPf
@jordyvanlandeghem34574 жыл бұрын
Thanks Anton, very easily explained. Those type of explanatory notebooks help see through the fluff and hype introduced in research papers and just focus on fast empirical results by applying it yourself.
@superkhoy5 жыл бұрын
What resources do you recommend for starting with DL? Anything in R?
@YannicKilcher5 жыл бұрын
Nothing in R. Start learning high-level frameworks like Keras, Sonnet or PytorchLightning.
@DrAhdol5 жыл бұрын
This approach seems like a derivative of boosting
@AntonPanchishin5 жыл бұрын
It sure does seem that way, applied to NN
@herp_derpingson5 жыл бұрын
Great paper though why hasnt anyone thought about this before?
@YannicKilcher5 жыл бұрын
people have thought about this in one way or another. see for example active learning or boosting
@tan-uz4oe5 жыл бұрын
Prioritize experience replay is very similar to this paper and it's published in 2015.