Accelerating Deep Learning by Focusing on the Biggest Losers

  Рет қаралды 2,748

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 21
@herp_derpingson
@herp_derpingson 5 жыл бұрын
There are so many ML papers these days, the authors have to resort to click baity titles. What a time to be alive.
@connor-shorten
@connor-shorten 5 жыл бұрын
Aside from the hard example selection, is this identical to the RevNet technique for saving memory needed for backprop?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
In my opinion, RevNet and SB are somewhat orthogonal. RevNet still computes the original loss and gradients, but does so with less memory requirements, while SB computes less gradients, but retains all memory requirements. The cost of RevNet is that they are restricted to certain architectures and need more computations, the cost of SB is the bias introduced to the loss function.
@connor-shorten
@connor-shorten 5 жыл бұрын
@@YannicKilcher Interesting, thank you! Do you think the restriction in RevNet limits the representational capacity?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
As a theoretical class of functions, probably not (or not much) but in a given practical situation, it might have an influence. It might be worth asking people working on normalizing flows etc. about how much their (very similar) constraints are hurting them.
@connor-shorten
@connor-shorten 5 жыл бұрын
@@YannicKilcher Interesting, thanks for the discussion!
@simleek
@simleek 5 жыл бұрын
This actually seems a lot like intrinsically motivated AI. The only difference is that those AIs move to get more high-loss&decrease-in-loss input instead of selecting neurons or examples in a batch when training.
@sehbanomer8151
@sehbanomer8151 5 жыл бұрын
won't it just overfit to the selected hard examples and underfit to the easy ones?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
One could argue that at that point, the previously "easy" samples will become "hard" and will be upweighted. But the essence of your comment is correct, there is definitely a bias introduced by the procedure.
@guanfuchen2741
@guanfuchen2741 3 жыл бұрын
I think it will be difficult for multi gpu training, because they will forward once and sync the results for a total gpu node batch for forward and backward, and it will be a tradeoff for extra forward time with saving sample backward time.
@AntonPanchishin
@AntonPanchishin 5 жыл бұрын
Thanks for this review Yannic. I've been using a loss function to do a similar function and was interested to see how the different ways of training on the hardest stacked up. Here's an interactive colab notebook that demos 'regular' training on MNIST, using a per-batch focus on the biggest losers, and a per-epoch focus. Also, in the notebook is the code involved to change 'regular' training into this new method, and it turns out that it's very easy to do with only a couple lines of code and easily works with Keras. colab.research.google.com/drive/1QrSimz0aDKt7-C8Chg9zZXne2pmPoqPf
@jordyvanlandeghem3457
@jordyvanlandeghem3457 4 жыл бұрын
Thanks Anton, very easily explained. Those type of explanatory notebooks help see through the fluff and hype introduced in research papers and just focus on fast empirical results by applying it yourself.
@superkhoy
@superkhoy 5 жыл бұрын
What resources do you recommend for starting with DL? Anything in R?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
Nothing in R. Start learning high-level frameworks like Keras, Sonnet or PytorchLightning.
@DrAhdol
@DrAhdol 5 жыл бұрын
This approach seems like a derivative of boosting
@AntonPanchishin
@AntonPanchishin 5 жыл бұрын
It sure does seem that way, applied to NN
@herp_derpingson
@herp_derpingson 5 жыл бұрын
Great paper though why hasnt anyone thought about this before?
@YannicKilcher
@YannicKilcher 5 жыл бұрын
people have thought about this in one way or another. see for example active learning or boosting
@tan-uz4oe
@tan-uz4oe 5 жыл бұрын
Prioritize experience replay is very similar to this paper and it's published in 2015.
The Visual Task Adaptation Benchmark
21:37
Yannic Kilcher
Рет қаралды 1,4 М.
Reinforcement Learning - My Algorithm vs State of the Art
19:32
Pezzza's Work
Рет қаралды 151 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
To Brawl AND BEYOND!
00:51
Brawl Stars
Рет қаралды 17 МЛН
Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей
00:19
xLSTM: Extended Long Short-Term Memory
57:00
Yannic Kilcher
Рет қаралды 39 М.
Graph Neural Networks - a perspective from the ground up
14:28
All Machine Learning algorithms explained in 17 min
16:30
Infinite Codes
Рет қаралды 522 М.
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,4 МЛН
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 18 МЛН
What are Genetic Algorithms?
12:13
argonaut
Рет қаралды 63 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 431 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН