Yannic you're spoiling us. I hope you're able to keep your pace once (if???) this virus dies down a bit.
@sayakpaul31524 жыл бұрын
Thanks for the wonderfully detailed walkthrough :) It might be worth mentioning that while training neural nets it's also possible to train it in a pruning-aware fashion with all the good stuff like pruning schedules, maximum achievable sparsity, etc.
@milkteamx7183 Жыл бұрын
Amazing explanation! Thank you so much! I just looked through your channel and am excited to find that you have many of these videos. Just subscribed!
@jivan4763 жыл бұрын
Could it be that the "winning tickets" can be identified after only a handful of training epochs instead of after a full training (e.g. 50 epochs or more)? If yes, it would mean that we can train for 3-4 epochs, prune 50% of the weights, then re-start the training on these weights only (with same initialisation as before), rinse and repeat. In theory it could allow faster training.
@wenhanzhou5826 Жыл бұрын
I think yes, because there is a paper that discusses that different weights initialization will create different local mininma in the loss landscape for the same data. What you can do is start with a really big network and a large learning rate. The network will find one of the local minima quickly, and then just start pruning to get to the lowest point of that minima.
@nbrpwng4 жыл бұрын
This is actually reminiscent of how human brains develop from childhood to adulthood. At birth, humans have far more connections between their neurons and connections primarily die off as they learn and mature, much more than new neurons and connections are formed. And yet humans can still learn despite connection removal, and possibly because of it.
@gorgolyt4 жыл бұрын
Great observation. That could simply be pruning, which doesn't decrease performance, and improves energy efficiency for the organism. But it could be something deeper and more important.
@wolfgangmitterbaur39422 жыл бұрын
Thanks a lot for this video. It explains essentials of the paper very good - and easy to follow for a non-native speaker, what is important as well!
@MrNightLifeLover4 жыл бұрын
Very well explained, thanks! Please keep reviewing papers!
@chesstanay9 ай бұрын
Where can I read more about the related finding at 17:16?
@HassanBinHaroon Жыл бұрын
Hi @Yannic Kilcher! Isn't there any possibility that the weights that are not close to zero (or very small in the magnitude), are the weights that should be pruned? Can't that be a better idea, to monitor the weights in the initial training (with complete network) and prune based upon "which weights are traveled much further in the initial training with complete network"? 🤔 Kindly enlighten on this!
@jrkirby934 жыл бұрын
I love the idea of sparse neural nets. It feels kinda icky looking at these grossly overparameterized models that are often SOTA and thinking: "Right now, this is the best way of doing this." Pruning is good technique for finding sparse neural nets. I thought this was a great paper when I first read it. But I've been working on my own research that approaches sparse NN from the other direction. Instead of starting with fully connected layers and pruning, I start with extremely sparse layers and build it up, one edge at a time. It requires quite a different training procedure though. Instead of back-propagation and gradient decent, I take advantage of the piecewise linear properties of ReLU to guarantee a fully piecewise linear neural net. This allows me to explicitly find the optimal next best edge - and it's optimal value - in a single optimization step. I hope to finish implementing my research in the coming weeks, and would be happy to show you in more detail if you're interested.
@jepkofficial4 жыл бұрын
What happened with this research?
@jrkirby934 жыл бұрын
@@jepkofficial Wow was that really 6 months ago? I still haven't finished implementing it. Hard to focus when working alone on independent research. Thanks for the reminder, I should return to that project and get it done.
@Leibniz_283 жыл бұрын
How's it going the research?
@laurenpinschannels3 жыл бұрын
checking in on this again, on the off chance you didn't get distracted from this one :)
@Poof572 жыл бұрын
@@jrkirby93 woohoo another reminder here :P
@HassanBinHaroon Жыл бұрын
Hi @Yannic Kilcher! It seems that the Random Initialization is very important before the pruning. Right? Because only lucky (in terms of random initialization) weights are kept after pruning. If random initialization is so bad and there is no (or very few) lucky candidate weight (after random initialization) then what to do in that case? Is there any particular Random initialization recommended by the paper or by practice? There are some of the recommended random initialization methods like Glorot or He.
@HassanBinHaroon Жыл бұрын
Hi @Yannic Kilcher! Can't we control the Random Initialization to keep almost every weight in the network (to get the most out of the original network)? Can't every weight win the lottery?
@freemind.d27143 жыл бұрын
Very good one hypothesis, very make sense
@TimScarfe4 жыл бұрын
Great video! Looking forward to having a discussion on our street talk podcast!
@thejll9 ай бұрын
Very interesting. Does anyone know of software that allows doing this pruning?
@HappyManStudiosTV4 жыл бұрын
hey! have you seen uber's follow up work? they basically say that the trick is just to prune weights that are going *towards* 0, not near 0
@TimScarfe4 жыл бұрын
HappyManStudiosTV Interesting
@jordyvanlandeghem34574 жыл бұрын
can you link the paper? :) thanks!
@joirnpettersen4 жыл бұрын
What if insead of pruning the weights, you assume the low magnitude weights were initialized incorrectly, and re-train the dense network where the high-magnitude weights are kept at their inital initalization, and the low magnitude weighs get new values?
@YannicKilcher4 жыл бұрын
I've never heard this idea. Nice, might be worth a try. I doubt you're gonna get a massive improvement, but it might be interesting to analyze whether you could find an even smaller winning hypothesis.
@araldjean-charles3924 Жыл бұрын
For the initial conditions that work, have anybody look at how much wiggle room you have. Is there an epsilon-neighborhood of the initial state you can safely start from, and how small is epsilon?
@vishwajitkumarvishnu38784 жыл бұрын
How do you read and understand any paper so fast? Does it come by practice or is there a way to read different sections. I want to do that. Uploading a video on how to read a paper might help :)
@YannicKilcher4 жыл бұрын
After you've read a bunch both the structure, the methods and the ideas become repetitive over the entire field, that speeds up the reading process a lot. I guess I can do a video on that, but it will be pretty straightforward and obvious.
@vishwajitkumarvishnu38784 жыл бұрын
@@YannicKilcher it'll be helpful if you make a video. Thanks a lot
@kevalan10423 жыл бұрын
did they check if those initial weights already tend to be relatively large ?
@eugening4 жыл бұрын
Good discussion. The sound is a bit too soft.
@MrSb1923 жыл бұрын
Question: suppose we have a network N that we train up to a certain accuracy on some data, prune p% of the weights using some algorithm (one shot, imp, etc) and revert the remaining weights to the initial values. Now, is there any way to ensure that the resulting pruned network will always perform better than the original when trained for the same#iterations? I mean, is there any algorithm for pruning which can guarantee the finding of a lottery ticket within the network everytime we use it? Or is it just trial and error (which is why, I guess, the term lottery ticket is used)?
@herp_derpingson4 жыл бұрын
Reminds me of dropout for some reason. Except we are throwing away the dropped out neurons.
@JungleEd174 жыл бұрын
I watched it 2x but I think the connections are thrown out not the neurons. What's interesting here though: 1. The weights are what are important. 2. Pruning involves throwing out both weight AND structure. Why not keep the structure but choose new weights. Perhaps it just randomly started at a plattaeu of a local min or randomization ended up created redundancies. Jump the the weight really far a way and try again.
@fsxaircanada014 жыл бұрын
I think the motivation is that activations are not the biggest source of memory access and energy loss. If we can get rid of 90% of weights, then it could mean speed and energy improvements
@Blooper19803 жыл бұрын
Interesting.. Just need to take the sick out of your mouth next time.