Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)

Рет қаралды 13,770

Yannic Kilcher

Күн бұрын

Пікірлер: 33

@michaelcarlon1831 4 жыл бұрын

Yet another super contribution! In some ways this kind of video is more valuable than the original paper!

@JaswinderSingh-oe5ie 4 жыл бұрын

please keep making these videos, it is amazing

@josephsantarcangelo9310 4 жыл бұрын

Thanks, Yannic. It's difficult to keep up with all the advancements, you make it a lot easier.

@HappyManStudiosTV 4 жыл бұрын

Thank you for covering this!

@burhanrashidhussein6037 3 жыл бұрын

Thanks for these amazing videos, Your opinion looks valid, these methods need to be benchmarked on more complex task!

@vivekkumar531 4 жыл бұрын

Thank you so much for the video!!

@christianleininger2954 4 жыл бұрын

great job would love more rl paper :)

@jrkirby93 4 жыл бұрын

I wanna see someone distill "lottery ticket" networks to 1% of the parameters, and then double the number of layers with all that freed RAM, distill them again, rinse and repeat.

@YannicKilcher 4 жыл бұрын

good idea, but sparse neural networks are still not really a thing, so I don't think this is going to save you much RAM at actual runtime.

@araldjean-charles3924 Жыл бұрын

For the initial conditions that work, have anybody look at how much wiggle room you have. Is there an epsilon-neighborhood of the initial state you can safely start from, and how small is epsilon?

@ans1975 4 жыл бұрын

Sorry for the silly question... what software can be used to do similar things? I need it for a classroom on much more basic things. Thanks, and by the way, this videos are great and generate true value.

@YannicKilcher 4 жыл бұрын

Thanks. I use OneNote. I made a video on how to do online education where I link to all my setup.

@justinking5964 2 жыл бұрын

English is not my first language but i believe I can explain them clearly to sb I can trust. To predict 10 nums in Three Drums, one actually don't have to pay attenton to them all. just 2 numbers are enough.

@deoabhijit5935 2 жыл бұрын

thanks for explanation :)

@michael-nef 4 жыл бұрын

how do you make so many videos

@seanjhardy 4 жыл бұрын

I would assume its because they unedited videos (just him talking with no cuts), which only require around 30 minutes to 2 hours to read, come up with talking points and then record a video

@YannicKilcher 4 жыл бұрын

You forget the large quantities of chocolate I need to consume to keep it up ;)

@seanjhardy 4 жыл бұрын

@@YannicKilcher oh absolutely, you need something to fuel this phenomenal work! More people in the AI community need to see these videos, you have such insightful analysis.

@jeremyscheurer3797 4 жыл бұрын

Hey Can I ask a follow up question, I was wondering where you find so many interesting papers? Obviously yes you can scroll through some of the top conferences and look for whatever catches your eye. But I see a high diversity in the papers that you present and thought maybe you have an interesting way to go about this? (some blogs, some techniques etc.) Or to phrase it differently, if you need a new topic for a video, what do you do?

@arkasaha4412 4 жыл бұрын

@@jeremyscheurer3797 I think reddit might be one of his sources.

@bluestar2253 3 жыл бұрын

Lottery tickets got me here!

@robbiero368 4 жыл бұрын

Makes me think using simplistic random initialisation isn't the best thing to do, as randomness is inherently lumpy. From computer graphics we know of better stochastic sampling methods and maybe something like that would be better to start out with, since ultimately you are trying to sample a high dimensional landscape.

@robbiero368 4 жыл бұрын

You could even think about doing a relaxation step after initialisation that move randomised weights away from each other so none are too close together perhaps

@MrEmretaha 4 жыл бұрын

How the hell figure 6 is possible? No optimization, just random init. and %40 accuracy. wtf

@YannicKilcher 4 жыл бұрын

by masking weights, you actively change the signal propagation, so it's entirely possible. what I find surprising is that the large-final selection criterion to achieve that is so simple.

@MrEmretaha 4 жыл бұрын

I understand that it is possible, but in the paper, they presented as if it is a "result" after repeated tests. That is confusing. Although the mask itself is trained, in the end it is a binary mask. It is like rerouting but in a very limited case, considering the random initialization, 40% accuracy needs more rigorous explaining since it is somewhat against "common sense".

@YannicKilcher 4 жыл бұрын

Ok true, it is rather unexpected. Maybe because the task itself isn't super hard

@MrEmretaha 4 жыл бұрын

Yeah that is a possibility, though I worked on pruning before, it is pretty easy to get a well behaving network to perform below random without pruning before. So this result is kind of suprising to me. But as you said it is probably due to the task. I doubt that they get anything well above random with cifar-10

@kevalan1042 3 жыл бұрын

@@MrEmretaha This is over a year later, but I completely agree with you :) mind blown