Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (Paper Explained)

  Рет қаралды 13,770

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 33
@michaelcarlon1831
@michaelcarlon1831 4 жыл бұрын
Yet another super contribution! In some ways this kind of video is more valuable than the original paper!
@JaswinderSingh-oe5ie
@JaswinderSingh-oe5ie 4 жыл бұрын
please keep making these videos, it is amazing
@josephsantarcangelo9310
@josephsantarcangelo9310 4 жыл бұрын
Thanks, Yannic. It's difficult to keep up with all the advancements, you make it a lot easier.
@HappyManStudiosTV
@HappyManStudiosTV 4 жыл бұрын
Thank you for covering this!
@burhanrashidhussein6037
@burhanrashidhussein6037 3 жыл бұрын
Thanks for these amazing videos, Your opinion looks valid, these methods need to be benchmarked on more complex task!
@vivekkumar531
@vivekkumar531 4 жыл бұрын
Thank you so much for the video!!
@christianleininger2954
@christianleininger2954 4 жыл бұрын
great job would love more rl paper :)
@jrkirby93
@jrkirby93 4 жыл бұрын
I wanna see someone distill "lottery ticket" networks to 1% of the parameters, and then double the number of layers with all that freed RAM, distill them again, rinse and repeat.
@YannicKilcher
@YannicKilcher 4 жыл бұрын
good idea, but sparse neural networks are still not really a thing, so I don't think this is going to save you much RAM at actual runtime.
@araldjean-charles3924
@araldjean-charles3924 Жыл бұрын
For the initial conditions that work, have anybody look at how much wiggle room you have. Is there an epsilon-neighborhood of the initial state you can safely start from, and how small is epsilon?
@ans1975
@ans1975 4 жыл бұрын
Sorry for the silly question... what software can be used to do similar things? I need it for a classroom on much more basic things. Thanks, and by the way, this videos are great and generate true value.
@YannicKilcher
@YannicKilcher 4 жыл бұрын
Thanks. I use OneNote. I made a video on how to do online education where I link to all my setup.
@justinking5964
@justinking5964 2 жыл бұрын
English is not my first language but i believe I can explain them clearly to sb I can trust. To predict 10 nums in Three Drums, one actually don't have to pay attenton to them all. just 2 numbers are enough.
@deoabhijit5935
@deoabhijit5935 2 жыл бұрын
thanks for explanation :)
@michael-nef
@michael-nef 4 жыл бұрын
how do you make so many videos
@seanjhardy
@seanjhardy 4 жыл бұрын
I would assume its because they unedited videos (just him talking with no cuts), which only require around 30 minutes to 2 hours to read, come up with talking points and then record a video
@YannicKilcher
@YannicKilcher 4 жыл бұрын
You forget the large quantities of chocolate I need to consume to keep it up ;)
@seanjhardy
@seanjhardy 4 жыл бұрын
@@YannicKilcher oh absolutely, you need something to fuel this phenomenal work! More people in the AI community need to see these videos, you have such insightful analysis.
@jeremyscheurer3797
@jeremyscheurer3797 4 жыл бұрын
Hey Can I ask a follow up question, I was wondering where you find so many interesting papers? Obviously yes you can scroll through some of the top conferences and look for whatever catches your eye. But I see a high diversity in the papers that you present and thought maybe you have an interesting way to go about this? (some blogs, some techniques etc.) Or to phrase it differently, if you need a new topic for a video, what do you do?
@arkasaha4412
@arkasaha4412 4 жыл бұрын
@@jeremyscheurer3797 I think reddit might be one of his sources.
@bluestar2253
@bluestar2253 3 жыл бұрын
Lottery tickets got me here!
@robbiero368
@robbiero368 4 жыл бұрын
Makes me think using simplistic random initialisation isn't the best thing to do, as randomness is inherently lumpy. From computer graphics we know of better stochastic sampling methods and maybe something like that would be better to start out with, since ultimately you are trying to sample a high dimensional landscape.
@robbiero368
@robbiero368 4 жыл бұрын
You could even think about doing a relaxation step after initialisation that move randomised weights away from each other so none are too close together perhaps
@MrEmretaha
@MrEmretaha 4 жыл бұрын
How the hell figure 6 is possible? No optimization, just random init. and %40 accuracy. wtf
@YannicKilcher
@YannicKilcher 4 жыл бұрын
by masking weights, you actively change the signal propagation, so it's entirely possible. what I find surprising is that the large-final selection criterion to achieve that is so simple.
@MrEmretaha
@MrEmretaha 4 жыл бұрын
I understand that it is possible, but in the paper, they presented as if it is a "result" after repeated tests. That is confusing. Although the mask itself is trained, in the end it is a binary mask. It is like rerouting but in a very limited case, considering the random initialization, 40% accuracy needs more rigorous explaining since it is somewhat against "common sense".
@YannicKilcher
@YannicKilcher 4 жыл бұрын
Ok true, it is rather unexpected. Maybe because the task itself isn't super hard
@MrEmretaha
@MrEmretaha 4 жыл бұрын
Yeah that is a possibility, though I worked on pruning before, it is pretty easy to get a well behaving network to perform below random without pruning before. So this result is kind of suprising to me. But as you said it is probably due to the task. I doubt that they get anything well above random with cifar-10
@kevalan1042
@kevalan1042 3 жыл бұрын
@@MrEmretaha This is over a year later, but I completely agree with you :) mind blown
@llewellyngreen402
@llewellyngreen402 4 жыл бұрын
Have you won??
@YannicKilcher
@YannicKilcher 4 жыл бұрын
Every time
@11rsort
@11rsort 2 жыл бұрын
like
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,8 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 12 МЛН
From Small To Giant 0%🍫 VS 100%🍫 #katebrush #shorts #gummy
00:19
Twin Telepathy Challenge!
00:23
Stokes Twins
Рет қаралды 133 МЛН
How Many Balloons To Make A Store Fly?
00:22
MrBeast
Рет қаралды 167 МЛН
When BERT Plays the Lottery, All Tickets Are Winning (Paper Explained)
53:35
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 384 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
But what is a neural network? | Deep learning chapter 1
18:40
3Blue1Brown
Рет қаралды 17 МЛН
Jonathan Frankle - The Lottery Ticket Hypothesis: On Sparse, Trainable Neural Networks
54:20
JumpTrading ELLIS UCL CSML Seminar Series
Рет қаралды 2,9 М.
Monte Carlo Simulation
10:06
MarbleScience
Рет қаралды 1,4 МЛН
14-Times Lottery Winner Finally Reveals His Secret
10:34
BRIGHT SIDE
Рет қаралды 3,7 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 12 МЛН