Yet another super contribution! In some ways this kind of video is more valuable than the original paper!
@JaswinderSingh-oe5ie4 жыл бұрын
please keep making these videos, it is amazing
@josephsantarcangelo93104 жыл бұрын
Thanks, Yannic. It's difficult to keep up with all the advancements, you make it a lot easier.
@HappyManStudiosTV4 жыл бұрын
Thank you for covering this!
@burhanrashidhussein60373 жыл бұрын
Thanks for these amazing videos, Your opinion looks valid, these methods need to be benchmarked on more complex task!
@vivekkumar5314 жыл бұрын
Thank you so much for the video!!
@christianleininger29544 жыл бұрын
great job would love more rl paper :)
@jrkirby934 жыл бұрын
I wanna see someone distill "lottery ticket" networks to 1% of the parameters, and then double the number of layers with all that freed RAM, distill them again, rinse and repeat.
@YannicKilcher4 жыл бұрын
good idea, but sparse neural networks are still not really a thing, so I don't think this is going to save you much RAM at actual runtime.
@araldjean-charles3924 Жыл бұрын
For the initial conditions that work, have anybody look at how much wiggle room you have. Is there an epsilon-neighborhood of the initial state you can safely start from, and how small is epsilon?
@ans19754 жыл бұрын
Sorry for the silly question... what software can be used to do similar things? I need it for a classroom on much more basic things. Thanks, and by the way, this videos are great and generate true value.
@YannicKilcher4 жыл бұрын
Thanks. I use OneNote. I made a video on how to do online education where I link to all my setup.
@justinking59642 жыл бұрын
English is not my first language but i believe I can explain them clearly to sb I can trust. To predict 10 nums in Three Drums, one actually don't have to pay attenton to them all. just 2 numbers are enough.
@deoabhijit59352 жыл бұрын
thanks for explanation :)
@michael-nef4 жыл бұрын
how do you make so many videos
@seanjhardy4 жыл бұрын
I would assume its because they unedited videos (just him talking with no cuts), which only require around 30 minutes to 2 hours to read, come up with talking points and then record a video
@YannicKilcher4 жыл бұрын
You forget the large quantities of chocolate I need to consume to keep it up ;)
@seanjhardy4 жыл бұрын
@@YannicKilcher oh absolutely, you need something to fuel this phenomenal work! More people in the AI community need to see these videos, you have such insightful analysis.
@jeremyscheurer37974 жыл бұрын
Hey Can I ask a follow up question, I was wondering where you find so many interesting papers? Obviously yes you can scroll through some of the top conferences and look for whatever catches your eye. But I see a high diversity in the papers that you present and thought maybe you have an interesting way to go about this? (some blogs, some techniques etc.) Or to phrase it differently, if you need a new topic for a video, what do you do?
@arkasaha44124 жыл бұрын
@@jeremyscheurer3797 I think reddit might be one of his sources.
@bluestar22533 жыл бұрын
Lottery tickets got me here!
@robbiero3684 жыл бұрын
Makes me think using simplistic random initialisation isn't the best thing to do, as randomness is inherently lumpy. From computer graphics we know of better stochastic sampling methods and maybe something like that would be better to start out with, since ultimately you are trying to sample a high dimensional landscape.
@robbiero3684 жыл бұрын
You could even think about doing a relaxation step after initialisation that move randomised weights away from each other so none are too close together perhaps
@MrEmretaha4 жыл бұрын
How the hell figure 6 is possible? No optimization, just random init. and %40 accuracy. wtf
@YannicKilcher4 жыл бұрын
by masking weights, you actively change the signal propagation, so it's entirely possible. what I find surprising is that the large-final selection criterion to achieve that is so simple.
@MrEmretaha4 жыл бұрын
I understand that it is possible, but in the paper, they presented as if it is a "result" after repeated tests. That is confusing. Although the mask itself is trained, in the end it is a binary mask. It is like rerouting but in a very limited case, considering the random initialization, 40% accuracy needs more rigorous explaining since it is somewhat against "common sense".
@YannicKilcher4 жыл бұрын
Ok true, it is rather unexpected. Maybe because the task itself isn't super hard
@MrEmretaha4 жыл бұрын
Yeah that is a possibility, though I worked on pruning before, it is pretty easy to get a well behaving network to perform below random without pruning before. So this result is kind of suprising to me. But as you said it is probably due to the task. I doubt that they get anything well above random with cifar-10
@kevalan10423 жыл бұрын
@@MrEmretaha This is over a year later, but I completely agree with you :) mind blown