When you say, it can train to full accuracy in the same number of steps - isn't that sort of untrue? If it is the same number of steps (N) to train the reduced network as the big network, wouldn't we have 2N at the end of finding and training the winning ticket?
@EPenguin82 Жыл бұрын
Fascinating!
@araldjean-charles3924 Жыл бұрын
For the initial conditions that work, have anybody look at how much wiggle room you have. Is there an epsilon-neighborhood of the initial state you can safely start from, and how small is epsilon?
@jerryyuan64562 ай бұрын
Did you find anything on this?
@forecastinglottery61533 жыл бұрын
what computing power was used to achieve such results ?