Рет қаралды 18,205
Leo Dirac (@leopd) gives a geometric intuition for what happens when you train a deep learning neural network. Starting with a physics analogy for how SGD works, and describing the shape of neural network loss surfaces.
This talk was recorded live on 12 Nov 2019 as part of the Seattle Applied Deep Learning (sea-adl.org) series.
References from the talk:
Loss Surfaces of Multilayer networks arxiv.org/pdf/...
Sharp minima papers:
-Modern take arxiv.org/abs/...
-Hochreiter, Schmidhuber 1997 www.bioinf.jku....
SGD converges to limit cycles: arxiv.org/pdf/...
Entropy-SGD: arxiv.org/abs/...
Parle: arxiv.org/abs/...
FGE: arxiv.org/abs/...
SWA: arxiv.org/pdf/...
SWA implementation in pytorch: pytorch.org/bl...