Geometric Intuition for Training Neural Networks

  Рет қаралды 18,205

Seattle Applied Deep Learning

Seattle Applied Deep Learning

Күн бұрын

Leo Dirac (@leopd) gives a geometric intuition for what happens when you train a deep learning neural network. Starting with a physics analogy for how SGD works, and describing the shape of neural network loss surfaces.
This talk was recorded live on 12 Nov 2019 as part of the Seattle Applied Deep Learning (sea-adl.org) series.
References from the talk:
Loss Surfaces of Multilayer networks arxiv.org/pdf/...
Sharp minima papers:
-Modern take arxiv.org/abs/...
-Hochreiter, Schmidhuber 1997 www.bioinf.jku....
SGD converges to limit cycles: arxiv.org/pdf/...
Entropy-SGD: arxiv.org/abs/...
Parle: arxiv.org/abs/...
FGE: arxiv.org/abs/...
SWA: arxiv.org/pdf/...
SWA implementation in pytorch: pytorch.org/bl...

Пікірлер: 20
@susmitislam1910
@susmitislam1910 3 жыл бұрын
For those who are wondering, yes, he's the grandson of the late great Paul Dirac.
@miguelduqueb7065
@miguelduqueb7065 2 жыл бұрын
Such insights so easily explained denote a deep understanding of the topic and great teaching skills. I am eager to see more lectures or talks by this author. Thanks.
@oxfordsculler8013
@oxfordsculler8013 3 жыл бұрын
Great video. Why no more? These are very insightful.
@ramkitty
@ramkitty 3 жыл бұрын
This is a great lecture that ends at wolframs argument for quantum physics and relativity and what I think is manifest as orch or type contiousness through Penrose twistor collapse
@matthewhuang7857
@matthewhuang7857 2 жыл бұрын
Thanks for the speech Leo! I'm now a couple of months into ML and this level of articulation really helped a lot. I know this is probably a rookie mistake in this context but often when it's hard for my model to converge, I thought it's probably because it reaches a 'local minima'. My practice is often significantly bumping up the learning rate to hopefully let the model to kinda leap over and get to a point where it can re-converge. According to what you said, there are evidences conclusively proving there's no local minima in loss functions. I'm wondering which specific papers you were talking about. regards, Matt
@uwe_sterr
@uwe_sterr 4 жыл бұрын
hi leo, thanks for this very impressing way of making somewhat complicated concepts so easy to understand with simple but well structured visualisations.
@linminhtoo
@linminhtoo 3 жыл бұрын
very nice (and certainly mindblowing) video, but according to kzbin.info/www/bejne/bWnZommhnNiHl5o, that complicated loss landscope at 13:51 is not actually a ResNet but a VGG. The ResNet one looks a lot smoother due to the residual skip connections
@LeoDirac
@LeoDirac 3 жыл бұрын
Thanks for the kind words. The creators of that diagram called it a "ResNet" - see the first page of the referenced paper arxiv.org/pdf/1712.09913.pdf . Skip connections make the loss surface smoothER, but remember that these surfaces have millions of dimensions. There are zillions of ways to visualize them in 2 or 3 dimensions, and every view discards tons of information. It's totally reasonable to expect that one view would look smooth and another very lumpy, for the same surface. TBH I don't know exactly what the authors of this paper did - they refer to "skip connections" a lot, and talk about resnets with and without them. I'm not sure if they mean "residuals" when they say "skip connections" but I'm not sure I'd call a resnet without RESiduals a RESnet myself. If you remove the residuals it's architecturally a lot closer to a traditional CNN like VGG / AlexNet / LeNet and not what I would call a ResNet at all.
@katiefaery
@katiefaery 4 жыл бұрын
He’s a great speaker. Really well explained. Thanks for sharing.
@PD-vt9fe
@PD-vt9fe 4 жыл бұрын
Thank you so much for this excellent talk.
@MrArihar
@MrArihar 4 жыл бұрын
Really useful resource with intuitively understandable explanations! Thanks a lot!
@RobertElliotPahel-Short
@RobertElliotPahel-Short 4 жыл бұрын
This is such a great talk! Keep it up my dude!!
@matthewtang1489
@matthewtang1489 4 жыл бұрын
This is so coooooollll!!!!!!!
@elclay
@elclay 3 жыл бұрын
please the slides sir
@hanyanglee9018
@hanyanglee9018 2 жыл бұрын
17:00 is all you need.
@berargumen2390
@berargumen2390 4 жыл бұрын
This video lead me to my "aha" moment, thanks
@bluemamba5317
@bluemamba5317 4 жыл бұрын
Was it the pink shirt, or the green belt?
@abhijeetvyas7365
@abhijeetvyas7365 4 жыл бұрын
Dude, awesome!
@srijeetful
@srijeetful 4 жыл бұрын
nice one
LSTM is dead. Long Live Transformers!
28:48
Seattle Applied Deep Learning
Рет қаралды 529 М.
Dendrites: Why Biological Neurons Are Deep Neural Networks
25:28
Artem Kirsanov
Рет қаралды 236 М.
Happy birthday to you by Secret Vlog
00:12
Secret Vlog
Рет қаралды 6 МЛН
Who's spending her birthday with Harley Quinn on halloween?#Harley Quinn #joker
01:00
Harley Quinn with the Joker
Рет қаралды 27 МЛН
Long Nails 💅🏻 #shorts
00:50
Mr DegrEE
Рет қаралды 4,5 МЛН
Tom Goldstein: "What do neural loss surfaces look like?"
50:26
Institute for Pure & Applied Mathematics (IPAM)
Рет қаралды 19 М.
What happens *inside* a neural network?
14:16
vcubingx
Рет қаралды 38 М.
What are Transformer Neural Networks?
16:44
Ari Seff
Рет қаралды 163 М.
The Most Important Material Ever Made
22:23
Veritasium
Рет қаралды 3,3 МЛН
L3.5 The Geometric Intuition Behind the Perceptron
18:43
Sebastian Raschka
Рет қаралды 8 М.
How AI Discovered a Faster Matrix Multiplication Algorithm
13:00
Quanta Magazine
Рет қаралды 1,5 МЛН
KAN: Kolmogorov-Arnold Networks | Ziming Liu
1:34:56
Valence Labs
Рет қаралды 36 М.
Happy birthday to you by Secret Vlog
00:12
Secret Vlog
Рет қаралды 6 МЛН