Why Are Neural Network Loss Landscapes So Weirdly Connected?

  Рет қаралды 2,117

Tunadorable

Tunadorable

28 күн бұрын

Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
arxiv.org/abs/2104.11044
Support my learning journey either by becoming a KZbin or Patron member!
/ @tunadorable
patreon.com/Tunadorable?...
Discuss this stuff with other Tunadorks on Discord
/ discord
All my other links
linktr.ee/tunadorable

Пікірлер: 18
@rosieposiebias
@rosieposiebias 25 күн бұрын
14:22 Layer Normalization and Batch Normalization are not the same thing. It doesn't take away the quality of the video or anything, just commenting :)
@Tunadorable
@Tunadorable 25 күн бұрын
omg thank you that felt like way too simple of an explanation but I didn't want to linger on it. this is one of those things where I can't memorize a given technical term to save my life and it ends up biting me in the butt every few months even though once-upon a time I understood the concept -_- ugh
@aleksandrkhavarovskiy991
@aleksandrkhavarovskiy991 24 күн бұрын
BatchNorm and LayerNorm are based on the same formula, specifically the normal distribution formula. The goal of the layers is the force trainable parameters of the layer to be in a normal distribution. Additional trainable parameters gamma and beta are used the shape the bell curve. Its widely used after convolutional operations as it helps the network converge faster during training. Batch Normalization has the added step of getting the average gradient of the entire batch. Although in practice, a running average can be used instead.
@aleksandrkhavarovskiy991
@aleksandrkhavarovskiy991 24 күн бұрын
Batch Normalization makes an assumption about the layer outputs. Specifically, that the output has to be in a normal distribution. Although this is an assumption we make; The optimal distribution for a layer may not be normal, at least not during the intermediate steps of training. This assumption in distribution is what could be causing the MLI property to not hold.
@drdca8263
@drdca8263 23 күн бұрын
I did not expect the analogy with dive spots in RSE, haha
@sarthakmishra1483
@sarthakmishra1483 26 күн бұрын
Nice analysis , made me want to revisit my optimisation notes.
@stereoplegic
@stereoplegic 26 күн бұрын
LeNet is a CNN by Yann LeCun (now Chief Scientist at Meta) et al.
@easydoesitismist
@easydoesitismist 26 күн бұрын
Cool paper and analysis
@beagle989
@beagle989 19 күн бұрын
good video! and neat paper
@InfiniteQuest86
@InfiniteQuest86 24 күн бұрын
Interesting. This all seemed self-evident to me. I didn't realize this stuff wasn't known. SGD is literally designed to do this. I suppose it may not work as intended, so it's good someone checked. Then ADAM is literally designed to avoid MLI. Hmm, good to know someone did the work to check this stuff, but it seems like a weird paper.
@SudhirYadav-kz6ts
@SudhirYadav-kz6ts 24 күн бұрын
how can you say this, can you point to some reading material.
@jsparger
@jsparger 23 күн бұрын
What do they mean when they say Nguyen implies that all global minima are connected? Isn’t there only one global minimum? Somebody unpack that for me please.
@Tunadorable
@Tunadorable 23 күн бұрын
oooof apologies if I'm going a bit too basic here or not giving a good enough explanation. so basically once upon a time we used to think that our 3D intuitions could be applied to high dimensional loss landscapes, which they most definitely cannot. Then we started seeing really weird stuff that didn't make sense under those 3D intuitions. For an example relevant to this case, if you were to train 1000 randomly initialized NNs you'd find that instead of them all reaching the same global minima, meaning having roughly the same parameters, they all actually have completely different parameters and yet all reach minimal loss. AND, if you linearly interpolate between them you find regions of higher loss in between. If 3D intuitions were correct then you might interpret this as many distinct bottoms of the valley (loss landscape) that all just happen to be the exact same elevation (loss). However, this didn't really make any sense, why would there just happen to be tons and tons of equal elevation bottoms to the valley? The reality is that there is in fact only one bottom, but that bottom is a huge weirdly shaped high dimensional manifold. This weird reality is behind a lot of misconceptions still taught in ML classes, such as the myth of local minima. The paper that comes to mind as helping me first understand this a bit better was section 2 of arxiv.org/abs/1406.2572
@jsparger
@jsparger 22 күн бұрын
@@Tunadorable thanks that’s very interesting!
@NeelBanodiya
@NeelBanodiya 24 күн бұрын
You must be having nice music taste
@DouglasASean
@DouglasASean 26 күн бұрын
You are mapping an n dimensional surface, oc it would look weird, we don’t usually come across those in our daily experience.
@Tunadorable
@Tunadorable 26 күн бұрын
ofc
@kylev.8248
@kylev.8248 25 күн бұрын
Yeeees 🥰
KAN: Kolmogorov-Arnold Networks
37:09
Gabriel Mongaras
Рет қаралды 48 М.
Making AI/ML Robust to Unpredictable Events
17:14
Tunadorable
Рет қаралды 536
They RUINED Everything! 😢
00:31
Carter Sharer
Рет қаралды 21 МЛН
🍕Пиццерия FNAF в реальной жизни #shorts
00:41
The Future Of LLM Training Is Federated
23:42
Tunadorable
Рет қаралды 13 М.
Mapping GPT revealed something strange...
1:09:14
Machine Learning Street Talk
Рет қаралды 186 М.
Deep Ensembles: A Loss Landscape Perspective (Paper Explained)
46:32
Yannic Kilcher
Рет қаралды 22 М.
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 244 М.
What is backpropagation really doing? | Chapter 3, Deep learning
12:47
3Blue1Brown
Рет қаралды 4,4 МЛН
John Schulman (OpenAI Cofounder) - Reasoning, RLHF, & Plan for 2027 AGI
1:36:55
Deep Learning: A Crash Course (2018) | SIGGRAPH Courses
3:33:03
ACMSIGGRAPH
Рет қаралды 2,7 МЛН
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,2 МЛН
They RUINED Everything! 😢
00:31
Carter Sharer
Рет қаралды 21 МЛН