Lecture 7 - Deep Learning Foundations: Neural Tangent Kernels

  Рет қаралды 25,642

Soheil Feizi

Soheil Feizi

Күн бұрын

Пікірлер: 30
@TheAIEpiphany
@TheAIEpiphany 2 жыл бұрын
Cool video thanks! 00:00:00 Intro: linear regression 00:23:55 NTKs start here 01:01:33 link between NNs and ODEs (ordinary differential equations)
@debadeepta
@debadeepta 4 жыл бұрын
Really nice lecture! I was looking to quickly learn NTKs before diving deep into the original papers and this really helped.
@zl7460
@zl7460 3 жыл бұрын
+1. Most well-explained DL lecture I've seen for a long time
@StratosFair
@StratosFair 2 жыл бұрын
Incredibly clear lecture, allowed me to fill the gaps in my understanding of NTK. Thank you professor !
@dv019
@dv019 4 жыл бұрын
Great video, thank you! To the student asking about Kernels: the word is overloaded. It is used in linear algebra to mean the set of all vectors mapped to 0 by a linear transformation. Sometimes Green's functions in PDEs are called integral kernels. In general a kernel is "the central or most important part of something". I don't like how overloaded the word is either, but c'est la vie.
@weisenjiang9179
@weisenjiang9179 4 жыл бұрын
great intro to NTK, benefit me a lot
@mstislavmaslennikov326
@mstislavmaslennikov326 2 жыл бұрын
The lecturer is imho doing a great job explaining difficult material!
@AyushSharma-ie7tj
@AyushSharma-ie7tj Жыл бұрын
Really nice lecture with a very even pace. Thank you for sharing.
@MetaOptimizer
@MetaOptimizer 3 жыл бұрын
41:07 Do we consider the large width of parameter (m) in empirical observation as an extremely large network such as GPT3? In other words, could I interpret the meaning of "the width of parameters" as "the number of trainable parameters"? Thank for your valuable lecture :)
@sikun7894
@sikun7894 4 жыл бұрын
Thank you so much for sharing these lectures! Really useful
@DarkNinja-24
@DarkNinja-24 2 жыл бұрын
Beautiful explanation!
@itachi7243456
@itachi7243456 4 жыл бұрын
These are fantastic, thanks!
@nhl8586
@nhl8586 2 жыл бұрын
Super useful for understanding NTK in 15 mins!
@joonho0
@joonho0 4 жыл бұрын
Thanks a lot for sharing this lecture!
@meghbhalerao5208
@meghbhalerao5208 2 жыл бұрын
If I understand right, the NTK is derived when we only consider quadratic mse loss, right? can it be generalized to other loss functions?
@yuwu7547
@yuwu7547 2 жыл бұрын
Very useful and easy-catching lecture. Thanks a lot!
@AlexanderGoncharenko-e7o
@AlexanderGoncharenko-e7o 3 жыл бұрын
Awesome lesson! Straight and clear!
@chongyizheng7758
@chongyizheng7758 3 жыл бұрын
Question about the first-order Taylor approximation of neural network: Why the first term f(w_0, x) is not included in the kernel function since it is nonlinear w.r.t. x?
@ramanasubramanyam1110
@ramanasubramanyam1110 3 жыл бұрын
The first derivative is included (and called NTK) because it resembles the operation of a kernel on an input, i.e a transformation function mapping to a higher dimension
@chongyizheng7758
@chongyizheng7758 3 жыл бұрын
@@ramanasubramanyam1110 Thanks for your reply, but I don't think I am asking for that. Let me clarify: My question is about the constant (the first) term f(w_0, x) at 41:16 instead of the derivative (the second) term in the equation. f(w_0, x) seems also nonlinearly depend on x, why it was excluded in the definition of NTK?
@hw1451
@hw1451 2 жыл бұрын
I think since it's a constant, we can always subtract it from y.
@yuzhema2506
@yuzhema2506 2 жыл бұрын
Thanks for the nice lecture! One question: the bias term in the Taylor approximation seems dependent on x, which means for different input x, the bias term varies. This is different from the traditional kernel view where the bias term is the same for different transformed input phi(x). In other words, for NTK, the inputs in the transformed space do not strictly follow the same linear model. How do we interpret such deviation? Thanks
@sayeedchowdhury11
@sayeedchowdhury11 3 жыл бұрын
thanks for the nice lecture, I have a query, we're evaluating the gradient at w0, does it mean the kernel is evaluated based on gradients obtained from an untrained NN which has just been initialized? i mean is the f(w,x) a trained NN or just an initialized one?
@sinaasadiyan
@sinaasadiyan 2 жыл бұрын
great explanation, just Subscribed!
@tanchienhao
@tanchienhao 2 жыл бұрын
Thanks for the awesome lectures!!
@da_lime
@da_lime 2 жыл бұрын
Awesome, thanks!
@chenamora1653
@chenamora1653 3 жыл бұрын
So amazing
@vi5hnupradeep
@vi5hnupradeep 2 жыл бұрын
Thankyou so much!
@ihany9061
@ihany9061 3 жыл бұрын
lifesaver!
@freerockneverdrop1236
@freerockneverdrop1236 5 ай бұрын
The formula for the neural network in this video should be a 2 level summation instead of one level.
Deep Learning Foundations by Soheil Feizi : Diffusion Models
2:58:09
Deep Networks Are Kernel Machines (Paper Explained)
43:04
Yannic Kilcher
Рет қаралды 60 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 36 МЛН
Feature Learning in Infinite-Width Neural Networks
2:09:49
Physics Meets ML
Рет қаралды 6 М.
Deep Learning Foundations by Soheil Feizi : Transformers
1:28:55
Soheil Feizi
Рет қаралды 2,9 М.
Kernels!
1:37:30
Machine Learning Street Talk
Рет қаралды 20 М.
Is Optimization the Right Language to Understand Deep Learning? - Sanjeev Arora
32:35
Institute for Advanced Study
Рет қаралды 13 М.
On the Connection between Neural Networks and Kernels: a Modern Perspective -Simon Du
30:47
Deep Learning Foundations by Soheil Feizi : Large Language Models
3:47:26
Rethinking Physics Informed Neural Networks [NeurIPS'21]
51:22
Amir Gholaminejad
Рет қаралды 53 М.
Stanford Seminar - Information Theory of Deep Learning, Naftali Tishby
1:24:44
Liquid Neural Networks
49:30
MITCBMM
Рет қаралды 256 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 108 МЛН