Neural Ordinary Differential Equations

  Рет қаралды 57,375

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 49
@jordibolibar6767
@jordibolibar6767 3 жыл бұрын
After reading and watching various articles and videos, I must say this is the clearest explanation I've found so far. Thanks!
@eshaa3393
@eshaa3393 2 жыл бұрын
لا جلالاخح
@nathancooper1001
@nathancooper1001 5 жыл бұрын
Best explanation I've found so far on this. Good job!
@siclonman
@siclonman 5 жыл бұрын
I spent 3 hours yesterday trying to figure out what the hell was happening in this paper, and I wake up to this...THANK YOU
@KMoscRD
@KMoscRD Жыл бұрын
That series of explaining deep learning papers, soo good.
@zyadh2399
@zyadh2399 3 жыл бұрын
This is my video of the year, thanky you for the explanation.
@cw9249
@cw9249 Жыл бұрын
wow this seems by far the most distinctive type of network in deep learning. everything else kind of falls into a few categories, but can all be conceptually interconnected in some way. this is not even close
@DonHora
@DonHora 4 жыл бұрын
So clear now, many thanks ! +1 follower
@shorray
@shorray 3 жыл бұрын
Great video thx! I didn't get the part with the encoder, where is the video, you talked about? I mean the Figure 6, are they supposed to work with NODE or... mhhh... would love, if somebody could explain it
@chrissteel7889
@chrissteel7889 3 жыл бұрын
Really great explanation, very clear and concise.
@ericstephenvorm
@ericstephenvorm 3 жыл бұрын
Cheers! That was an excellent video. Thanks so much for putting it together!
@peterhessey7732
@peterhessey7732 3 жыл бұрын
Super helpful video, thanks!
@SuperSarvagya
@SuperSarvagya 5 жыл бұрын
Thanks for making this video. This was really helpful
@zitangsun8688
@zitangsun8688 3 жыл бұрын
please, see the caption of Fig.2 "If the loss depends directly on the state at multiple observation times, the adjoint state must be updated in the direction of the partial derivative of the loss with respect to each observation" why need to add an offset for each observation???
@albertlee5312
@albertlee5312 4 жыл бұрын
Thank you! I am trying to understand the implementation in Python, but I am confused about why we still need 2~3 Conv2D layers with activation function... if we consider hidden layers as a continuous function that can be solved by ODE solvers. Could you please help me with this?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
The network doesn't represent the continuous function, but is a discrete approximation to the linear update equation of ODEs.
@handokosupeno5425
@handokosupeno5425 Жыл бұрын
Amazing explanation
@Alex-rt3po
@Alex-rt3po Жыл бұрын
How does this relate to liquid neural networks? That paper is also worthy of a video from you I think
@ClosiusBeg
@ClosiusBeg 2 жыл бұрын
Ok.. and how to find the adjoin equation? What is it and what does it mean and why we can do it?
@Mylad
@Mylad 2 ай бұрын
You are an angel and a savior
@alekhmahankudo1051
@alekhmahankudo1051 5 жыл бұрын
I could not understand why do we need to compute dL/dZ(0), don't we need just dL/d{theta}, for updating our parameters. I would appreciate if anybody could answer my query.
@wujiewang8781
@wujiewang8781 4 жыл бұрын
they are related in the augmented dynamics, you could look in the appendix in the paper
@Hawiyah_galery71532
@Hawiyah_galery71532 2 ай бұрын
Article source?
@hdgdhdhdh
@hdgdhdhdh 3 жыл бұрын
Hi, Thanks for the crisp explanation. However, Is there any forum or link which I can join for ODE related issues/tasks? Actually, I have just started working on ODEs and would appreciate some help or discussions related to the topic. Thanks!
@herp_derpingson
@herp_derpingson 5 жыл бұрын
Let me try to summarize. Tell me if I understood it right. There is a neural network which tries to predict the hidden activations at each layer (in a continuous space) of another neural network. So, the integral, of the outputs, of this entire neural network, should be the activations of the final layer (x1), of the neural network which we are trying to predict. Similarly, the input should be the initial activations. (x0) Therefore, loss is the deviation from ground truth and integration of the first neural network from x0 to x1. The integration is done through some numerical ODE solver like Euler method. It must be continuous and differentiable. t is a hyperparameter which is an arbitrarily chosen "depth" of the neural network which we are trying to predict.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
Yes that sounds legit. The main contribution of the paper is the way they implement backpropagation without having to differentiate through the ODE solver.
@moormanjean5636
@moormanjean5636 3 жыл бұрын
Amazing video.. new subscriber for sure
@daveb4446
@daveb4446 Жыл бұрын
What is the scientific term for “oh crap okay”?
@liangliangyan7528
@liangliangyan7528 2 жыл бұрын
Thank you for this video, maybe is useful for me.
@keeplearning7588
@keeplearning7588 Жыл бұрын
What‘s the t mean? number of layer?
@iqraiqra6627
@iqraiqra6627 4 жыл бұрын
Hay guys can anyone help me to write research proposal on ODEs topics
@2bmurders
@2bmurders 5 жыл бұрын
I think I'm misunderstanding something from the paper (maybe it's the comparison with residual networks). Is this concept using the idea of having a fixed size neural network that is approximating an unknown differential equation that would then be numerically integrated across to some arbitrary time step (the prediction) in the future for the output? That would make sense to me. But the paper seems to sort of hint at still having x layers as the approximating intermediate steps of the approximated differential equation when the paper references the hidden states of the network. That part is what's throwing me off.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
Not sure I understand your problem, but your first statement is correct. And the "arbitrary" time step in the future is a fixed time, I think. I guess easiest would be to always go from 0 to 1 with t. The question of the paper is, if this whole procedure approximates some neural network with h hidden layers, how big is h?
@2bmurders
@2bmurders 5 жыл бұрын
Thanks for the follow up. After letting this paper sink in for a bit, I think it's finally clicked for me...and now I feel a little dumb for not getting it the first time because it's pretty straight forward (must have been over thinking it). I'm curious if it's possible to parameterize the width dynamics over time in tandem with depth via a neural network (probably as a PDE at that point). Regardless, this paper is really exciting.
@bhanusri3732
@bhanusri3732 4 жыл бұрын
How does da(t)/dt equation come?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
It comes from the ODE literature. I think to really understand this paper you might need to dive into that.
@bhanusri3732
@bhanusri3732 4 жыл бұрын
@@YannicKilcher how did the equation da(t)/dt know its directly proportion to -a(t) and df/dz.Is it through experimenting.I am noob sorry if my doubt was too basic
@bhanusri3732
@bhanusri3732 4 жыл бұрын
@@YannicKilcher will we apply chain rule? How does it work in this particular equation
@wujiewang8781
@wujiewang8781 4 жыл бұрын
@@bhanusri3732 you can look into the SI of the paper. the derivation is not too bad.
@sarvagyagupta1744
@sarvagyagupta1744 4 жыл бұрын
Thanks for making this video. I had questions while reading this paper and you covered those topics. But I still don't understand how did we get equation 4? Also, when we go from eq 4 to 5, the integration is for a very small time step right? It's not for the whole period as shown in the diagram. Let me know.
@conjugategradient5498
@conjugategradient5498 4 жыл бұрын
I suspect that the primary reason why the RNN has such jagged lines is due to Relu. I'm curious to see what the results look like with Tanh.
@moormanjean5636
@moormanjean5636 3 жыл бұрын
RNNs are a time discrete analog to Neural ODEs, which could also contribute to the jerkiness
@wasgeht2409
@wasgeht2409 5 жыл бұрын
u are fucking good my friend
@-mwolf
@-mwolf Жыл бұрын
Thanks!
@bertchristiaens6355
@bertchristiaens6355 4 жыл бұрын
Fantastic videos and channel!! ps: I noticed that in your playlist of "DL architectures' there are a few videos duplicated (e.g. 7 times Linformer)
@YannicKilcher
@YannicKilcher 4 жыл бұрын
thanks. I know my playlists are a mess :-S
@bertchristiaens6355
@bertchristiaens6355 4 жыл бұрын
@@YannicKilcher but your videos are 👌 though
@ethansmith7608
@ethansmith7608 Жыл бұрын
diffusion, before it was cool
GPT-2: Language Models are Unsupervised Multitask Learners
27:33
Yannic Kilcher
Рет қаралды 30 М.
Neural Ordinary Differential Equations
35:33
Andriy Drozdyuk
Рет қаралды 26 М.
Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей
00:19
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️
01:01
DO$HIK
Рет қаралды 3,3 МЛН
NeurIPS 2020 Tutorial: Deep Implicit Layers
1:51:35
Zico Kolter
Рет қаралды 48 М.
David Duvenaud | Reflecting on Neural ODEs | NeurIPS 2019
21:02
Preserve Knowledge
Рет қаралды 27 М.
Hopfield Networks is All You Need (Paper Explained)
1:05:16
Yannic Kilcher
Рет қаралды 100 М.
Neural ODEs (NODEs) [Physics Informed Machine Learning]
24:37
Steve Brunton
Рет қаралды 69 М.
Were RNNs All We Needed? (Paper Explained)
27:48
Yannic Kilcher
Рет қаралды 53 М.
Liquid Neural Networks
49:30
MITCBMM
Рет қаралды 255 М.
Neural Differential Equations
35:18
Siraj Raval
Рет қаралды 139 М.
Hardy's Integral
13:47
Michael Penn
Рет қаралды 1,2 М.
Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей
00:19