Neural Ordinary Differential Equations

  Рет қаралды 52,199

Yannic Kilcher

Yannic Kilcher

Күн бұрын

arxiv.org/abs/1806.07366
Abstract:
We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.
Authors:
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Пікірлер: 47
@nathancooper1001
@nathancooper1001 5 жыл бұрын
Best explanation I've found so far on this. Good job!
@siclonman
@siclonman 5 жыл бұрын
I spent 3 hours yesterday trying to figure out what the hell was happening in this paper, and I wake up to this...THANK YOU
@jordibolibar6767
@jordibolibar6767 3 жыл бұрын
After reading and watching various articles and videos, I must say this is the clearest explanation I've found so far. Thanks!
@eshaa3393
@eshaa3393 2 жыл бұрын
لا جلالاخح
@ericstephenvorm
@ericstephenvorm 3 жыл бұрын
Cheers! That was an excellent video. Thanks so much for putting it together!
@KMoscRD
@KMoscRD 6 ай бұрын
That series of explaining deep learning papers, soo good.
@zyadh2399
@zyadh2399 3 жыл бұрын
This is my video of the year, thanky you for the explanation.
@chrissteel7889
@chrissteel7889 3 жыл бұрын
Really great explanation, very clear and concise.
@SuperSarvagya
@SuperSarvagya 4 жыл бұрын
Thanks for making this video. This was really helpful
@DonHora
@DonHora 4 жыл бұрын
So clear now, many thanks ! +1 follower
@peterhessey7732
@peterhessey7732 3 жыл бұрын
Super helpful video, thanks!
@handokosupeno5425
@handokosupeno5425 7 ай бұрын
Amazing explanation
@cw9249
@cw9249 11 ай бұрын
wow this seems by far the most distinctive type of network in deep learning. everything else kind of falls into a few categories, but can all be conceptually interconnected in some way. this is not even close
@liangliangyan7528
@liangliangyan7528 2 жыл бұрын
Thank you for this video, maybe is useful for me.
@moormanjean5636
@moormanjean5636 2 жыл бұрын
Amazing video.. new subscriber for sure
@shorray
@shorray 3 жыл бұрын
Great video thx! I didn't get the part with the encoder, where is the video, you talked about? I mean the Figure 6, are they supposed to work with NODE or... mhhh... would love, if somebody could explain it
@-mwolf
@-mwolf 11 ай бұрын
Thanks!
@hdgdhdhdh
@hdgdhdhdh 2 жыл бұрын
Hi, Thanks for the crisp explanation. However, Is there any forum or link which I can join for ODE related issues/tasks? Actually, I have just started working on ODEs and would appreciate some help or discussions related to the topic. Thanks!
@wasgeht2409
@wasgeht2409 5 жыл бұрын
u are fucking good my friend
@bertchristiaens6355
@bertchristiaens6355 4 жыл бұрын
Fantastic videos and channel!! ps: I noticed that in your playlist of "DL architectures' there are a few videos duplicated (e.g. 7 times Linformer)
@YannicKilcher
@YannicKilcher 4 жыл бұрын
thanks. I know my playlists are a mess :-S
@bertchristiaens6355
@bertchristiaens6355 4 жыл бұрын
@@YannicKilcher but your videos are 👌 though
@zitangsun8688
@zitangsun8688 3 жыл бұрын
please, see the caption of Fig.2 "If the loss depends directly on the state at multiple observation times, the adjoint state must be updated in the direction of the partial derivative of the loss with respect to each observation" why need to add an offset for each observation???
@albertlee5312
@albertlee5312 4 жыл бұрын
Thank you! I am trying to understand the implementation in Python, but I am confused about why we still need 2~3 Conv2D layers with activation function... if we consider hidden layers as a continuous function that can be solved by ODE solvers. Could you please help me with this?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
The network doesn't represent the continuous function, but is a discrete approximation to the linear update equation of ODEs.
@sarvagyagupta1744
@sarvagyagupta1744 3 жыл бұрын
Thanks for making this video. I had questions while reading this paper and you covered those topics. But I still don't understand how did we get equation 4? Also, when we go from eq 4 to 5, the integration is for a very small time step right? It's not for the whole period as shown in the diagram. Let me know.
@ClosiusBeg
@ClosiusBeg Жыл бұрын
Ok.. and how to find the adjoin equation? What is it and what does it mean and why we can do it?
@Alex-rt3po
@Alex-rt3po 11 ай бұрын
How does this relate to liquid neural networks? That paper is also worthy of a video from you I think
@alekhmahankudo1051
@alekhmahankudo1051 5 жыл бұрын
I could not understand why do we need to compute dL/dZ(0), don't we need just dL/d{theta}, for updating our parameters. I would appreciate if anybody could answer my query.
@wujiewang8781
@wujiewang8781 4 жыл бұрын
they are related in the augmented dynamics, you could look in the appendix in the paper
@herp_derpingson
@herp_derpingson 4 жыл бұрын
Let me try to summarize. Tell me if I understood it right. There is a neural network which tries to predict the hidden activations at each layer (in a continuous space) of another neural network. So, the integral, of the outputs, of this entire neural network, should be the activations of the final layer (x1), of the neural network which we are trying to predict. Similarly, the input should be the initial activations. (x0) Therefore, loss is the deviation from ground truth and integration of the first neural network from x0 to x1. The integration is done through some numerical ODE solver like Euler method. It must be continuous and differentiable. t is a hyperparameter which is an arbitrarily chosen "depth" of the neural network which we are trying to predict.
@YannicKilcher
@YannicKilcher 4 жыл бұрын
Yes that sounds legit. The main contribution of the paper is the way they implement backpropagation without having to differentiate through the ODE solver.
@2bmurders
@2bmurders 5 жыл бұрын
I think I'm misunderstanding something from the paper (maybe it's the comparison with residual networks). Is this concept using the idea of having a fixed size neural network that is approximating an unknown differential equation that would then be numerically integrated across to some arbitrary time step (the prediction) in the future for the output? That would make sense to me. But the paper seems to sort of hint at still having x layers as the approximating intermediate steps of the approximated differential equation when the paper references the hidden states of the network. That part is what's throwing me off.
@YannicKilcher
@YannicKilcher 5 жыл бұрын
Not sure I understand your problem, but your first statement is correct. And the "arbitrary" time step in the future is a fixed time, I think. I guess easiest would be to always go from 0 to 1 with t. The question of the paper is, if this whole procedure approximates some neural network with h hidden layers, how big is h?
@2bmurders
@2bmurders 5 жыл бұрын
Thanks for the follow up. After letting this paper sink in for a bit, I think it's finally clicked for me...and now I feel a little dumb for not getting it the first time because it's pretty straight forward (must have been over thinking it). I'm curious if it's possible to parameterize the width dynamics over time in tandem with depth via a neural network (probably as a PDE at that point). Regardless, this paper is really exciting.
@iqraiqra6627
@iqraiqra6627 4 жыл бұрын
Hay guys can anyone help me to write research proposal on ODEs topics
@daveb4446
@daveb4446 11 ай бұрын
What is the scientific term for “oh crap okay”?
@conjugategradient5498
@conjugategradient5498 3 жыл бұрын
I suspect that the primary reason why the RNN has such jagged lines is due to Relu. I'm curious to see what the results look like with Tanh.
@moormanjean5636
@moormanjean5636 2 жыл бұрын
RNNs are a time discrete analog to Neural ODEs, which could also contribute to the jerkiness
@keeplearning7588
@keeplearning7588 10 ай бұрын
What‘s the t mean? number of layer?
@bhanusri3732
@bhanusri3732 4 жыл бұрын
How does da(t)/dt equation come?
@YannicKilcher
@YannicKilcher 4 жыл бұрын
It comes from the ODE literature. I think to really understand this paper you might need to dive into that.
@bhanusri3732
@bhanusri3732 4 жыл бұрын
@@YannicKilcher how did the equation da(t)/dt know its directly proportion to -a(t) and df/dz.Is it through experimenting.I am noob sorry if my doubt was too basic
@bhanusri3732
@bhanusri3732 4 жыл бұрын
@@YannicKilcher will we apply chain rule? How does it work in this particular equation
@wujiewang8781
@wujiewang8781 4 жыл бұрын
@@bhanusri3732 you can look into the SI of the paper. the derivation is not too bad.
@ethansmith7608
@ethansmith7608 9 ай бұрын
diffusion, before it was cool
GPT-2: Language Models are Unsupervised Multitask Learners
27:33
Yannic Kilcher
Рет қаралды 29 М.
Neural Ordinary Differential Equations
35:33
Andriy Drozdyuk
Рет қаралды 22 М.
Final muy inesperado 🥹
00:48
Juan De Dios Pantoja
Рет қаралды 19 МЛН
ОСКАР ИСПОРТИЛ ДЖОНИ ЖИЗНЬ 😢 @lenta_com
01:01
small vs big hoop #tiktok
00:12
Анастасия Тарасова
Рет қаралды 23 МЛН
FOOLED THE GUARD🤢
00:54
INO
Рет қаралды 62 МЛН
David Duvenaud | Reflecting on Neural ODEs | NeurIPS 2019
21:02
Preserve Knowledge
Рет қаралды 25 М.
Hopfield Networks is All You Need (Paper Explained)
1:05:16
Yannic Kilcher
Рет қаралды 90 М.
Differential equations, a tourist's guide | DE1
27:16
3Blue1Brown
Рет қаралды 4 МЛН
NeurIPS 2020 Tutorial: Deep Implicit Layers
1:51:35
Zico Kolter
Рет қаралды 45 М.
Neural ODEs (NODEs) [Physics Informed Machine Learning]
24:37
Steve Brunton
Рет қаралды 47 М.
What is a DIFFERENTIAL EQUATION??   **Intro to my full ODE course**
11:26
Dr. Trefor Bazett
Рет қаралды 215 М.
Hisense Official Flagship Store Hisense is the champion What is going on?
0:11
Special Effects Funny 44
Рет қаралды 2,3 МЛН
#miniphone
0:16
Miniphone
Рет қаралды 3,6 МЛН
Asus  VivoBook Винда за 8 часов!
1:00
Sergey Delaisy
Рет қаралды 1,1 МЛН