After reading and watching various articles and videos, I must say this is the clearest explanation I've found so far. Thanks!
@eshaa33932 жыл бұрын
لا جلالاخح
@nathancooper10015 жыл бұрын
Best explanation I've found so far on this. Good job!
@siclonman5 жыл бұрын
I spent 3 hours yesterday trying to figure out what the hell was happening in this paper, and I wake up to this...THANK YOU
@KMoscRD Жыл бұрын
That series of explaining deep learning papers, soo good.
@zyadh23993 жыл бұрын
This is my video of the year, thanky you for the explanation.
@cw9249 Жыл бұрын
wow this seems by far the most distinctive type of network in deep learning. everything else kind of falls into a few categories, but can all be conceptually interconnected in some way. this is not even close
@DonHora4 жыл бұрын
So clear now, many thanks ! +1 follower
@shorray3 жыл бұрын
Great video thx! I didn't get the part with the encoder, where is the video, you talked about? I mean the Figure 6, are they supposed to work with NODE or... mhhh... would love, if somebody could explain it
@chrissteel78893 жыл бұрын
Really great explanation, very clear and concise.
@ericstephenvorm3 жыл бұрын
Cheers! That was an excellent video. Thanks so much for putting it together!
@peterhessey77323 жыл бұрын
Super helpful video, thanks!
@SuperSarvagya5 жыл бұрын
Thanks for making this video. This was really helpful
@zitangsun86883 жыл бұрын
please, see the caption of Fig.2 "If the loss depends directly on the state at multiple observation times, the adjoint state must be updated in the direction of the partial derivative of the loss with respect to each observation" why need to add an offset for each observation???
@albertlee53124 жыл бұрын
Thank you! I am trying to understand the implementation in Python, but I am confused about why we still need 2~3 Conv2D layers with activation function... if we consider hidden layers as a continuous function that can be solved by ODE solvers. Could you please help me with this?
@YannicKilcher4 жыл бұрын
The network doesn't represent the continuous function, but is a discrete approximation to the linear update equation of ODEs.
@handokosupeno5425 Жыл бұрын
Amazing explanation
@Alex-rt3po Жыл бұрын
How does this relate to liquid neural networks? That paper is also worthy of a video from you I think
@ClosiusBeg2 жыл бұрын
Ok.. and how to find the adjoin equation? What is it and what does it mean and why we can do it?
@Mylad2 ай бұрын
You are an angel and a savior
@alekhmahankudo10515 жыл бұрын
I could not understand why do we need to compute dL/dZ(0), don't we need just dL/d{theta}, for updating our parameters. I would appreciate if anybody could answer my query.
@wujiewang87814 жыл бұрын
they are related in the augmented dynamics, you could look in the appendix in the paper
@Hawiyah_galery715322 ай бұрын
Article source?
@hdgdhdhdh3 жыл бұрын
Hi, Thanks for the crisp explanation. However, Is there any forum or link which I can join for ODE related issues/tasks? Actually, I have just started working on ODEs and would appreciate some help or discussions related to the topic. Thanks!
@herp_derpingson5 жыл бұрын
Let me try to summarize. Tell me if I understood it right. There is a neural network which tries to predict the hidden activations at each layer (in a continuous space) of another neural network. So, the integral, of the outputs, of this entire neural network, should be the activations of the final layer (x1), of the neural network which we are trying to predict. Similarly, the input should be the initial activations. (x0) Therefore, loss is the deviation from ground truth and integration of the first neural network from x0 to x1. The integration is done through some numerical ODE solver like Euler method. It must be continuous and differentiable. t is a hyperparameter which is an arbitrarily chosen "depth" of the neural network which we are trying to predict.
@YannicKilcher5 жыл бұрын
Yes that sounds legit. The main contribution of the paper is the way they implement backpropagation without having to differentiate through the ODE solver.
@moormanjean56363 жыл бұрын
Amazing video.. new subscriber for sure
@daveb4446 Жыл бұрын
What is the scientific term for “oh crap okay”?
@liangliangyan75282 жыл бұрын
Thank you for this video, maybe is useful for me.
@keeplearning7588 Жыл бұрын
What‘s the t mean? number of layer?
@iqraiqra66274 жыл бұрын
Hay guys can anyone help me to write research proposal on ODEs topics
@2bmurders5 жыл бұрын
I think I'm misunderstanding something from the paper (maybe it's the comparison with residual networks). Is this concept using the idea of having a fixed size neural network that is approximating an unknown differential equation that would then be numerically integrated across to some arbitrary time step (the prediction) in the future for the output? That would make sense to me. But the paper seems to sort of hint at still having x layers as the approximating intermediate steps of the approximated differential equation when the paper references the hidden states of the network. That part is what's throwing me off.
@YannicKilcher5 жыл бұрын
Not sure I understand your problem, but your first statement is correct. And the "arbitrary" time step in the future is a fixed time, I think. I guess easiest would be to always go from 0 to 1 with t. The question of the paper is, if this whole procedure approximates some neural network with h hidden layers, how big is h?
@2bmurders5 жыл бұрын
Thanks for the follow up. After letting this paper sink in for a bit, I think it's finally clicked for me...and now I feel a little dumb for not getting it the first time because it's pretty straight forward (must have been over thinking it). I'm curious if it's possible to parameterize the width dynamics over time in tandem with depth via a neural network (probably as a PDE at that point). Regardless, this paper is really exciting.
@bhanusri37324 жыл бұрын
How does da(t)/dt equation come?
@YannicKilcher4 жыл бұрын
It comes from the ODE literature. I think to really understand this paper you might need to dive into that.
@bhanusri37324 жыл бұрын
@@YannicKilcher how did the equation da(t)/dt know its directly proportion to -a(t) and df/dz.Is it through experimenting.I am noob sorry if my doubt was too basic
@bhanusri37324 жыл бұрын
@@YannicKilcher will we apply chain rule? How does it work in this particular equation
@wujiewang87814 жыл бұрын
@@bhanusri3732 you can look into the SI of the paper. the derivation is not too bad.
@sarvagyagupta17444 жыл бұрын
Thanks for making this video. I had questions while reading this paper and you covered those topics. But I still don't understand how did we get equation 4? Also, when we go from eq 4 to 5, the integration is for a very small time step right? It's not for the whole period as shown in the diagram. Let me know.
@conjugategradient54984 жыл бұрын
I suspect that the primary reason why the RNN has such jagged lines is due to Relu. I'm curious to see what the results look like with Tanh.
@moormanjean56363 жыл бұрын
RNNs are a time discrete analog to Neural ODEs, which could also contribute to the jerkiness
@wasgeht24095 жыл бұрын
u are fucking good my friend
@-mwolf Жыл бұрын
Thanks!
@bertchristiaens63554 жыл бұрын
Fantastic videos and channel!! ps: I noticed that in your playlist of "DL architectures' there are a few videos duplicated (e.g. 7 times Linformer)