NeurIPS 2020 Tutorial: Deep Implicit Layers

  Рет қаралды 45,304

Zico Kolter

Zico Kolter

3 жыл бұрын

This is a video recording of our NeurIPS 2020 Tutorial - Deep Implicit Layers: Neural ODEs, Deep Equilibrium Models, and Beyond - by David Duvenaud, Zico Kolter, and Matt Johnson. Full information about the tutorial, including extensive notes in Colab notebook form, are available on our website: implicit-layers-tutorial.org

Пікірлер: 25
@keraeduardo
@keraeduardo 3 ай бұрын
I am a graduate student in Physics. This video is clear, easy to follow and highly informative. Many thanks for making this video public! This is very helpful for me
@alicsir
@alicsir 6 ай бұрын
Thanks for making this video public. The explanations are very intuitive and clear.
@alexeychernyavskiy4193
@alexeychernyavskiy4193 3 жыл бұрын
Thank you guys! Very solid video, and good tempo. You present the material with a smile in a very user-friendly manner, that's a rare delicacy :) I wish new successes for your trio in the coming year. Separate thank you for the website and the code! I think I will try to apply DEQ to image denoising.
@sippy_cups
@sippy_cups 3 жыл бұрын
Awesome! Really well presented!
@khuongnguyenduy2156
@khuongnguyenduy2156 3 жыл бұрын
Thank you very much for sharing this amazing tutorial!
@ezamora1981
@ezamora1981 3 жыл бұрын
Very cool idea!! Congratulations! and thanks for the tutorial.
@gewang9770
@gewang9770 2 жыл бұрын
I like this tutorial very much!
@elisim7
@elisim7 2 жыл бұрын
Great tutorial and notes!
@ansha2221
@ansha2221 3 жыл бұрын
Thank you for sharing this.
@CristianGarcia
@CristianGarcia 3 жыл бұрын
Thanks for the tutorial! I have a question about the representations created by DEQs, in normal Deep Networks depth means you can compose features and deeper layers are supposed to have higher level representations, does the same story apply for DEQs or is there a similar way to understand its computation?
@jiangao5652
@jiangao5652 3 жыл бұрын
This work is amazing! When I saw GPT-3 use 175 billion parameters to build a language model, just feel hopeless. It's more fair to compete state-of-the-art performance based on model complexity.
@dominikklotz1035
@dominikklotz1035 3 жыл бұрын
Great Idea.
@Kram1032
@Kram1032 8 күн бұрын
I wonder how much can be done here with stochastic continuous evaluations in the spirit of MCMC or recent "Walk on Stars" style evaluations, where you don't have any discretization error at all, but trade that off with some noise...
@omarsharif4676
@omarsharif4676 2 жыл бұрын
Thank you for a very informative video. I have a very limited mathematics background and was wondering if there are any good resources to better understand the differentiation in ODE. Please let me know if have such resources if you see my comment. Cheers!
@kimchi_taco
@kimchi_taco 3 жыл бұрын
I learn a lot. Thank you very much. There are 2 questions about DEQ. 1. Why does equilibrium point z* matter? How is z* better representation than any intermediate representation z_t? 2. ALBERT is BERT but share the weight by all transformer layers. How DEQ save memory sounds like ALBERT computes the gradient of only last layer and update the "shared" weight. ALBERT actually computes all gradients of all layers and update the "shared" weight by average of gradients. Why does DEQ work even though it doesn't care of gradient of intermediate layer?
@zicokolter9110
@zicokolter9110 3 жыл бұрын
Thanks for the questions. For 1) this is mainly just an empirical issue, but in practice we do see that "deeper" networks (even in the weight-tied setting) do appear to work better, and thus the equilibrium point works best as the final representation (plus allowing efficient differentiation). 2) Yes, ALBERT would store all the intermediate activations, and compute gradients through the whole unrolled network. The idea of the DEQ model is that this is actually unnecessary, though, precisely via the implicit differentiation method we discuss in the tutorial.
@ezamora1981
@ezamora1981 3 жыл бұрын
Hi Zico Kolter, great work! ....What about the inference time of DEQs w.r.t DNNs? Are they similar? ...Another question Do you recommend to use JAX instead PyTorch or Tensorflow2?
@adrianbergesenfedaque8016
@adrianbergesenfedaque8016 2 жыл бұрын
Hi, I'm just getting started with DILs/DEQs but from what I can tell, their inference time tends to be x2 slower when compared to DNNs. Still, depending on your application it might not be important at all; e.g. in my case we are interested in processing requests on the minute, while a feed-forward DNN takes milliseconds to do inference, so doubling the milliseconds is not going to be a problem. In fact, our hope is that solving the optimization problem directly via this method will save time overall (compared to DNN + optimization algorithm).
@vishwajitkumarvishnu3878
@vishwajitkumarvishnu3878 3 жыл бұрын
shouldn't the last partial differentiation at 54:00 in backward pass be d1(z*,x,theta) ? its written d2(z*,x,theta)
@kimchi_taco
@kimchi_taco 3 жыл бұрын
Awesome, but closed caption is little bit out sync. Could you sync it?
@zicokolter9110
@zicokolter9110 3 жыл бұрын
Thanks for pointing this out! We've re-uploaded them to properly sync. They should work correctly now.
@DasGrosseFressen
@DasGrosseFressen 3 жыл бұрын
Really cool. One question though? What is the fuss about neural ODEs? Honestly, I think I am missing something. They look just as taking a fireing rate model as an RNN... What is the difference?
@user-we2zz3gc3l
@user-we2zz3gc3l 3 жыл бұрын
50:56
@cexploreful
@cexploreful 2 жыл бұрын
earned like + sub at min. 1.47
@rohullahalavi
@rohullahalavi 2 жыл бұрын
like
David Duvenaud | Reflecting on Neural ODEs | NeurIPS 2019
21:02
Preserve Knowledge
Рет қаралды 25 М.
MAMBA from Scratch: Neural Nets Better and Faster than Transformers
31:51
Algorithmic Simplicity
Рет қаралды 135 М.
My little bro is funny😁  @artur-boy
00:18
Andrey Grechka
Рет қаралды 11 МЛН
когда повзрослела // EVA mash
00:40
EVA mash
Рет қаралды 3,2 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 297 М.
But what is a convolution?
23:01
3Blue1Brown
Рет қаралды 2,5 МЛН
On Neural Differential Equations
1:06:36
Patrick Kidger
Рет қаралды 10 М.
Neural ODEs (NODEs) [Physics Informed Machine Learning]
24:37
Steve Brunton
Рет қаралды 48 М.
Implicit Neural Representations with Periodic Activation Functions
10:20
Stanford Computational Imaging Lab
Рет қаралды 56 М.
Bayesian Deep Learning and Probabilistic Model Construction - ICML 2020 Tutorial
1:57:07
Neural Ordinary Differential Equations
35:33
Andriy Drozdyuk
Рет қаралды 22 М.