1:16 Overview of energy based models 15:20 Start of the paper 30:05 Experiments
@BJCaasenbrood4 жыл бұрын
Love your style of presenting papers, its clear and well-structured. Keep up the good work!
@PeterOtt4 жыл бұрын
I’m 10 minutes in and so far this is a great summary of what energy based learning is, I’ve heard the name but had no idea what it was before now!
@matterhart4 жыл бұрын
Best video yet, the longer intro was totally worth it. Timestamps would be great though.
@TijsMaas4 жыл бұрын
Nice explanation. I was going to say reminds me a lot of neurosymbolic concept learning, however, just found out this work was published before NS-CL.
@slackstation4 жыл бұрын
Great explanation. It's both a new and challenging concept mathematically. Thank you for the clear explanation.
@jugrajsingh32994 жыл бұрын
Great way of presenting link between current knowledge with problem and solution addressed by paper in a simple way 😃
@herp_derpingson4 жыл бұрын
Once we have GPUs large enough, this would be a game changer in solving abstract reasoning problems and procedurally generated matrices.
@atursams64713 жыл бұрын
28:00 Could you explain what backpropagation through an optimization procedure means?
@pastrop20034 жыл бұрын
Somehow, I start to think that if this model is further developed and then married up with a lot of compute we may get something looking like AGI?????
@welcomeaioverlords4 жыл бұрын
Super interesting, thanks for breaking it down!
@vojtechkubin15904 жыл бұрын
Genius design, beautiful presentation! But there is one thing I don't understand: "why only 11.5K subscribers"?
@YannicKilcher4 жыл бұрын
It's a very exclusive club ;)
@snippletrap4 жыл бұрын
Almost quadrupled by now
@sarvagyagupta17442 жыл бұрын
This is a great paper and great job explaining it. I kept on wondering while watching this if this is the concept behind the attention mechanism?
@vsiegel4 жыл бұрын
A good example for something mind bending: Imagine a differentiable cat.
@CristianGarcia4 жыл бұрын
Great video! This reminds me of the differentiable ODEs paper.
@lislouise23053 жыл бұрын
I'm not sure if I understand. So a deep neural network is an energy based model because you want to minimize the loss? Then deep learning is models are just energy based model and there's no difference?
@amrmartini39353 жыл бұрын
Wouldn't a structured SVM framework provide a backprop-able loss that avoids having to backprop through SGD? You just need a solid (or principled) learning framework where a max/min/argmax/argmin is part of the definition of the loss function.
@Laszer2714 жыл бұрын
Great presentation. I wonder how much time did it take you to understand such a paper (not taking into account planning out this presentation)
@nbrpwng4 жыл бұрын
So if it can be considered that inferring an x or a or w from the others, using an existing energy function, is “learning”, then maybe learning the energy function parameters is “meta learning” in a way? But maybe not, and I guess it’s just a less important matter of definition.
@YannicKilcher4 жыл бұрын
That's a good observation! It's maybe a bit tricky to call that learning, as we usually understand learning as something that has a lasting impact over time. Here, once you have inferred an x, you go to the next one and start over.
@blizzard0723 жыл бұрын
Wouldn't this connet somehow with the recent iMAML paper that you reviewed? Backpropagating through SGD seemed worth trying
@jrkirby934 жыл бұрын
So if I understand correctly, w is rather arbitrary. It depends specifically on the energy function and how it's trained. I guess if they have n concepts to learn, they make w an n dim vector, and encode each concept with 1-hot. This paper does not explore out of distribution concepts, but I suppose theoretically, you could interpolate them. In all these problems the elements of x are positionally independent. If you swap the first and last element of x, and swap the first and last of the attention vector, you ought to get the same result. Do they test that this is true in practice? Does this technique require positional independence? Could enforcing positional independence more strictly give performance benefits? If you make a neural net piecewise linear the entire way through, you can calculate the function of the loss (or energy) with respect to a single parameter completely, and find the minima of that function in a computationally efficient manner. This is the key component of my current research. I wonder if this concept learning would benefit from attempting that instead of gradient descent.
@sexybunny223-v5s4 күн бұрын
hi, how was your research going? sounds quite interesting
@SergeyVBD4 жыл бұрын
This is not a new idea to use gradient descent at inference. Ive definitely seen classic computer vision algorithms that have done this. Is deep learning now considered classic machine learning lol? I think the main contribution of this paper is the formulation of these concepts. That seems promising.
@amrmartini39353 жыл бұрын
Yeah structured models and EBMs are full of this. The looped inference is a major bottleneck for any computational learning research in this area. It's why the computational community has moved away from PGMs in the first place.
@joirnpettersen4 жыл бұрын
Interesting concept. Do you know why it has the name "Energy" function? Is it like, the more energy the more unstable it is?
@YannicKilcher4 жыл бұрын
I think it comes from physics. Think of the potential energy of a pendulum, for example. It will converge to the place where this energy is the lowest. I might be very wrong, though.
@joirnpettersen4 жыл бұрын
@@YannicKilcher Oh yeah, of course. Like how Snell's law can be thought about as minimizing the energy during travel of the light ray.
@BJCaasenbrood4 жыл бұрын
I agree with Yannic. The energy function is positive for all values of X, and close to zero for an equilibrium. The name also implies that the unknown energy function E(x) is differentable in contrast to any generic objective functions in AI. Generally, in physics, they aim to minimize the potential energy function to find the solution to complex nonlinear problems. Also through gradient decent methods. The advantage is that the Hessian (i.e., twice differentiation of E(x) w.r.t. X) is always positive definite since the energy function is always positively increasing for every X, similar to an elastic spring storing more energy the further you stretch it. An energy function, which is just a definition of a thing with similar characteristics like potential energy, offfers therefore good numerical stability and convergence!
@snippletrap4 жыл бұрын
@@YannicKilcher It does come from physics, but the lineage is through Hopfield nets and the Ising models that inspired them.
@theodorosgalanos96634 жыл бұрын
Yannic, from your point of view, as a highly experienced researcher and a person who dissects papers like this 'for a living', how hard would it be to write the code for this one? I haven't found anything online and I wonder the reason it wasn't shared is that it might be a bit..difficult or hard to organize?
@YannicKilcher4 жыл бұрын
No I think as long as you have a clear picture in your mind of what is the "dataset", what counts as X and what is Y in each sample, you should be fine. The only engineering difficulty here is backpropagating through the inner optimization procedure.
@sau002 Жыл бұрын
Thank you.
@xgplayer4 жыл бұрын
But you do perform gradient descent when training the generator in the GANs framework, don't you?
@YannicKilcher4 жыл бұрын
Yes, but the gradient descent isn't part of the model itself.
@Chhillee4 жыл бұрын
This paper wasn't published anywhere was it? I see an ICLR workshop version, but the full version doesn't seem to have been accepted at any conference.
@CristianGarcia4 жыл бұрын
Welcome to arvix
@patrickjdarrow4 жыл бұрын
"you can gradient descent on colors" = 🤯
@vsiegel4 жыл бұрын
You can even do that on cats!
@datgatto39114 жыл бұрын
Nice video, p/s: "nice demonstration" of Discriminator of GAN 05:44 =)))
@MrAlextorex4 жыл бұрын
Found better justifications in a slide here. When Y is high dimensional (or simply conbinatorial), normalizing becomes intractable...See: cs.nyu.edu/~yann/talks/lecun-20050719-ipam-2-ebm.pdf
@AI_ML_DL_LLM9 ай бұрын
EBM is coming to fruition considering the recent leak on Q*
@HB-kl5ik9 ай бұрын
That guy is stupid on Twitter who leaked that
@Elstuhn11 ай бұрын
Bro I'm laughing so hard at 5:23 rn I'm so sorry for being so immature