Concept Learning with Energy-Based Models (Paper Explained)

Рет қаралды 33,258

Yannic Kilcher

Күн бұрын

Пікірлер: 54

@Chhillee 4 жыл бұрын

1:16 Overview of energy based models 15:20 Start of the paper 30:05 Experiments

@BJCaasenbrood 4 жыл бұрын

Love your style of presenting papers, its clear and well-structured. Keep up the good work!

@PeterOtt 4 жыл бұрын

I’m 10 minutes in and so far this is a great summary of what energy based learning is, I’ve heard the name but had no idea what it was before now!

@matterhart 4 жыл бұрын

Best video yet, the longer intro was totally worth it. Timestamps would be great though.

@TijsMaas 4 жыл бұрын

Nice explanation. I was going to say reminds me a lot of neurosymbolic concept learning, however, just found out this work was published before NS-CL.

@slackstation 4 жыл бұрын

Great explanation. It's both a new and challenging concept mathematically. Thank you for the clear explanation.

@jugrajsingh3299 4 жыл бұрын

Great way of presenting link between current knowledge with problem and solution addressed by paper in a simple way 😃

@herp_derpingson 4 жыл бұрын

Once we have GPUs large enough, this would be a game changer in solving abstract reasoning problems and procedurally generated matrices.

@atursams6471 3 жыл бұрын

28:00 Could you explain what backpropagation through an optimization procedure means?

@pastrop2003 4 жыл бұрын

Somehow, I start to think that if this model is further developed and then married up with a lot of compute we may get something looking like AGI?????

@welcomeaioverlords 4 жыл бұрын

Super interesting, thanks for breaking it down!

@vojtechkubin1590 4 жыл бұрын

Genius design, beautiful presentation! But there is one thing I don't understand: "why only 11.5K subscribers"?

@YannicKilcher 4 жыл бұрын

It's a very exclusive club ;)

@snippletrap 4 жыл бұрын

Almost quadrupled by now

@sarvagyagupta1744 2 жыл бұрын

This is a great paper and great job explaining it. I kept on wondering while watching this if this is the concept behind the attention mechanism?

@vsiegel 4 жыл бұрын

A good example for something mind bending: Imagine a differentiable cat.

@CristianGarcia 4 жыл бұрын

Great video! This reminds me of the differentiable ODEs paper.

@lislouise2305 3 жыл бұрын

I'm not sure if I understand. So a deep neural network is an energy based model because you want to minimize the loss? Then deep learning is models are just energy based model and there's no difference?

@amrmartini3935 3 жыл бұрын

Wouldn't a structured SVM framework provide a backprop-able loss that avoids having to backprop through SGD? You just need a solid (or principled) learning framework where a max/min/argmax/argmin is part of the definition of the loss function.

@Laszer271 4 жыл бұрын

Great presentation. I wonder how much time did it take you to understand such a paper (not taking into account planning out this presentation)

@nbrpwng 4 жыл бұрын

So if it can be considered that inferring an x or a or w from the others, using an existing energy function, is “learning”, then maybe learning the energy function parameters is “meta learning” in a way? But maybe not, and I guess it’s just a less important matter of definition.

@YannicKilcher 4 жыл бұрын

That's a good observation! It's maybe a bit tricky to call that learning, as we usually understand learning as something that has a lasting impact over time. Here, once you have inferred an x, you go to the next one and start over.

@blizzard072 3 жыл бұрын

Wouldn't this connet somehow with the recent iMAML paper that you reviewed? Backpropagating through SGD seemed worth trying

@jrkirby93 4 жыл бұрын

So if I understand correctly, w is rather arbitrary. It depends specifically on the energy function and how it's trained. I guess if they have n concepts to learn, they make w an n dim vector, and encode each concept with 1-hot. This paper does not explore out of distribution concepts, but I suppose theoretically, you could interpolate them. In all these problems the elements of x are positionally independent. If you swap the first and last element of x, and swap the first and last of the attention vector, you ought to get the same result. Do they test that this is true in practice? Does this technique require positional independence? Could enforcing positional independence more strictly give performance benefits? If you make a neural net piecewise linear the entire way through, you can calculate the function of the loss (or energy) with respect to a single parameter completely, and find the minima of that function in a computationally efficient manner. This is the key component of my current research. I wonder if this concept learning would benefit from attempting that instead of gradient descent.

@sexybunny223-v5s 4 күн бұрын

hi, how was your research going? sounds quite interesting

@SergeyVBD 4 жыл бұрын

This is not a new idea to use gradient descent at inference. Ive definitely seen classic computer vision algorithms that have done this. Is deep learning now considered classic machine learning lol? I think the main contribution of this paper is the formulation of these concepts. That seems promising.

@amrmartini3935 3 жыл бұрын

Yeah structured models and EBMs are full of this. The looped inference is a major bottleneck for any computational learning research in this area. It's why the computational community has moved away from PGMs in the first place.

@joirnpettersen 4 жыл бұрын

Interesting concept. Do you know why it has the name "Energy" function? Is it like, the more energy the more unstable it is?

@YannicKilcher 4 жыл бұрын

I think it comes from physics. Think of the potential energy of a pendulum, for example. It will converge to the place where this energy is the lowest. I might be very wrong, though.

@joirnpettersen 4 жыл бұрын

@@YannicKilcher Oh yeah, of course. Like how Snell's law can be thought about as minimizing the energy during travel of the light ray.

@BJCaasenbrood 4 жыл бұрын

I agree with Yannic. The energy function is positive for all values of X, and close to zero for an equilibrium. The name also implies that the unknown energy function E(x) is differentable in contrast to any generic objective functions in AI. Generally, in physics, they aim to minimize the potential energy function to find the solution to complex nonlinear problems. Also through gradient decent methods. The advantage is that the Hessian (i.e., twice differentiation of E(x) w.r.t. X) is always positive definite since the energy function is always positively increasing for every X, similar to an elastic spring storing more energy the further you stretch it. An energy function, which is just a definition of a thing with similar characteristics like potential energy, offfers therefore good numerical stability and convergence!

@snippletrap 4 жыл бұрын

@@YannicKilcher It does come from physics, but the lineage is through Hopfield nets and the Ising models that inspired them.

@theodorosgalanos9663 4 жыл бұрын

Yannic, from your point of view, as a highly experienced researcher and a person who dissects papers like this 'for a living', how hard would it be to write the code for this one? I haven't found anything online and I wonder the reason it wasn't shared is that it might be a bit..difficult or hard to organize?

@YannicKilcher 4 жыл бұрын

No I think as long as you have a clear picture in your mind of what is the "dataset", what counts as X and what is Y in each sample, you should be fine. The only engineering difficulty here is backpropagating through the inner optimization procedure.

@sau002 Жыл бұрын

Thank you.

@xgplayer 4 жыл бұрын

But you do perform gradient descent when training the generator in the GANs framework, don't you?

@YannicKilcher 4 жыл бұрын

Yes, but the gradient descent isn't part of the model itself.

@Chhillee 4 жыл бұрын

This paper wasn't published anywhere was it? I see an ICLR workshop version, but the full version doesn't seem to have been accepted at any conference.

@CristianGarcia 4 жыл бұрын

Welcome to arvix

@patrickjdarrow 4 жыл бұрын

"you can gradient descent on colors" = 🤯

@vsiegel 4 жыл бұрын

You can even do that on cats!

@datgatto3911 4 жыл бұрын

Nice video, p/s: "nice demonstration" of Discriminator of GAN 05:44 =)))

@MrAlextorex 4 жыл бұрын

Found better justifications in a slide here. When Y is high dimensional (or simply conbinatorial), normalizing becomes intractable...See: cs.nyu.edu/~yann/talks/lecun-20050719-ipam-2-ebm.pdf