05L - Joint embedding method and latent variable energy based models (LV-EBMs)

Рет қаралды 24,882

Күн бұрын

Пікірлер: 67

@COOLZZist 2 жыл бұрын

Love the energy from Prof. Yann LeCun, just from his excitement on the topic and the small smiles he has when he is talking about how fresh this content is, is amazing. Thanks a lot Prof. Alfredo!

@alfcnz 2 жыл бұрын

😄😄😄

@oguzhanercan4701 9 ай бұрын

The most important video all around the internet for comp. vis. researchers. I watch the video several times in a year regularly.

@alfcnz 9 ай бұрын

😀😀😀

@gonzalopolo2612 3 жыл бұрын

Again @Alfredo Canziani thank you very much for making this public, this is an amazing content. I have several questions (I refer to the instant(s) in the video): 16:34 and 50:43 => Unconditional model is when the input is partially observed but you dont know exactly what part. - What is test/inference in these unconditional EBM models? Is there a proper split between training and inference/test in the unconditional models? - How does models like PCA or K-means fit here, what are the partially observed inputs Y? For example in K-MEans you receive all the components of Y, I dont see that they are partially observed 25:10 and 1:01:50 => With the joint embedding architecture - What would be inference with this architecture, inferring a Y from a given X minimizing the cost C(h, h')? I know that you could run gradient descent to the Y backward the Pred(y) network but it is not clear to me the purpose of inferring Y given X in this architecure. - What does the "Advange: no pixel-level reconstruction" in green means? (I suspect that this may have something to do with my just above question) - Can this architecture also be trained as a Latent Variable EBM? or it is always trained in a Contrastive way

@damnit258 3 жыл бұрын

gold, ive watched the last year's lectures and i'm filling the gaps with this year's ones.

@alfcnz 3 жыл бұрын

💛🧡💛

@anondoggo 2 жыл бұрын

Dr. Yann only mentioned this in passing at 20:00 , but I just wanted to clarify, why does EBM offer more flexibility in choice of scores and objective functions? It's from page 9 on the slides. Thank you!

@anondoggo 2 жыл бұрын

nvm, I should have just watched on, at 1:04:27 Yann explained how probabilistic models are EBM where the objective function is NLL.

@anondoggo 2 жыл бұрын

then by extension, the scoring function for a probabilistic model is probably restricted to a probability.

@anondoggo 2 жыл бұрын

the info at 18:17 is underrated

@MehranZiadloo 10 ай бұрын

Not to be nitpicking but I believe there's a minus missing @49:22 in denominator of P(y|x) at the far end (right side of screen) behind the beta.

@alfcnz 10 ай бұрын

Oh, yes indeed! Yann is s little heedless when crafting slides 😅

@MehranZiadloo 10 ай бұрын

@@alfcnz These things happen. I just waned to make sure that I'm following the calculations correctly. Thanks for confirmation.

@alfcnz 10 ай бұрын

Sure sure 😊

@RJRyan 3 жыл бұрын

Thank you so much for making these lectures public! The slides are very difficult to read because of being overlaid over Yann's face and the background image. I imagine this could be an accessibility issue for anyone with vision impairments, too.

@alfcnz 2 жыл бұрын

That's why we provide the slides 🙂🙂🙂

@lucamatteobarbieri2493 Жыл бұрын

A cool thing about prediction systems is that they can be used also to predict the past, not only the future. For example if you see something falling you both intuitively predict where is going and where it came from.

@kalokng3572 2 жыл бұрын

Hi Alfredo thank you for making the course public. It is super useful especially to those who are self-learning cutting-edge AI concept and I've found EBM a fascinating one. I have a question regarding EBM: How should I describe "overfitting" in the context of EBM? Does that mean the energy landscape have very small volume surrounding the training sample data points?

@alfcnz 2 жыл бұрын

You're welcome. And yes, precisely. And underfitting would be having a flat manifold.

@buoyrina9669 2 жыл бұрын

I guess i need to watch many times to get what Yann was trying to explain :)

@alfcnz 2 жыл бұрын

It's alright. It took me ‘only’ 5 repetitions 😅😅😅

@arashjavanmard5911 2 жыл бұрын

Great lecture, thanks a lot. But it would be also great if you could tell us a reference book or publications for this lecture. Thanks a lot in advance.

@alfcnz 2 жыл бұрын

I'm writing the book right now. A bit of patience, please 😅😅😅

@Vikram-wx4hg 2 жыл бұрын

@@alfcnz Looking forward to the book Alfredo. Can you give a ball park estimate of the 'patience' here? :-)

@alfcnz 2 жыл бұрын

End of summer ‘22 the first draft will see the light.

@anondoggo 2 жыл бұрын

@@alfcnz omg, I'm so excited

@iamyouu 3 жыл бұрын

Is there any book that i can read from to know more about these methods. thank you.

@alfcnz 3 жыл бұрын

I'm writing the book. It'll take some time.

@iamyouu 3 жыл бұрын

@@alfcnz thank you so much!

@alfcnz 3 жыл бұрын

❤️❤️❤️

@my_master55 2 жыл бұрын

Hi, Alfredo 👋 Am I missing something, or in this lecture there is no "non-contrastive joint embeddings" methods Yann was talking about at 1:34:40 ? I also briefly checked the next lectures but didn't find something related to this. Could you please point me out? 😇 Thank you for the video, btw, brilliant as always :)

@anondoggo 2 жыл бұрын

If you open the slides for lecture 6 you can find a whole page on non-contrastive embeddings.

@aljjxw 3 жыл бұрын

What are the research papers from Facebook mentioned around 1:30?

@alfcnz 3 жыл бұрын

All references are written on the slides. At that timestamp I don't hear Yann mentioning any paper.

@hamedgholami261 2 жыл бұрын

so that is what contrastive learning is all about!

@alfcnz 2 жыл бұрын

It seems so 😀😀😀

@SnoSixtyTwo 2 жыл бұрын

Thanks a whole bunch for this lecture, after two times I think I'm starting to grasp it :) One thing that confuses me though is: in the very beginning, it is mentioned that x may or may not be adapted when going for the optimum location. I cannot quickly come up with an example where I would want that? Wouldn't that mean I am just discarding the info in x and - in the case of modeling with latent variables - now my inference becomes a function of z exclusively?

@alfcnz 2 жыл бұрын

You need to write down the timestamp in minutes:seconds if you want me to be able to address any particular aspect of the video.

@SnoSixtyTwo 2 жыл бұрын

@@alfcnz Thanks for taking the time to respond! Here we go, 15:20

@НиколайНовичков-е1э 3 жыл бұрын

Thank you, Alfredo! :)

@alfcnz 3 жыл бұрын

Пожалуйста 🥰🥰🥰

@mpalaourg8597 3 жыл бұрын

I tried to calculate the derivative Yann said (1:07:45), but probably I am missing something because in my final result I don't have the integral (only -P_w(.) ...). Is there any supplementary material with these calculations? Thanks again for your amazing and hard work!

@alfcnz 3 жыл бұрын

Uh… can you share your calculations? I can have a look. Maybe post them in the Discord server, maths room, so that others may be able to help as well.

@mpalaourg8597 3 жыл бұрын

@@alfcnz It was my bad. I... misunderstand the formula of P_w(y/x) and thought that was an integral at the numerator (over all y's), but that didn't make any sense to me and checked again your notes and ...voilà I got the right answer. Is the discord open to us too? I thought only for students of NY. I definitely join then (learning alone, isn't fun :P).

@alfcnz 3 жыл бұрын

Discord is for *non* NYU students. I have another communication system set up for them.

@hamidrezaheidarian8207 11 ай бұрын

Hi Alfredo, which book on DL do you recommend that has the same sort of structure as the content of this course?

@alfcnz 11 ай бұрын

The one I’m writing 😇

@hamidrezaheidarian8207 11 ай бұрын

@@alfcnz Great, I think it would be a great companion to these lectures, looking forward to it.

@ShihgianLee 2 жыл бұрын

I spent some time to derive the step mention in 1:07:44. I made my best effort to get the final result. But, I am not sure if my steps are correct. I hope my fellow students can help to point out my mistakes. Due to the lack of LaTex support in KZbin comment, I try my best to make my steps as clear as possible. I use partial derivative for log to get to the second step. Then, I use Leibniz integral rule to move the partial derivative inside the integral in the third step. The rest is pretty straightforward, hopefully. Thank you! ∂/∂w (1/β) log[ ∫y′ exp[−βFw(x, y')] ] = (1/β) [1/∫y′ exp[−βFw(x, y')] ∂/∂w ∫y′ exp[−βFw(x, y')] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ ∂/∂w exp[−βFw(x, y')]] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w −βFw(x, y')] = - [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w Fw(x, y')] = - [∫y′ exp[−βFw(x, y')/∫y′ exp[−βFw(x, y')] [∂/∂w Fw(x, y')] = - ∫y′ Pw(y'|x) ∂/∂w Fw(x, y')

@hamedgholami261 2 жыл бұрын

can you put a link to a latex file? I did the derivative and maybe able to help.

@cambridgebreaths3581 3 жыл бұрын

Perfect. Thank you so much :)

@alfcnz 3 жыл бұрын

😇😇😇

@bmahlbrand 3 жыл бұрын

How do you use autograd in pytorch for "nonstochastic" gradient descent?

@shiftedabsurdity 3 жыл бұрын

probably conjugate gradient

@alfcnz 3 жыл бұрын

If the function I have is not approximate (not like the per-batch approximation of the dataset loss), then you're performing non-stochastic GD. The stochasticity comes from the approximation to the objective function.

@vageta008 3 жыл бұрын

Interesting, Energy based models do something very similar to metric learning. (Or am I missing something?).

@alfcnz 2 жыл бұрын

Indeed metric learning can be formulated as an energy model. I'd say energy models are like a large umbrella under which many conventional models can be recast.