Love the energy from Prof. Yann LeCun, just from his excitement on the topic and the small smiles he has when he is talking about how fresh this content is, is amazing. Thanks a lot Prof. Alfredo!
@alfcnz2 жыл бұрын
😄😄😄
@oguzhanercan47019 ай бұрын
The most important video all around the internet for comp. vis. researchers. I watch the video several times in a year regularly.
@alfcnz9 ай бұрын
😀😀😀
@gonzalopolo26123 жыл бұрын
Again @Alfredo Canziani thank you very much for making this public, this is an amazing content. I have several questions (I refer to the instant(s) in the video): 16:34 and 50:43 => Unconditional model is when the input is partially observed but you dont know exactly what part. - What is test/inference in these unconditional EBM models? Is there a proper split between training and inference/test in the unconditional models? - How does models like PCA or K-means fit here, what are the partially observed inputs Y? For example in K-MEans you receive all the components of Y, I dont see that they are partially observed 25:10 and 1:01:50 => With the joint embedding architecture - What would be inference with this architecture, inferring a Y from a given X minimizing the cost C(h, h')? I know that you could run gradient descent to the Y backward the Pred(y) network but it is not clear to me the purpose of inferring Y given X in this architecure. - What does the "Advange: no pixel-level reconstruction" in green means? (I suspect that this may have something to do with my just above question) - Can this architecture also be trained as a Latent Variable EBM? or it is always trained in a Contrastive way
@damnit2583 жыл бұрын
gold, ive watched the last year's lectures and i'm filling the gaps with this year's ones.
@alfcnz3 жыл бұрын
💛🧡💛
@anondoggo2 жыл бұрын
Dr. Yann only mentioned this in passing at 20:00 , but I just wanted to clarify, why does EBM offer more flexibility in choice of scores and objective functions? It's from page 9 on the slides. Thank you!
@anondoggo2 жыл бұрын
nvm, I should have just watched on, at 1:04:27 Yann explained how probabilistic models are EBM where the objective function is NLL.
@anondoggo2 жыл бұрын
then by extension, the scoring function for a probabilistic model is probably restricted to a probability.
@anondoggo2 жыл бұрын
the info at 18:17 is underrated
@MehranZiadloo10 ай бұрын
Not to be nitpicking but I believe there's a minus missing @49:22 in denominator of P(y|x) at the far end (right side of screen) behind the beta.
@alfcnz10 ай бұрын
Oh, yes indeed! Yann is s little heedless when crafting slides 😅
@MehranZiadloo10 ай бұрын
@@alfcnz These things happen. I just waned to make sure that I'm following the calculations correctly. Thanks for confirmation.
@alfcnz10 ай бұрын
Sure sure 😊
@RJRyan3 жыл бұрын
Thank you so much for making these lectures public! The slides are very difficult to read because of being overlaid over Yann's face and the background image. I imagine this could be an accessibility issue for anyone with vision impairments, too.
@alfcnz2 жыл бұрын
That's why we provide the slides 🙂🙂🙂
@lucamatteobarbieri2493 Жыл бұрын
A cool thing about prediction systems is that they can be used also to predict the past, not only the future. For example if you see something falling you both intuitively predict where is going and where it came from.
@kalokng35722 жыл бұрын
Hi Alfredo thank you for making the course public. It is super useful especially to those who are self-learning cutting-edge AI concept and I've found EBM a fascinating one. I have a question regarding EBM: How should I describe "overfitting" in the context of EBM? Does that mean the energy landscape have very small volume surrounding the training sample data points?
@alfcnz2 жыл бұрын
You're welcome. And yes, precisely. And underfitting would be having a flat manifold.
@buoyrina96692 жыл бұрын
I guess i need to watch many times to get what Yann was trying to explain :)
@alfcnz2 жыл бұрын
It's alright. It took me ‘only’ 5 repetitions 😅😅😅
@arashjavanmard59112 жыл бұрын
Great lecture, thanks a lot. But it would be also great if you could tell us a reference book or publications for this lecture. Thanks a lot in advance.
@alfcnz2 жыл бұрын
I'm writing the book right now. A bit of patience, please 😅😅😅
@Vikram-wx4hg2 жыл бұрын
@@alfcnz Looking forward to the book Alfredo. Can you give a ball park estimate of the 'patience' here? :-)
@alfcnz2 жыл бұрын
End of summer ‘22 the first draft will see the light.
@anondoggo2 жыл бұрын
@@alfcnz omg, I'm so excited
@iamyouu3 жыл бұрын
Is there any book that i can read from to know more about these methods. thank you.
@alfcnz3 жыл бұрын
I'm writing the book. It'll take some time.
@iamyouu3 жыл бұрын
@@alfcnz thank you so much!
@alfcnz3 жыл бұрын
❤️❤️❤️
@my_master552 жыл бұрын
Hi, Alfredo 👋 Am I missing something, or in this lecture there is no "non-contrastive joint embeddings" methods Yann was talking about at 1:34:40 ? I also briefly checked the next lectures but didn't find something related to this. Could you please point me out? 😇 Thank you for the video, btw, brilliant as always :)
@anondoggo2 жыл бұрын
If you open the slides for lecture 6 you can find a whole page on non-contrastive embeddings.
@aljjxw3 жыл бұрын
What are the research papers from Facebook mentioned around 1:30?
@alfcnz3 жыл бұрын
All references are written on the slides. At that timestamp I don't hear Yann mentioning any paper.
@hamedgholami2612 жыл бұрын
so that is what contrastive learning is all about!
@alfcnz2 жыл бұрын
It seems so 😀😀😀
@SnoSixtyTwo2 жыл бұрын
Thanks a whole bunch for this lecture, after two times I think I'm starting to grasp it :) One thing that confuses me though is: in the very beginning, it is mentioned that x may or may not be adapted when going for the optimum location. I cannot quickly come up with an example where I would want that? Wouldn't that mean I am just discarding the info in x and - in the case of modeling with latent variables - now my inference becomes a function of z exclusively?
@alfcnz2 жыл бұрын
You need to write down the timestamp in minutes:seconds if you want me to be able to address any particular aspect of the video.
@SnoSixtyTwo2 жыл бұрын
@@alfcnz Thanks for taking the time to respond! Here we go, 15:20
@НиколайНовичков-е1э3 жыл бұрын
Thank you, Alfredo! :)
@alfcnz3 жыл бұрын
Пожалуйста 🥰🥰🥰
@mpalaourg85973 жыл бұрын
I tried to calculate the derivative Yann said (1:07:45), but probably I am missing something because in my final result I don't have the integral (only -P_w(.) ...). Is there any supplementary material with these calculations? Thanks again for your amazing and hard work!
@alfcnz3 жыл бұрын
Uh… can you share your calculations? I can have a look. Maybe post them in the Discord server, maths room, so that others may be able to help as well.
@mpalaourg85973 жыл бұрын
@@alfcnz It was my bad. I... misunderstand the formula of P_w(y/x) and thought that was an integral at the numerator (over all y's), but that didn't make any sense to me and checked again your notes and ...voilà I got the right answer. Is the discord open to us too? I thought only for students of NY. I definitely join then (learning alone, isn't fun :P).
@alfcnz3 жыл бұрын
Discord is for *non* NYU students. I have another communication system set up for them.
@hamidrezaheidarian820711 ай бұрын
Hi Alfredo, which book on DL do you recommend that has the same sort of structure as the content of this course?
@alfcnz11 ай бұрын
The one I’m writing 😇
@hamidrezaheidarian820711 ай бұрын
@@alfcnz Great, I think it would be a great companion to these lectures, looking forward to it.
@ShihgianLee2 жыл бұрын
I spent some time to derive the step mention in 1:07:44. I made my best effort to get the final result. But, I am not sure if my steps are correct. I hope my fellow students can help to point out my mistakes. Due to the lack of LaTex support in KZbin comment, I try my best to make my steps as clear as possible. I use partial derivative for log to get to the second step. Then, I use Leibniz integral rule to move the partial derivative inside the integral in the third step. The rest is pretty straightforward, hopefully. Thank you! ∂/∂w (1/β) log[ ∫y′ exp[−βFw(x, y')] ] = (1/β) [1/∫y′ exp[−βFw(x, y')] ∂/∂w ∫y′ exp[−βFw(x, y')] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ ∂/∂w exp[−βFw(x, y')]] = (1/β) [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w −βFw(x, y')] = - [1/∫y′ exp[−βFw(x, y')] [∫y′ exp[−βFw(x, y')] ∂/∂w Fw(x, y')] = - [∫y′ exp[−βFw(x, y')/∫y′ exp[−βFw(x, y')] [∂/∂w Fw(x, y')] = - ∫y′ Pw(y'|x) ∂/∂w Fw(x, y')
@hamedgholami2612 жыл бұрын
can you put a link to a latex file? I did the derivative and maybe able to help.
@cambridgebreaths35813 жыл бұрын
Perfect. Thank you so much :)
@alfcnz3 жыл бұрын
😇😇😇
@bmahlbrand3 жыл бұрын
How do you use autograd in pytorch for "nonstochastic" gradient descent?
@shiftedabsurdity3 жыл бұрын
probably conjugate gradient
@alfcnz3 жыл бұрын
If the function I have is not approximate (not like the per-batch approximation of the dataset loss), then you're performing non-stochastic GD. The stochasticity comes from the approximation to the objective function.
@vageta0083 жыл бұрын
Interesting, Energy based models do something very similar to metric learning. (Or am I missing something?).
@alfcnz2 жыл бұрын
Indeed metric learning can be formulated as an energy model. I'd say energy models are like a large umbrella under which many conventional models can be recast.
@arcman94363 жыл бұрын
Very Interesting
@alfcnz3 жыл бұрын
🧐🧐🧐
@pratik2453 жыл бұрын
French language seems to be more suited for misic.. Has a sweet tonality..