Really appreciate your generosity in sharing these courses online. We live in the best era.
@alfcnz3 жыл бұрын
😇😇😇
@hasanrants29 күн бұрын
Alfredo thank you very much for this concrete explanation. Yann's lectures are full of dense knowledge and I had some doubts and gaps that are filled by this practicum. Appreciated/
@alfcnz28 күн бұрын
🥳🥳🥳
@donthulasumanth541513 күн бұрын
@23:59 Number of trainable params in decoder are 2. One for w1cos(z), one for w2sin(z) respectively.
@alfcnz11 күн бұрын
Precisely 🙂
@punkbuster893 жыл бұрын
"we gonna actually see next week how do we learn shi... eeerm stuff...." Cracked me up XD thanks for the amazing course BTW, really enjoying it!
@alfcnz3 жыл бұрын
🥳🥳🥳
@bibhabasumohapatra Жыл бұрын
I have seen this video 5-6 times last 1 year. but Now when I understood 50% of it I was like F. Insane . . . .Amazing
@alfcnz Жыл бұрын
Check out the 2023 edition! It’s much better! 🥳🥳🥳
@abdulmajidmurad46673 жыл бұрын
Thanks alfredo (pretty cool plots btw).
@alfcnz3 жыл бұрын
😍😍😍
@soumyasarkar41003 жыл бұрын
wow what cool visualisations !!
@alfcnz3 жыл бұрын
🎨🖌️👨🏼🎨
@kseniiaikonnikova5763 жыл бұрын
yay, new video! 😍 Alfredo, many thanks!
@alfcnz3 жыл бұрын
🥳🥳🥳
@НиколайНовичков-е1э3 жыл бұрын
Thank you, Alfredo!
@alfcnz3 жыл бұрын
Добре дошъл, Николай. 😊😊😊
@НиколайНовичков-е1э3 жыл бұрын
It's in another language :))))
@alfcnz3 жыл бұрын
Упс, а вы говорите по-русски?
@НиколайНовичков-е1э3 жыл бұрын
Ага, это мой родной язык. Я из России :))
@alfcnz3 жыл бұрын
@@НиколайНовичков-е1э Ха-ха, ладно, раньше я ошибался языком 😅😅😅
@rsilveira793 жыл бұрын
Awesome material, really looking forward to the next classes. Thanks for all the effort you put on designing this classes
@alfcnz3 жыл бұрын
You're welcome 😁
@bibhabasumohapatra Жыл бұрын
Basically in Layman terms we are choosing the best y_pred out of n number of y_pred for each ground truth y. Right?
@alfcnz6 ай бұрын
Yes, that’s correct! And we need to do so because otherwise we would be learning to predict the average target y.
@pastrop20033 жыл бұрын
Thank you, Alfredo, great video. I have been reading about the energy models for a few weeks already and still have a nagging question: Does energy function is a generalized loss function? I keep thinking that I can reframe any traditional neural network loss as an energy function. What do I miss here?
@alfcnz3 жыл бұрын
Next episode I'll talk about the loss. Stay tuned.
@robinranabhat3125 Жыл бұрын
THANK YOU !
@alfcnz Жыл бұрын
You're welcome! 🥰🥰🥰
@chetanpandey8722 Жыл бұрын
Thank you for posting making such amazing videos. I have a doubt. Whenever you are talking about optimizing the energy function to find the minimum value you are saying that we should be using gradient descent and not stochastic gradient descent. In my understanding, in gradient descent we calculate the gradients using the whole dataset and then make an update while in stochastic case we take random data points to calculate the gradient and then make the update. So I am not able to understand what is the problem with stochastic gradient descent
@alfcnz10 ай бұрын
The energy is a scalar value for a given input x, y, z. You want to minimise this energy, for example, wrt z. There’s nothing stochastic here. When training a model, we minimise the loss by following a noisy gradient computed for a given per-sample (or per-batch) loss.
@mikhaeldito3 жыл бұрын
Office hours, please!
@alfcnz3 жыл бұрын
Okay, okay.
@francistembo6503 жыл бұрын
First comment. Couldn't resist my favourite teacher's video.
@alfcnz3 жыл бұрын
❤️❤️❤️
@blackcurrant07452 жыл бұрын
At 17:56 you say there are 48 different z's, then later you have only 24 of them at 29:43, and then later yet one can count 48 lilac z points in the graph at 42:45. What's the reason of changing the number of z points back and forth?
@alfcnz2 жыл бұрын
Good catch! z is continuous. The number of distinct values I pick is arbitrary. In the following edition of these slides there are no more distinct z points but they are shown as a continuous line. So, there are infinitely many z. Why 24 and 48. 24 are my y. I used to generate them with 24 equally spaced z. When I show the ‘continuous’ manifold, I should show more points than training samples. So, I doubled them. Hence 48. It looks like I didn't use the doubled version for the plot with the 24 squares. In the following edition of this lesson (not online, because only minor changes have been made and these videos take me forever to put together) and in the book (which replaces my video editing time) there are no more discrete dots for the latents.
@WolfgangWaltenberger3 жыл бұрын
These are super cool, pedagogical videos. I wonder what software stack you guys are using to produce them.
@alfcnz3 жыл бұрын
Hum, PowerPoint, LaTeXiT, matplotlib, Zoom, Adobe After Effects and Premiere.
@sutharsanmahendren10713 жыл бұрын
Thank you for your great explanation and make your course material available to all. I have a small doubt at 45:25 where you compute energy from all inferences from z samples. Is it the right way to use euclidian distance for computing distance from the reference point (y) to all the points(y hat) in the manifold. ? Will it is more appropriate if points from the bottom half of the manifold resulted in more energy than the first half?
@alfcnz3 жыл бұрын
E is a function of y and z. Given a y, say y', E is a function of z only. What I'm computing there is E(y', z) for a few values of z. In this example, for every z the decoder will give me ỹ. Finally, the energy function of choice, in this case, is the reconstruction error.
@sutharsanmahendren10713 жыл бұрын
@@alfcnz Thank you so much for your reply. I understand that reconstruction error is one of the choices for energy function here.
@alfcnz3 жыл бұрын
@@sutharsanmahendren1071 okay, so your question is… not yet answered? Or did I nail it above?
@sutharsanmahendren10713 жыл бұрын
@@alfcnzActually my question is reconstruction error is the best choice for EBM ? (Funny ideas: construct KNN graph with y_hat manifold and y (observation) and find the shortest path from y to all other y_hat; instead of computing energy between two points cant we measure the energy between two distributions which are formed by y and y_hat in EBM? )
@keshavsingh4893 жыл бұрын
Great explanation. Just one question: Why is it called energy function, when it looks just like a loss function with latent variable.?
@alfcnz3 жыл бұрын
A “loss” measures the network performance and it's minimised during training. We'll see more about this in the next episode. An “energy” is an actual output produced by a model and it's used during inference. In this episode we didn't train anything, still we've used gradient descent to perform inference of latent variables.
@keshavsingh4893 жыл бұрын
Thank you soo much for explaining. Looking forward to the next lecture.
@flaskapp98853 жыл бұрын
amazing video alfredo :)
@alfcnz3 жыл бұрын
And more to come! 😇😇😇
@flaskapp98853 жыл бұрын
@@alfcnz thanks, pls make guide video to NLP engineer or something :) there is no sort of nlp engineer things on the internet:)
@alfcnz3 жыл бұрын
@@flaskapp9885 NLP engineering? 🤔🤔🤔 What is it?
@flaskapp98853 жыл бұрын
@@alfcnz yes sir, nlp engineering. IM thinking of doing that. :)
@alfcnz3 жыл бұрын
@@flaskapp9885 I don't know what that is. Can you explain?
@datonefaridze15032 жыл бұрын
You explain like Andrew Ng, giving examples are essential for proper understanding, thank you so much, great content
@alfcnz2 жыл бұрын
❤️❤️❤️
@mythorganizer42223 жыл бұрын
Hello Mr. Canziani!
@alfcnz3 жыл бұрын
“Prof” Canziani 😜
@mythorganizer42223 жыл бұрын
@@alfcnz I am sorry Professor Canziani. I want to tell you, your videos are the best learning source for people who want to study deep learning but can't afford it. Oh your videos and also deep learning by Ian Goodfellow. It is a very good book. Thank you for all the efforts you put in sir :D
@alfcnz3 жыл бұрын
😇😇😇
@anondoggo2 жыл бұрын
So inference means we're given x and y and we want to predict an energy score E(x, y)? I thought lv-ebm was supposed to produce predictions for y, better go back to the slide :/
@anondoggo2 жыл бұрын
Ok so I think what's going on is, during training y is the target for which we should give low E for; during inference, we're choosing a y that gives the lowest energy and y is an input. Mind is blown :/
@alfcnz2 жыл бұрын
I think your sentence is broken. «during inference…» we'd like to test how far a given y is from the data manifold.