Thanks for posting these! With this, you reach a very wide audience and help anyone who does not have access to such teachers and universities! 👏
@alfcnz3 жыл бұрын
Yup, that's the plan! 😎😎😎
@Navhkrin2 жыл бұрын
Hello Ms Coffee beans
@AICoffeeBreak2 жыл бұрын
@@Navhkrin Hello! ☕
@makotokinoshita63373 жыл бұрын
You’re doing a massive favor to the community who wants to access to high quality content without paying a huge amount of money. Thank you so much!
@alfcnz3 жыл бұрын
You're welcome 😇😇😇
@sutirthabiswas82733 жыл бұрын
From seeing the name of Yan in a research paper during a literature survey in my internship program , to attending his lectures is really a thriller. Quite enriching and mathematically profound stuff here. Thanks for sharing it free!
@alfcnz3 жыл бұрын
You're welcome 😊😊😊
@dr.mikeybee3 жыл бұрын
Wow! Yann is such a great teacher. I thought I knew this material fairly well, but Yann is enriching my understanding with every slide. It seems to me that his teaching method is extremely efficient. I suppose that's because he has such a deep understanding of the material.
@alfcnz3 жыл бұрын
🤓🤓🤓
@thanikhurshid74033 жыл бұрын
You are a great man. Thanks to you someone even in a third world country can learn DL from one of the inventors himself. THIS IS CRAZY!
@alfcnz3 жыл бұрын
😇😇😇
@dr.mikeybee3 жыл бұрын
At 1:05:40 Yann is explaining the two jacobians, but I was having trouble getting the intuition. Then I realized that the first jacobian was getting the gradient to modify the weights w[k+1] for function z[k+1] and the second jacobian was back propagating the gradient to function z[k] which can then be used to calculate the gradient at k for yet another jacobian to adjust weights w[k]. So one jacobian is for the parameters and the other is for the state since both the parameter variable and state variable are column vectors. Yann explains it really well. I'm amazed that I seem to be understanding this complicated mix of symbols and logic. Thank you.
@alfcnz3 жыл бұрын
👍🏻👍🏻👍🏻
@johnhammer86683 жыл бұрын
Thanks very much for the content. What a time to be alive. To hear from the master himself.
@alfcnz3 жыл бұрын
💜💜💜
@WeAsBee2 жыл бұрын
Discussion on stochastic gradient descent (12:23) and with adams (1:16:15) are great. General misconception.
@alfcnz2 жыл бұрын
🥳🥳🥳
@neuroinformaticafbf53133 жыл бұрын
I just can't believe this content is free. Amazing! Long life to Open Source! Grazie Alfredo :)
@alfcnz3 жыл бұрын
❤️❤️❤️
@OpenAITutor4 ай бұрын
I can totally see how a quantum computer could be used to perform gradient descent in all directions simultaneously, helping to find the true global minimum across all valleys in one go! 😲 It's mind-blowing to think about the potential for quantum computing to revolutionize optimization problems like this!
@fuzzylogicq3 жыл бұрын
Mehn!! these are gold.. especially for people who don't have access to these types of teachers, and methods of teaching, plus the material etc (that's a lot of people actually).
@alfcnz3 жыл бұрын
🤗🤗🤗
@НиколайНовичков-е1э3 жыл бұрын
I have watched this lecture twice in the last year. Mister LeCun is great! :)
@alfcnz3 жыл бұрын
Professor / doctor LeCun 😜
@mahdiamrollahi84563 жыл бұрын
That is my honor to learn from you and Sir...
@alfcnz3 жыл бұрын
Don't forget to subscribe to the channel and like the video to manifest your appreciation.
@mahdiamrollahi84563 жыл бұрын
@@alfcnz Ya, I did that 🤞
@alfcnz3 жыл бұрын
🥰🥰🥰
@dr.mikeybee3 жыл бұрын
Just FYI, at 1:01:00 Yann correctly says dc/dzg, but the diagram has dc/zg. Should that also be dc/dwg and dc/dwf?
@alexandrevalente99943 жыл бұрын
I really love that discussion about solving non convex problems.... finally we get out of the books ! At least we unleash our mind.
@alfcnz3 жыл бұрын
🧠🧠🧠
@sobhanahmadianmoghadam92112 жыл бұрын
Hello. Isn't ds[0] * dc / ds[0] + ds[1] * dc / ds[1] + ds[2] * dc / ds[2] = 3dc instead of dc? (At time 41:00)
@mahdiamrollahi84562 жыл бұрын
Hello Alfredo, at 1:11:50, where do we have the loops in gradient graph? Is there any prime example? Thanks
@alfcnz2 жыл бұрын
That would be a system that we don't know how to handle. Every other connection is permitted.
@mahdiamrollahi84562 жыл бұрын
@@alfcnz 🙏🌿
@dr.mikeybee3 жыл бұрын
How do you perturb the output and backprop? Earlier the derivative of cost function was 1. (around 1:50:00)
@alfcnz3 жыл бұрын
I've listened to it and there's no mention to backprop at that timestamp.
@dr.mikeybee3 жыл бұрын
@@alfcnz Thank you Alfredo. I probably messed up. It's where Yann mentions Q-learning and Deep Mind. I imagine he will cover all this in a later lecture. Thank you for doing all this. Sorry for all the comments. I'm just enjoying this challenging material a lot. I just forked your repo, and I'm starting the first notebook. Cheers!
@dr.mikeybee3 жыл бұрын
I see what I did. I gave you the end of video timestamp. My bad. LOL!
@dr.mikeybee3 жыл бұрын
It's just after 1:27:00.
@gurdeeepsinghs3 жыл бұрын
Alfredo Canziani ... drinks are on me if you ever visit India ... this is extremely high quality content!
@alfcnz3 жыл бұрын
Thanks! I prefer food, though 😅 And yes, I'm planning to come over soon-ish.
@alexandrevalente99943 жыл бұрын
About the code in PyTorch... (51:00 in the video)... the code instantiates the mynet class and stores the reference in model variable... but nowhere it calls the "forward" method... so how does the out variable receive any output from the model object? Is there some Pytorch magic which is not explained here ?
@alfcnz3 жыл бұрын
Yup. When you call a nn.Module the forward function is called after and before some other stuff.
@alexandrevalente99943 жыл бұрын
@@alfcnz O Yes! My bad... I was distracted.... indeed the mynet inherit from nn.module and I suppose that forward is the implementation of an abstract method.
@alfcnz3 жыл бұрын
Correct. 🙂🙂🙂
@lam_roger3 жыл бұрын
At the 40:41 section - is the purpose of using back propagation to find the derivative of the cost function wrt z to find the best direction to "move"? I've only gotten through half of the lecture so forgive me if this is answered later
@alfcnz3 жыл бұрын
Say z = f(wᵀx). If you know ∂C/∂z, then you can compute ∂C/∂w = ∂C/∂z ∂f/∂w.
@mpalaourg85973 жыл бұрын
Thank you so much, Alfredo, for organizing the material in such a nice and compact way for us! The insights of Yann and your examples, explanations and visualization are an awesome tool for anybody willing to learn (or to remember stuff) about deep learning. Greetings from Greece and I owe you a coffee, for your tireless effort. PS. Sorry for my bad English. I am not a native speaker.
@alfcnz3 жыл бұрын
I'm glad the content is of any help. Looking forward to get that coffee in Greece. I've never visited… 🥺🥺🥺 Hopefully I'll fix that soon. 🥳🥳🥳
@mpalaourg85973 жыл бұрын
@@alfcnz Easy fix, I'll send a Pull Request in no time!
@alfcnz3 жыл бұрын
For coming to Greece? 🤔🤔🤔
@inertialdataholic92783 жыл бұрын
51:05 shouldn't it be self.m0(z0) as it takes in the flattened input?
@alfcnz3 жыл бұрын
Of course.
@alexandrevalente99943 жыл бұрын
Does the trick explained in normalizing training samples (01:20:00) applies also to convolutional neural networks?
@alfcnz3 жыл бұрын
Indeed.
@mdragon6580 Жыл бұрын
1:02:09 The "Einstein summation convention" is being used here. The student asking the question is not familiar with this convention, and Yann doesn't seem to realize that the student is unfamiliar with this convention
@alfcnz Жыл бұрын
It’s not. It’s just a vector matrix multiplication.
@mdragon6580 Жыл бұрын
@@alfcnz Ohhh I see. I was reading ∂c/∂z_f as "the f-th entry of the vector", but it actually denotes the entire vector. Similarly, I was reading ∂z_g/∂z_f as "the (g,f)-th entry of the Jacobian, whereas it actually denotes the entire Jacobian matrix. Sorry, I misread. Yann's notation for the (i,j)-th entry of the Jacobian matrix is given in the last line of the same slide. Thank you so much Alfredo for your quick reply above! And thank you so so much for putting these videos on KZbin for everyone!
@mataharyszary3 жыл бұрын
This intimate atmosphere allows for a better understanding of the subject matter. Great questions 【ツ】 and of course great answers. Thank you
@alfcnz3 жыл бұрын
You're welcome 😁😁😁
@monanasery19926 ай бұрын
Thank you so much for sharing this 🥰 This was the best video for learning gradient descent and backpropagation.
@alfcnz6 ай бұрын
🥳🥳🥳
@dr.mikeybee3 жыл бұрын
I thought that haar-like features were not that recognizable. (1:48:00)
@mahdiamrollahi84562 жыл бұрын
So, how we can find that at least there is a pattern in our distribution, so we can find it by any model? Suppose we are going to find the md5 hash code of a string. For this one, we ourselves may know that there is not any pattern in it, but how we can find it for any other problem? Thanks
@adarshraj67213 жыл бұрын
Love from India sir.I really like the discussion & doubt clearing part. Hope to join NYU for my MS in 2023.:)
@alfcnz3 жыл бұрын
💜💜💜
@dr.mikeybee3 жыл бұрын
I don't know if this helps anyone, but it might. Weighted sums like s[0] are always to the to the first power. There are no squared weighted sums or cubed. So the derivative using the power rule of nx to the first power is equal to n. The derivative of ws[0] is always the weight w. That's why the application of the chain rule is so simple. Here's some more help. If y=2x, y'=2. If q=3y, q'=3; so y(q(x))' = 2 * 3. Picture the graph of y(q(x)), What is the slope? It's 6. And as many layers as you add in a neural net, the partial slopes will be multiples of the weights.
@alfcnz3 жыл бұрын
Things get a little more fussy when moving away from the 1D case, though. 😬😬😬
@mahdiamrollahi84562 жыл бұрын
Following the contours, there are infinite numbers for range of w that have the same losses. So, do we have the same prediction for all these params which the loss is equal for them?
@mahdiamrollahi84562 жыл бұрын
22:27 how to ensure about the convexity...
@alfcnz2 жыл бұрын
We don't.
@mahdiamrollahi84562 жыл бұрын
@@alfcnz Yes😅, I just wanted to mention the question which you asked sir and he answered at that time slot. Thanks 🙏
@alexandrevalente99943 жыл бұрын
About the notebooks... are there corrections? Or can we send them to you? Thanks
@alfcnz3 жыл бұрын
What notebooks would you want to send to me? 😮😮😮
@pranabsarma183 жыл бұрын
Great videos. may I know what does L stands for in the video title eg: 01L
@alfcnz3 жыл бұрын
Lecture.
@pranabsarma183 жыл бұрын
@@alfcnz what about the videos which does not have L. It might sound silly but I am so confused 😂
@alfcnz3 жыл бұрын
Those are my sessions, the practica. So, they should have a P, if I would want to be super precise.
@alfcnz3 жыл бұрын
At the beginning there were only my videos. Yann's videos were not initially going to come online. It's too much work…
@pranabsarma183 жыл бұрын
Thank you Alfredo. ☺️🤗
@balamanikandanjayaprakash63783 жыл бұрын
Hi Alfredo, in 20:58 Yann mentioned "objective function need to be Continuous mostly and differentiable almost everywhere". What does he mean? isn't the function differentiable is always continuous? also is there a function where some part only differentiable? Can someone give me one example in deep learning functions? pls help me out. And, Thanks for this amazing videos!!!
@aniketthomas63873 жыл бұрын
I think, he meant that the function has to be continuous everywhere (but not differentiable but it should be differentiable "almost" everywhere, as seen in the case with relu function max(0,x) it is non differentiable at x = 0, but elsewhere it is differentiable and it is continuous everywhere) so if the function is differentiable everywhere that is awesome but that is not necessary condition. The thing is, it should be continuous so that we can estimate the gradients and there's no break in the function. If there's a break somewhere in your objective function, you can't estimate gradient and your network has no way of knowing what to do. If I am wrong please do correct.
@alfcnz3 жыл бұрын
Yup, Aniket's correct.
@balamanikandanjayaprakash63783 жыл бұрын
Hey, Thanks for the explanation !!
@hyphenpointhyphen5 ай бұрын
Why cant we use counters for the loops in neural nets - would a loop not make the network more robust in the sense of stabilizing output?
@alfcnz5 ай бұрын
You need to add a timestamp if you’re expecting an answer to a specific part of the video. Otherwise it’s impossible for me to understand what you’re talking about.
@hyphenpointhyphen5 ай бұрын
@@alfcnz Sorry, around 34:39 - thanks for replying
@geekyrahuliitm2 жыл бұрын
@Alfredo, This content is amazing. Although I have 2 questions. It would be great if you can help me with it: Does this mean that in SGD, we are going to compute weight update steps for all the samples(randomly)? If we perform it on all the samples individually, how is it going to affect the training time? Is it going to increase/decrease as compared to batch GD?
@maxim_ml2 жыл бұрын
SGD _is_ mini-batch GD
@jobiquirobi1233 жыл бұрын
Great content! It’s just great to have this quality information available
@alfcnz3 жыл бұрын
You're welcome 🐱🐱🐱
@mahdiamrollahi84563 жыл бұрын
How libraries like Pytorch or Tensorflow calcualte the derivative of a function? Do they calculate the lim (f(x+dx) - f(x))/(dx) or just they have the pre-defined derivatives?
@alfcnz3 жыл бұрын
Each function f comes with its analytical derivative f'. Forward calls f, while backward calls f'.
@mahdiamrollahi84563 жыл бұрын
@@alfcnz Actually I asked that before I watched it at 54:30, Regards 🤞
@alfcnz3 жыл бұрын
If you remove ' and ", that becomes a link. 🔗🔗🔗
@mahdiamrollahi84563 жыл бұрын
@@alfcnz Cool !
@dr.mikeybee3 жыл бұрын
Just to clarify, the first code you show defines a model's graph, but it is untrained; so it can't be used yet for inference.
@alfcnz3 жыл бұрын
You need to tell me minutes:seconds, or I have no clue what you're asking about.
@dr.mikeybee3 жыл бұрын
50:00
@isurucumaranathunga3 жыл бұрын
Thank you so much for this valuable content. This teaching method is extremely amazing.
@alfcnz3 жыл бұрын
Yay! I'm glad you fancy it! 😊😊😊
@ayushimittal64963 жыл бұрын
Hi Alfredo! Thank you so much for posting these lectures here! I wanted to know if there's any textbook for this course that I could refer to, along with following the lectures. Thanks :)
@alfcnz3 жыл бұрын
Yes, I'm writing it. Hopefully a draft will be available by December. 🤓🤓🤓
@juliusolaifa51113 жыл бұрын
@@alfcnz The eagernessssssssssss
@alfcnz3 жыл бұрын
Not sure anything will come out _this_ December, though…
@juliusolaifa51113 жыл бұрын
I’m hanging in there any day it does come out. Alfredo can I mail you? About the possibility of phd supervision?
@alfcnz3 жыл бұрын
Uh… are you an NYU student?
@bhaswarbasu2288 Жыл бұрын
where to get the slides from?
@alfcnz Жыл бұрын
The course website. 😇
@alexandrevalente99943 жыл бұрын
Is this paper to use in order to better understand backprop (the way explained on this video)? Or should we read some other work from Yann ?
@alfcnz3 жыл бұрын
What paper? You need to point out minutes:seconds if you want me to address a specific question regarding the video.
@alexandrevalente99943 жыл бұрын
@@alfcnz i forgot to paste the link… i’ll do later. I from 1988… i will review the link.
@xXxBladeStormxXx3 жыл бұрын
Can you please link the reinforcement learning course Yann mentioned? Or at least the name of the author, I couldn't fully understand.
@alfcnz3 жыл бұрын
Without telling me minute:second I have no clue what you're talking about.
@xXxBladeStormxXx3 жыл бұрын
@@alfcnz Oh right! sorry. It's at 1:18
@xXxBladeStormxXx3 жыл бұрын
@@alfcnz Actually after re-listening to it, it sounded a lot clearer. It's the NYU reinforcement learning course by Larrel Pinto.
@alfcnz3 жыл бұрын
Yes, that's correct. 😇😇😇
@alexandrevalente99943 жыл бұрын
One of my question was overlooked ... "What is the difference between lesson x and lesson xL ?" So what is the difference between 01 and 01L for example ?
@alfcnz3 жыл бұрын
Lecture and practica. This used to be a playlist of only practica. Then it turned into a full course.
@mohammedelfatih80182 жыл бұрын
How can I press the like button more than one ?
@matthewevanusa88533 жыл бұрын
One reason I agree it's better not to call a unit a "neuron" is the growing acceptance that single neurons in the brain are capable of complex computation via dendritic compartment computation
@alfcnz3 жыл бұрын
If this is a question or note about the content, you need to add minutes:seconds, or I have no clue what you're referring at.
@matthewevanusa88533 жыл бұрын
@@alfcnz ah, sorry. Was just to add on, at ~ 31:25 when Prof. LeCun explains why people don't like to refer to the units as 'neurons' persay
@alfcnz3 жыл бұрын
Cool! 😇😇😇
@andylee82832 жыл бұрын
thank you for share, those are help for me more and more
@alfcnz2 жыл бұрын
🤓🤓🤓
@AIwithAniket3 жыл бұрын
I didn't get "if batch-size >> num_classes then we are wasting computation". Could someone explain?
@alfcnz3 жыл бұрын
You need to add minutes:seconds, or I cannot figure out what you're talking about.
@AIwithAniket3 жыл бұрын
@@alfcnz wow I didn't expect a reply this soon 💜. My question was from 30:17
@alfcnz3 жыл бұрын
Let's say you have exactly 10 images, one per digit. Now clone them 6k times, so you have a data set of size 60k samples (same size as MNIST). Now, if your batch is anything larger than 10, say 20 (you pick two images per digit), for example, you're computing the same gradient twice for no good reason. Now take the real MNIST. It is certainly not as bad as the toy data set described above, but most images for a given digit look very similar (hopefully so, otherwise it would be impossible to recognise)! So, you're in a very very similar situation.
@AIwithAniket3 жыл бұрын
@@alfcnz oh got it. Thanks for explaining with the intuitive example 🙏
@alfcnz3 жыл бұрын
That's Yann's 😅😅😅
@alexandrevalente99943 жыл бұрын
What is the difference between lesson x and lesson xL ?
@alfcnz3 жыл бұрын
L stands for lecture. Initially I was going to publish only my sessions. Then I added Yann's.
@DaHrakl2 жыл бұрын
20:03 Doesn't he contradict himself? First he mentions that smaller batches are better (I assume that by "better" he meant model quality) in most cases, and a few seconds later he says that it's just a hardware matter.
@alfcnz2 жыл бұрын
We use mini-batches because we use GPU or other accelerators. Learning wise, we would prefer purely stochastic gradient descent (batch size of 1).
@aymensekhri21333 жыл бұрын
Thank you very much
@alfcnz3 жыл бұрын
You're very welcome 🐱🐱🐱
@wangyeelinpamela Жыл бұрын
This might be the larrel pinto course he references at 1:26 kzbin.info/www/bejne/qXzUq2yKlKuSe7c
@chrcheel2 жыл бұрын
This is wonderful. Thank you ❤️
@alfcnz2 жыл бұрын
😃😃😃
@wangvince58576 ай бұрын
I found Yann's boring face when he tried to explain chain rule at 38:43 lol
@alfcnz6 ай бұрын
🤣🤣🤣
@advaitathreya55582 жыл бұрын
12:55
@alfcnz2 жыл бұрын
?
@advaitathreya55582 жыл бұрын
@@alfcnz A timestamp for myself to visit later :)
@alfcnz2 жыл бұрын
🤣🤣🤣
@alexandrevalente9994 Жыл бұрын
2 years later.... in the video...At 54:30.... x still not fixed ... must be s0 .... 🤣🤣🤣🤣🤣🤣🤣🤣🤣 Just a joke ;-)
@alfcnz Жыл бұрын
😭😭😭
@alexandrevalente9994 Жыл бұрын
@@alfcnzhahaaaaa... Gli esperti in computer... Cosa ci vuoi fare ? Siamo cosi 😂😂😂😂😂😂😂