01L - Gradient descent and the backpropagation algorithm

Рет қаралды 58,967

Күн бұрын

Пікірлер: 157

@AICoffeeBreak 3 жыл бұрын

Thanks for posting these! With this, you reach a very wide audience and help anyone who does not have access to such teachers and universities! 👏

@alfcnz 3 жыл бұрын

Yup, that's the plan! 😎😎😎

@Navhkrin 2 жыл бұрын

Hello Ms Coffee beans

@AICoffeeBreak 2 жыл бұрын

@@Navhkrin Hello! ☕

@makotokinoshita6337 3 жыл бұрын

You’re doing a massive favor to the community who wants to access to high quality content without paying a huge amount of money. Thank you so much!

@alfcnz 3 жыл бұрын

You're welcome 😇😇😇

@sutirthabiswas8273 3 жыл бұрын

From seeing the name of Yan in a research paper during a literature survey in my internship program , to attending his lectures is really a thriller. Quite enriching and mathematically profound stuff here. Thanks for sharing it free!

@alfcnz 3 жыл бұрын

You're welcome 😊😊😊

@dr.mikeybee 3 жыл бұрын

Wow! Yann is such a great teacher. I thought I knew this material fairly well, but Yann is enriching my understanding with every slide. It seems to me that his teaching method is extremely efficient. I suppose that's because he has such a deep understanding of the material.

@alfcnz 3 жыл бұрын

🤓🤓🤓

@thanikhurshid7403 3 жыл бұрын

You are a great man. Thanks to you someone even in a third world country can learn DL from one of the inventors himself. THIS IS CRAZY!

@alfcnz 3 жыл бұрын

😇😇😇

@OpenAITutor 4 ай бұрын

I can totally see how a quantum computer could be used to perform gradient descent in all directions simultaneously, helping to find the true global minimum across all valleys in one go! 😲 It's mind-blowing to think about the potential for quantum computing to revolutionize optimization problems like this!

@johnhammer8668 3 жыл бұрын

Thanks very much for the content. What a time to be alive. To hear from the master himself.

@alfcnz 3 жыл бұрын

💜💜💜

@WeAsBee 2 жыл бұрын

Discussion on stochastic gradient descent (12:23) and with adams (1:16:15) are great. General misconception.

@alfcnz 2 жыл бұрын

🥳🥳🥳

@fuzzylogicq 3 жыл бұрын

Mehn!! these are gold.. especially for people who don't have access to these types of teachers, and methods of teaching, plus the material etc (that's a lot of people actually).

@alfcnz 3 жыл бұрын

🤗🤗🤗

@neuroinformaticafbf5313 3 жыл бұрын

I just can't believe this content is free. Amazing! Long life to Open Source! Grazie Alfredo :)

@alfcnz 3 жыл бұрын

❤️❤️❤️

@dr.mikeybee 3 жыл бұрын

At 1:05:40 Yann is explaining the two jacobians, but I was having trouble getting the intuition. Then I realized that the first jacobian was getting the gradient to modify the weights w[k+1] for function z[k+1] and the second jacobian was back propagating the gradient to function z[k] which can then be used to calculate the gradient at k for yet another jacobian to adjust weights w[k]. So one jacobian is for the parameters and the other is for the state since both the parameter variable and state variable are column vectors. Yann explains it really well. I'm amazed that I seem to be understanding this complicated mix of symbols and logic. Thank you.

@alfcnz 3 жыл бұрын

👍🏻👍🏻👍🏻

@monanasery1992 6 ай бұрын

Thank you so much for sharing this 🥰 This was the best video for learning gradient descent and backpropagation.

@alfcnz 6 ай бұрын

🥳🥳🥳

@mahdiamrollahi8456 3 жыл бұрын

That is my honor to learn from you and Sir...

@alfcnz 3 жыл бұрын

Don't forget to subscribe to the channel and like the video to manifest your appreciation.

@mahdiamrollahi8456 3 жыл бұрын

@@alfcnz Ya, I did that 🤞

@alfcnz 3 жыл бұрын

🥰🥰🥰

@НиколайНовичков-е1э 3 жыл бұрын

I have watched this lecture twice in the last year. Mister LeCun is great! :)

@alfcnz 3 жыл бұрын

Professor / doctor LeCun 😜

@gurdeeepsinghs 3 жыл бұрын

Alfredo Canziani ... drinks are on me if you ever visit India ... this is extremely high quality content!

@alfcnz 3 жыл бұрын

Thanks! I prefer food, though 😅 And yes, I'm planning to come over soon-ish.

@alexandrevalente9994 3 жыл бұрын

I really love that discussion about solving non convex problems.... finally we get out of the books ! At least we unleash our mind.

@alfcnz 3 жыл бұрын

🧠🧠🧠

@mdragon6580 Жыл бұрын

1:02:09 The "Einstein summation convention" is being used here. The student asking the question is not familiar with this convention, and Yann doesn't seem to realize that the student is unfamiliar with this convention

@alfcnz Жыл бұрын

It’s not. It’s just a vector matrix multiplication.

@mdragon6580 Жыл бұрын

@@alfcnz Ohhh I see. I was reading ∂c/∂z_f as "the f-th entry of the vector", but it actually denotes the entire vector. Similarly, I was reading ∂z_g/∂z_f as "the (g,f)-th entry of the Jacobian, whereas it actually denotes the entire Jacobian matrix. Sorry, I misread. Yann's notation for the (i,j)-th entry of the Jacobian matrix is given in the last line of the same slide. Thank you so much Alfredo for your quick reply above! And thank you so so much for putting these videos on KZbin for everyone!

@mataharyszary 3 жыл бұрын

This intimate atmosphere allows for a better understanding of the subject matter. Great questions 【ツ】 and of course great answers. Thank you

@alfcnz 3 жыл бұрын

You're welcome 😁😁😁

@mpalaourg8597 3 жыл бұрын

Thank you so much, Alfredo, for organizing the material in such a nice and compact way for us! The insights of Yann and your examples, explanations and visualization are an awesome tool for anybody willing to learn (or to remember stuff) about deep learning. Greetings from Greece and I owe you a coffee, for your tireless effort. PS. Sorry for my bad English. I am not a native speaker.

@alfcnz 3 жыл бұрын

I'm glad the content is of any help. Looking forward to get that coffee in Greece. I've never visited… 🥺🥺🥺 Hopefully I'll fix that soon. 🥳🥳🥳

@mpalaourg8597 3 жыл бұрын

@@alfcnz Easy fix, I'll send a Pull Request in no time!

@alfcnz 3 жыл бұрын

For coming to Greece? 🤔🤔🤔

@dr.mikeybee 3 жыл бұрын

I don't know if this helps anyone, but it might. Weighted sums like s[0] are always to the to the first power. There are no squared weighted sums or cubed. So the derivative using the power rule of nx to the first power is equal to n. The derivative of ws[0] is always the weight w. That's why the application of the chain rule is so simple. Here's some more help. If y=2x, y'=2. If q=3y, q'=3; so y(q(x))' = 2 * 3. Picture the graph of y(q(x)), What is the slope? It's 6. And as many layers as you add in a neural net, the partial slopes will be multiples of the weights.

@alfcnz 3 жыл бұрын

Things get a little more fussy when moving away from the 1D case, though. 😬😬😬

@mahdiamrollahi8456 2 жыл бұрын

Hello Alfredo, at 1:11:50, where do we have the loops in gradient graph? Is there any prime example? Thanks

@alfcnz 2 жыл бұрын

That would be a system that we don't know how to handle. Every other connection is permitted.

@mahdiamrollahi8456 2 жыл бұрын

@@alfcnz 🙏🌿

@dr.mikeybee 3 жыл бұрын

How do you perturb the output and backprop? Earlier the derivative of cost function was 1. (around 1:50:00)

@alfcnz 3 жыл бұрын

I've listened to it and there's no mention to backprop at that timestamp.

@dr.mikeybee 3 жыл бұрын

@@alfcnz Thank you Alfredo. I probably messed up. It's where Yann mentions Q-learning and Deep Mind. I imagine he will cover all this in a later lecture. Thank you for doing all this. Sorry for all the comments. I'm just enjoying this challenging material a lot. I just forked your repo, and I'm starting the first notebook. Cheers!

@dr.mikeybee 3 жыл бұрын

I see what I did. I gave you the end of video timestamp. My bad. LOL!

@dr.mikeybee 3 жыл бұрын

It's just after 1:27:00.

@hyphenpointhyphen 5 ай бұрын

Why cant we use counters for the loops in neural nets - would a loop not make the network more robust in the sense of stabilizing output?

@alfcnz 5 ай бұрын

You need to add a timestamp if you’re expecting an answer to a specific part of the video. Otherwise it’s impossible for me to understand what you’re talking about.

@hyphenpointhyphen 5 ай бұрын

@@alfcnz Sorry, around 34:39 - thanks for replying

@mahdiamrollahi8456 2 жыл бұрын

22:27 how to ensure about the convexity...

@alfcnz 2 жыл бұрын

We don't.

@mahdiamrollahi8456 2 жыл бұрын

@@alfcnz Yes😅, I just wanted to mention the question which you asked sir and he answered at that time slot. Thanks 🙏

@mahdiamrollahi8456 2 жыл бұрын

Following the contours, there are infinite numbers for range of w that have the same losses. So, do we have the same prediction for all these params which the loss is equal for them?

@lam_roger 3 жыл бұрын

At the 40:41 section - is the purpose of using back propagation to find the derivative of the cost function wrt z to find the best direction to "move"? I've only gotten through half of the lecture so forgive me if this is answered later

@alfcnz 3 жыл бұрын

Say z = f(wᵀx). If you know ∂C/∂z, then you can compute ∂C/∂w = ∂C/∂z ∂f/∂w.

@alexandrevalente9994 3 жыл бұрын

Does the trick explained in normalizing training samples (01:20:00) applies also to convolutional neural networks?

@alfcnz 3 жыл бұрын

Indeed.

@mahdiamrollahi8456 2 жыл бұрын

So, how we can find that at least there is a pattern in our distribution, so we can find it by any model? Suppose we are going to find the md5 hash code of a string. For this one, we ourselves may know that there is not any pattern in it, but how we can find it for any other problem? Thanks

@inertialdataholic9278 3 жыл бұрын

51:05 shouldn't it be self.m0(z0) as it takes in the flattened input?

@alfcnz 3 жыл бұрын

Of course.

@dr.mikeybee 3 жыл бұрын

Just FYI, at 1:01:00 Yann correctly says dc/dzg, but the diagram has dc/zg. Should that also be dc/dwg and dc/dwf?

@alexandrevalente9994 3 жыл бұрын

About the code in PyTorch... (51:00 in the video)... the code instantiates the mynet class and stores the reference in model variable... but nowhere it calls the "forward" method... so how does the out variable receive any output from the model object? Is there some Pytorch magic which is not explained here ?

@alfcnz 3 жыл бұрын

Yup. When you call a nn.Module the forward function is called after and before some other stuff.

@alexandrevalente9994 3 жыл бұрын

@@alfcnz O Yes! My bad... I was distracted.... indeed the mynet inherit from nn.module and I suppose that forward is the implementation of an abstract method.

@alfcnz 3 жыл бұрын

Correct. 🙂🙂🙂

@adarshraj6721 3 жыл бұрын

Love from India sir.I really like the discussion & doubt clearing part. Hope to join NYU for my MS in 2023.:)

@alfcnz 3 жыл бұрын

💜💜💜

@sobhanahmadianmoghadam9211 2 жыл бұрын

Hello. Isn't ds[0] * dc / ds[0] + ds[1] * dc / ds[1] + ds[2] * dc / ds[2] = 3dc instead of dc? (At time 41:00)

@isurucumaranathunga 3 жыл бұрын

Thank you so much for this valuable content. This teaching method is extremely amazing.

@alfcnz 3 жыл бұрын

Yay! I'm glad you fancy it! 😊😊😊

@dr.mikeybee 3 жыл бұрын

I thought that haar-like features were not that recognizable. (1:48:00)

@jobiquirobi123 3 жыл бұрын

Great content! It’s just great to have this quality information available

@alfcnz 3 жыл бұрын

You're welcome 🐱🐱🐱

@dr.mikeybee 3 жыл бұрын

Just to clarify, the first code you show defines a model's graph, but it is untrained; so it can't be used yet for inference.

@alfcnz 3 жыл бұрын

You need to tell me minutes:seconds, or I have no clue what you're asking about.

@dr.mikeybee 3 жыл бұрын

50:00

@mahdiamrollahi8456 3 жыл бұрын

How libraries like Pytorch or Tensorflow calcualte the derivative of a function? Do they calculate the lim (f(x+dx) - f(x))/(dx) or just they have the pre-defined derivatives?

@alfcnz 3 жыл бұрын

Each function f comes with its analytical derivative f'. Forward calls f, while backward calls f'.

@mahdiamrollahi8456 3 жыл бұрын

@@alfcnz Actually I asked that before I watched it at 54:30, Regards 🤞

@alfcnz 3 жыл бұрын

If you remove ' and ", that becomes a link. 🔗🔗🔗

@mahdiamrollahi8456 3 жыл бұрын

@@alfcnz Cool !

@alexandrevalente9994 3 жыл бұрын

About the notebooks... are there corrections? Or can we send them to you? Thanks

@alfcnz 3 жыл бұрын

What notebooks would you want to send to me? 😮😮😮

@pranabsarma18 3 жыл бұрын

Great videos. may I know what does L stands for in the video title eg: 01L

@alfcnz 3 жыл бұрын

Lecture.

@pranabsarma18 3 жыл бұрын

@@alfcnz what about the videos which does not have L. It might sound silly but I am so confused 😂

@alfcnz 3 жыл бұрын

Those are my sessions, the practica. So, they should have a P, if I would want to be super precise.

@alfcnz 3 жыл бұрын

At the beginning there were only my videos. Yann's videos were not initially going to come online. It's too much work…

@pranabsarma18 3 жыл бұрын

Thank you Alfredo. ☺️🤗

@alexandrevalente9994 3 жыл бұрын

One of my question was overlooked ... "What is the difference between lesson x and lesson xL ?" So what is the difference between 01 and 01L for example ?

@alfcnz 3 жыл бұрын

Lecture and practica. This used to be a playlist of only practica. Then it turned into a full course.

@andylee8283 2 жыл бұрын

thank you for share, those are help for me more and more

@alfcnz 2 жыл бұрын

🤓🤓🤓

@bhaswarbasu2288 Жыл бұрын

where to get the slides from?

@alfcnz Жыл бұрын

The course website. 😇

@mohammedelfatih8018 2 жыл бұрын

How can I press the like button more than one ?

@geekyrahuliitm 2 жыл бұрын

@Alfredo, This content is amazing. Although I have 2 questions. It would be great if you can help me with it: Does this mean that in SGD, we are going to compute weight update steps for all the samples(randomly)? If we perform it on all the samples individually, how is it going to affect the training time? Is it going to increase/decrease as compared to batch GD?

@maxim_ml 2 жыл бұрын

SGD _is_ mini-batch GD

@AIwithAniket 3 жыл бұрын

I didn't get "if batch-size >> num_classes then we are wasting computation". Could someone explain?

@alfcnz 3 жыл бұрын

You need to add minutes:seconds, or I cannot figure out what you're talking about.

@AIwithAniket 3 жыл бұрын

@@alfcnz wow I didn't expect a reply this soon 💜. My question was from 30:17

@alfcnz 3 жыл бұрын

Let's say you have exactly 10 images, one per digit. Now clone them 6k times, so you have a data set of size 60k samples (same size as MNIST). Now, if your batch is anything larger than 10, say 20 (you pick two images per digit), for example, you're computing the same gradient twice for no good reason. Now take the real MNIST. It is certainly not as bad as the toy data set described above, but most images for a given digit look very similar (hopefully so, otherwise it would be impossible to recognise)! So, you're in a very very similar situation.

@AIwithAniket 3 жыл бұрын

@@alfcnz oh got it. Thanks for explaining with the intuitive example 🙏

@alfcnz 3 жыл бұрын

That's Yann's 😅😅😅

@alexandrevalente9994 3 жыл бұрын

What is the difference between lesson x and lesson xL ?

@alfcnz 3 жыл бұрын

L stands for lecture. Initially I was going to publish only my sessions. Then I added Yann's.

@alexandrevalente9994 3 жыл бұрын

Is this paper to use in order to better understand backprop (the way explained on this video)? Or should we read some other work from Yann ?

@alfcnz 3 жыл бұрын

What paper? You need to point out minutes:seconds if you want me to address a specific question regarding the video.

@alexandrevalente9994 3 жыл бұрын

@@alfcnz i forgot to paste the link… i’ll do later. I from 1988… i will review the link.

@balamanikandanjayaprakash6378 3 жыл бұрын

Hi Alfredo, in 20:58 Yann mentioned "objective function need to be Continuous mostly and differentiable almost everywhere". What does he mean? isn't the function differentiable is always continuous? also is there a function where some part only differentiable? Can someone give me one example in deep learning functions? pls help me out. And, Thanks for this amazing videos!!!

@aniketthomas6387 3 жыл бұрын

I think, he meant that the function has to be continuous everywhere (but not differentiable but it should be differentiable "almost" everywhere, as seen in the case with relu function max(0,x) it is non differentiable at x = 0, but elsewhere it is differentiable and it is continuous everywhere) so if the function is differentiable everywhere that is awesome but that is not necessary condition. The thing is, it should be continuous so that we can estimate the gradients and there's no break in the function. If there's a break somewhere in your objective function, you can't estimate gradient and your network has no way of knowing what to do. If I am wrong please do correct.

@alfcnz 3 жыл бұрын

Yup, Aniket's correct.

@balamanikandanjayaprakash6378 3 жыл бұрын

Hey, Thanks for the explanation !!

@chrcheel 2 жыл бұрын

This is wonderful. Thank you ❤️

@alfcnz 2 жыл бұрын

😃😃😃

@ayushimittal6496 3 жыл бұрын

Hi Alfredo! Thank you so much for posting these lectures here! I wanted to know if there's any textbook for this course that I could refer to, along with following the lectures. Thanks :)

@alfcnz 3 жыл бұрын

Yes, I'm writing it. Hopefully a draft will be available by December. 🤓🤓🤓

@juliusolaifa5111 3 жыл бұрын

@@alfcnz The eagernessssssssssss

@alfcnz 3 жыл бұрын

Not sure anything will come out _this_ December, though…

@juliusolaifa5111 3 жыл бұрын

I’m hanging in there any day it does come out. Alfredo can I mail you? About the possibility of phd supervision?

@alfcnz 3 жыл бұрын

Uh… are you an NYU student?

@xXxBladeStormxXx 3 жыл бұрын

Can you please link the reinforcement learning course Yann mentioned? Or at least the name of the author, I couldn't fully understand.

@alfcnz 3 жыл бұрын

Without telling me minute:second I have no clue what you're talking about.

@xXxBladeStormxXx 3 жыл бұрын

@@alfcnz Oh right! sorry. It's at 1:18

@xXxBladeStormxXx 3 жыл бұрын

@@alfcnz Actually after re-listening to it, it sounded a lot clearer. It's the NYU reinforcement learning course by Larrel Pinto.

@alfcnz 3 жыл бұрын

Yes, that's correct. 😇😇😇

@matthewevanusa8853 3 жыл бұрын

One reason I agree it's better not to call a unit a "neuron" is the growing acceptance that single neurons in the brain are capable of complex computation via dendritic compartment computation

@alfcnz 3 жыл бұрын

If this is a question or note about the content, you need to add minutes:seconds, or I have no clue what you're referring at.

@matthewevanusa8853 3 жыл бұрын

@@alfcnz ah, sorry. Was just to add on, at ~ 31:25 when Prof. LeCun explains why people don't like to refer to the units as 'neurons' persay

@alfcnz 3 жыл бұрын

Cool! 😇😇😇

@aymensekhri2133 3 жыл бұрын

Thank you very much

@alfcnz 3 жыл бұрын

You're very welcome 🐱🐱🐱

@DaHrakl 2 жыл бұрын

20:03 Doesn't he contradict himself? First he mentions that smaller batches are better (I assume that by "better" he meant model quality) in most cases, and a few seconds later he says that it's just a hardware matter.

@alfcnz 2 жыл бұрын

We use mini-batches because we use GPU or other accelerators. Learning wise, we would prefer purely stochastic gradient descent (batch size of 1).