Gradient Descent (C1W2L04)

Рет қаралды 144,528

Күн бұрын

Take the Deep Learning Specialization: bit.ly/3csURe6
Check out all our courses: www.deeplearni...
Subscribe to The Batch, our weekly newsletter: www.deeplearni...
Follow us:
Twitter: / deeplearningai_
Facebook: / deeplearninghq
Linkedin: / deeplearningai

Пікірлер

@sarfarazmemon2429 6 жыл бұрын

If you don't know, don't worry about it :-)

@thepresistence5935 3 жыл бұрын

I melted

@tzaidi2349 6 ай бұрын

I had no intent of watching upto this point but Im loving it. Thanks for your and your team’s hard work!

@williamgenetelli7334 4 жыл бұрын

I would like to say thank you. I've been looking for this for weeks and i now fully understand neurone network. This is helping me out so much for a University Project. So thank you !

@thepresistence5935 3 жыл бұрын

If you don't understand here's my understanding Gradient Descent: Consider you're standing on top of the mountain, you want to come down to the ground( Global minima ). You will take small, small steps to reach the Global minima. Not think like this( if I take a large step I will reach soon. If you took a large step you will step down and you will die). Same here our Initial Loss function value is too high our ultimate aim is to bring down the loss function value to the Global minima (smallest value). For that, we will initialize some values with the gradient descent formula to bring down the value to small(not use the random weight initialization). Thanks! correct me if I am wrong.

@abcxyz9723 2 жыл бұрын

The small steps are required because the global optimum is unknown. By stepping down small steps into the direction of biggest descent you approach the global optimum iteratively. If you took one large step into the direction of biggest descent at the beginning, you will likely end at some point that is not the optimum. The directions of biggest descent can change iteratively for each step you take.

@TomM-p3o Жыл бұрын

I've not seen calculus/linear algebra/etc in 25 years, but your explanations are easily digestible

@arthurkalb1817 2 жыл бұрын

Use of the Del, instead of d, indicates that some variables are held constant. There is a huge difference. It's not insignificant.

@nijhumreza9878 5 жыл бұрын

it turns out that you don't have to worry about anything

@danielteixeira6497 4 жыл бұрын

Andrew reached nirvana through coding, some next level stuff

@classicalfandom8219 Жыл бұрын

I think if you guys don't know about calculus, you should take a khan academy course and look some 3Blue1Brown videos for better understanding of what Andrew is trying to make us understand in Gradient Descent and why we want to go to that valley bottom in the graph

@arghavankayvani1059 6 жыл бұрын

one of the best! thank you for the great lecture..

@bpaudel2239 3 жыл бұрын

The steepest route downhill might only be applicable if the optimisation is convex

@aayushpaudel2379 4 жыл бұрын

Now that I have gone through this lecture, I will worry about nothing in the days to come. ;)

@codingwithsam4992 Жыл бұрын

It's so amazing that what humans are thinking danger to them is nothing but a minimization problem of calculus 😆

@charlos1388 3 жыл бұрын

fancy d refers to partial derivatives or derivatives, while normal d refers to differential, that occurs to be the same when we're working in R. The theta notation in previous video simplifies the gradient notation here, compacting both iterations for w and b, that are really one parameter in n+1 dimensions. I mean, if you're updating both parameters one after the other, why not update the parameter (w,b)? Is it maybe that the step lenght for w and b is not the same? Excuse me if this question is clarified later in the course.

@morancium 3 жыл бұрын

8:35 he got me there XD

@ibrahimyldrm2427 5 жыл бұрын

i have a question about cost function. at the first place, i really understood what we do by implementing gradient descent on cost function but then i am cınfused.let's say there is a some question like cost function. does it have always unique w,b value ? in this case i don not have to use gradient descent? i know what is function and can find global minimum always

@RahulSingh-f7e5z 5 жыл бұрын

yes it does have w,b unique values. you will get different cost function value and aim is to minimize it, so that the error reduced to zero. i.e it reaches the global minimum.

@mr.shroom4280 Жыл бұрын

I have a question, when we are doing gradient descent on a higher level of complexity, so there is like more than 4 weights and biases, is gradient descent going to have to be applied to all of them individually? How is the derivative of the function with respect to some variable going to be calculated dynamically?, I know that there is around this, there must be, I just don't know what it is. Because all the tutorials im seeing on uni or bivariate gradient descent.

@sandipansarkar9211 4 жыл бұрын

great explanation

@Lauschangreifer 4 жыл бұрын

I think, the symbol it's a cyrillic d for partial derivatives

@konstantinpluzhnikov4862 3 жыл бұрын

Maybe it is a greek symbol. But it is also seldom used in Russian (cyrillic) language as lowercase d.

@Lauschangreifer 3 жыл бұрын

@@konstantinpluzhnikov4862, it is used as regular cyrillic small italic d, in many today's fonts so far I see, such as Times, Arial etc. And I learned this in school, some decades ago ;-)

@louerleseigneur4532 3 жыл бұрын

Thanks sir

@ayushjangid 3 жыл бұрын

isn't this gradient descent algorithm in one dimension like the newton's method to find roots

@manuel783 3 жыл бұрын

Clarification about Gradient Descent Video Please note that in the video at the second slide, there is a missing parenthesis. The negative sign should apply to the entire cost function (both terms in the summation). Also Andrew mentions: "To make this easier to draw, I'm going to ignore b for now, just to make this a one-dimensional plot instead of a high-dimensional plot.". This plot is actually two-dimensional, it shows the loss J(w) for values of w.

@youssefelamrani7905 3 жыл бұрын

when do we stop with gradient descent, is it when dJ/dw = 0?

@moein671 2 жыл бұрын

as we go further and further the derivative converges to zero and when it multiplies to learning rate it becomes almost zero

@abhijeetpalai381 4 ай бұрын

can i get all the notes ?

@saanvisharma2081 6 жыл бұрын

What's the difference between loss function and cost function, in previous lecture he started with loss function and ended up cost function.

@Video-Notes 5 жыл бұрын

well i think cost function basically is the mean of all the loss functions in your training set

@twinaibots5549 5 жыл бұрын

Basically, the loss function is used for a single training example but cost function is used for the entire training set.

@fazilokuyanus3396 5 жыл бұрын

He has explained at the previos vidoe; to better unterstand , i can suggest you to look at it

@MrRenanwill 5 жыл бұрын

Cost function is sometimes called empirical risk, this name comes because of some other concept that arises in Machine Learling called expected risk which generalizes the idea of cost function of "all possible training sets". Because of it I prefer call it empirical risk. =) Minimizing the cost function of "all possible training sets", i.e., the minimize the expected risk is what we aim achieve, but this is often impraticable.

@shubhamchandra9258 4 жыл бұрын

cost is the mean loss in the data set.

@dengpan8086 7 жыл бұрын

谢谢

@doubtunites168 5 жыл бұрын

不客气

@chitralalawat8106 5 жыл бұрын

Can't it be w+Alpha*dJ(w,b)/dw

@Rocklee46v 4 жыл бұрын

It is possible when the slope is negative

@codewebsduh2667 4 жыл бұрын

Because a derivative will determine the direction of the gradient as well. ie if the gradient's direction is increasing the gradient is positive , if it's decreasing our gradient is negative. Hence the negative sign , in front of the derivative, will simply reverse the direction we want to go each time.