so, how many of you came here from Andrew NG's ML Course?
@phungdaoxuan994 жыл бұрын
i'm here
@nasserhussain56984 жыл бұрын
Me
@willgabriel52754 жыл бұрын
I'm here.
@charlottefx71633 жыл бұрын
mee too
@behrad97123 жыл бұрын
yea!
@yemaneabrha66374 жыл бұрын
Simple , Clear , and Gentle Explanation Thanks More Prof.
@JulianHarris9 жыл бұрын
I'm very visual so I particularly loved the visualisations showing the progressive improvement of the hypothesis as the parameters were refined.
@sagardolas38807 жыл бұрын
This was the simplest explaination as most beautiful and precise one
@redfield1265 жыл бұрын
Very interesting. Thanks for the clear and vizual explanation that provided me quite good intuition of different version of Gradient descent.
@sagarbhat79324 жыл бұрын
Wouldn't online gradient decent cause the problem of overfitting?
@AlexanderIhler4 жыл бұрын
Overfitting is not really related to the *method* of doing the optimization (online=stochastic GD, versus batch GD, or second order methods like BFGS, etc.) but rather to the complexity of the model, and the *degree* to which the optimization process is allowed to fit the data. So, early stopping (incomplete optimization) can reduce overfitting, for example. Changing optimization methods can appear to change overfitting simply because of stopping rules interacting with optimization efficiency, but they don't really change the fundamental issue.
@AronBordin9 жыл бұрын
Exactly what I was looking for, thx!
@poltimmer9 жыл бұрын
Thanks! Making an essay on machine learning, and this really helped me out!
@anthamithya6 жыл бұрын
First of all, how do we know that j(theta) curve is of that kind? The curve will be obtained only after the gradient descent worked upon or run random 1000 or so theta values...
@sidbhatia42305 жыл бұрын
What modifications can we make to use l2 norm instead?
@mdfantacherislam44017 жыл бұрын
Thanks for such kinda helpful lecture
@UmeshMorankar8 жыл бұрын
what if we set the learning rate α to too large a value.
@SreeragNairisawesome8 жыл бұрын
+Umesh Morankar Then it might diverge or offshoot from the minima..... for eg. if the minima is 2 ,the latest value of Θ = 4 and α = 8 (suppose) , then it would diverge to 4 - 8 = -4 which is too far away from 2 whereas if α = 1(too small) then it would reach the local minima in the next 2 iterations. I havent applied the algorithm. This is just for explanation purpose.
@EngineersLife-Vlog6 жыл бұрын
Can i get this slide please
@JanisPundurs10 жыл бұрын
This helped a lot, thanks
@NicoCarosio8 жыл бұрын
gracias!
@fyz56898 жыл бұрын
excellent
@prithviprakash11106 жыл бұрын
Can someone explain how the derivative of ⍬X(t) wrt ⍬ becomes X and not X(t)?
@patton47866 жыл бұрын
because it is reference to theta0 therefore derivative theta0*x is 1*theta0^(1-0)*x=1*1*x=x (btw, this is partial derivative, so all other terms before derivative are constant expect theta0*x0)