Linear regression (2): Gradient descent

  Рет қаралды 90,765

Alexander Ihler

Alexander Ihler

Күн бұрын

Пікірлер: 30
@emmanuel5566
@emmanuel5566 4 жыл бұрын
so, how many of you came here from Andrew NG's ML Course?
@phungdaoxuan99
@phungdaoxuan99 4 жыл бұрын
i'm here
@nasserhussain5698
@nasserhussain5698 4 жыл бұрын
Me
@willgabriel5275
@willgabriel5275 4 жыл бұрын
I'm here.
@charlottefx7163
@charlottefx7163 3 жыл бұрын
mee too
@behrad9712
@behrad9712 3 жыл бұрын
yea!
@yemaneabrha6637
@yemaneabrha6637 4 жыл бұрын
Simple , Clear , and Gentle Explanation Thanks More Prof.
@JulianHarris
@JulianHarris 9 жыл бұрын
I'm very visual so I particularly loved the visualisations showing the progressive improvement of the hypothesis as the parameters were refined.
@sagardolas3880
@sagardolas3880 7 жыл бұрын
This was the simplest explaination as most beautiful and precise one
@redfield126
@redfield126 5 жыл бұрын
Very interesting. Thanks for the clear and vizual explanation that provided me quite good intuition of different version of Gradient descent.
@sagarbhat7932
@sagarbhat7932 4 жыл бұрын
Wouldn't online gradient decent cause the problem of overfitting?
@AlexanderIhler
@AlexanderIhler 4 жыл бұрын
Overfitting is not really related to the *method* of doing the optimization (online=stochastic GD, versus batch GD, or second order methods like BFGS, etc.) but rather to the complexity of the model, and the *degree* to which the optimization process is allowed to fit the data. So, early stopping (incomplete optimization) can reduce overfitting, for example. Changing optimization methods can appear to change overfitting simply because of stopping rules interacting with optimization efficiency, but they don't really change the fundamental issue.
@AronBordin
@AronBordin 9 жыл бұрын
Exactly what I was looking for, thx!
@poltimmer
@poltimmer 9 жыл бұрын
Thanks! Making an essay on machine learning, and this really helped me out!
@anthamithya
@anthamithya 6 жыл бұрын
First of all, how do we know that j(theta) curve is of that kind? The curve will be obtained only after the gradient descent worked upon or run random 1000 or so theta values...
@sidbhatia4230
@sidbhatia4230 5 жыл бұрын
What modifications can we make to use l2 norm instead?
@mdfantacherislam4401
@mdfantacherislam4401 7 жыл бұрын
Thanks for such kinda helpful lecture
@UmeshMorankar
@UmeshMorankar 8 жыл бұрын
what if we set the learning rate α to too large a value.
@SreeragNairisawesome
@SreeragNairisawesome 8 жыл бұрын
+Umesh Morankar Then it might diverge or offshoot from the minima..... for eg. if the minima is 2 ,the latest value of Θ = 4 and α = 8 (suppose) , then it would diverge to 4 - 8 = -4 which is too far away from 2 whereas if α = 1(too small) then it would reach the local minima in the next 2 iterations. I havent applied the algorithm. This is just for explanation purpose.
@EngineersLife-Vlog
@EngineersLife-Vlog 6 жыл бұрын
Can i get this slide please
@JanisPundurs
@JanisPundurs 10 жыл бұрын
This helped a lot, thanks
@NicoCarosio
@NicoCarosio 8 жыл бұрын
gracias!
@fyz5689
@fyz5689 8 жыл бұрын
excellent
@prithviprakash1110
@prithviprakash1110 6 жыл бұрын
Can someone explain how the derivative of ⍬X(t) wrt ⍬ becomes X and not X(t)?
@patton4786
@patton4786 6 жыл бұрын
because it is reference to theta0 therefore derivative theta0*x is 1*theta0^(1-0)*x=1*1*x=x (btw, this is partial derivative, so all other terms before derivative are constant expect theta0*x0)
@叶渐师
@叶渐师 8 жыл бұрын
thx a lot
@宗宝冯
@宗宝冯 8 жыл бұрын
nice
Linear regression (3): Normal equations
8:22
Alexander Ihler
Рет қаралды 30 М.
Gradient Descent, Step-by-Step
23:54
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Какой я клей? | CLEX #shorts
0:59
CLEX
Рет қаралды 1,9 МЛН
VC Dimension
17:42
Alexander Ihler
Рет қаралды 88 М.
Intro to Gradient Descent || Optimizing High-Dimensional Equations
11:04
Dr. Trefor Bazett
Рет қаралды 81 М.
Linear Regression From Scratch in Python (Mathematical)
24:38
NeuralNine
Рет қаралды 194 М.
Solve any equation using gradient descent
9:05
Edgar Programmator
Рет қаралды 55 М.
Linear classifiers (1): Basics
14:15
Alexander Ihler
Рет қаралды 54 М.
Gradient Descent Part 1 Chieh
10:46
Chieh Wu
Рет қаралды 65 М.
Stochastic Gradient Descent, Clearly Explained!!!
10:53
StatQuest with Josh Starmer
Рет қаралды 500 М.
Gradient descent, how neural networks learn | DL2
20:33
3Blue1Brown
Рет қаралды 7 МЛН
Linear regression (1): Basics
5:47
Alexander Ihler
Рет қаралды 22 М.
Какой я клей? | CLEX #shorts
0:59
CLEX
Рет қаралды 1,9 МЛН