23. Accelerating Gradient Descent (Use Momentum)

  Рет қаралды 52,918

MIT OpenCourseWare

MIT OpenCourseWare

Күн бұрын

Пікірлер: 48
@gigik64
@gigik64 5 жыл бұрын
Jesus man, I remember back before I started college when I checked out Prof Strang’s calculus series. He’s aged quite a lot since that series, but he’s always sharp as a tack. And I’m just astonished that even being so old he knows so much about machine learning, I didn’t think it was his field. Huge kudos Gilbert Strang, huge kudos.
@marsag3118
@marsag3118 3 жыл бұрын
impressive indeed. I'd be happy to be 50% sharp at that age as he was here.
@franzdoe5558
@franzdoe5558 5 жыл бұрын
Such a great lecturer, as well as in his classic Linear Algebra lecture series. Really nice to see him up and healthy, sharp and as a great step-by-step-explainer as ever.
@georgesadler7830
@georgesadler7830 3 жыл бұрын
Professor Strang ,thank you for an old fashion lecture on Accelerating Gradient Descent. These topics are very theoretical for the average student.
@dengdengkenya
@dengdengkenya 5 жыл бұрын
Why is there no more comments for such a great course? MIT is a great university!
@thaddeuspawlicki4707
@thaddeuspawlicki4707 4 жыл бұрын
I'm just speachless.
@marjavanderwind4251
@marjavanderwind4251 5 жыл бұрын
Wow this old man is so smart. I would wish to see more lectures from him and learn much more of this stuff.
@yefetbentili128
@yefetbentili128 5 жыл бұрын
absolutely ! this man is a pure tresor
@mdrasel-gh5yf
@mdrasel-gh5yf 4 жыл бұрын
Check out his linear algebra course, this is one of the most liked playlists of MIT. kzbin.info/www/bejne/bYatZXZ8h6yXY7c
@nguyenbaodung1603
@nguyenbaodung1603 3 жыл бұрын
I'm so happy to see you here. I only trust you when it comes to lecture
@honprarules
@honprarules 4 жыл бұрын
He radiates knowledge. Love the content!
@Arin177
@Arin177 Жыл бұрын
Those who have sixth edition of Introduction to Linear Algebra can enjoy this course!!! In my view this course really increases the value of the book.
@yubai6549
@yubai6549 4 жыл бұрын
祝老爷子健康,非常感谢您!
@MsVanessasimoes
@MsVanessasimoes 3 жыл бұрын
I loved this amazing lecture. Great professor, and great content. Thanks for sharing it openly on KZbin.
@vaisuliafu3342
@vaisuliafu3342 3 жыл бұрын
such great lecturing makes me wonder what part of MIT student success is due to innate ability and how much due to superior teaching
@PrzemyslawSliwinski
@PrzemyslawSliwinski 2 жыл бұрын
In terms of this very lecture: think about a professor as a gradient with your ability being a momentum. ;)
@何浩源-r2y
@何浩源-r2y 5 жыл бұрын
Prof Boyd is also very good teacher ! I enjoy his lecture very much.
@casual_dancer
@casual_dancer Жыл бұрын
Finally a lecture that explains the magic numbers in momentum! Those shorter video formats are great for introduction but leave me confused about the math behind it. Love the ground up approach to explaining. Could any one tell me what the book that Professor Strang mentioned in 06:53 of the lecture is?
@scotts.9460
@scotts.9460 Жыл бұрын
web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
@vnpikachu4627
@vnpikachu4627 3 жыл бұрын
At 27:00 why follow the direction of eigenvalue? It just comes out of no where
@ky8920
@ky8920 3 жыл бұрын
i think it has something to do with pca.
@e2DAiPIE
@e2DAiPIE Жыл бұрын
Can anyone provide some clarification here? I think why we would like to follow an eigen-vector is made clear, but what's not clear to me is why we expected this would work prior to deriving the result (that f decreases faster). I can see that following an eigen vector reduces the problem of inverting a block matrix containing the original S to just inverting a much smaller matrix of scalars. So, maybe this strategy was just wishful thinking that paid off? Insight would be very welcome. Thanks.
@Schweini8
@Schweini8 11 ай бұрын
@@e2DAiPIE maybe if you can show that the method converges in all directions pointed by eigenvectors then it also converges with at least the same rate in all other directions (since any vector x in S can be written as a linear combination of the eigenbasis)
@newbie8051
@newbie8051 Жыл бұрын
Tough course to follow, from what I feel (I'm currently in my 4th semester of undergrad) Great lecture of Prof Gilbert, I feel kinda dumb after listening to this lecture, will try again
@brendawilliams8062
@brendawilliams8062 3 жыл бұрын
It’s nice you got it on a linear line.
@meow75714
@meow75714 4 жыл бұрын
wow, beautiful, now i see why it oscillates
@antaresd1
@antaresd1 4 жыл бұрын
Crystal clear! Thank you very much for sharing it
@Schweini8
@Schweini8 11 ай бұрын
why is it enough to assume x follows an eigenvector to demonstrate the rate of convergence?
@alessandromarialaspina9997
@alessandromarialaspina9997 2 жыл бұрын
Can this procedure be expanded to deal with problems in multiple dimensions? So a, b, c, and d are not scalars but actually vectors themselves, representing the inputs x1, x2, x3 to a function f(x1, x2, x3). How would you form R that way, and would you have different condition numbers for each element of b?
@RLDacademyGATEeceAndAdvanced
@RLDacademyGATEeceAndAdvanced 2 жыл бұрын
Excellent lecture
@itay4178
@itay4178 5 жыл бұрын
Such a great lecturer. Thank you!
@vishalpoddar
@vishalpoddar 4 жыл бұрын
why do we need to make the eigen vector as small as possible ?
@samymohammed596
@samymohammed596 4 жыл бұрын
You mean why are we trying to make the eigenvalue as small as possible? I am also wondering the same... if we make eigenvalues of R small, then R^k -->0 as k-->\infty and you end up with c_k, d_k --> 0, and what good is that? I am surely missing a few parts to this story...
@0ScarletBlood0
@0ScarletBlood0 4 жыл бұрын
@@samymohammed596 1) if on the contrary, the powers of R where increasing, the new values of c_k, d_k would increase with them, meaning that x_k = c_k*q would never settle for the minimum of the function but diverge from it. 2) you do want the value of d_k to approach zero, meaning that z_k = d_k*q = 0 which then makes x_(k+1) = x_k, the point of convergence would be found at the minimum of the function. it's true that R^k --> 0 as k --> inf but we are not computing these values that many times! Taking this into account, R^k*[c_k, d_k] is not = [0, 0]
@samymohammed596
@samymohammed596 4 жыл бұрын
@@0ScarletBlood0 Ah, of course you are right about wanting d_k = 0! :):) Thanks for making that point clear! I certainly see the issue with powers of R increasing and then that causing immediate divergence. Yes, better for eigenvalues to be < 0 because then at least you don't start off with divergence... But then you might hit zero... I guess you need a little skill to pick the parameters s, beta to ensure that your problem is well defined so that you reach convergence (d_k = 0) before the powers of R runaway and make the whole thing zero! Just my 2 cents... but thanks very much for your reply!
@ky8920
@ky8920 3 жыл бұрын
@@samymohammed596 that matrix has full rank, as long as β!=0.
@brendawilliams8062
@brendawilliams8062 3 жыл бұрын
All I know is it’s based on symmetry and the remaining 5 will be at the end of the spool.
@ShadowGamer-qy7ls
@ShadowGamer-qy7ls 2 жыл бұрын
That guy who is always capturing the photo
@anarbay24
@anarbay24 4 жыл бұрын
why f is equal to (1/2)X(transpose)Sx where prof did not explain what is S. Does anyone know what is that?
@sheelaagarwal3392
@sheelaagarwal3392 4 жыл бұрын
see lecture 22 for the definition
@ky8920
@ky8920 3 жыл бұрын
this subchapter is limited to the convex function. convex provides a nice property: the local minima is also the global minima
@archibaldgoldking
@archibaldgoldking 3 жыл бұрын
B is just the momentum :)
@ostrodmit
@ostrodmit 4 жыл бұрын
Would they please stop calling Nesterov's algorithm ``descent''? It's not a descent method as Nesterov himself keeps repeating. Otherwise, a wonderful lecture, and an impressive feat for the lecturer given his age.
@ketan9318
@ketan9318 4 жыл бұрын
I agree with your point.
@omaribrahim3370
@omaribrahim3370 4 жыл бұрын
Momentum forsenCD
@naterojas9272
@naterojas9272 4 жыл бұрын
I'm back! 🤓
@murat7456
@murat7456 5 жыл бұрын
reis 85 yaşında kafa zehir.
24. Linear Programming and Two-Person Games
53:34
MIT OpenCourseWare
Рет қаралды 69 М.
22. Gradient Descent: Downhill to a Minimum
52:44
MIT OpenCourseWare
Рет қаралды 83 М.
黑天使被操控了#short #angel #clown
00:40
Super Beauty team
Рет қаралды 61 МЛН
25. Stochastic Gradient Descent
53:03
MIT OpenCourseWare
Рет қаралды 87 М.
The Genius Way Computers Multiply Big Numbers
22:04
PurpleMind
Рет қаралды 132 М.
Gradient Descent, Step-by-Step
23:54
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Lecture 8: Norms of Vectors and Matrices
49:21
MIT OpenCourseWare
Рет қаралды 163 М.
Intro to Gradient Descent || Optimizing High-Dimensional Equations
11:04
Dr. Trefor Bazett
Рет қаралды 78 М.
Optimizers - EXPLAINED!
7:23
CodeEmporium
Рет қаралды 123 М.