Harvard AM205 video 4.9 - Quasi-Newton methods

  Рет қаралды 16,112

Chris Rycroft

Chris Rycroft

Күн бұрын

Пікірлер: 14
@dwyanechou4061
@dwyanechou4061 2 жыл бұрын
Excellent video, it really helps me for understanding the quasi Newton method, thank you very much!
@hyukppen
@hyukppen 3 жыл бұрын
very nice! I have a question. I think you can answer it. Do you know what is the objective function of SR1? When I solve min ||Jk-Jk-1||_F (as you explained on 7:17 ) s.t. satisfies the secant equation, then, I can get Broyden's solution. But, If I solve ||Jk-Jk-1||_F s.t. satisfies the secant equation and update matrix is rank 1 and symmetry, I can get a solution, but it is different from SR1.
@chrisrycroft2010
@chrisrycroft2010 3 жыл бұрын
The motivation behind the SR1 update formula is given in the following article: J. E. Dennis Jr. and Jorge J. Moré, "Quasi-Newton methods, motivation and theory", SIAM Review 19, 46 (1977). (doi.org/10.1137/1019005 ). See Section 7.1. The SR1 update is the only rank one choice that satisfies my equation (**) at 11:40 and is symmetric. Because symmetry is enforced, you get a unique solution without having to impose a minimization condition as in the Broyden derivation.
@hyukppen
@hyukppen 3 жыл бұрын
@@chrisrycroft2010 Thank you for the reply. As you cited, the article doesn't say SR1 minimizes the ||Jk-Jk-1||_F. I misunderstood because, in the wiki, ( en.wikipedia.org/wiki/Symmetric_rank-one ) the introduction "The SR1 formula computes (via an update of rank 1) the symmetric solution that is "closest" to the current approximate-value Bk". Then, is this statement not true?
@chrisrycroft2010
@chrisrycroft2010 3 жыл бұрын
@@hyukppen Yes, I think that statement on Wikipedia is a little misleading. Any symmetric rank one update can be written as c^T c for some vector c. For a problem in R^n, then c will have n degrees of freedom, but equation (**) gives you n constraints, so it's not surprising that you get a unique solution. I think that statement on Wikipedia is technically true, because the solution you get will be the "closest" by virtue of being the unique one. But that statement is misleading about the origins of this update formula and should probably be revised. Furthermore, in any case, "closest" is ambiguous since that depends on the choice of norm (e.g. Frobenius norm, 2-norm).
@hyukppen
@hyukppen 3 жыл бұрын
​@@chrisrycroft2010 Thank you for your kind explanation. I agree with you.
@hyukppen
@hyukppen 3 жыл бұрын
@@chrisrycroft2010 I'm back.. About your comment "For a problem in R^n, then c will have n degrees of freedom, but equation (**) gives you n constraints, so it's not surprising that you get a unique solution." But, in my opinion, if the equation is non-linear for the n variables, then, we can't say uniqueness, isn't it? for example, x^2 = 1 y^2 = 1 Two equations but 4 solutions! and (**) is non-linear becuase if u = [x;y;z], u*u^T = [x^2, xy, xz ; yx, y^2, yz ; zx, zy, z^2]
@AJ-et3vf
@AJ-et3vf Жыл бұрын
Great video. Thank you
@abdjahdoiahdoai
@abdjahdoiahdoai 2 жыл бұрын
well made videos!
@fabianbirringer2218
@fabianbirringer2218 3 жыл бұрын
Very nice video Keep up the good work :)
@vivy5815
@vivy5815 Жыл бұрын
Excellent Video, I have a question, you discussed both Broyden method and BFGS, the Broyden méthode requires no explicit knowledge of gradient(deltaF), so it looks simpler when exact gradient is not available and numerical gradient is not cheap, my question is, does it mean Broyden method converge slower than bfgs? Or what’s its pros and cons compared to bfgs
@chrisrycroft2010
@chrisrycroft2010 Жыл бұрын
Thanks for your comment. The Broyden method and BFGS are connected, but they solve two different problems. For the Broyden method, you are trying to solve a nonlinear system of equations F(x)=0, where x is an n-dimensional vector, and F is an n-dimensional function (i.e. F: R^n -> R^n). BFGS on the other hand solves the optimization problem of minimizing a function g(x), where x is an n-dimensional vector and g is a scalar function (i.e. g: R^n -> R). To minimize (or maximize) a function we can search for points where grad(g)=0. Thus if we define F(x)=grad(g(x)), then this optimization problem can be viewed as a root-finding problem, for finding F(x)=grad(g(x))=0. The key point is that F in the Broyden method is roughly equivalent to grad(g) in BFGS. Thus, it isn't the case that BFGS uses more information by using the gradient, because BFGS is trying set the gradient to zero, whereas Broyden is trying to set F to zero. Overall, the two methods have similar information available, and converge similarly for their respective tasks. Some of the other videos in Unit 4 of the course talk a bit further about these connections between root-finding and optimization.
@hongkyulee9724
@hongkyulee9724 Жыл бұрын
Thank you for the nice lecture. :D
@georgesms
@georgesms Жыл бұрын
WOW! Finally I get it.
Harvard AM205 video 4.10 - Sequential quadratic programming
9:53
Chris Rycroft
Рет қаралды 10 М.
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 43 МЛН
黑天使被操控了#short #angel #clown
00:40
Super Beauty team
Рет қаралды 55 МЛН
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
8.1 Quasi Newton Methods Part I
16:16
Constantine Caramanis
Рет қаралды 12 М.
Understanding scipy.minimize part 1: The BFGS algorithm
12:58
Folker Hoffmann
Рет қаралды 16 М.
Newton's Method for optimization
17:27
OptiML PSE
Рет қаралды 8 М.
Visually Explained: Newton's Method in Optimization
11:26
Visually Explained
Рет қаралды 112 М.
Why Runge-Kutta is SO Much Better Than Euler's Method #somepi
13:32
Phanimations
Рет қаралды 160 М.
Broyden's Method
9:24
Oscar Veliz
Рет қаралды 12 М.
Moore's Law is Dead - Welcome to Light Speed Computers
20:27
I visited the world's hardest math class
12:50
Gohar Khan
Рет қаралды 1,7 МЛН
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 43 МЛН