Harvard AM205 video 4.9 - Quasi-Newton methods

Рет қаралды 16,112

Chris Rycroft

Күн бұрын

Пікірлер: 14

@dwyanechou4061 2 жыл бұрын

Excellent video, it really helps me for understanding the quasi Newton method, thank you very much!

@hyukppen 3 жыл бұрын

very nice! I have a question. I think you can answer it. Do you know what is the objective function of SR1? When I solve min ||Jk-Jk-1||_F (as you explained on 7:17 ) s.t. satisfies the secant equation, then, I can get Broyden's solution. But, If I solve ||Jk-Jk-1||_F s.t. satisfies the secant equation and update matrix is rank 1 and symmetry, I can get a solution, but it is different from SR1.

@chrisrycroft2010 3 жыл бұрын

The motivation behind the SR1 update formula is given in the following article: J. E. Dennis Jr. and Jorge J. Moré, "Quasi-Newton methods, motivation and theory", SIAM Review 19, 46 (1977). (doi.org/10.1137/1019005 ). See Section 7.1. The SR1 update is the only rank one choice that satisfies my equation (**) at 11:40 and is symmetric. Because symmetry is enforced, you get a unique solution without having to impose a minimization condition as in the Broyden derivation.

@hyukppen 3 жыл бұрын

@@chrisrycroft2010 Thank you for the reply. As you cited, the article doesn't say SR1 minimizes the ||Jk-Jk-1||_F. I misunderstood because, in the wiki, ( en.wikipedia.org/wiki/Symmetric_rank-one ) the introduction "The SR1 formula computes (via an update of rank 1) the symmetric solution that is "closest" to the current approximate-value Bk". Then, is this statement not true?

@chrisrycroft2010 3 жыл бұрын

@@hyukppen Yes, I think that statement on Wikipedia is a little misleading. Any symmetric rank one update can be written as c^T c for some vector c. For a problem in R^n, then c will have n degrees of freedom, but equation (**) gives you n constraints, so it's not surprising that you get a unique solution. I think that statement on Wikipedia is technically true, because the solution you get will be the "closest" by virtue of being the unique one. But that statement is misleading about the origins of this update formula and should probably be revised. Furthermore, in any case, "closest" is ambiguous since that depends on the choice of norm (e.g. Frobenius norm, 2-norm).

@hyukppen 3 жыл бұрын

@@chrisrycroft2010 Thank you for your kind explanation. I agree with you.

@hyukppen 3 жыл бұрын

@@chrisrycroft2010 I'm back.. About your comment "For a problem in R^n, then c will have n degrees of freedom, but equation (**) gives you n constraints, so it's not surprising that you get a unique solution." But, in my opinion, if the equation is non-linear for the n variables, then, we can't say uniqueness, isn't it? for example, x^2 = 1 y^2 = 1 Two equations but 4 solutions! and (**) is non-linear becuase if u = [x;y;z], u*u^T = [x^2, xy, xz ; yx, y^2, yz ; zx, zy, z^2]

@AJ-et3vf Жыл бұрын

Great video. Thank you

@abdjahdoiahdoai 2 жыл бұрын

well made videos!

@fabianbirringer2218 3 жыл бұрын

Very nice video Keep up the good work :)

@vivy5815 Жыл бұрын

Excellent Video, I have a question, you discussed both Broyden method and BFGS, the Broyden méthode requires no explicit knowledge of gradient(deltaF), so it looks simpler when exact gradient is not available and numerical gradient is not cheap, my question is, does it mean Broyden method converge slower than bfgs? Or what’s its pros and cons compared to bfgs

@chrisrycroft2010 Жыл бұрын

Thanks for your comment. The Broyden method and BFGS are connected, but they solve two different problems. For the Broyden method, you are trying to solve a nonlinear system of equations F(x)=0, where x is an n-dimensional vector, and F is an n-dimensional function (i.e. F: R^n -> R^n). BFGS on the other hand solves the optimization problem of minimizing a function g(x), where x is an n-dimensional vector and g is a scalar function (i.e. g: R^n -> R). To minimize (or maximize) a function we can search for points where grad(g)=0. Thus if we define F(x)=grad(g(x)), then this optimization problem can be viewed as a root-finding problem, for finding F(x)=grad(g(x))=0. The key point is that F in the Broyden method is roughly equivalent to grad(g) in BFGS. Thus, it isn't the case that BFGS uses more information by using the gradient, because BFGS is trying set the gradient to zero, whereas Broyden is trying to set F to zero. Overall, the two methods have similar information available, and converge similarly for their respective tasks. Some of the other videos in Unit 4 of the course talk a bit further about these connections between root-finding and optimization.