Harvard AM205 video 1.8 - Nonlinear least squares

Рет қаралды 10,428

Күн бұрын

Harvard Applied Math 205 is a graduate-level course on scientific computing and numerical methods. This video introduces nonlinear least squares problems. The video generalizes the methods considered in previous videos to solve least squares problems that have nonlinear dependence on parameters. It introduces the Gauss-Newton and Levenberg-Marquardt methods.
For more information see the main course website at people.math.wisc.edu/~chr/am205

Пікірлер: 12

@NedSar85 Жыл бұрын

I came looking for copper and found Gold! thanks

@grinfacelaxu Ай бұрын

Great! ThankYou!!

@mohamed-mkh15 2 жыл бұрын

This series is really helpful.. Thanks a lot for ur efforts, Chris.

@chrisrycroft2010 2 жыл бұрын

Glad to hear you like the videos!

@EladM8a 2 жыл бұрын

@Chris Rycroft in 3:29, the fact that r(a+b) does not equal to r(a) + r(b) is also true for a standard linear LS problem of the form Ax ~ b. the residual is b - Ax which is an affine function (and not linear) with respect to x. Please correct me if I am wrong.

@chrisrycroft2010 2 жыл бұрын

Yes, this is a good point. The statement in the video is technically correct, but it is not a precise statement of the difference between linear and nonlinear least squares. I think that in general, linear least squares refers to cases where function you are fitting is linear in the parameters, i.e. y(x; gamma*a+sigma*b) = gamma*y(x;a) + sigma*y(x;b). Anything with this form will then give you a linear system to solve for the parameters.

@MrBelleroche 5 ай бұрын

Thank you for putting this on the web Chris, a beautiful publicly available resource! Quick q: is the beacon example really a good example of the LM algorithm? We are feeding scipy.root with a gradient, whereas the preceding material points to using the residuals (r) and the Jacobian of the residuals (J) separately and use those to approximate it all via a simple normal equations problem, i.e. calculate (J'J + mu I)J'r. If you feed the gradient directly (J'r) to scipy.root then that is obviously fine in that it is correct, but I can't see how the LM algorithm works for root-finding problems? I.e. how does it separate out J again to solve a set of normal equations? Perhaps a more appropriate example given the context is to feed r to scipy.least_squares instead of grad_r to scipy.root?

@chrisrycroft2010 5 ай бұрын

You're right: this is not an ideal computer demo, and I always found it awkward when I lectured on this. It solves the mathematical problem, but it glosses over the implementation details. This largely arose due to the structure of the lecture-based course: nonlinear least squares belongs mathematically in Unit 1, but I didn't want to get too much into these types of algorithms until Unit 4. If I end up creating online-only material in the future, where the timing of each topic is more flexible, then I will look at revising this.

@MrBelleroche 5 ай бұрын

Thanks for your quick reply and confirmation. I just hacked together the algorithm in javascript as an exercise, so knowing all the nitty-gritty implementation details under the hood became important. I wonder why scipy.root even has a "lm" option and what it does with it. @@chrisrycroft2010

@svrusti 3 жыл бұрын

Hi Chris, I have 2 questions. 1 - In the function grad_phi, what is the logic behind calculating f0/f1 (f0+=2*dx -2*y[i]*dx*d)? 2 - In the whole code, why are we looping 10 times to calculate dx,dy? Can't we create x_beac/y_beac as np.array and perform calculation than looping? Thank you.

@chrisrycroft2010 3 жыл бұрын

1. In this example, phi=||r(b)||^2_2 and r(b) is a 10-vector with components r_i(b)=sqrt(dx*dx+dy*dy)-y[i], where dx=x[0] - x_beac[i] and dy=x[1] - y_beac[i]. Here b=(b_0,b_1) is written in code as (x[0],x[1]) since typically Python's root function works with an x vector. The function grad_phi is evaluating the partial derivatives of phi with respect to the components of b. Hence, as an example, you need to calculate partial d/db_0 of r_i(b)^2, which is 2 r_i(b) [d/db_0 r_i(b)]. The partial derivative evaluates to d/db_0 r_i(b) = (b_0-x_beac[i]) / sqrt((b_0-x_beac[i])^2 + (b_1-y_beac[i])^2), which is equal to dx/sqrt(dx*dx+dy*dy) in the code definitions. In addition r_i(b) = sqrt(dx*dx+dy*dy)-y[i] in the code definitions, and therefore 2 r_i(b) [d/db_0 r_i(b)] = 2*dx - 2*y[i]*dx/sqrt(dx*dx+dy*dy) = 2*dx - 2*y[i]*dx*d, where d=1/sqrt(dx*dx+dy*dy). You can do a similar calculation for d/db_1. 2. It's possible that you could write some parts of this using np.arrays, but since there are some nonlinearities here (e.g. via the factor of 1/sqrt(dx*dx+dy*dy)) it will never be as straightforward as just multiplying matrices together, as you would get for linear least squares. In many situations like this in the AM205 examples, my style is usually to write out calculations explicitly with loops.

@svrusti 3 жыл бұрын

@@chrisrycroft2010 Hi Chris, Thank you for the detailed explanation on point1, I initially did not understand the part where you squared r(b), you have explained that in Unit1_Page63 as "Since minimizing b is the same for r (b) and r(b)^2. For point 2, as you said I could do some parts using nd.array and it works. I am enjoying the lecture series! Thank you so much for uploading. # --Code------ # Initilize bx_t=0.7 # Define the transmitter's true location by_t=0.37 # Define the transmitter's true location b_init=np.array([0.4,0.9]) # Define initial value for itiration y=np.empty(10) # holds ground truth data and add noise noise_level=0.05 # noise value # Define the beacon locations (randomly located in the unit square) x_beac=[0.7984,0.9430,0.6837,0.1321,0.7227,0.1104,0.1175,0.6407,0.3288,0.6538] y_beac=[0.7491,0.5832,0.7400,0.2348,0.7350,0.9706,0.8669,0.0862,0.3664,0.3692] # Convert x_beac, y_beac to np.array xb = np.array(x_beac,dtype=np.float64) yb = np.array(y_beac,dtype=np.float64) # Steps to generate ground truth data dx=bx_t-xb dy =by_t-yb dx_sq = np.multiply(dx,dx) dy_sq = np.multiply(dy,dy) add_sq=np.sqrt(np.add(dx_sq,dy_sq)) y=add_sq+noise_level#*random.random() # Generate ground truth data and add noise # Gradient of phi def grad_phi(x): f0=f1=0 dx=x[0]-xb dy =x[1]-yb d = 1/np.sqrt(np.add(np.multiply(dx,dx), np.multiply(dy,dy))) f0=sum(2*dx-2*y*dx*d) f1=sum(2*dy-2*y*dy*d) return np.array([f0,f1]) sol=root(grad_phi,b_init,jac=False,method='lm') print("Predicted location:",sol.x)