J(Xk)Pr
> R”, The vector px
). The right figure is
plane spanned by the
)
x
ctive function f(x) in
m in Figure 1 with the
»,, marked. In this case,
igher than the function
ny reasons, e.g. the so-
ximation xo, the prob-
oblem has a large resid-
ion point. The problem
> residual and curvature
assumption holds only
Pk (13)
nd a; > 0 is chosen as
1... such that the new
ndition (see e.g. (Den-
Fr WIPk, (14)
SSSSSSS
SS
«
Ss:
FY £8 A A)
5
777
>>
T TA.
77,
P
777
f
e
(17777
I
«x P)
ml
Hu t
NU NOIR
COO Oi
L 1
1 1
0 —1/6 1/8 1/4 1/2 1
Figure 3: Illustration of the Armijo condition (14) for mul-
tiple values of 1, applied to the function and points x ,, and
p. in Figure 1. For a given value of js, a step length o; is
accepted if the function value (solid curve) is smaller than
the corresponding straight line. The full step length a = 1
would be rejected for all values of u > 0. The half step
a = 0.5 would be accepted for u = 0 and u = 0.1 but not
for u = 0.5 which however would accept aj, = 0.25. The
solid straight line corresponding to u = 1 is the tangent
line to f(x + apy) ata = 0.
for a given constant u, 0 < pu < 1. In this context, pg
is called a search direction, o is called a step length,
and the algorithm for selecting a, is called a line search.
The Gauss-Newton method with line search is sometimes
called the damped Gauss-Newton method.
3.2 Geometrical interpretation
The interpretation of the damped Gauss-Newton method is
that we search along the search direction p; to find out how
large step aj, along py we should take in order to get to a
better point x1 than x. Far from the solution, the line
search algorithm works as a “safe guard”, making sure that
we do not take a long step without getting a correspond-
ing reduction in the size of the residual. Near the solu-
—F(x,) ©
Figure 4: Geometry for a non-optimal point (left) and an optimal point (right). At the optimal point, the negative residual
—F,, is orthogonal to the tangent plane and the projected residual J py vanishes. Near the optimal point, the angle
f. between the vectors —F, and Jp; approaches 7/2 and cos 4 = ||J4Pr||/|[Fz[| approaches zero. For illustrative
purposes, L = I is assumed.
tion the line search will not interfere unless the problem is
ill-behaved in some sense, e.g. if the Jacobian is rank defi-
cient (the problem is over-parameterized) or ill conditioned
(high uncertainty in some directions) near the solution or if
the problem has large residual and curvature. In both cases,
itis not uncommon that taking full steps o; = 1 will result
in successive search directions py and p+ that are almost
opposite and we get oscillations in one or more of the pa-
rameters. With a line search algorithm, such oscillations
will be reduced.
The Armijo condition applied to the problem in Figure 1 is
illustrated in Figure 3.
3.3 Computational overhead
If the fitting problem is well-behaved, the line search mod-
ification does not introduce any overhead since the full step
length is always tried first. Additional residual calculations
are only needed if the full step length is not acceptable.
3.4 Other modification
There exist a number of modification of the Gauss-Newton
methods, such as global regularization (Eriksson, 1996,
Eriksson et al., 1998); local regularization, i.e. the Leven-
berg-Marquardt method (see e.g. (Nash and Sofer, 1996,
chapters 10 and 13)); methods using second order informa-
tion, see e.g. (Sóderkvist, 1993, Gulliksson and Sóderkvist,
1995, Gulliksson et al., 1997); and subspace minimization,
see e.g. (Eriksson, 1996).
What is appealing with the line search modification is that
it is both computationally cheap and it retains the statisti-
cal properties of the problem. Second order methods and
subspace minimization are computationally expensive and
regularization methods introduces a bias-variance trade-off
which alters the statistical properties.
4 TERMINATION CRITERIA
It is possible to derive a problem-independent termination
criteria for the Gauss-Newton method. At the solution,
the gradient V f(x) = J(x)"WF(x) will be zero. If
0.