CHAPTER 4. NUMERICAL COMPUTATION
increases as we move to the right and decreases as we move to the left. This means
f
(
x −
)
<
0 and
f
(
x
+
)
>
0 for small enough
. In other words, as we move
right, the slope begins to point uphill to the right, and as we move left, the slope
begins to point uphill to the left. Thus, when
f
(
x
) = 0 and
f
(
x
)
>
0, we can
conclude that
x
is a local minimum. Similarly, when
f
(
x
) = 0 and
f
(
x
)
<
0, we
can conclude that
x
is a local maximum. This is known as the
second derivative
test
. Unfortunately, when
f
(
x
) = 0, the test is inconclusive. In this case
x
may
be a saddle point or a part of a flat region.
In multiple dimensions, we need to examine all the second derivatives of the
function. Using the eigendecomposition of the Hessian matrix, we can generalize
the second derivative test to multiple dimensions. At a critical point, where
∇
x
f
(
x
) = 0, we can examine the eigenvalues of the Hessian to determine whether
the critical point is a local maximum, local minimum, or saddle point. When the
Hessian is positive definite (all its eigenvalues are positive), the point is a local
minimum. This can be seen by observing that the directional second derivative
in any direction must be positive, and making reference to the univariate second
derivative test. Likewise, when the Hessian is negative definite (all its eigenvalues
are negative), the point is a local maximum. In multiple dimensions, it is actually
possible to find positive evidence of saddle points in some cases. When at least
one eigenvalue is positive and at least one eigenvalue is negative, we know that
x
is a local maximum on one cross section of
f
but a local minimum on another
cross section. See figure 4.5 for an example. Finally, the multidimensional second
derivative test can be inconclusive, just as the univariate version can. The test
is inconclusive whenever all the nonzero eigenvalues have the same sign but at
least one eigenvalue is zero. This is because the univariate second derivative test is
inconclusive in the cross section corresponding to the zero eigenvalue.
In multiple dimensions, there is a different second derivative for each direction
at a single point. The condition number of the Hessian at this point measures
how much the second derivatives differ from each other. When the Hessian has a
poor condition number, gradient descent performs poorly. This is because in one
direction, the derivative increases rapidly, while in another direction, it increases
slowly. Gradient descent is unaware of this change in the derivative, so it does not
know that it needs to explore preferentially in the direction where the derivative
remains negative for longer. Poor condition number also makes choosing a good
step size difficult. The step size must be small enough to avoid overshooting
the minimum and going uphill in directions with strong positive curvature. This
usually means that the step size is too small to make significant progress in other
directions with less curvature. See figure 4.6 for an example.
87