EXPLANATION FILE FOR PROGRAM STEEPDS
====================================
Optimization by Steepest Descent
--------------------------------
1. Introduction
------------
ln this chapter, we will briefly examine how the maxima and minima of many
functions can be found by using the method of sieepest descent. The ability
to determine the extremes of functions is essential to the process of optimi-
zation. For example, given a mathematical model for production costs and
sales, the analyst can, in principle, maximize the return on investment by
establishing the trade-off between advertising expenses, production run
length, and so on. Under this criterion (ROI), the operation would be conside-
red optimized.
Another example is related to the adjustment of the properties of materials.
By establishing how the individual chemical and mechanical properties depend
on a particular additive, and by placing relative values on those properties
(i.e., weighting), an optimum composition for a particular task can be disco-
vered.
There are several key elements in the optimization procedure. First, an ob-
jective function must be established. This is the mathematical construction to
be maximized or minimized. ln many cases, the most difficult part of the enti-
re problem is the formulation of this expression. How does one include subjec-
tive criteria such as color and safety in an equation that is also meant to
involve strength and cost? ln any case, after assuming that su ch a function
exists, the next step is to choose a method for solution.
If there is only one parameter to be manipulated, th en the problem is rea-
sonably simple. The derivative of the function can be calculated (or numeri-
cally estimated), and one of the root-seeking algorithms applied. If the
function contains several parameters to be optimized across, however, the pro-
blem becomes much more difficult, and is the subject of many texts. ln this
chapter, only one of the most elementary algorithms will be presented.
The object function is z(x,y). The goal is to find xm and ym. so that z is a
maximum.
The issue involves two simple questions:
1) Given that we are at position (x,y), which way to the maximum?
2) Given that we know which way, how Far should we go?
The first question is usually relatively easy to answer. The answer to the se-
cond question is actually what determines the success of the algorithm. We
will start with the first question--the direction.
Clearly it wouId not be productive to change positions in such a manner that
the movement was only along a z = constant contour. Rather, the best direction
of travel is that in which the contour values change most quickly. If we defi-
ne the unit vectors in the x and y directions as i and j, respectively, it can
be shown in vector notation that the desired direction is
^ ^
g = GRAD[z(x,y)] = (dz/dx) i + (dz/dy) j (8.1.1)
^ ^
If we further define the (x,y) position as r = xi + yj, then the suggested
iteration equation is
r = r + c g (8.1.2)
i+1 i i
This vector equation simply states that the new position (ri+l) is obtained by
starting at the most recent location (ri), and moving sorne distance in the
direction of the gradient. The question now is the distance, or the value, of
c. That will be considered in the next section.
It is beyond the scope of this chapter to present subroutines for advanced
optimization techniques. However, two of the classic approaches should at
least be mentioned: the generalized Newton-Raphson method, and the Marquart
composite algorithm. As you will see, they are direct extensions of the
steepest-descent concept.
The generalized Newton-Raphson iteration scheme can be written in vector no-
tation as
r = r + H GRAD(z) == ri + Hg (8.1.3)
i+1 i
d²z -1
In this notation, H is the Hessian matrix [---] . For two dimensions, this
matrix is dx²
| d²z d²z |-1
| --- ---- |
-1 | dx² dxdy |
H = | |
| d²z d²z |
| ---- --- |
| dydx dy² |
Observe that the Hessian matrix automatically answers the question of dis-
tance.
The method of steepest descent can be combined with the generalized Newton-
Raphson form to give Marquart's algorithm:
r = r + [cI + H] g (8.1.5)
i+1 i i
In this equation, I is the identity matrix, which in two dimensions is
| 1 0 |
I = | |
| 0 1 |
The Marquart procedure combines the directional properties of the steepest-
descent procedure (which are useful far from the optimum) with the rapid con-
vergence of Newton-Raphson iteration near the peak. However, the Hessian
matrix calculation requires you to provide either analytical forms for the
second-order partial derivatives, or finite difference approximations. ln ei-
ther case, the procedure can be both slow and subject to considerable round-
off error. We will therefore limit our attention to the more elementary form
of steepest descent.
2. Steepest Descent with Functional Derivatives
--------------------------------------------
Although the gradient supplies information on the preferred direction of
travel, its magnitude is of no help in determining the length of the step.
Therefore, it is appropriate to normalize the gradient vector and restate the
iteration equation as
kg
r == r + --- (8.2.1)
i+1 i |g|
In two dimensions, the vector equation is equivalent to
dz/dx
x == x + k ---------------------1/2
i+1 i [(dz/dx)² + (dz/dy)²]
dz/dy
y == y + k ---------------------1/2
i+1 i [(dz/dx)² + (dz/dy)²]
We will now concentra te on a means for choosing a value for k at each step in
the iteration.
Consider the one-dimensional situation: three sequential evaluations using
the same k value were made, and the sequence yI < y2 < y3 is observed. The
iteration is clearly heading in the right direction. However, the maximum may
be far off, and sorne form of acceleration should be used in the next step
(e.g., a larger value for k). We will choose the following adjustment:
If (Y3 - Y2 )/(y2 - y1) > 0 then k --> 1.2 k
However, if y3 < y2, then the maximum has been passed; k is too large.The rea-
sonable thing to do is to halve k (k --> k/2), not update the positions, and
try again.
The choice of the acceleration (1.2) and deceleration (0.5) parameters is
not completely arbitrary. The 0.5 deceleration factor is based on the inter-
val-halving algorithm discussed in another Chapter. It is possible for the
estima tes to bracket the peak, and by reducing k by 50 %, the uncertainty in
the location optimum is reduced just as with the bisection method. However, in
practical situations this will not occur exclusively, and frequent increases
in k (1.2x) can be expected. Therefore, convergence is usually slower than
with the bisection algorithm. The 1.2 factor is chosen as a compromise. If it
were sqrt(2) or greater, the algorithm could become unstable. If it were only
slightly greater than unity, then the acceleration process would be inhibited.
You may wish to experiment with this choice in the program Steepds.
The remaining uncertainty is the initial value for k. This is very much de-
pendent on the specific problem being treated. ln general, k should be roughly
a fraction of the size of the error in the initial guess.
The above ideas have been incorporated into the subroutine STEEPDS, with a
function evaluation routine for y = sin xl + 2 cos x2 - sin x3 and its deri-
vatives, and a demonstration program.
The inputs to STEEPDS are the initial estimates for the parameters, X(Z);
the initial step size, K; the error criterion, E; the number of variables, L;
and the maximum number of steps, M. A sample run of this program is shown. The
object is to optimize y. The maximum value that y can take on is 4. This
occurs at xl = pi/2 + 2mlpi, x2 = 2m2pi, and x3 = 3pi/2 + 2m3pi (ml, m2, m3 =
0, ±1, ±2, ... ). The iteration homed in on the values corresponding to ml =
m2 = 0 and m3 = -1. The calculated results and the "true" values are shown
below:
Parameter Calculated Value True Value Error
--------- ---------------- ---------- -----
xl 1.5707964 1.5707963 0.0000001
x2 -0.00062712 0.0 -0.00062712
x3 -1.5707963 -1.5707963 0.0000000
The calculated and the true values differ because of round-off error. Even
though the procedure is iterative, there are many ways to numerically achieve
the maximum value for y, which is four. This is the same problem as the one we
observed earlier in finding the roots of y = x5 + 5X4 + 10x3 + 10x2 + 5x + 1.
There were many values of x near -1 that gave y == O. ln this case, the values
found give 3.9999996 as the maximum, which is very close to the true value of
4.
There are two ways in which to view this problem. First, if the goal is to
maximize profits, then the result that a small range of xi values lead to the
same optimum is not significant; the maximum value is the goal. Second, if the
objective is to accurately locate the position of the optimum, then a funda-
mental difficulty exists. Because the partial derivatives must necessarily be
zero at the optimum, small changes in the parameters have an even smaller in-
fluence on the maximum value calculated for the objective function. But we are
using su ch changes to locate the optimum! In effect, the procedure is poorly
conditioned near the maximum.
The low accuracy of the calculated parameters is characteristic of many op-
timization procedures (e.g., least-squares fitting of experimental data), and
is usualIy not a problem in practical situations.
The main precautions with respect ta using STEEPDS involve the choices for
the input parameters. Because many functions have severallocal maxima, star-
ting the iteration far from the largest maximum can result in convergence of
one of the local extremes. AIso, if the initial value for K is too large, the
iteration may show some erratic behavior before closing in on a local maximum.
There is often much trial and error associated with optimization. ln any case,
you should be cognizant of the extraneous optima introduced by round-off er-
ror, and of the possibility of local maxima.
STEEPDS naturally seeks the location of the maximum of a function. If the
position of the minimum is desired instead, then a small modification is re-
quired. If the minimum is negative, then STEEPDS should be used with - y
instead of y as the objective function. If the minimum is positive, then the
function to optimize is 1/ y. We will examine the latter alteration in
another section.
From [BIBLI 01].
--------------------------------------------
End of file steepds.txt