EXPLANATION FILE OF MODULE LU
=============================
LU Decomposition
----------------
Suppose we are able to write the matrix A as a product of two matrices,
L . U = A (2.3.1)
where L is a lower triangular (has elements only on the diagonal ans below)
and U is upper triangular ((has elements only on the diagonal ans above).
For the case of a 4 x 4 matrix A, for example, equation (2.3.1) would look
like this:
|a11 0 0 0 | |b11 b12 b13 b14| |A11 A12 A13 A14|
|a21 a22 0 0 | |0 b22 b23 b24| = |A21 A22 A23 A24| (2.3.2)
|a31 a32 a33 0 | |0 0 b33 b34| |A31 A32 A33 A34|
|a41 a42 a43 a44| |0 0 0 b44| |A41 A42 A43 A11|
We can use a decomposition such as (2.3.1) to solve the linear set
A . x = (L . U) . x = L . (U . x) = b (2.3.3)
by first solving for the vector y such that
L . y = b (2.3.4)
and then solving
U . x = y (2.3.5)
What is the advantage of breaking up one linear set into two successive
ones? The advantage is that the solution of a triangular set of equations is
quite trivial, as we have already seen before. Thus, equation (2.3.4) can be
solved by forward substitution as follows,
y = b / a
1 1 11
i-1
y = (1/a ) [ b - Sum a Y ] i = 2,3,...,N (2.3.6)
i 11 i j=1 ij j
while (2.3.5) can then be solved by backsubstitution:
x = y / b
N N NN
N
x = (1/b ) [ y - Sum b x ] i=N-1,N-2,...,1 (2.3.7)
i ii i j=i+1 ij j
Equations (2.3.6) and (2.3.7) total (for each right-hand side b) N2 exe-
cutions of an inner loop containing one multiply and one add. If we have N
right-hand sides which are the unit column vectors (which is the case when we
are inverting a matrix), then taking into account the leading zeros reduces
the total execution count of (2.3.6) from N^3/2 to N^3/6, while (2.3.7) is
unchanged at N^3/2.
Notice that, once we have the LU decomposition of A we can solve with as
many right-hand sides as we then care to, one at a time. This is a distinct
advantage over the methods seen before.
Performing the LU Decomposition
-------------------------------
How then can we solve for Land U, given A? First, we write out the i,j th
component of equation (2.3.1) or (2.3.2). That component always is a sum
beginning with
a b + ... = A with notations of (2.3.2)
i1 1j ij
The number of terms in the sum depends, however, on whether i or j is the
smaller number. We have, in fact, the three cases,
i < j: a b + a b + ... + a b = A (2.3.8)
i1 1j i2 2j ii ij ij
i = j: a b + a b + ... + a b = A (2.3.9)
i1 1j i2 2j ii jj ij
i > j: a b + a b + ... + a b = A (2.3.10)
i1 1j i2 2j ij jj ij
Equations (2.3.8)-(2.3.10) total N^2 equations for the N^2 + N unknown
a's and b's (the diagonal being represented twice). Since the number of
unknowns is greater than the number of equations, we are invited to specify
N of the unknowns arbitrarily and then try to solve for the others. ln fact,
as we shall see, it is always possible to take
a = 1 i = 1,...,N (2.3.11)
ii
A surprising procedure, now, is Crout's algorithm, which quite trivially
solves the set of N^2 + N equations (2.3.8)-(2.3.11) for all the a's and b's
by just arranging the equations in a certain order! That order is as follows:
• Set a = 1, i = 1,...,N (equation 2.3.11).
ii
• For each j = 1,2,3, ... , N do these two procedures: First, for i =
1,2, ... .i, use (2.3.8), (2.3.9), and (2.3.11) to solve for b ,
namely ij
i-1
b = A - Sum a b
ij ij k=1 ik kj (2.3.12)
(When i = 1 in 2.3.12 the summation term is taken to mean zero.) Second,
for i = j + l,j + 2, ... ,N use (2.3.10) to solve for a , namely
j-1 ) ij
j-1
a = (1/b) ( aij - Sum a b ) (2.3.13)
ij jj k=1 ik kj
Be sure to do both procedures before going on to the next j.
If you work through a few iterations of the above procedure, you will see
that the a's and b's that occur on the right-hand side of equations (2.3.12)
and (2.3.13) are already determined by the time they are needed. You will
also see that every aij is used only once and never again. This means that
the corresponding aij or bij can be stored in the location that the A used
to occupy: the decomposition is "in place." (The diagonal unity elements aii
(equation 2.3.11) are not stored at aIl). ln brief, Crout's method fills in
the combined matrix of a's and b's,
| b11 b12 b13 b14 |
| a21 b22 b23 b24 | (2.3.14)
| a31 a32 b33 b34 |
| a41 a42 a43 b44 |
by columns from left to right, and within each column from top to bottom.
What about pivoting? Pivoting (i.e. selection of a salubrious pivot ele-
ment for the division in equation 2.3.13) is absolutely essential for the sta-
bility of Crout's method. Only partial pivoting (interchange of rows) can be
implemented efficiently. However this is enough to make the method stable.
This means, incidentally, that we don't actually decompose the matrix A into
LU form, but rather we decompose a rowwise permutation of A. (If we keep track
of what that permutation is, this decomposition is just as use ful as the
original one would have been).
Pivoting is slightly subtle in Crout's algorithm. The key point to notice
is that equation (2.3.12) in the case of i = J (its final application) is
exactly the same as equation (2.3.13) except for the division in the latter
equation; in both cases the upper limit of the sum is k = j - 1 (= i - 1).
This means that we don't have to commit ourselves as to whether the diagonal
element bjj is the one which happens to fall on the diagonal in the first
instance, or whether one of the (undivided) aij's below it in the column,
i = j + 1, ... ,N, is to be "promoted" to become the diagonal b. This can be
decided after all the candidates in the column are in hand. As you should be
able to guess by now, we will choose the largest one as the diagonal b (pivot
element), then do all the divisions by that element en masse. This is Crout's
method with partial pivoting. Our implementation has one additional subtlety:
it initially finds the largest element in each row, and subsequently (when it
is looking for the maximal pivot element) scales the comparison as if we had
initially scaled all the equations to make their maximum coefficient equal to
unity; this is the implicit pivoting mentioned before.
The LU decomposition in routine LUDCMP requires about N^3/3 executions of
the inner loops (each with one multiply and one add). This is thus the ope-
ration count for solving one (or a few) right-hand sides, and is a factor of 3
better than the Gauss-Jordan routine GAUSSJ which was given in §2.1, and a
factor of 1.5 better than a Gauss-Jordan routine which does not compute the
inverse matrix. For inverting a matrix, the total count (inc1uding the forward
and backsubstitution as discussed following equation 2.3.7 above) is (1/3 +
1/6 + 1/2) N^3 = N^3, the same as GAUSSJ.
To summarize, this is the preferred way to solve the linear set of equations
A·x = b:
CALL LUDCMP(A,N,NP,INDX,D)
CALL LUBKSB(A,N,NP,INDX,B)
The answer x will be returned in B. Your original matrix A will have been
destroyed.
If you subsequently want to solve a set of equations with the same A but
a different right-hand side b, you repeat only
CALL LUBKSB(A,N,NP,INDX,B)
not, of course, with the original matrix A, but with A and INDX as were
already returned from LUDCMP.
From [BIBLI 08].
--------------------------------------------------
End of file lu.txt