EXPLANATION FILE OF PROGRAM REGITER =================================== Iterated Regression ------------------- The design of numerical algorithms is both a mathematical science and an art. The mathematics forms the background for the methodology, but as in most endeavours, it is the execution that decides the quality of the final product. In the file Lstsqr.txt, we considered an elegant and very general matrix approach to least-squares fitting of polynomials. Upon execution, however, we found that round-off error could significantly influence the accuracy of the computed results. This difficulty was largely overcome for one-dimensional re- gression by the Forsythe orthogonal polynomial formulation presented in file Lsqrply.txt. The trick was to choose a mathematical structure that resulted in matrix operations much less susceptible to round-off error. The matrix approach was extended to multidimensional applications in file Lstsqr.txt. Again we encountered round-off error. The objective of this sec- tion is to develop a corresponding algorithm that treats this very important case. As you will see, the concept behind this algorithm is fundamental to the use of computers for numerical ca1culations. This concept is called iteration. We first note that the observed round-off error is mainly associated with the process of subtraction. When two numbers, say A and B, are subtracted, the relative error in their difference is E / (A - Bl, where E is a measure of the combined numerical truncation plus round-off) error in A and B. If A and B are comparable in size, relative error can be quite large. Unfortunately, the ma- trix operations involved in the LEASTSQR subroutine involve many subtractions of comparable numbers. This is particularly true of the matrix inversion step. Knowing the root of the problem, we can devise a strategy to overcome the dif- ficulty. The matrix equation given for the coefficient vector was equation 1.4.4: t -1 t D = (X X) X Y This has the same form as the classic matrix equation D = AY Given Y and A, the result D is mathematically determined. However, what we calculate is D. The residual is r1 = D - D1 = D - AY How can we obtain D wh en we have D1 ? If we knew r1, even approximately, we could improve upon the estimate D1, using D = D1 + r1. Therefore, the problem is reduced to estimating r1. Recall the original purpose for obtaining D. We want to employ it to approximate (in the least-squares sense) Y: Y = XD However, applying D1, we get Y1 = XO1. Therefore, a reasonable estimate for r, is r1 = A(Y - Y1) Round-off error will probably also affect the estimate for r1, so the correc- tion process must be repeated until some error criterion is met: Step 1 D1 = AY Step 2 r1 = A(Y - XO1) Step 3 D2 = D1 + r1 Step 4 r2 = A(Y - XD2) Step 5 D3 = D2 + r2 and so on. We will use the variance as the criterion for deciding when to stop the iteration sequence. If the variance increases upon the next step in the sequence, the round-off error limit has probably been reached. Thus, if t t (Y - XDn) (Y - XOn) < (Y - XOn+1) (y - XOn+1), then Dn is the chosen coefficient vector. This procedure has been implemented in program REGITER. The operation of the regression-iteration subroutine is very simple. The calling program supplies the number of data points (M), the number of dimensions (L), the degree of fit for each dimension [M(I)), and the data pairs [X(1,J),Y(1)]. The program then proceeds to iteratively calcula te the coefficient vector D(I). The returned results are the coefficients [D(I)), the standard deviation (D), and the num- ber of iterations performed (LI). In the example given, the coefficients were found with six-digit or better accuracy: the results are very accurate. The corresponding standard deviation of the fit was thousands of times better than the fit found without iteration. Clearly, the method is effective. REGlTER is a fairly reliable program. The input variable precautions are simple and are the same as discussed in Lstsqr. There is a possibility (although the author has never encountered such a case) that the iteration may not converge. In that case, the iteration is terminated at the point of divergence. ln the worst case, the returned coefficient vector would simply be that calculated on the first pass. REGlTER is a very effective subroutine and should be employed whenever high- accuracy multidimensionalleast-squares curve fitting is desired. The main dis- advantage to using REGlTER is that it is slow. In the next section, we will examine another iterative approach to the least- squares problem. lt does not involve matrix operations and it can be used for problems in which the coefficients appear in nonlinear forms. See file Parafit.txt. From [BIBLI 01]. ------------------------------------------- End of file Regiter.txt