EXPLANATION FILE OF PROGRAM CHI-SQ ================================== The Chi-Square Distribution Functions ------------------------------------- This subsection was inspired by an observation that statisticians appear to be repeatedly thwarted in their work by a lack of table values that cover their particular situation. For those who do not have access to a large compu- ter, this can be very frustrating. Statistical tables are numerous (e.g., see Refs. 21 and 22). However, the number of possible different situations is infinite, and the tables cover only a few selected cases. This problem appears to be particularly acute for the Chi-Square statistic. ln this subsection, therefore, asymptotic series expan- sions for both the Chi2 probability density function, p(Chi2), and the Chi2 cumulative distribution function, P(chi^2), will be presented. For those not familiar with the subject, the chi2 statistic is used to test the hypothesis that the measured results either support or disprove a precon- ceived distribution function. For example, consider the situation where there are N bins into which the results can fall. If E, is the expected frequency of a result falling into bin i, and 0, is the observed frequency, then the chi^2 statistic is defined to be N (Oi - Ei)² Chi^2 = Sum ----------- (2.3.3) i=1 Ei Associated with this statistic is the number of degrees of freedom, M = N - R, where R is the number of independent relationships between the Oi and Ei. For example, one relation might be N N Sum O = Sum E i=1 i i=1 i ln the extreme case, if there are N independent relationships. and if the expected Frequency conjecture is correct, then Chi2 = 0 because the O, are totally constrained. The basic relationship we will employ to calcula te p(x) is -x/2 (M/2-1) e x p(x) = ----------------- (2.3.4) (M/2) (M/2) 2 Gamma ln this notation x = chi^2, and M represents the number of degrees of freedom. The approximation that will be used for P(x) is 2x inf x^k P(x) = -- p(x) [1 + Sum --------------------------- ] (2.3.5) M k=1 (M + 2)(M + 4) ... (M + 2k) To evaluate equation (2.3.5) we need p(x), and to obtain that function we require Gamma(M/2). Therefore, the starting point involves approximating the Gamma function. One of the problems in evaluating Gamma(M/2) directly is that there is a strong potential for numeric overflow for large values of M. For exarnple, if M = 200, then Gamma(M/2) = Gamma(100) = 99!, which is beyond the range of many BASIC interpreters. Therefore, it would be better to approximate ln[Gamma(M/2)]. This can be implemented in equation (2.3.4) by first approxi- mating ln[p(x)]: ln[p(x)] = -x/2 + (M/2 - 1) ln x - (M/2) ln 2 - ln[Gamma(M/2)] (2.3.6) The conversion to p(x) then is simply p(x) = exp{ln[p(x)]} (2.3.7) We therefore start with an approximation for ln[Gamma(M/2)]. The Gamma function is directly related to the generalized factorial Gamma(x + 1) = x! (2.3.8) The problem now is one of finding an approximation for ln x!. Since we are doing all of this in order to avoid the numeric overflow problems associated with large arguments in the Gamma function, an asymptotic expansion is appro- priate: 1 1 1 1 ln x! = (x + 1/2) ln x - x + --- - ------ + ------- - ------- 12x 360x^3 1260x^5 1680x^7 + 0.918938533205 (2.3.9) The derivation of this equation will be discussed in the next subsection. For the present, we will note that this expansion is accurate to better than 12 digits forr x > 10. Equation (2.3.9) can be implemented in BASIC as shown in program chi-sq.bas. As you can see from the example, LN(X!) is very accute even for low values of chi. We can now use the LN(Xl) subroutine in conjunction with equation (2.3.6) to proovide an approximation to the Chi-Square probability density function, p(x) The inputs to CHI-SQR are the number of degrees of freedom (M) and argument (X). The result is returned in Y. The intrinsic limit to the accuracy of the calculation is the approximation associated with obtaining ln(M/2). However, that approximation is very good, and therefore the number of degrees of free- dom may be as low as M = 4 (or even lower). As expected, the calculated distribution peaks close to the number of degrees of freedom. There are a few precautions related to the use of CHI-SQR. Because there are no error checks on the inputs, it is the responsibility of the calling program to ensure that M is a nonnegative integer, and that Chi is positive. Other- wise, CHI-SQR is a very reliable subroutine. Now that we have a means for accurately evaluating pix), the cumulative dis- tribution function can be approximated using equation (2.3.5). Since the se- ries summed in this equation contains terms of the same sign, little round- off error is expected in the final result. The calculations required to evaluate P(x) are performed by the subroutine CHISQ. This subroutine calls CHI-SQR, which in turn calls LN(XI). The inputs to CHISQ are the number of degrees of freedom (M), the argument (chi), and a convergence factor for the series summation (E). The result is returned in Y. The programs presented in this subsection serve two purposes. First, they are very useful routines for statistical studies. Second, they demonstrate how a set of subroutines can be built one block at a time. This is the overall philosophy behind the BASIC Scientific Subroutines series. From [BIBLI 01]. ------------------------------------------ End of file chi-sq.txt.