Kahan summation: Difference between revisions

Line 390:
The expression <code>S = A(1) + A(2) + A(3)</code> was rendered as four operation codes in a row: FLD, FADD, FADD, FSTP - omitting the addressing details. This means that the arithmetic was conducted in the cpu's floating-point unit using eighty-bit floating-point arithmetic (or, REAL*10), however the FSTP that stored the result to S, stored to a 32-bit recipient so that the result was in single precision and would have been rounded to fit.
 
If instead a simple DO-loop is used to calculate the sum, each iteration involved a FLD, FADD, FSTP. This means that at each step, the value of S (single precision) is loaded into the 80-bit register, a (single precision) value is added using 80-bit arithmetic, then the result is stored back to the single-precision variable, S. The third option is to use the supplied facility, <code>S = SUM(A)</code>, but alas, this accumulates the result in the same stepwise manner, using a temporary storage area of the same type as the array: single precision. It makes no attempt to recognise that the summation could be accumulated in a register, still less to do so with 80-bit precision. Similarly, the compiler made no attempt to recognise that on completion of one expression, the next expression used the result of the previous expression. Thus, in the sequence <code>T:=S + Y; C:=(T - S) - Y;</code> the value of T was reloaded for the second statement, even though the previous statement's last step had been to save a result to T. Some compilers and some languages do allow for this possibilityto be recognised (possibly as an option) but on computers where the working register has a different precision from that of a variable in storage, it will not be just the speed of the calculation that changes. Such changes will almost surely wreck the workings of compensated summation! Fortran has strict rules on the order of evaluation, and there is a long history of needing care. For instance, to compute m/h² where m is the mass of an electron and h is Planck's constant, h² risks exponent overflow so (m/h)/h would be safer.
 
For the test data, all methods, including compensated summation, returned exactly the same result, even to the last bit, so one wonders just what is going on. Using decimal numbers when concerned with the exact details of binary arithmetic is troublesome because the conversions are generally inexact, so the plan is to reveal the numbers in base two. Besides F format, later Fortran version offers further options such as B O, and Z formats for binary, octal and hexadecimal (the code H being spoken for another use) however these present the bit-patterns as packed into the storage used for the variable, they do not reveal its ''numerical'' value in binary, etc. So, for 3.14159 in decimal,
1,220

edits