Kahan summation: Difference between revisions

(→‎Third phase: what is being computed?: to the last bit and more.)
Line 308:
All of this means that number size specifications such as REAL*8 are not as straightforward as one might imagine since a word size is not necessarily a multiple of eight bits, which is why REAL and DOUBLE are preferred over REAL*4 and REAL*8, but even so, a computation that worked on one computer may still behave oddly on another. And, even on binary computers where only simple sizes such as 32 or 64 bit numbers are offered there are still problems because with floating-point there is the question of assumed-one leading-bit for normalised numbers. The IBM pc and its followers uses this for 32- and 64-bit floating-point, but ''not'' for 80-bit floating point. And is it truncation or rounding, and what style of rounding at that?
 
===Second phase: workdecimal withprecision via binary floating point===
Decimal fractions are nearly always recurring sequences in binary (i.e. except for powers of two such as three eights, etc) so at once there are approximations and these may combine unhelpfully. A value such as 3.14159 uses up nearly all of the precision of a 32-bit binary floating-point number, and is not represented exactly. This applies in turn to the various intermediate results as the summation proceeds, and one might be unlucky with the demonstration. The TRUNC6 approach alas doesn't work, as both summations come out as 10005·8, but with ROUND6, the desired results are presented...
<lang Fortran>
1,220

edits