Cumulative standard deviation: Difference between revisions

(→‎{{header|Julia}}: A new entry for Julia)
(→‎Using built-in array awareness: Older-style Fortran)
Line 1,210:
end program stats
</lang>
 
===Old style, two ways===
Early computers loaded the entire programme and its working storage into memory and left it there throughout the run. Uninitialised variables would start with whatever had been left in memory at their address by whatever last used those addresses, though some systems would clear all of memory to zero or possibly some other value before each load. Either way, if a routine was invoked a second time, its variables would have the values left in them by their previous invocation. The DATA statement allows initial values to be specified, and allows repeat counts as well. It is not an executable statement: it is not re-executed on second and subsequent invocations of the containing routine. Thus, it is easy to have a routine employ counters and the like, visible only within themselves and initialised to zero or whatever suited.
 
With more complex operating systems, routines that relied on retaining values across invocations might no longer work - perhaps a fresh version of the routine would be loaded to memory (perhaps at odd intervals), or, on exit, the working storage would be discarded. There was a half-way scheme, whereby variables that had appeared in DATA statements would be retained while the others would be discarded. This subtle indication has been discarded in favour of the explicit SAVE statement, naming those variables whose values are to be retained between invocations, though compilers might also offer an option such as "automatic" (for each invocation always allocate then discard working memory) and "static" (retain values), possibly introducing non-standard keywords as well. Otherwise, the routines would have to use storage global to them such as additional parameters, or, COMMON storage and in later Fortran, the MODULE arrangements for shared items. The persistence of such storage can still be limited, but by naming them in the main line can be ensured for the life of the run. The other routines with access to such storage could enable re-initialisation, additional reports, or multiple accumulations, etc.
 
Since the standard deviation can be calculated in a single pass through the data, producing values for the standard deviation of all values so far supplied is easily done without re-calculation. Accuracy is quite another matter. Calculations using deviances from a working mean are much better, and capturing the first X as the working mean would be easy, just test on N = 0. The sum and sum-of-squares method is quite capable of generating a negative variance, but the second method cannot, because the terms going added in to V are never negative.
 
Incidentally, Fortran implementations rarely enable reentrancy for the WRITE statement, so, since here the functions are invoked in a WRITE statement, the functions cannot themselves use WRITE statements, say for debugging.
<lang Fortran>
REAL FUNCTION STDDEV(X) !Standard deviation for successive values.
REAL X !The latest value.
REAL V !Scratchpad.
INTEGER N !Ongoing: count of the values.
REAL EX,EX2 !Ongoing: sum of X and X**2.
SAVE N,EX,EX2 !Retain values from one invocation to the next.
DATA N,EX,EX2/0,0.0,0.0/ !Initial values.
N = N + 1 !Another value arrives.
EX = X + EX !Augment the total.
EX2 = X**2 + EX2 !Augment the sum of squares.
STDDEV = SQRT(EX2/N - (EX/N)**2) !The variance, but, it might come out negative!
END FUNCTION STDDEV !For the sequence of received X values.
 
REAL FUNCTION STDDEVP(X) !Standard deviation for successive values.
REAL X !The latest value.
INTEGER N !Ongoing: count of the values.
REAL A,V !Ongoing: average, and sum of squared deviations.
SAVE N,A,V !Retain values from one invocation to the next.
DATA N,A,V/0,0.0,0.0/ !Initial values.
N = N + 1 !Another value arrives.
V = (N - 1)*(X - A)**2 /N + V !First, as it requires the existing average.
A = (X - A)/N + A != [x + (n - 1).A)]/n: recover the total from the average.
STDDEVP = SQRT(V/N) !V can never be negative, even with limited precision.
END FUNCTION STDDEVP !For the sequence of received X values.
 
PROGRAM TEST
INTEGER I !A stepper.
REAL A(8) !The example data.
DATA A/2.0,3*4.0,2*5.0,7.0,9.0/ !Alas, another opportunity to use @ passed over.
WRITE (6,1)
1 FORMAT ("Progressive calculation of the standard deviation."/
1 " I, A(I), EX EX2, Av V*N.")
DO I = 1,8 !Step along the data series,
WRITE (6,2) I,A(I),STDDEV(A(I)),STDDEVP(A(I)) !Showing progressive values.
2 FORMAT (I2,F6.1,2F10.6) !Should do for the example.
END DO !On to the next value.
END
</lang>
 
Output:
Progressive calculation of the standard deviation.
I, A(I), EX EX2, Av V*N.
1 2.0 0.000000 0.000000
2 4.0 1.000000 1.000000
3 4.0 0.942809 0.942809
4 4.0 0.866025 0.866025
5 5.0 0.979796 0.979796
6 5.0 1.000000 1.000000
7 7.0 1.399708 1.399708
8 9.0 2.000000 2.000000
 
=={{header|Go}}==
Algorithm to reduce rounding errors from WP article.
1,220

edits