Cumulative standard deviation

Write a stateful function, class, generator or coroutine that takes a series of floating point numbers, one at a time, and returns the running standard deviation of the series. The task implementation should use the most natural programming style of those listed for the function in the implementation language; the task must state which is being used. Do not apply Bessel's correction; the returned standard deviation should always be computed as if the sample seen so far is the entire population.

Use this to compute the standard deviation of this demonstration set, $\{2,4,4,4,5,5,7,9\}$ , which is $2$ .

Ada

<lang ada> with Ada.Numerics.Elementary_Functions; use Ada.Numerics.Elementary_Functions; with Ada.Text_IO; use Ada.Text_IO;

procedure Test_Deviation is

  type Sample is record
     N       : Natural := 0;
     Mean    : Float := 0.0;
     Squares : Float := 0.0;
  end record;
  procedure Add (Data : in out Sample; Point : Float) is
  begin
     Data.N       := Data.N + 1;
     Data.Mean    := Data.Mean    + Point;
     Data.Squares := Data.Squares + Point ** 2;
  end Add;
  function Deviation (Data : Sample) return Float is
  begin
     return Sqrt (Data.Squares / Float (Data.N) - (Data.Mean / Float (Data.N)) ** 2);
  end Deviation;

  Data : Sample;
  Test : array (1..8) of Float := (2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0);

begin

  for Item in Test'Range loop
     Add (Data, Test (Item));
  end loop;
  Put_Line ("Deviation" & Float'Image (Deviation (Data)));

end Test_Deviation; </lang> Sample output:

Deviation 2.00000E+00

AutoHotkey

ahk forum: discussion <lang AutoHotkey>std(2),std(4),std(4),std(4),std(5),std(5),std(7) MsgBox % std(9) ; 2

std(x="") {

 Static sum:=0, sqr:=0, n:=0
 If (x="")                    ; blank parameter: reset
    sum := 0, sqr := 0, n := 0
 Else
    sum += x, sqr += x*x, n++ ; update state
 Return sqrt((sqr-sum*sum/n)/n)

}</lang>

C

Of course this function is not thread-safe nor it can be used to compute the standard deviation just for a set of values per time.

<lang c>#include <stdio.h>

include <stdlib.h>
include <math.h>

enum Action { VALUE, STDDEV, MEAN, VAR, COUNT, RESET }; double stat_object(double v, enum Action action) {

 static double sum = 0.0;
 static double sum2 = 0.0;
 static size_t num = 0;

 double m;

 switch(action) {
 case VALUE:
   num++;
   sum += v;
   sum2 += v*v;
   return stat_object(0.0, STDDEV);
 case STDDEV:
   return sqrt(stat_object(0.0, VAR));
 case MEAN:
   return (num>0) ? sum/(double)num : 0.0;
 case VAR:
   m = stat_object(0.0, MEAN);
   return (num>0) ? (sum2/(double)num - m*m) : 0.0;
 case COUNT:
   return num;
 case RESET:
   sum = sum2 = 0.0; num = 0;
   return 0.0;
 }

}</lang>

<lang c>double v[] = { 2,4,4,4,5,5,7,9 };

int main() {

 int i;
 double sd;

 for(i=0; i < sizeof(v)/sizeof(double) ; i++)
   sd = stat_object(v[i], VALUE);

 printf("standard dev = %lf\n", sd);

 return 0;

}</lang>

E

This implementation produces two (function) objects sharing state. It is idiomatic in E to separate input from output (read from write) rather than combining them into one object.

The algorithm is

Translation of: Perl

and the results were checked against #Python.

<lang e>def makeRunningStdDev() {

   var sum := 0.0
   var sumSquares := 0.0
   var count := 0.0
   
   def insert(v) {
       sum += v
       sumSquares += v ** 2
       count += 1
   }
   
   /** Returns the standard deviation of the inputs so far, or null if there
       have been no inputs. */
   def stddev() {
       if (count > 0) {
           def meanSquares := sumSquares/count
           def mean := sum/count
           def variance := meanSquares - mean**2
           return variance.sqrt()
       }
   }
   
   return [insert, stddev]

}</lang>

<lang e>? def [insert, stddev] := makeRunningStdDev()

value: <insert>, <stddev>

? [stddev()]

value: [null]

? for value in [2,4,4,4,5,5,7,9] { > insert(value) > println(stddev()) > } 0.0 1.0 0.9428090415820626 0.8660254037844386 0.9797958971132716 1.0 1.3997084244475297 2.0</lang>

Forth

<lang forth>: f+! ( x addr -- ) dup f@ f+ f! ;

st-count ( stats -- n ) f@ ;

st-sum ( stats -- sum ) float+ f@ ;

st-sumsq ( stats -- sum*sum ) 2 floats + f@ ;

st-add ( fnum stats -- )

   1e dup f+!  float+
 fdup dup f+!  float+
 fdup f*  f+! ;

st-mean ( stats -- mean )

 dup st-sum st-count f/ ;

st-variance ( stats -- var )

 dup st-sumsq
 dup st-mean fdup f* dup st-count f*  f-
 st-count f/ ;

st-stddev ( stats -- stddev )

 st-variance fsqrt ;</lang>

This variation is more numerically stable when there are large numbers of samples or large sample ranges. <lang forth>: st-count ( stats -- n ) f@ ;

st-mean ( stats -- mean ) float+ f@ ;

st-nvar ( stats -- n*var ) 2 floats + f@ ;

st-variance ( stats -- var ) dup st-nvar st-count f/ ;

st-stddev ( stats -- stddev ) st-variance fsqrt ;

st-add ( x stats -- )

 1e dup f+!			\ update count
 fdup dup st-mean f- fswap
 ( delta x )
 fover dup st-count f/
 ( delta x delta/n )
 float+ dup f+!		\ update mean
 ( delta x )
 dup f@ f-  f*  float+ f+! ;	\ update nvar</lang>

<lang forth>create stats 0e f, 0e f, 0e f,

2e stats st-add 4e stats st-add 4e stats st-add 4e stats st-add 5e stats st-add 5e stats st-add 7e stats st-add 9e stats st-add

stats st-stddev f. \ 2.</lang>

Fortran

Translation of: C

Works with: Fortran version 95 and later

This one imitates C and suffers the same problems: the function is not thread-safe and must be used to compute the stddev for one set per time.

<lang fortran>program Test_Stddev

 implicit none

 real, dimension(8) :: v = (/ 2,4,4,4,5,5,7,9 /)
 integer :: i
 real :: sd

 do i = 1, size(v)
    sd = stat_object(v(i))
 end do

 print *, "std dev = ", sd

contains

 recursive function stat_object(a, cmd) result(stddev)
   real :: stddev
   real, intent(in) :: a
   character(len=*), intent(in), optional :: cmd

   real, save :: summa = 0.0, summa2 = 0.0
   integer, save :: num = 0

   real :: m

   if ( .not. present(cmd) ) then
      num = num + 1
      summa = summa + a
      summa2 = summa2 + a*a
      stddev = stat_object(0.0, "stddev")
   else
      select case(cmd)
      case("stddev")
         stddev = sqrt(stat_object(0.0, "variance"))
      case("variance")
         m = stat_object(0.0, "mean")
         if ( num > 0 ) then
            stddev = summa2/real(num) - m*m
         else
            stddev = 0.0
         end if
      case("count")
         stddev = real(num)
      case("mean")
         if ( num > 0 ) then
            stddev = summa/real(num)
         else
            stddev = 0.0
         end if
      case("reset")
         summa = 0.0
         summa2 = 0.0
         num = 0
      case default
         stddev = 0.0
      end select
   end if

 end function stat_object

end program Test_Stddev</lang>

Java

Works with: Java version 1.5+

<lang java5>import java.util.LinkedList; public class StdDev {

   LinkedList<Double> nums;
   public StdDev() {
       nums = new LinkedList<Double>();
   }
   public static void main(String[] args) {
       double[] testData = {2,4,4,4,5,5,7,9};
       StdDev sd = new StdDev();
       
       for (double x : testData) {
           sd.newNum(x);
           System.out.println(sd.getSD());
       }
   }

   public void newNum(double num) {
       nums.add(num);
   }

   public double getAvg() {
       if (nums.isEmpty()) return 0;
       double ret = 0;
       double sum = 0;
       for (double num : nums) {
          sum += num;
       }
       return sum / nums.size();
   }

   public double getSD() {
       double sqDiffs = 0;
       double avg = getAvg();
       for (double num : nums) {
          sqDiffs += (num - avg) * (num - avg);
       }
       return Math.sqrt(sqDiffs / nums.size());
   }

}</lang>

Objective-C

<lang objc>#import <stdio.h>

import <math.h>
import <objc/Object.h>

@interface SDAccum : Object {

 double sum, sum2;
 unsigned int num;

} -(id)init; -(double)value: (double)v; -(unsigned int)count; -(double)mean; -(double)variance; -(double)stddev; @end

@implementation SDAccum -(id)init {

 sum = sum2 = 0.0;
 num = 0;
 return self;

} -(double)value: (double)v {

 sum = sum + v;
 sum2 = sum2 + v*v;
 num++;
 return [self stddev];

} -(unsigned int)count {

 return num;

} -(double)mean {

 return (num>0) ? sum/(double)num : 0.0;

} -(double)variance {

 double m = [self mean];
 return (num>0) ? (sum2/(double)num - m*m) : 0.0;

} -(double)stddev {

 return sqrt([self variance]);

} @end</lang>

<lang objc>double v[] = { 2,4,4,4,5,5,7,9 };

int main() {

 int i;
 double sd;

 id sdacc = [SDAccum new];

 for(i=0; i < sizeof(v)/sizeof(double) ; i++)
   sd = [sdacc value: v[i]];

 printf("std dev = %lf\n", sd);

 return 0;

}</lang>

Perl

<lang perl>{

   package SDAccum;
   sub new {

my $class = shift; my $self = {}; $self->{sum} = 0.0; $self->{sum2} = 0.0; $self->{num} = 0; bless $self, $class; return $self;

   }
   sub count {

my $self = shift; return $self->{num};

   }
   sub mean {

my $self = shift; return ($self->{num}>0) ? $self->{sum}/$self->{num} : 0.0;

   }
   sub variance {

my $self = shift; my $m = $self->mean; return ($self->{num}>0) ? $self->{sum2}/$self->{num} - $m * $m : 0.0;

   }
   sub stddev {

my $self = shift; return sqrt($self->variance);

   }
   sub value {

my $self = shift; my $v = shift; $self->{sum} += $v; $self->{sum2} += $v * $v; $self->{num}++; return $self->stddev;

}</lang>

<lang perl>my $sdacc = SDAccum->new; my $sd;

foreach my $v ( 2,4,4,4,5,5,7,9 ) {

   $sd = $sdacc->value($v);

} print "std dev = $sd\n";</lang>

Python

Using a function with attached properties

The program should work with Python 2.x and 3.x, although the output would not be a tuple in 3.x <lang python>>>> from math import sqrt >>> def sd(x):

   sd.sum  += x
   sd.sum2 += x*x
   sd.n    += 1.0
   sum, sum2, n = sd.sum, sd.sum2, sd.n
   return sqrt(sum2/n - sum*sum/n/n)

>>> sd.sum = sd.sum2 = sd.n = 0 >>> for value in (2,4,4,4,5,5,7,9):

   print (value, sd(value))

(2, 0.0) (4, 1.0) (4, 0.94280904158206258) (4, 0.8660254037844386) (5, 0.97979589711327075) (5, 1.0) (7, 1.3997084244475311) (9, 2.0) >>> </lang>

Using a class instance

<lang python>>>> class SD(object): # Plain () for python 3.x def __init__(self): self.sum, self.sum2, self.n = (0,0,0) def sd(self, x): self.sum += x self.sum2 += x*x self.n += 1.0 sum, sum2, n = self.sum, self.sum2, self.n return sqrt(sum2/n - sum*sum/n/n)

>>> sd_inst = SD() >>> for value in (2,4,4,4,5,5,7,9): print (value, sd_inst.sd(value))</lang>

Using a Closure

Works with: Python version 3.x

<lang python>>>> from math import sqrt >>> def sdcreator(): sum = sum2 = n = 0 def sd(x): nonlocal sum, sum2, n

sum += x sum2 += x*x n += 1.0 return sqrt(sum2/n - sum*sum/n/n) return sd

>>> sd = sdcreator() >>> for value in (2,4,4,4,5,5,7,9): print (value, sd(value))

2 0.0 4 1.0 4 0.942809041582 4 0.866025403784 5 0.979795897113 5 1.0 7 1.39970842445 9 2.0</lang>

R

Built-in Std Dev fn

The built-in standard deviation function applies the Bessel correction. To reverse this, we can apply an uncorrection.
If na.rm is true, missing data points (NA values) are removed.

reverseBesselCorrection <- function(x, na.rm=FALSE)
{
  if(na.rm) x <- x[!is.na(x)]
  len <- length(x)
  if(len < 2) stop("2 or more data points required")
  sqrt((len-1)/len)
}
testdata <- c(2,4,4,4,5,5,7,9)
reverseBesselCorrection(testdata)*sd(testdata) #2

</lang>

From scratch

Again, if na.rm is true, missing data points (NA values) are removed.

uncorrectedsd <- function(x, na.rm=FALSE)
{
  len <- length(x)
  if(len < 2) stop("2 or more data points required")
  mu <- mean(x, na.rm=na.rm)
  ssq <- sum((x - mu)^2, na.rm=na.rm)
  usd <- sqrt(ssq/len)
  usd
}
uncorrectedsd(testdata) #2

</lang>

REXX

Works with: oorexx

This is indeed Object REXX.

<lang orexx>sdacc = .SDAccum~new x = .array~of(2,4,4,4,5,5,7,9) sd = 0 do i = 1 to x~size

  sd = sdacc~value(x[i])

end

say "std dev = "sd

class SDAccum

method sum attribute

method sum2 attribute

method count attribute

method init

 self~sum = 0.0
 self~sum2 = 0.0
 self~count = 0

method value

 expose sum sum2 count
 parse arg x
 sum = sum + x
 sum2 = sum2 + x*x
 count = count + 1
 return self~stddev

method mean

 expose sum count
 return sum/count

method variance

 expose sum2  count
 m = self~mean
 return sum2/count - m*m

method stddev

 return self~sqrt(self~variance)

method sqrt

 arg n
 if n = 0 then return 0
 ans = n / 2
 prev = n
 do until prev = ans
   prev = ans
   ans = ( prev + ( n / prev ) ) / 2
 end
 return ans</lang>

Ruby

Object

Uses an object to keep state.

"Simplification of the formula [...] for standard deviation [...] can be memorized as taking the square root of (the average of the squares less the square of the average)." [1]

<lang ruby>class StdDevAccumulator

 def initialize
   @n, @sum, @sumofsquares = 0, 0.0, 0.0
 end
 
 def <<(num)
   # return self to make this possible:  sd << 1 << 2 << 3 # => 0.816496580927726
   @n += 1
   @sum += num
   @sumofsquares += num**2
   self
 end
 
 def stddev
   Math.sqrt( (@sumofsquares / @n) - (@sum / @n)**2 )
 end
 
 def to_s
   stddev.to_s
 end

end

sd = StdDevAccumulator.new i = 0 [2,4,4,4,5,5,7,9].each {|n| puts "adding #{n}: stddev of #{i+=1} samples is #{sd << n}" }</lang>

adding 2: stddev of 1 samples is 0.0
adding 4: stddev of 2 samples is 1.0
adding 4: stddev of 3 samples is 0.942809041582063
adding 4: stddev of 4 samples is 0.866025403784439
adding 5: stddev of 5 samples is 0.979795897113272
adding 5: stddev of 6 samples is 1.0
adding 7: stddev of 7 samples is 1.39970842444753
adding 9: stddev of 8 samples is 2.0

Closure

<lang ruby>def sdaccum

 n, sum, sum2 = 0, 0.0, 0.0
 lambda do |num|
   n += 1
   sum += num
   sum2 += num**2
   Math.sqrt( (sum2 / n) - (sum / n)**2 )
 end

end

sd = sdaccum [2,4,4,4,5,5,7,9].each {|n| print sd.call(n), ", "}</lang>

0.0, 1.0, 0.942809041582063, 0.866025403784439, 0.979795897113272, 1.0, 1.39970842444753, 2.0,

Smalltalk

Works with: GNU Smalltalk

<lang smalltalk>Object subclass: SDAccum [

   |sum sum2 num|
   SDAccum class >> new [  |o| 
       o := super basicNew.
       ^ o init.
   ]
   init [ sum := 0. sum2 := 0. num := 0 ]
   value: aValue [ 
     sum := sum + aValue.
     sum2 := sum2 + ( aValue * aValue ).
     num := num + 1.
     ^ self stddev
   ]
   count [ ^ num ]
   mean [ num>0 ifTrue: [^ sum / num] ifFalse: [ ^ 0.0 ] ]
   variance [ |m| m := self mean.
              num>0 ifTrue: [^ (sum2/num) - (m*m) ] ifFalse: [ ^ 0.0 ]
            ]
   stddev [ ^ (self variance) sqrt ]

].</lang>

<lang smalltalk>|sdacc sd| sdacc := SDAccum new.

( 2 4 4 4 5 5 7 9 ) do: [ :v | sd := sdacc value: v ].

('std dev = %1' % { sd }) displayNl.</lang>

Tcl

With a Class

Works with: Tcl version 8.6

<lang tcl>oo::class create SDAccum {

   variable sum sum2 num
   constructor {} {
       set sum 0.0
       set sum2 0.0
       set num 0
   }
   method value {x} {
       set sum2 [expr {$sum2 + $x**2}]
       set sum [expr {$sum + $x}]
       incr num
       return [my stddev]
   }
   method count {} {
       return $num
   }
   method mean {} {
       expr {$sum / $num}
   }
   method variance {} {
       expr {$sum2/$num - [my mean]**2}
   }
   method stddev {} {
       expr {sqrt([my variance])}
   }

}

Demonstration

set sdacc [SDAccum new] foreach val {2 4 4 4 5 5 7 9} {

   set sd [$sdacc value $val]

} puts "the standard deviation is: $sd"</lang> which produces the output:

the standard deviation is: 2.0

With a Coroutine

Works with: Tcl version 8.6

<lang tcl># Make a coroutine out of a lambda application coroutine sd apply {{} {

   set sum 0.0
   set sum2 0.0
   set sd {}
   # Keep processing argument values until told not to...
   while {[set val [yield $sd]] ne "stop"} {
       incr n
       set sum [expr {$sum + $val}]
       set sum2 [expr {$sum2 + $val**2}]
       set sd [expr {sqrt($sum2/$n - ($sum/$n)**2)}]
   }

}}

Demonstration

foreach val {2 4 4 4 5 5 7 9} {

   set sd [sd $val]

} sd stop puts "the standard deviation is: $sd"</lang>