Verify distribution uniformity/Naive: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Tcl: Added implementation)
(→‎{{header|Tcl}}: Attempt to write down a test that is based on the chi2 test)
Line 64: Line 64:
0 10003 1 9851 2 10058 3 10193 4 10126 5 10002 6 9852 7 9964 8 9957 9 9994
0 10003 1 9851 2 10058 3 10193 4 10126 5 10002 6 9852 7 9964 8 9957 9 9994
<span style="color:red">distribution potentially skewed for 0: expected around 50000, got 94873</span>
<span style="color:red">distribution potentially skewed for 0: expected around 50000, got 94873</span>

An alternative is to use the [[wp:Pearson's chi-square test|<math>\chi^2</math> test]] to see whether the hypothesis that the data is uniformly distributed is satisfied.

{{works with|Tcl|8.5}}
{{libheader|tcllib}}
<lang tcl>package require math
interp alias {} tcl::mathfunc::lnGamma {} math::ln_Gamma
proc tcl::mathfunc::chi2 {k x} {
set k2 [expr {$k / 2.0}]
expr {exp(log(0.5)*$k2 + log($x) * ($k2 - 1) - $x/2.0 - lnGamma($k2))}
}

proc isUniform {distribution {significance 0.05}} {
set count [tcl::mathop::+ {*}[dict values $distribution]]
set expected [expr {double($count) / [dict size $distribution]}]
set X2 0.0
foreach value [dict values $distribution] {
set X2 [expr {$X2 + ($value - $expected)**2 / $expected}]
}
set freedom [expr {[dict size $distribution] - 1}]
expr {chi2($freedom, $X2) > $significance}
}</lang>
The computing of the distribution to check is trivial (and part of the <code>distcheck</code>) and so is omitted here for clarity.

Revision as of 14:18, 8 August 2009

Task
Verify distribution uniformity/Naive
You are encouraged to solve this task according to the task description, using any language you may know.

This task is an adjunct to Seven-dice from Five-dice.

Create a function to check that the random integers returned from a small-integer generator function have uniform distribution.

The function should take as arguments:

  • The function producing random integers.
  • The number of times to call the integer generator.
  • A 'delta' value of some sort that indicates how close to a flat distribution is close enough.

The function should produce:

  • Some indication of the distribution achieved.
  • An 'error' if the distribution is not flat enough.

Show the distribution checker working when the produced distribution is flat enough and when it is not. (Use a generator from Seven-dice from Five-dice).

Python

<lang python>from collections import Counter from pprint import pprint as pp

def distcheck(fn, repeats, delta):

   \
   Bin the answers to fn() and check bin counts are within +/- delta %
   of repeats/bincount
   bin = Counter(fn() for i in range(repeats))
   target = repeats // len(bin)
   deltacount = int(delta / 100. * target)
   assert all( abs(target - count) < deltacount
               for count in bin.values() ), "Bin distribution skewed from %i +/- %i: %s" % (
                   target, deltacount, [ (key, target - count)
                                         for key, count in sorted(bin.items()) ]
                   )
   pp(dict(bin))</lang>

Sample output:

>>> distcheck(dice5, 1000000, 1)
{1: 200244, 2: 199831, 3: 199548, 4: 199853, 5: 200524}
>>> distcheck(dice5, 1000, 1)
Traceback (most recent call last):
  File "<pyshell#30>", line 1, in <module>
    distcheck(dice5, 1000, 1)
  File "C://Paddys/rand7fromrand5.py", line 54, in distcheck
    for key, count in sorted(bin.items()) ]
AssertionError: Bin distribution skewed from 200 +/- 2: [(1, 4), (2, -33), (3, 6), (4, 11), (5, 12)]

Tcl

<lang tcl>proc distcheck {random times {delta 1}} {

   for {set i 0} {$i<$times} {incr i} {incr vals([uplevel 1 $random])}
   set target [expr {$times / [array size vals]}]
   foreach {k v} [array get vals] {
       if {abs($v - $target) > $times  * $delta / 100.0} {
          error "distribution potentially skewed for $k: expected around $target, got $v"
       }
   }
   foreach k [lsort -integer [array names vals]] {lappend result $k $vals($k)}
   return $result

}</lang> Demonstration: <lang tcl># First, a uniformly distributed random variable puts [distcheck {expr {int(10*rand())}} 100000]

  1. Now, one that definitely isn't!

puts [distcheck {expr {rand()>0.95}} 100000]</lang> Which produces this output (error in red):

0 10003 1 9851 2 10058 3 10193 4 10126 5 10002 6 9852 7 9964 8 9957 9 9994
distribution potentially skewed for 0: expected around 50000, got 94873

An alternative is to use the test to see whether the hypothesis that the data is uniformly distributed is satisfied.

Works with: Tcl version 8.5
Library: tcllib

<lang tcl>package require math interp alias {} tcl::mathfunc::lnGamma {} math::ln_Gamma proc tcl::mathfunc::chi2 {k x} {

   set k2 [expr {$k / 2.0}]
   expr {exp(log(0.5)*$k2 + log($x) * ($k2 - 1) - $x/2.0 - lnGamma($k2))}

}

proc isUniform {distribution {significance 0.05}} {

   set count [tcl::mathop::+ {*}[dict values $distribution]]
   set expected [expr {double($count) / [dict size $distribution]}]
   set X2 0.0
   foreach value [dict values $distribution] {

set X2 [expr {$X2 + ($value - $expected)**2 / $expected}]

   }
   set freedom [expr {[dict size $distribution] - 1}]
   expr {chi2($freedom, $X2) > $significance}

}</lang> The computing of the distribution to check is trivial (and part of the distcheck) and so is omitted here for clarity.