Verify distribution uniformity/Naive

From Rosetta Code
Jump to: navigation, search
Task
Verify distribution uniformity/Naive
You are encouraged to solve this task according to the task description, using any language you may know.

This task is an adjunct to Seven-dice from Five-dice.

Create a function to check that the random integers returned from a small-integer generator function have uniform distribution.

The function should take as arguments:

  • The function (or object) producing random integers.
  • The number of times to call the integer generator.
  • A 'delta' value of some sort that indicates how close to a flat distribution is close enough.

The function should produce:

  • Some indication of the distribution achieved.
  • An 'error' if the distribution is not flat enough.

Show the distribution checker working when the produced distribution is flat enough and when it is not. (Use a generator from Seven-dice from Five-dice).

See also:

Contents

[edit] Ada

with Ada.Numerics.Discrete_Random, Ada.Text_IO;
 
procedure Naive_Random is
 
type M_1000 is mod 1000;
package Rand is new Ada.Numerics.Discrete_Random(M_1000);
Gen: Rand.Generator;
 
procedure Perform(Modulus: Positive; Expected, Margin: Natural;
Passed: out boolean) is
Low: Natural  := (100-Margin) * Expected/100;
High: Natural  := (100+Margin) * Expected/100;
Buckets: array(0 .. Modulus-1) of Natural := (others => 0);
Index: Natural;
begin
for I in 1 .. Expected * Modulus loop
Index := Integer(Rand.Random(Gen)) mod Modulus;
Buckets(Index) := Buckets(Index) + 1;
end loop;
Passed := True;
for I in Buckets'Range loop
Ada.Text_IO.Put("Bucket" & Integer'Image(I+1) & ":" &
Integer'Image(Buckets(I)));
if Buckets(I) < Low or else Buckets(I) > High then
Ada.Text_IO.Put_Line(" (failed)");
Passed := False;
else
Ada.Text_IO.Put_Line(" (passed)");
end if;
end loop;
Ada.Text_IO.New_Line;
end Perform;
 
Number_Of_Buckets: Positive := Natural'Value(Ada.Text_IO.Get_Line);
Expected_Per_Bucket: Natural := Natural'Value(Ada.Text_IO.Get_Line);
Margin_In_Percent: Natural := Natural'Value(Ada.Text_IO.Get_Line);
OK: Boolean;
 
begin
Ada.Text_IO.Put_Line( "Buckets:" & Integer'Image(Number_Of_Buckets) &
", Expected:" & Integer'Image(Expected_Per_Bucket) &
", Margin:" & Integer'Image(Margin_In_Percent));
Rand.Reset(Gen);
 
Perform(Modulus => Number_Of_Buckets,
Expected => Expected_Per_Bucket,
Margin => Margin_In_Percent,
Passed => OK);
 
Ada.Text_IO.Put_Line("Test Passed? (" & Boolean'Image(OK) & ")");
end Naive_Random;
Sample run 1 (all buckets good):
7
1000
3 
Buckets: 7, Expected: 1000, Margin: 3
Bucket 1: 1006 (passed)
Bucket 2: 1030 (passed)
Bucket 3: 997 (passed)
Bucket 4: 985 (passed)
Bucket 5: 976 (passed)
Bucket 6: 1024 (passed)
Bucket 7: 982 (passed)

Test Passed? (TRUE)
Sample run 2 (some buckets too large / to small):
7
1000
3
Buckets: 7, Expected: 1000, Margin: 3
Bucket 1: 1034 (failed)
Bucket 2: 985 (passed)
Bucket 3: 1025 (passed)
Bucket 4: 933 (failed)
Bucket 5: 1000 (passed)
Bucket 6: 1016 (passed)
Bucket 7: 1007 (passed)

Test Passed? (FALSE)

[edit] AutoHotkey

MsgBox, % DistCheck("dice7",10000,3)
 
DistCheck(function, repetitions, delta)
{ Loop, % 7 ; initialize array
{ bucket%A_Index% := 0
}
 
Loop, % repetitions ; populate buckets
{ v := %function%()
bucket%v% += 1
}
 
lbnd := round((repetitions/7)*(100-delta)/100)
ubnd := round((repetitions/7)*(100+delta)/100)
text := "Distribution check:`n`nTotal elements = " repetitions
. "`n`nMargin = " delta "% --> Lbound = " lbnd ", Ubound = " ubnd "`n"
Loop, % 7
{ text := text "`nBucket " A_Index " contains " bucket%A_Index% " elements."
If bucket%A_Index% not between %lbnd% and %ubnd%
text := text " Skewed."
}
Return, text
}
Distribution check:

Total elements = 10000

Margin = 3% --> Lbound = 1386, Ubound = 1471

Bucket 1 contains 1450 elements.
Bucket 2 contains 1374 elements. Skewed.
Bucket 3 contains 1412 elements.
Bucket 4 contains 1465 elements.
Bucket 5 contains 1370 elements. Skewed.
Bucket 6 contains 1485 elements. Skewed.
Bucket 7 contains 1444 elements.

[edit] BBC BASIC

      MAXRND = 7
FOR r% = 2 TO 5
check% = FNdistcheck(FNdice5, 10^r%, 0.05)
PRINT "Over "; 10^r% " runs dice5 ";
IF check% THEN
PRINT "failed distribution check with "; check% " bin(s) out of range"
ELSE
PRINT "passed distribution check"
ENDIF
NEXT
END
 
DEF FNdistcheck(RETURN func%, repet%, delta)
LOCAL i%, m%, r%, s%, bins%()
DIM bins%(MAXRND)
FOR i% = 1 TO repet%
r% = FN(^func%)
bins%(r%) += 1
IF r%>m% m% = r%
NEXT
FOR i% = 1 TO m%
IF bins%(i%)/(repet%/m%) > 1+delta s% += 1
IF bins%(i%)/(repet%/m%) < 1-delta s% += 1
NEXT
= s%
 
DEF FNdice5 = RND(5)

Output:

Over 100 runs dice5 failed distribution check with 3 bin(s) out of range
Over 1000 runs dice5 failed distribution check with 1 bin(s) out of range
Over 10000 runs dice5 passed distribution check
Over 100000 runs dice5 passed distribution check

[edit] C

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
 
inline int rand5()
{
int r, rand_max = RAND_MAX - (RAND_MAX % 5);
while ((r = rand()) >= rand_max);
return r / (rand_max / 5) + 1;
}
 
inline int rand5_7()
{
int r;
while ((r = rand5() * 5 + rand5()) >= 27);
return r / 3 - 1;
}
 
/* assumes gen() returns a value from 1 to n */
int check(int (*gen)(), int n, int cnt, double delta) /* delta is relative */
{
int i = cnt, *bins = calloc(sizeof(int), n);
double ratio;
while (i--) bins[gen() - 1]++;
for (i = 0; i < n; i++) {
ratio = bins[i] * n / (double)cnt - 1;
if (ratio > -delta && ratio < delta) continue;
 
printf("bin %d out of range: %d (%g%% vs %g%%), ",
i + 1, bins[i], ratio * 100, delta * 100);
break;
}
free(bins);
return i == n;
}
 
int main()
{
int cnt = 1;
while ((cnt *= 10) <= 1000000) {
printf("Count = %d: ", cnt);
printf(check(rand5_7, 7, cnt, 0.03) ? "flat\n" : "NOT flat\n");
}
 
return 0;
}
output

Count = 10: bin 1 out of range: 1 (-30% vs 3%), NOT flat Count = 100: bin 1 out of range: 11 (-23% vs 3%), NOT flat Count = 1000: bin 1 out of range: 150 (5% vs 3%), NOT flat Count = 10000: bin 3 out of range: 1477 (3.39% vs 3%), NOT flat Count = 100000: flat Count = 1000000: flat

[edit] C++

#include <map>
#include <iostream>
#include <cmath>
 
template<typename F>
bool test_distribution(F f, int calls, double delta)
{
typedef std::map<int, int> distmap;
distmap dist;
 
for (int i = 0; i < calls; ++i)
++dist[f()];
 
double mean = 1.0/dist.size();
 
bool good = true;
 
for (distmap::iterator i = dist.begin(); i != dist.end(); ++i)
{
if (std::abs((1.0 * i->second)/calls - mean) > delta)
{
std::cout << "Relative frequency " << i->second/(1.0*calls)
<< " of result " << i->first
<< " deviates by more than " << delta
<< " from the expected value " << mean << "\n";
good = false;
}
}
 
return good;
}

[edit] Clojure

The code could be shortened if the verify function did the output itself, but the "Clojure way" is to have functions, whenever possible, avoid side effects (like printing) and just return a result. Then the "application-level" code uses doseq and println to display the output to the user. The built-in (rand-int) function is used for all three runs of the verify function, but first with small N to simulate experimental error in a small sample size, then with larger N to show it working properly on large N.

(defn verify [rand n & [delta]]
(let [rands (frequencies (repeatedly n rand))
avg (/ (reduce + (map val rands)) (count rands))
max-delta (* avg (or delta 1/10))
acceptable? #(<= (- avg max-delta) % (+ avg max-delta))]
(for [[num count] (sort rands)]
[num count (acceptable? count)])))
 
(doseq [n [100 1000 10000]
[num count okay?] (verify #(rand-int 7) n)]
(println "Saw" num count "times:"
(if okay? "that's" " not") "acceptable"))
Saw 1 13 times: that's acceptable
Saw 2 15 times: that's acceptable
Saw 3 11 times:    not acceptable
Saw 4 10 times:    not acceptable
Saw 5 19 times:    not acceptable
Saw 6 17 times:    not acceptable
Saw 0 121 times:    not acceptable
Saw 1 128 times:    not acceptable
Saw 2 161 times:    not acceptable
Saw 3 146 times: that's acceptable
Saw 4 134 times: that's acceptable
Saw 5 170 times:    not acceptable
Saw 6 140 times: that's acceptable
Saw 0 1480 times: that's acceptable
Saw 1 1372 times: that's acceptable
Saw 2 1411 times: that's acceptable
Saw 3 1442 times: that's acceptable
Saw 4 1395 times: that's acceptable
Saw 5 1432 times: that's acceptable
Saw 6 1468 times: that's acceptable

[edit] Common Lisp

Translation of: OCaml
(defun check-distribution (function n &optional (delta 1.0))
(let ((bins (make-hash-table)))
(loop repeat n do (incf (gethash (funcall function) bins 0)))
(loop with target = (/ n (hash-table-count bins))
for key being each hash-key of bins using (hash-value value)
when (> (abs (- value target)) (* 0.01 delta n))
do (format t "~&Distribution potentially skewed for ~w:~
expected around ~w got ~w."
key target value)
finally (return bins))))
> (check-distribution 'd7 1000)
Distribution potentially skewed for 1: expected around 1000/7 got 153.
Distribution potentially skewed for 2: expected around 1000/7 got 119.
Distribution potentially skewed for 3: expected around 1000/7 got 125.
Distribution potentially skewed for 7: expected around 1000/7 got 156.
T
#<EQL Hash Table{7} 200B5A53>

> (check-distribution 'd7 10000)
NIL
#<EQL Hash Table{7} 200CB5BB>

[edit] D

import std.stdio, std.string, std.math, std.algorithm, std.traits;
 
/**
Bin the answers to fn() and check bin counts are within
+/- delta % of repeats/bincount.
*/

void distCheck(TF)(in TF func, in int nRepeats, in double delta)
if (isCallable!TF) {
int[int] counts;
foreach (i; 0 .. nRepeats)
counts[func()]++;
immutable double target = nRepeats / cast(double)counts.length;
immutable int deltaCount = cast(int)(delta / 100.0 * target);
 
foreach (k, count; counts)
if (abs(target - count) >= deltaCount)
throw new Exception(format(
"distribution potentially skewed for '%s': '%d'\n",
k, count));
 
foreach (k; counts.keys.sort())
writeln(k, " ", counts[k]);
writeln();
}
 
version (verify_distribution_uniformity_naive_main) {
void main() {
import std.random;
distCheck(() => uniform(1, 6), 1_000_000, 1);
}
}

If compiled with -version=verify_distribution_uniformity_naive_main:

Output:
1 199389
2 2
4 200016
5 200424

[edit] Euler Math Toolbox

Following the task verbatim.

 
>function checkrandom (frand$, n:index, delta:positive real) ...
$ v=zeros(1,n);
$ loop 1 to n; v{#}=frand$(); end;
$ K=max(v);
$ fr=getfrequencies(v,1:K);
$ return max(fr/n-1/K)<delta/sqrt(n);
$ endfunction
>function dice () := intrandom(1,1,6);
>checkrandom("dice",1000000,1)
1
>wd = 0|((1:6)+[-0.01,0.01,0,0,0,0])/6
[ 0 0.165 0.335 0.5 0.666666666667 0.833333333333 1 ]
>function wrongdice () := find(wd,random())
>checkrandom("wrongdice",1000000,1)
0
 

Checking the dice7 from dice5 generator.

 
>function dice5 () := intrandom(1,1,5);
>function dice7 () ...
$ repeat
$ k=(dice5()-1)*5+dice5();
$ if k<=21 then return ceil(k/3); endif;
$ end;
$ endfunction
>checkrandom("dice7",1000000,1)
1
 

Faster implementation with the matrix language.

 
>function dice5(n) := intrandom(1,n,5)-1;
>function dice7(n) ...
$ v=dice5(2*n)*5+dice5(2*n);
$ return v[nonzeros(v<=21)][1:n];
$ endfunction
>test=dice7(1000000);
>function checkrandom (v, delta=1) ...
$ K=max(v); n=cols(v);
$ fr=getfrequencies(v,1:K);
$ return max(fr/n-1/K)<delta/sqrt(n);
$ endfunction
>checkrandom(test)
1
>wd = 0|((1:6)+[-0.01,0.01,0,0,0,0])/6
[ 0 0.165 0.335 0.5 0.666666666667 0.833333333333 1 ]
>function wrongdice (n) := find(wd,random(1,n))
>checkrandom(wrongdice(1000000))
0
 

[edit] Fortran

Works with: Fortran version 95 and later
subroutine distcheck(randgen, n, delta)
 
interface
function randgen
integer :: randgen
end function randgen
end interface
 
real, intent(in) :: delta
integer, intent(in) :: n
integer :: i, mval, lolim, hilim
integer, allocatable :: buckets(:)
integer, allocatable :: rnums(:)
logical :: skewed = .false.
 
allocate(rnums(n))
 
do i = 1, n
rnums(i) = randgen()
end do
 
mval = maxval(rnums)
allocate(buckets(mval))
buckets = 0
 
do i = 1, n
buckets(rnums(i)) = buckets(rnums(i)) + 1
end do
 
lolim = n/mval - n/mval*delta
hilim = n/mval + n/mval*delta
 
do i = 1, mval
if(buckets(i) < lolim .or. buckets(i) > hilim) then
write(*,"(a,i0,a,i0,a,i0)") "Distribution potentially skewed for bucket ", i, " Expected: ", &
n/mval, " Actual: ", buckets(i)
skewed = .true.
end if
end do
 
if (.not. skewed) write(*,"(a)") "Distribution uniform"
 
deallocate(rnums)
deallocate(buckets)
 
end subroutine

[edit] Go

package main
 
import (
"fmt"
"math"
"math/rand"
"time"
)
 
// "given"
func dice5() int {
return rand.Intn(5) + 1
}
 
// function specified by task "Seven-sided dice from five-sided dice"
func dice7() (i int) {
for {
i = 5*dice5() + dice5()
if i < 27 {
break
}
}
return (i / 3) - 1
}
 
// function specified by task "Verify distribution uniformity/Naive"
//
// Parameter "f" is expected to return a random integer in the range 1..n.
// (Values out of range will cause an unceremonious crash.)
// "Max" is returned as an "indication of distribution achieved."
// It is the maximum delta observed from the count representing a perfectly
// uniform distribution.
// Also returned is a boolean, true if "max" is less than threshold
// parameter "delta."
func distCheck(f func() int, n int,
repeats int, delta float64) (max float64, flatEnough bool) {
count := make([]int, n)
for i := 0; i < repeats; i++ {
count[f()-1]++
}
expected := float64(repeats) / float64(n)
for _, c := range count {
max = math.Max(max, math.Abs(float64(c)-expected))
}
return max, max < delta
}
 
// Driver, produces output satisfying both tasks.
func main() {
rand.Seed(time.Now().UnixNano())
const calls = 1000000
max, flatEnough := distCheck(dice7, 7, calls, 500)
fmt.Println("Max delta:", max, "Flat enough:", flatEnough)
max, flatEnough = distCheck(dice7, 7, calls, 500)
fmt.Println("Max delta:", max, "Flat enough:", flatEnough)
}

Output:

Max delta: 356.1428571428696 Flat enough: true
Max delta: 787.8571428571304 Flat enough: false

[edit] Haskell

import System.Random
import Data.List
import Control.Monad
import Control.Arrow
 
distribCheck :: IO Int -> Int -> Int -> IO [(Int,(Int,Bool))]
distribCheck f n d = do
nrs <- replicateM n f
let group = takeWhile (not.null) $ unfoldr (Just. (partition =<< (==). head)) nrs
avg = (fromIntegral n) / fromIntegral (length group)
ul = round $ (100 + fromIntegral d)/100 * avg
ll = round $ (100 - fromIntegral d)/100 * avg
return $ map (head &&& (id &&& liftM2 (&&) (>ll)(<ul)).length) group

Example:

*Main> mapM_ print .sort =<< distribCheck (randomRIO(1,6)) 100000 3
(1,(16911,True))
(2,(16599,True))
(3,(16670,True))
(4,(16624,True))
(5,(16526,True))
(6,(16670,True))

[edit] Icon and Unicon

This example assumes the random number generator is passed to verify_uniform as a co-expression. The co-expression rnd is prompted for its next value using @rnd. The co-expression is created using create (|?10) where the vertical bar means generate an infinite sequence of what is to its right (in this case, ?10 generates a random integer in the range [1,10]).

# rnd  : a co-expression, which will generate the random numbers
# n  : the number of numbers to test
# delta: tolerance in non-uniformity
# This procedure fails if after the sampling the difference
# in uniformity exceeds delta, a proportion of n / number-of-alternatives
procedure verify_uniform (rnd, n, delta)
# generate a table counting the outcome of the generator
results := table (0)
every (1 to n) do results[@rnd] +:= 1
# retrieve the statistics
smallest := n
largest := 0
every num := key(results) do { # record result and limits
write (num || " " || results[num])
if results[num] < smallest then smallest := results[num]
if results[num] > largest then largest := results[num]
}
# decide if difference is within bounds defined by delta
return largest-smallest < delta * n / *results
end
 
procedure main ()
gen_1 := create (|?10) # uniform integers, 1 to 10
if verify_uniform (gen_1, 1000000, 0.01)
then write ("uniform")
else write ("skewed")
gen_2 := create (|(if ?2 = 1 then 6 else ?10)) # skewed integers, 1 to 10
if verify_uniform (gen_2, 1000000, 0.01)
then write ("uniform")
else write ("skewed")
end

Output:

5 99988
2 99998
10 99894
7 99948
4 100271
1 99917
9 99846
6 99943
3 99824
8 100371
uniform
5 49940
2 50324
10 50243
7 49982
4 50295
1 50162
9 49800
6 549190
3 50137
8 49927
skewed

[edit] J

This solution defines the checker as an adverb. Adverbs combine with the verb immediately to their left to create a new verb. So for a verb generateDistribution and an adverb checkUniform, the new verb might be thought of as checkGeneratedDistributionisUniform.

The delta is given as an optional left argument (x), defaulting to 5%. The right argument (y) is any valid argument to the distribution generating verb.

checkUniform=: adverb define
0.05 u checkUniform y
 :
n=. */y
delta=. x
sample=. u n NB. the "u" refers to the verb to left of adverb
freqtable=. /:~ (~. sample) ,. #/.~ sample
expected=. n % # freqtable
errmsg=. 'Distribution is potentially skewed'
errmsg assert (delta * expected) > | expected - {:"1 freqtable
freqtable
)

It is possible to use tacit expressions within an explicit definition enabling a more functional and concise style:

checkUniformT=: adverb define    
0.05 u checkUniformT y
 :
freqtable=. /:~ (~. ,. #/.~) u n=. */y
errmsg=. 'Distribution is potentially skewed'
errmsg assert ((n % #) (x&*@[ > |@:-) {:"1) freqtable
freqtable
)

Show usage using rollD7t given in Seven-dice from Five-dice:

   0.05 rollD7t checkUniform 1e5
1 14082
2 14337
3 14242
4 14470
5 14067
6 14428
7 14374
0.05 rollD7t checkUniform 1e2
|Distribution is potentially skewed: assert
| errmsg assert(delta*expected)>|expected-{:"1 freqtable

[edit] JavaScript

Translation of: Tcl
function distcheck(random_func, times, opts) {
if (opts === undefined) opts = {}
opts['delta'] = opts['delta'] || 2;
 
var count = {}, vals = [];
for (var i = 0; i < times; i++) {
var val = random_func();
if (! has_property(count, val)) {
count[val] = 1;
vals.push(val);
}
else
count[val] ++;
}
vals.sort(function(a,b) {return a-b});
 
var target = times / vals.length;
var tolerance = target * opts['delta'] / 100;
 
for (var i = 0; i < vals.length; i++) {
var val = vals[i];
if (Math.abs(count[val] - target) > tolerance)
throw "distribution potentially skewed for " + val +
": expected result around " + target + ", got " +count[val];
else
print(val + "\t" + count[val]);
}
}
 
function has_property(obj, propname) {
return typeof(obj[propname]) == "undefined" ? false : true;
}
 
try {
distcheck(function() {return Math.floor(10 * Math.random())}, 100000);
print();
distcheck(function() {return (Math.random() > 0.95 ? 1 : 0)}, 100000);
} catch (e) {
print(e);
}

Output:

0       9945
1       9862
2       9954
3       10104
4       9861
5       10140
6       10066
7       10001
8       10101
9       9966

distribution potentially skewed for 0: expected result around 50000, got 95040

[edit] Mathematica

SetAttributes[CheckDistribution, HoldFirst]
CheckDistribution[function_,number_,delta_] :=(Print["Expected: ", N[number/7], ", Generated :",
Transpose[Tally[Table[function, {number}]]][[2]] // Sort]; If[(Max[#]-Min[#])&
[Transpose[Tally[Table[function, {number}]]][[2]]] < delta*number/700, "Flat", "Skewed"])

Example usage:

CheckDistribution[RandomInteger[{1, 7}], 10000, 5]  

->Expected: 1428.57, Generated :{1372,1420,1429,1431,1433,1450,1465}
->"Skewed"

CheckDistribution[RandomInteger[{1, 7}], 100000, 5]  

->Expected: 14285.7, Generated :{14182,14186,14240,14242,14319,14407,14424}
->"Flat"

[edit] OCaml

let distcheck fn n ?(delta=1.0) () = 
let h = Hashtbl.create 5 in
for i = 1 to n do
let v = fn() in
let n =
try Hashtbl.find h v
with Not_found -> 0
in
Hashtbl.replace h v (n+1)
done;
Hashtbl.iter (fun v n -> Printf.printf "%d => %d\n%!" v n) h;
let target = (float n) /. float (Hashtbl.length h) in
Hashtbl.iter (fun key value ->
if abs_float(float value -. target) > 0.01 *. delta *. (float n)
then (Printf.eprintf
"distribution potentially skewed for '%d': expected around %f, got %d\n%!"
key target value)
) h;
;;

[edit] PARI/GP

This tests the purportedly random 7-sided die with a slightly biased 1000-sided die.

dice5()=random(5)+1;
 
dice7()={
my(t);
while((t=dice5()*5+dice5()) > 21,);
(t+2)\3
};
 
cumChi2(chi2,dof)={
my(g=gamma(dof/2));
incgam(dof/2,chi2/2,g)/g
};
 
test(f,n,alpha=.05)={
v=vector(n,i,f());
my(s,ave,dof,chi2,p);
s=sum(i=1,n,v[i],0.);
ave=s/n;
dof=n-1;
chi2=sum(i=1,n,(v[i]-ave)^2)/ave;
p=cumChi2(chi2,dof);
if(p<alpha,
error("Not flat enough, significance only "p)
,
print("Flat with significance "p);
)
};
 
test(dice7, 10^5)
test(()->if(random(1000),random(1000),1), 10^5)

Output:

Flat with significance 1.000000000000000000

  ###   user error: Not flat enough, significance only 5.391077606003910233 E-3500006

[edit] Perl 6

Since the tested function is rolls of a 7 sided die, the test numbers are magnitudes of 10x bumped up to the closest multiple of 7. This reduces spurious error from there not being an integer expected value.

my $d7 = 1..7;
sub roll7 { $d7.roll };
 
my $threshold = 3;
 
for 14, 105, 1001, 10003, 100002, 1000006 -> $n
{ dist( $n, $threshold, &roll7 ) };
 
 
sub dist ( $n is copy, $threshold, &producer ) {
my @dist;
my $expect = $n / 7;
say "Expect\t",$expect.fmt("%.3f");
 
loop ($_ = $n; $n; --$n) { @dist[&producer()]++; }
 
for @dist.kv -> $i, $v is copy {
next unless $i;
$v //= 0;
my $pct = ($v - $expect)/$expect*100;
printf "%d\t%d\t%+.2f%% %s\n", $i, $v, $pct,
($pct.abs > $threshold ?? '- skewed' !! '');
}
say '';
}

Sample output:

Expect  2.000
1       2       +0.00%
2       3       +50.00% - skewed
3       2       +0.00%
4       2       +0.00%
5       3       +50.00% - skewed
6       0       -100.00% - skewed
7       2       +0.00%

Expect  15.000
1       15      +0.00%
2       17      +13.33% - skewed
3       13      -13.33% - skewed
4       16      +6.67% - skewed
5       14      -6.67% - skewed
6       16      +6.67% - skewed
7       14      -6.67% - skewed

Expect  143.000
1       134     -6.29% - skewed
2       142     -0.70%
3       141     -1.40%
4       137     -4.20% - skewed
5       142     -0.70%
6       170     +18.88% - skewed
7       135     -5.59% - skewed

Expect  1429.000
1       1396    -2.31%
2       1468    +2.73%
3       1405    -1.68%
4       1442    +0.91%
5       1453    +1.68%
6       1417    -0.84%
7       1422    -0.49%

Expect  14286.000
1       14222   -0.45%
2       14320   +0.24%
3       14326   +0.28%
4       14425   +0.97%
5       14140   -1.02%
6       14275   -0.08%
7       14294   +0.06%

Expect  142858.000
1       142510  -0.24%
2       142436  -0.30%
3       142438  -0.29%
4       143152  +0.21%
5       142905  +0.03%
6       143232  +0.26%
7       143333  +0.33%

[edit] PicoLisp

The following function takes a count, and allowed deviation in per mill (one-tenth of a percent), and a 'prg' code body (i.e. an arbitrary number of executable expressions).

(de checkDistribution (Cnt Pm . Prg)
(let Res NIL
(do Cnt (accu 'Res (run Prg 1) 1))
(let
(N (/ Cnt (length Res))
Min (*/ N (- 1000 Pm) 1000)
Max (*/ N (+ 1000 Pm) 1000) )
(for R Res
(prinl (cdr R) " " (if (>= Max (cdr R) Min) "Good" "Bad")) ) ) ) )

Output:

: (checkDistribution 100000 5 (rand 1 7))
14299 Good
14394 Bad
14147 Bad
14418 Bad
14159 Bad
14271 Good
14312 Good

[edit] PureBasic

Prototype RandNum_prt()
 
Procedure.s distcheck(*function.RandNum_prt, repetitions, delta.d)
Protected text.s, maxIndex = 0
Dim bucket(maxIndex) ;array will be resized as needed
 
For i = 1 To repetitions ;populate buckets
v = *function()
If v > maxIndex
maxIndex = v
Redim bucket(maxIndex)
EndIf
bucket(v) + 1
Next
 
 
lbnd = Round((repetitions / maxIndex) * (100 - delta) / 100, #PB_Round_Up)
ubnd = Round((repetitions / maxIndex) * (100 + delta) / 100, #PB_Round_Down)
text = "Distribution check:" + #crlf$ + #crlf$
text + "Total elements = " + Str(repetitions) + #crlf$ + #crlf$
text + "Margin = " + StrF(delta, 2) + "% --> Lbound = " + Str(lbnd) + ", Ubound = " + Str(ubnd) + #crlf$
 
For i = 1 To maxIndex
If bucket(i) < lbnd Or bucket(i) > ubnd
text + #crlf$ + "Bucket " + Str(i) + " contains " + Str(bucket(i)) + " elements. Skewed."
EndIf
Next
ProcedureReturn text
EndProcedure
 
MessageRequester("Results", distcheck(@dice7(), 1000000, 0.20))

A small delta was chosen to increase the chance of a skewed result in the sample output:

Distribution check:

Total elements = 1000000

Margin = 0.20% --> Lbound = 142572, Ubound = 143142

Bucket 1 contains 141977 elements.  Skewed.
Bucket 6 contains 143860 elements.  Skewed.

[edit] Python

Works with: Python version 3.1
from collections import Counter
from pprint import pprint as pp
 
def distcheck(fn, repeats, delta):
'''\
Bin the answers to fn() and check bin counts are within +/- delta %
of repeats/bincount'''

bin = Counter(fn() for i in range(repeats))
target = repeats // len(bin)
deltacount = int(delta / 100. * target)
assert all( abs(target - count) < deltacount
for count in bin.values() ), "Bin distribution skewed from %i +/- %i: %s" % (
target, deltacount, [ (key, target - count)
for key, count in sorted(bin.items()) ]
)
pp(dict(bin))

Sample output:

>>> distcheck(dice5, 1000000, 1)
{1: 200244, 2: 199831, 3: 199548, 4: 199853, 5: 200524}
>>> distcheck(dice5, 1000, 1)
Traceback (most recent call last):
  File "<pyshell#30>", line 1, in <module>
    distcheck(dice5, 1000, 1)
  File "C://Paddys/rand7fromrand5.py", line 54, in distcheck
    for key, count in sorted(bin.items()) ]
AssertionError: Bin distribution skewed from 200 +/- 2: [(1, 4), (2, -33), (3, 6), (4, 11), (5, 12)]

[edit] R

distcheck <- function(fn, repetitions=1e4, delta=3)
{
if(is.character(fn))
{
fn <- get(fn)
}
if(!is.function(fn))
{
stop("fn is not a function")
}
samp <- fn(n=repetitions)
counts <- table(samp)
expected <- repetitions/length(counts)
lbound <- expected * (1 - 0.01*delta)
ubound <- expected * (1 + 0.01*delta)
status <- ifelse(counts < lbound, "under",
ifelse(counts > ubound, "over", "okay"))
data.frame(value=names(counts), counts=as.vector(counts), status=status)
}
distcheck(dice7.vec)

[edit] Racket

This example is incorrect. arguments don't fit the task description. Please fix the code and remove this message.

The code below implements a chi-squared test in order to determine whether a given source of random numbers produce uniformly distributed numbers.

 
#lang racket
(require
racket/flonum (planet williams/science:4:5/science)
(only-in (planet williams/science/unsafe-ops-utils) real->float))
 
; (chi^2-goodness-of-fit-test observed expected df)
; Given: observed, a sequence of observed frequencies
; expected, a sequence of expected frequencies
; df, the degrees of freedom
; Result: P-value = 1-chi^2cdf(X^2,df) , the p-value
(define (chi^2-goodness-of-fit-test observed expected df)
(define X^2 (for/sum ([o observed] [e expected])
(/ (sqr (- o e)) e)))
(- 1.0 (chi-squared-cdf X^2 df)))
 
 
(define (is-uniform? rand n α)
 ; Use significance level α to test whether
 ; n small random numbers generated by rand
 ; have a uniform distribution.
 
 ; Observed values:
(define o (make-vector 10 0))
 ; generate n random integers from 0 to 9.
(for ([_ (+ n 1)])
(define r (rand 10))
(vector-set! o r (+ (vector-ref o r) 1)))
 ; Expected values:
(define ex (make-vector 10 (/ n 10)))
 
 ; Calculate the P-value:
(define P (chi^2-goodness-of-fit-test o ex (- n 1)))
 
 ; If the P-value is larger than α we accept the
 ; hypothesis that the numbers are distributed uniformly.
(> P α))
 
; Test whether the builtin generator is uniform:
(is-uniform? random 1000 0.05)
; Test whether the constant generator fails:
(is-uniform? (λ(_) 5) 1000 0.05)
 

Output:

 
#t
#f
 

[edit] Ruby

Translation of: Tcl
def distcheck(n, delta=1)
unless block_given?
raise ArgumentError, "pass a block to this method"
end
 
h = Hash.new(0)
n.times {h[ yield ] += 1}
 
target = 1.0 * n / h.length
h.each do |key, value|
if (value - target).abs > 0.01 * delta * n
raise StandardError,
"distribution potentially skewed for '#{key}': expected around #{target}, got #{value}"
end
end
 
h.keys.sort.each {|k| print "#{k} #{h[k]} "}
puts
end
 
if __FILE__ == $0
begin
distcheck(100_000) {rand(10)}
distcheck(100_000) {rand > 0.95}
rescue StandardError => e
p e
end
end

output:

0 9986 1 9826 2 9861 3 10034 4 9876 5 10114 6 10329 7 9924 8 10123 9 9927 
#<StandardError: distribution potentially skewed for 'false': expected around 50000.0, got 94841>


[edit] Run BASIC

s$ = "#########################"
dim num(100)
for i = 1 to 1000
n = (rnd(1) * 10) + 1
num(n) = num(n) + 1
next i
 
for i = 1 to 10
print using("###",i);" "; using("#####",num(i));" ";left$(s$,num(i) / 5)
next i
 1    90 ##################
 2   110 ######################
 3   105 #####################
 4   100 ####################
 5   107 #####################
 6   133 #########################
 7    85 #################
 8    96 ###################
 9    82 ################
10 92 ##################*

[edit] Tcl

proc distcheck {random times {delta 1}} {
for {set i 0} {$i<$times} {incr i} {incr vals([uplevel 1 $random])}
set target [expr {$times / [array size vals]}]
foreach {k v} [array get vals] {
if {abs($v - $target) > $times * $delta / 100.0} {
error "distribution potentially skewed for $k: expected around $target, got $v"
}
}
foreach k [lsort -integer [array names vals]] {lappend result $k $vals($k)}
return $result
}

Demonstration:

# First, a uniformly distributed random variable
puts [distcheck {expr {int(10*rand())}} 100000]
 
# Now, one that definitely isn't!
puts [distcheck {expr {rand()>0.95}} 100000]

Which produces this output (error in red):

0 10003 1 9851 2 10058 3 10193 4 10126 5 10002 6 9852 7 9964 8 9957 9 9994
distribution potentially skewed for 0: expected around 50000, got 94873

[edit] VBScript

Option Explicit
 
sub verifydistribution(calledfunction, samples, delta)
Dim i, n, maxdiff
'We could cheat via Dim d(7), but "7" wasn't mentioned in the Task. Heh.
Dim d : Set d = CreateObject("Scripting.Dictionary")
wscript.echo "Running """ & calledfunction & """ " & samples & " times..."
for i = 1 to samples
Execute "n = " & calledfunction
d(n) = d(n) + 1
next
n = d.Count
maxdiff = 0
wscript.echo "Expected average count is " & Int(samples/n) & " across " & n & " buckets."
for each i in d.Keys
dim diff : diff = abs(1 - d(i) / (samples/n))
if diff > maxdiff then maxdiff = diff
wscript.echo "Bucket " & i & " had " & d(i) & " occurences" _
& vbTab & " difference from expected=" & FormatPercent(diff, 2)
next
wscript.echo "Maximum found variation is " & FormatPercent(maxdiff, 2) _
& ", desired limit is " & FormatPercent(delta, 2) & "."
if maxdiff > delta then wscript.echo "Skewed!" else wscript.echo "Smooth!"
end sub

Demonstration with included Seven-sided dice from five-sided dice#VBScript code:

verifydistribution "dice7", 1000, 0.03
verifydistribution "dice7", 100000, 0.03

Which produces this output:

Running "dice7" 1000 times...
Expected average count is 142 across 7 buckets.
Bucket 2 had 150 occurences      difference from expected=5.00%
Bucket 7 had 147 occurences      difference from expected=2.90%
Bucket 6 had 146 occurences      difference from expected=2.20%
Bucket 5 had 141 occurences      difference from expected=1.30%
Bucket 1 had 152 occurences      difference from expected=6.40%
Bucket 4 had 115 occurences      difference from expected=19.50%
Bucket 3 had 149 occurences      difference from expected=4.30%
Maximum found variation is 19.50%, desired limit is 3.00%.
Skewed!
Running "dice7" 100000 times...
Expected average count is 14285 across 7 buckets.
Bucket 5 had 14420 occurences    difference from expected=0.94%
Bucket 4 had 14298 occurences    difference from expected=0.09%
Bucket 2 had 14202 occurences    difference from expected=0.59%
Bucket 7 had 14201 occurences    difference from expected=0.59%
Bucket 6 had 14237 occurences    difference from expected=0.34%
Bucket 3 had 14263 occurences    difference from expected=0.16%
Bucket 1 had 14379 occurences    difference from expected=0.65%
Maximum found variation is 0.94%, desired limit is 3.00%.
Smooth!
Personal tools
Namespaces

Variants
Actions
Community
Explore
Misc
Toolbox