Verify distribution uniformity/Chi-squared test: Difference between revisions

m
(Racket version)
m (→‎{{header|Wren}}: Minor tidy)
 
(46 intermediate revisions by 25 users not shown)
Line 1:
{{task|Probability and statistics}}
In this task, write a function to verify that a given distribution of values is uniform by using the [[wp:Pearson's chi-square test|<math>\chi^2</math> test]] to see if the distribution has a likelihood of happening of at least the significance level (conventionally 5%). The function should return a boolean that is true if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce.
 
;Task:
More information about the <math>\chi^2</math> distribution may be found at
Write a function to determine whether a given set of frequency counts could plausibly have come from a uniform distribution by using the [[wp:Pearson's chi-square test|<math>\chi^2</math> test]] with a significance level of 5%.
[http://mathworld.wolfram.com/Chi-SquaredDistribution.html mathworld].
 
The function should return a boolean that is true if and only if the distribution is one that a uniform distribution (with appropriate number of degrees of freedom) may be expected to produce.
 
Note: normally a two-tailed test would be used for this kind of problem.
 
;Reference:
:* &nbsp; an entry at the MathWorld website: &nbsp; [http://mathworld.wolfram.com/Chi-SquaredDistribution.html chi-squared distribution].
; Related task:
:* [[Statistics/Chi-squared_distribution]]
<br><br>
=={{header|11l}}==
{{trans|Python}}
 
<syntaxhighlight lang="11l">V a = 12
V k1_factrl = 1.0
[Float] c
c.append(sqrt(2.0 * math:pi))
L(k) 1 .< a
c.append(exp(a - k) * (a - k) ^ (k - 0.5) / k1_factrl)
k1_factrl *= -k
 
F gamma_spounge(z)
V accm = :c[0]
L(k) 1 .< :a
accm += :c[k] / (z + k)
accm *= exp(-(z + :a)) * (z + :a) ^ (z + 0.5)
R accm / z
 
F GammaInc_Q(a, x)
V a1 = a - 1
V a2 = a - 2
F f0(t)
R t ^ @a1 * exp(-t)
 
F df0(t)
R (@a1 - t) * t ^ @a2 * exp(-t)
 
V y = a1
L f0(y) * (x - y) > 2.0e-8 & y < x
y += 0.3
I y > x
y = x
 
V h = 3.0e-4
V n = Int(y / h)
h = y / n
V hh = 0.5 * h
V gamax = h * sum(((n - 1 .< -1).step(-1).map(j -> @h * j)).map(t -> @f0(t) + @hh * @df0(t)))
 
R gamax / gamma_spounge(a)
 
F chi2UniformDistance(dataSet)
V expected = sum(dataSet) * 1.0 / dataSet.len
V cntrd = (dataSet.map(d -> d - @expected))
R sum(cntrd.map(x -> x * x)) / expected
 
F chi2Probability(dof, distance)
R 1.0 - GammaInc_Q(0.5 * dof, 0.5 * distance)
 
F chi2IsUniform(dataSet, significance)
V dof = dataSet.len - 1
V dist = chi2UniformDistance(dataSet)
R chi2Probability(dof, dist) > significance
 
V dset1 = [199809, 200665, 199607, 200270, 199649]
V dset2 = [522573, 244456, 139979, 71531, 21461]
 
L(ds) (dset1, dset2)
print(‘Data set: ’ds)
V dof = ds.len - 1
V distance = chi2UniformDistance(ds)
print(‘dof: #. distance: #.4’.format(dof, distance), end' ‘ ’)
V prob = chi2Probability(dof, distance)
print(‘probability: #.4’.format(prob), end' ‘ ’)
print(‘uniform? ’(I chi2IsUniform(ds, 0.05) {‘Yes’} E ‘No’))</syntaxhighlight>
 
{{out}}
<pre>
Data set: [199809, 200665, 199607, 200270, 199649]
dof: 4 distance: 4.1463 probability: 0.3866 uniform? Yes
Data set: [522573, 244456, 139979, 71531, 21461]
dof: 4 distance: 790063.2759 probability: -1.5002e-8 uniform? No
</pre>
 
=={{header|Ada}}==
First, we specifayspecify a simple package to compute the Chi-Square Distance from the uniform distribution:
<langsyntaxhighlight Adalang="ada">package Chi_Square is
type Flt is digits 18;
Line 13 ⟶ 96:
function Distance(Bins: Bins_Type) return Flt;
end Chi_Square;</langsyntaxhighlight>
 
Next, we implement that package:
 
<langsyntaxhighlight Adalang="ada">package body Chi_Square is
function Distance(Bins: Bins_Type) return Flt is
Line 43 ⟶ 126:
end Distance;
end Chi_Square;</langsyntaxhighlight>
 
Finally, we actually implement the Chi-square test. We do not actually compute the Chi-square probability; rather we hardcode a table of values for 5% significance level, which has been picked from Wikipedia [http://en.wikipedia.org/wiki/Chi-squared_distribution]:
<langsyntaxhighlight Adalang="ada">with Ada.Text_IO, Ada.Command_Line, Chi_Square; use Ada.Text_IO;
 
procedure Test_Chi_Square is
Line 73 ⟶ 156:
Put_Line("; (deviates significantly from uniform)");
end if;
end;</langsyntaxhighlight>
 
{{out}}
Line 80 ⟶ 163:
$ ./Test_Chi_Square 522573 244456 139979 71531 21461
Degrees of Freedom: 4, Distance: 790063.28; (deviates significantly from uniform)</pre>
 
 
 
 
=={{header|C}}==
This first sections contains the functions required to compute the Chi-Squared probability.
These are not needed if a library containing the necessary function is availabile (e.g. see [[Numerical Integration]], [[Gamma function]]).
<langsyntaxhighlight lang="c">#include <stdlib.h>
#include <stdio.h>
#include <math.h>
Line 154 ⟶ 234:
 
return 1.0 - Simpson3_8( &f0, 0, y, (int)(y/h))/Gamma_Spouge(a);
}</langsyntaxhighlight>
This section contains the functions specific to the task.
<langsyntaxhighlight lang="c">double chi2UniformDistance( double *ds, int dslen)
{
double expected = 0.0;
Line 183 ⟶ 263:
double dist = chi2UniformDistance( dset, dslen);
return chi2Probability( dof, dist ) > significance;
}</langsyntaxhighlight>
Testing
<langsyntaxhighlight lang="c">int main(int argc, char **argv)
{
double dset1[] = { 199809., 200665., 199607., 200270., 199649. };
Line 209 ⟶ 289:
}
return 0;
}</langsyntaxhighlight>
 
=={{header|C#}}==
{{trans|Go}}
<syntaxhighlight lang="C#">
using System;
 
class Program
{
public delegate double Func(double x);
public static double Simpson38(Func f, double a, double b, int n)
{
double h = (b - a) / n;
double h1 = h / 3;
double sum = f(a) + f(b);
for (int j = 3 * n - 1; j > 0; j--)
{
if (j % 3 == 0)
{
sum += 2 * f(a + h1 * j);
}
else
{
sum += 3 * f(a + h1 * j);
}
}
 
return h * sum / 8;
}
 
// Lanczos Approximation for Gamma Function
private static double SpecialFunctionGamma(double z)
{
double[] p =
{
676.5203681218851,
-1259.1392167224028,
771.32342877765313,
-176.61502916214059,
12.507343278686905,
-0.13857109526572012,
9.9843695780195716e-6,
1.5056327351493116e-7
};
if (z < 0.5)
return Math.PI / (Math.Sin(Math.PI * z) * SpecialFunctionGamma(1 - z));
z -= 1;
double x = 0.99999999999980993;
for (int i = 0; i < p.Length; i++)
{
x += p[i] / (z + i + 1);
}
 
double t = z + p.Length - 0.5;
return Math.Sqrt(2 * Math.PI) * Math.Pow(t, z + 0.5) * Math.Exp(-t) * x;
}
 
public static double GammaIncQ(double a, double x)
{
double aa1 = a - 1;
Func f = t => Math.Pow(t, aa1) * Math.Exp(-t);
double y = aa1;
double h = 1.5e-2;
while (f(y) * (x - y) > 2e-8 && y < x)
{
y += .4;
}
 
if (y > x)
{
y = x;
}
 
return 1 - Simpson38(f, 0, y, (int)(y / h / SpecialFunctionGamma(a)));
}
 
public static double Chi2Ud(int[] ds)
{
double sum = 0, expected = 0;
foreach (var d in ds)
{
expected += d;
}
 
expected /= ds.Length;
foreach (var d in ds)
{
double x = d - expected;
sum += x * x;
}
 
return sum / expected;
}
 
public static double Chi2P(int dof, double distance)
{
return GammaIncQ(.5 * dof, .5 * distance);
}
 
const double SigLevel = .05;
static void Main(string[] args)
{
int[][] datasets = new int[][]
{
new int[]
{
199809,
200665,
199607,
200270,
199649
},
new int[]
{
522573,
244456,
139979,
71531,
21461
},
};
foreach (var dset in datasets)
{
UTest(dset);
}
}
 
static void UTest(int[] dset)
{
Console.WriteLine("Uniform distribution test");
int sum = 0;
foreach (var c in dset)
{
sum += c;
}
 
Console.WriteLine($" dataset: {string.Join(", ", dset)}");
Console.WriteLine($" samples: {sum}");
Console.WriteLine($" categories: {dset.Length}");
int dof = dset.Length - 1;
Console.WriteLine($" degrees of freedom: {dof}");
double dist = Chi2Ud(dset);
Console.WriteLine($" chi square test statistic: {dist}");
double p = Chi2P(dof, dist);
Console.WriteLine($" p-value of test statistic: {p}");
bool sig = p < SigLevel;
Console.WriteLine($" significant at {SigLevel * 100}% level? {sig}");
Console.WriteLine($" uniform? {!sig}\n");
}
}
</syntaxhighlight>
{{out}}
<pre>
Uniform distribution test
dataset: 199809, 200665, 199607, 200270, 199649
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 4.14628
p-value of test statistic: 0.386570833082767
significant at 5% level? False
uniform? True
 
Uniform distribution test
dataset: 522573, 244456, 139979, 71531, 21461
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 790063.27594
p-value of test statistic: 2.35282904270662E-11
significant at 5% level? True
uniform? False
 
 
</pre>
 
=={{header|C++}}==
<syntaxhighlight lang="c++">
 
#include <iostream>
#include <vector>
#include <cmath>
#include <iomanip>
 
void print_vector(const std::vector<int32_t>& list) {
std::cout << "[";
for ( uint64_t i = 0; i < list.size() - 1; ++i ) {
std::cout << list[i] << ", ";
}
std::cout << list.back() << "]" << std::endl;
}
 
bool is_significant(const double p_value, const double significance_level) {
return p_value > significance_level;
}
 
// The normalised lower incomplete gamma function.
double gamma_cdf(const double aX, const double aK) {
double result = 0.0;
for ( uint32_t m = 0; m <= 99; ++m ) {
result += pow(aX, m) / tgamma(aK + m + 1);
}
result *= pow(aX, aK) * exp(-aX);
return std::isnan(result) ? 1.0 : result;
}
 
// The cumulative probability function of the Chi-squared distribution.
double cdf(const double aX, const double aK) {
if ( aX > 1'000 && aK < 100 ) {
return 1.0;
}
return ( aX > 0.0 && aK > 0.0 ) ? gamma_cdf(aX / 2, aK / 2) : 0.0;
}
 
void chi_squared_test(const std::vector<int32_t>& observed) {
double sum = 0.0;
for ( uint64_t i = 0; i < observed.size(); ++i ) {
sum += observed[i];
}
const double expected = sum / observed.size();
 
const int32_t degree_freedom = observed.size() - 1;
 
double test_statistic = 0.0;
for ( uint64_t i = 0; i < observed.size(); ++i ) {
test_statistic += pow(observed[i] - expected, 2) / expected;
}
 
const double p_value = 1.0 - cdf(test_statistic, degree_freedom);
 
std::cout << "\nUniform distribution test" << std::setprecision(6) << std::endl;
std::cout << " observed values : "; print_vector(observed);
std::cout << " expected value : " << expected << std::endl;
std::cout << " degrees of freedom: " << degree_freedom << std::endl;
std::cout << " test statistic : " << test_statistic << std::endl;
std::cout.setf(std::ios::fixed);
std::cout << " p-value : " << p_value << std::endl;
std::cout.unsetf(std::ios::fixed);
std::cout << " is 5% significant?: " << std::boolalpha << is_significant(p_value, 0.05) << std::endl;
}
 
int main() {
const std::vector<std::vector<int32_t>> datasets = { { 199809, 200665, 199607, 200270, 199649 },
{ 522573, 244456, 139979, 71531, 21461 } };
for ( std::vector<int32_t> dataset : datasets ) {
chi_squared_test(dataset);
}
}
</syntaxhighlight>
{{ out }}
<pre>
 
Uniform distribution test
observed values : [199809, 200665, 199607, 200270, 199649]
expected value : 200000
degrees of freedom: 4
test statistic : 4.14628
p-value : 0.386571
is 5% significant?: true
 
Uniform distribution test
observed values : [522573, 244456, 139979, 71531, 21461]
expected value : 200000
degrees of freedom: 4
test statistic : 790063
p-value : 0.000000
is 5% significant?: false
</pre>
 
=={{header|D}}==
<langsyntaxhighlight lang="d">import std.stdio, std.algorithm, std.mathspecial;
 
real x2Dist(T)(in T[] data) pure /*nothrow*/ @safe @nogc {
immutable avg = reduce!q{a + b}(0.0L, data).sum / data.length;
immutable sqs = reduce!((a, b) => a + (b - avg) ^^ 2)(0.0L, data);
return sqs / avg;
}
 
real x2Prob(in real dof, in real distance) /*pure nothrow*/ @safe @nogc {
return gammaIncompleteCompl(dof / 2, distance / 2);
}
 
bool x2IsUniform(T)(in T[] data, in real significance=0.05L)
/*pure nothrow*/ @safe @nogc {
return x2Prob(data.length - 1.0L, x2Dist(data)) > significance;
}
Line 234 ⟶ 581:
writefln(" %4s %12s %12s %8s %s",
"dof", "distance", "probability", "Uniform?", "dataset");
foreach (immutable(int[]) ds; dataSets) {
immutable dof = ds.length - 1;
immutable dist = x2Dist(ds).x2Dist;
immutable prob = x2Prob(dof, dist);
writefln("%4d %12.3f %12.8f %5s %6s",
dof, dist, prob, x2IsUniform(ds).x2IsUniform ? "YES" : "NO", ds);
}
}</langsyntaxhighlight>
{{out}}
<pre> dof distance probability Uniform? dataset
4 4.146 0.38657083 YES [199809, 200665, 199607, 200270, 199649]
4 790063.276 0.00000000 NO [522573, 244456, 139979, 71531, 21461]</pre>
 
=={{header|Elixir}}==
{{trans|Ruby}}
<syntaxhighlight lang="elixir">defmodule Verify do
defp gammaInc_Q(a, x) do
a1 = a-1
f0 = fn t -> :math.pow(t, a1) * :math.exp(-t) end
df0 = fn t -> (a1-t) * :math.pow(t, a-2) * :math.exp(-t) end
y = while_loop(f0, x, a1)
n = trunc(y / 3.0e-4)
h = y / n
hh = 0.5 * h
sum = Enum.reduce(n-1 .. 0, 0, fn j,sum ->
t = h * j
sum + f0.(t) + hh * df0.(t)
end)
h * sum / gamma_spounge(a, make_coef)
end
defp while_loop(f, x, y) do
if f.(y)*(x-y) > 2.0e-8 and y < x, do: while_loop(f, x, y+0.3), else: min(x, y)
end
@a 12
defp make_coef do
coef0 = [:math.sqrt(2.0 * :math.pi)]
{_, coef} = Enum.reduce(1..@a-1, {1.0, coef0}, fn k,{k1_factrl,c} ->
h = :math.exp(@a-k) * :math.pow(@a-k, k-0.5) / k1_factrl
{-k1_factrl*k, [h | c]}
end)
Enum.reverse(coef) |> List.to_tuple
end
defp gamma_spounge(z, coef) do
accm = Enum.reduce(1..@a-1, elem(coef,0), fn k,res -> res + elem(coef,k) / (z+k) end)
accm * :math.exp(-(z+@a)) * :math.pow(z+@a, z+0.5) / z
end
def chi2UniformDistance(dataSet) do
expected = Enum.sum(dataSet) / length(dataSet)
Enum.reduce(dataSet, 0, fn d,sum -> sum + (d-expected)*(d-expected) end) / expected
end
def chi2Probability(dof, distance) do
1.0 - gammaInc_Q(0.5*dof, 0.5*distance)
end
def chi2IsUniform(dataSet, significance\\0.05) do
dof = length(dataSet) - 1
dist = chi2UniformDistance(dataSet)
chi2Probability(dof, dist) > significance
end
end
 
dsets = [ [ 199809, 200665, 199607, 200270, 199649 ],
[ 522573, 244456, 139979, 71531, 21461 ] ]
 
Enum.each(dsets, fn ds ->
IO.puts "Data set:#{inspect ds}"
dof = length(ds) - 1
IO.puts " degrees of freedom: #{dof}"
distance = Verify.chi2UniformDistance(ds)
:io.fwrite " distance: ~.4f~n", [distance]
:io.fwrite " probability: ~.4f~n", [Verify.chi2Probability(dof, distance)]
:io.fwrite " uniform? ~s~n", [(if Verify.chi2IsUniform(ds), do: "Yes", else: "No")]
end)</syntaxhighlight>
 
{{out}}
<pre>
Data set:[199809, 200665, 199607, 200270, 199649]
degrees of freedom: 4
distance: 4.1463
probability: 0.3866
uniform? Yes
Data set:[522573, 244456, 139979, 71531, 21461]
degrees of freedom: 4
distance: 790063.2759
probability: -0.0000
uniform? No
</pre>
 
=={{header|Fortran}}==
Line 252 ⟶ 679:
{{libheader|GNU Scientific Library}}
 
Instead of implementing the incompletechi-squared gamma functiondistribution by ourselves, we bind to GNU Scientific Library; so we need a module to interface to the function we need (<tt>gsl_sf_gamma_incgsl_cdf_chisq_Q</tt>)
 
<langsyntaxhighlight lang="fortran">module GSLMiniBindgsl_mini_bind_m
implicit none
 
use iso_c_binding
interface
implicit none
real(c_double) function gsl_sf_gamma_inc(a, x) bind(C)
private
use iso_c_binding
 
real(c_double), value, intent(in) :: a, x
public :: p_value
end function gsl_sf_gamma_inc
 
end interface
interface
end module GSLMiniBind</lang>
function gsl_cdf_chisq_q(x, nu) bind(c, name='gsl_cdf_chisq_Q')
import
real(c_double), value :: x
real(c_double), value :: nu
real(c_double) :: gsl_cdf_chisq_q
end function gsl_cdf_chisq_q
end interface
 
contains
 
!> Get p-value from chi-square distribution
real function p_value(x, df)
real, intent(in) :: x
integer, intent(in) :: df
 
p_value = real(gsl_cdf_chisq_q(real(x, c_double), real(df, c_double)))
 
end function p_value
 
end module gsl_mini_bind_m</syntaxhighlight>
 
Now we're ready to complete the task.
 
<langsyntaxhighlight lang="fortran">program ChiTestchi2test
use GSLMiniBind
use iso_c_binding
implicit none
 
use gsl_mini_bind_m, only: p_value
real, dimension(5) :: dset1 = (/ 199809., 200665., 199607., 200270., 199649. /)
implicit none
real, dimension(5) :: dset2 = (/ 522573., 244456., 139979., 71531., 21461. /)
 
real :: dset1(5) = [199809., 200665., 199607., 200270., 199649.]
real :: dist, prob
real :: dset2(5) = [522573., 244456., 139979., 71531., 21461.]
integer :: dof
 
real :: dist, prob
print *, "Dataset 1:"
integer :: dof
print *, dset1
dist = chi2UniformDistance(dset1)
dof = size(dset1) - 1
write(*, '(A,I4,A,F12.4)') 'dof: ', dof, ' distance: ', dist
prob = chi2Probability(dof, dist)
write(*, '(A,F9.6)') 'probability: ', prob
write(*, '(A,L)') 'uniform? ', chiIsUniform(dset1, 0.05)
 
write (*, '(A)', advance='no') "Dataset 1:"
! Lazy copy/past :|
write (*, '(5(F12.4,:,1x))') dset1
print *, "Dataset 2:"
 
print *, dset2
dist = chi2UniformDistancechisq(dset2dset1)
dof = size(dset2dset1) - 1
write (*, '(A,I4,A,F12.4)') 'dof: ', dof, ' distancechisq: ', dist
prob = chi2Probabilityp_value(dofdist, distdof)
write (*, '(A,F9F12.64)') 'probability: ', prob
write (*, '(A,L)') 'uniform? ', chiIsUniform(dset2,prob > 0.05)
 
! Lazy copy/past :|
write (*, '(/A)', advance='no') "Dataset 2:"
write (*, '(5(F12.4,:,1x))') dset2
 
dist = chisq(dset2)
dof = size(dset2) - 1
write (*, '(A,I4,A,F12.4)') 'dof: ', dof, ' chisq: ', dist
prob = p_value(dist, dof)
write (*, '(A,F12.4)') 'probability: ', prob
write (*, '(A,L)') 'uniform? ', prob > 0.05
 
contains
 
!> Get chi-square value for a set of data `ds`
function chi2Probability(dof, distance)
real ::function chi2Probabilitychisq(ds)
integer real, intent(in) :: dofds(:)
real, intent(in) :: distance
 
real :: expected, summa
! in order to make this work, we need linking with GSL library
chi2Probability = gsl_sf_gamma_inc(real(0.5*dof, c_double), real(0.5*distance, c_double))
end function chi2Probability
 
expected = sum(ds)/size(ds)
function chiIsUniform(dset, significance)
summa = sum((ds - expected)**2)
logical :: chiIsUniform
chisq = summa/expected
real, dimension(:), intent(in) :: dset
real, intent(in) :: significance
 
end function chisq
integer :: dof
real :: dist
 
end program chi2test</syntaxhighlight>
dof = size(dset) - 1
dist = chi2UniformDistance(dset)
chiIsUniform = chi2Probability(dof, dist) > significance
end function chiIsUniform
function chi2UniformDistance(ds)
real :: chi2UniformDistance
real, dimension(:), intent(in) :: ds
 
Output:
real :: expected, summa = 0.0
<syntaxhighlight lang="txt">Dataset 1: 199809.0000 200665.0000 199607.0000 200270.0000 199649.0000
dof: 4 chisq: 4.1463
probability: 0.3866
uniform? T
 
Dataset 2: 522573.0000 244456.0000 139979.0000 71531.0000 21461.0000
expected = sum(ds) / size(ds)
dof: 4 chisq: 790063.2500
summa = sum( (ds - expected) ** 2 )
probability: 0.0000
uniform? F</syntaxhighlight>
chi2UniformDistance = summa / expected
end function chi2UniformDistance
 
end program ChiTest</lang>
=={{header|Go}}==
{{trans|C}}
Go has a nice gamma function in the library. Otherwise, it's mostly a port from C. Note, this implementation of the incomplete gamma function works for these two test cases, but, I believe, has serious limitations. See talk page.
<langsyntaxhighlight lang="go">package main
 
import (
Line 426 ⟶ 864:
fmt.Printf(" significant at %2.0f%% level? %t\n", sigLevel*100, sig)
fmt.Println(" uniform? ", !sig, "\n")
}</langsyntaxhighlight>
Output:
<pre>
Line 449 ⟶ 887:
uniform? false
</pre>
 
=={{header|Hy}}==
<syntaxhighlight lang="lisp">(import
[scipy.stats [chisquare]]
[collections [Counter]])
 
(defn uniform? [f repeats &optional [alpha .05]]
"Call 'f' 'repeats' times and do a chi-squared test for uniformity
of the resulting discrete distribution. Return false iff the
null hypothesis of uniformity is rejected for the test with
size 'alpha'."
(<= alpha (second (chisquare
(.values (Counter (take repeats (repeatedly f))))))))</syntaxhighlight>
 
Examples of use:
 
<syntaxhighlight lang="lisp">(import [random [randint]])
 
(for [f [
(fn [] (randint 1 10))
(fn [] (if (randint 0 1) (randint 1 9) (randint 1 10)))]]
(print (uniform? f 5000)))</syntaxhighlight>
 
=={{header|J}}==
'''Solution (Tacit):'''
<langsyntaxhighlight lang="j">require 'stats/base'
 
countCats=: #@~. NB. counts the number of unique items
Line 464 ⟶ 924:
NB. y is: distribution to test
NB. x is: optionally specify number of categories possible
isUniform=: (countCats $: ]) : (0.95 > calcDf chisqcdf :: 1: calcX2)</langsyntaxhighlight>
 
'''Solution (Explicit):'''
<langsyntaxhighlight lang="j">require 'stats/base'
 
NB.*isUniformX v Tests (5%) whether y is uniformly distributed
Line 482 ⟶ 942:
degfreedom=. <: x NB. degrees of freedom
signif > degfreedom chisqcdf :: 1: X2
)</langsyntaxhighlight>
 
'''Example Usage:'''
<langsyntaxhighlight lang="j"> FairDistrib=: 1e6 ?@$ 5
UnfairDistrib=: (9.5e5 ?@$ 5) , (5e4 ?@$ 4)
isUniformX FairDistrib
Line 494 ⟶ 954:
1
4 isUniform 4 4 4 5 5 5 5 5 5 5 NB. not uniform if 4 categories possible
0</langsyntaxhighlight>
 
=={{header|Mathematica}}==
=={{header|Java}}==
{{trans|D}}
{{works with|Java|8}}
<syntaxhighlight lang="java">import static java.lang.Math.pow;
import java.util.Arrays;
import static java.util.Arrays.stream;
import org.apache.commons.math3.special.Gamma;
 
public class Test {
 
static double x2Dist(double[] data) {
double avg = stream(data).sum() / data.length;
double sqs = stream(data).reduce(0, (a, b) -> a + pow((b - avg), 2));
return sqs / avg;
}
 
static double x2Prob(double dof, double distance) {
return Gamma.regularizedGammaQ(dof / 2, distance / 2);
}
 
static boolean x2IsUniform(double[] data, double significance) {
return x2Prob(data.length - 1.0, x2Dist(data)) > significance;
}
 
public static void main(String[] a) {
double[][] dataSets = {{199809, 200665, 199607, 200270, 199649},
{522573, 244456, 139979, 71531, 21461}};
 
System.out.printf(" %4s %12s %12s %8s %s%n",
"dof", "distance", "probability", "Uniform?", "dataset");
 
for (double[] ds : dataSets) {
int dof = ds.length - 1;
double dist = x2Dist(ds);
double prob = x2Prob(dof, dist);
System.out.printf("%4d %12.3f %12.8f %5s %6s%n",
dof, dist, prob, x2IsUniform(ds, 0.05) ? "YES" : "NO",
Arrays.toString(ds));
}
}
}</syntaxhighlight>
<pre> dof distance probability Uniform? dataset
4 4,146 0,38657083 YES [199809.0, 200665.0, 199607.0, 200270.0, 199649.0]
4 790063,276 0,00000000 NO [522573.0, 244456.0, 139979.0, 71531.0, 21461.0]</pre>
 
=={{header|jq}}==
{{works with|jq}}
'''Also works with gojq, the Go implementation of jq.'''
 
This entry uses a two-tailed test, as is appropriate for this type of problem as illustrated
by the last example. The test is based on the assumption that the sample size is large
enough for the χ2 distribution to be used.
 
The implementation of `Chi2_cdf` here uses the recursion relation for
the gamma function and should be both fast, accurate and quite robust.
For an industrial-strength algorithm, see
e.g. https://people.sc.fsu.edu/~jburkardt/c_src/asa239/asa239.c
 
'''Generic Functions'''
<syntaxhighlight lang=jq>
def round($dec):
if type == "string" then .
else pow(10;$dec) as $m
| . * $m | floor / $m
end;
 
# sum of squares
def ss(s): reduce s as $x (0; . + ($x * $x));
 
# Cumulative density function of the chi-squared distribution with $k
# degrees of freedom
# The recursion formula for gamma is used for efficiency and robustness.
def Chi2_cdf($x; $k):
if $x == 0 then 0
elif $x > (1e3 * $k) then 1
else 1e-15 as $tol # for example
| { s: 0, m: 0, term: (1 / ((($k/2)+1)|gamma)) }
| until (.term|length < $tol; # length here is abs
.s += .term
| .m += 1
| .term *= (($x/2) / (($k/2) + .m )) )
| .s * ( ((-$x/2) + ($k/2)*(($x/2)|log)) | exp)
end ;
</syntaxhighlight>
'''Tasks'''
<syntaxhighlight lang=jq>
# Input: array of frequencies
def chi2UniformDistance:
(add / length) as $expected
| ss(.[] - $expected) / $expected;
 
# Input: a number
# Output: an indication of the probability of observing this value or higher
# assuming the value is drawn from a chi-squared distribution with $dof degrees
# of freedom
def chi2Probability($dof):
(1 - Chi2_cdf(.; $dof))
| if . < 1e-10 then "< 1e-10"
else .
end;
 
# Input: array of frequencies
# Output: result of a two-tailed test based on the chi-squared statistic
# assuming the sample size is large enough
def chiIsUniform($significance):
(length - 1) as $dof
| chi2UniformDistance
| Chi2_cdf(.; $dof) as $cdf
| if $cdf
then ($significance/2) as $s
| $cdf > $s and $cdf < (1-$s)
else false
end;
def dsets: [
[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461],
[19,14,6,18,7,5,1], # low entropy
[9,11,9,10,15,11,5], # high entropy
[20,20,20] # made-up
];
 
def task:
dsets[]
| "Dataset: \(.)",
( chi2UniformDistance as $dist
| (length - 1) as $dof
| "DOF: \($dof) D (Distance): \($dist)",
" Estimated probability of observing a value >= D: \($dist|chi2Probability($dof)|round(2))",
" Uniform? \( (select(chiIsUniform(0.05)) | "Yes") // "No" )\n" ) ;
 
task
</syntaxhighlight>
<pre>
Dataset: [199809,200665,199607,200270,199649]
DOF: 4 D (Distance): 4.14628
Estimated probability of observing a value >= D: 0.38
Uniform? Yes
 
Dataset: [522573,244456,139979,71531,21461]
DOF: 4 D (Distance): 790063.27594
Estimated probability of observing a value >= D: < 1e-10
Uniform? No
 
Dataset: [19,14,6,18,7,5,1]
DOF: 6 D (Distance): 29.2
Estimated probability of observing a value >= D: 0
Uniform? No
 
Dataset: [9,11,9,10,15,11,5]
DOF: 6 D (Distance): 5.4
Estimated probability of observing a value >= D: 0.49
Uniform? Yes
 
Dataset: [20,20,20]
DOF: 2 D (Distance): 0
Estimated probability of observing a value >= D: 1
Uniform? No
</pre>
 
=={{header|Julia}}==
<syntaxhighlight lang="julia"># v0.6
 
using Distributions
 
function eqdist(data::Vector{T}, α::Float64=0.05)::Bool where T <: Real
if ! (0 ≤ α ≤ 1); error("α must be in [0, 1]") end
exp = mean(data)
chisqval = sum((x - exp) ^ 2 for x in data) / exp
pval = ccdf(Chisq(2), chisqval)
return pval > α
end
 
data1 = [199809, 200665, 199607, 200270, 199649]
data2 = [522573, 244456, 139979, 71531, 21461]
 
for data in (data1, data2)
println("Data:\n$data")
println("Hypothesis test: the original population is ", (eqdist(data) ? "" : "not "), "uniform.\n")
end</syntaxhighlight>
 
{{out}}
<pre>Data:
[199809, 200665, 199607, 200270, 199649]
Hypothesis test: the original population is uniform.
 
Data:
[522573, 244456, 139979, 71531, 21461]
Hypothesis test: the original population is not uniform.
</pre>
 
=={{header|Kotlin}}==
This program reuses Kotlin code from the [[Gamma function]] and [[Numerical Integration]] tasks but otherwise is a translation of the C entry for this task.
<syntaxhighlight lang="scala">// version 1.1.51
 
typealias Func = (Double) -> Double
 
fun gammaLanczos(x: Double): Double {
var xx = x
val p = doubleArrayOf(
0.99999999999980993,
676.5203681218851,
-1259.1392167224028,
771.32342877765313,
-176.61502916214059,
12.507343278686905,
-0.13857109526572012,
9.9843695780195716e-6,
1.5056327351493116e-7
)
val g = 7
if (xx < 0.5) return Math.PI / (Math.sin(Math.PI * xx) * gammaLanczos(1.0 - xx))
xx--
var a = p[0]
val t = xx + g + 0.5
for (i in 1 until p.size) a += p[i] / (xx + i)
return Math.sqrt(2.0 * Math.PI) * Math.pow(t, xx + 0.5) * Math.exp(-t) * a
}
 
fun integrate(a: Double, b: Double, n: Int, f: Func): Double {
val h = (b - a) / n
var sum = 0.0
for (i in 0 until n) {
val x = a + i * h
sum += (f(x) + 4.0 * f(x + h / 2.0) + f(x + h)) / 6.0
}
return sum * h
}
 
fun gammaIncompleteQ(a: Double, x: Double): Double {
val aa1 = a - 1.0
fun f0(t: Double) = Math.pow(t, aa1) * Math.exp(-t)
val h = 1.5e-2
var y = aa1
while ((f0(y) * (x - y) > 2.0e-8) && y < x) y += 0.4
if (y > x) y = x
return 1.0 - integrate(0.0, y, (y / h).toInt(), ::f0) / gammaLanczos(a)
}
 
fun chi2UniformDistance(ds: DoubleArray): Double {
val expected = ds.average()
val sum = ds.map { val x = it - expected; x * x }.sum()
return sum / expected
}
 
fun chi2Probability(dof: Int, distance: Double) =
gammaIncompleteQ(0.5 * dof, 0.5 * distance)
 
fun chiIsUniform(ds: DoubleArray, significance: Double):Boolean {
val dof = ds.size - 1
val dist = chi2UniformDistance(ds)
return chi2Probability(dof, dist) > significance
}
 
fun main(args: Array<String>) {
val dsets = listOf(
doubleArrayOf(199809.0, 200665.0, 199607.0, 200270.0, 199649.0),
doubleArrayOf(522573.0, 244456.0, 139979.0, 71531.0, 21461.0)
)
for (ds in dsets) {
println("Dataset: ${ds.asList()}")
val dist = chi2UniformDistance(ds)
val dof = ds.size - 1
print("DOF: $dof Distance: ${"%.4f".format(dist)}")
val prob = chi2Probability(dof, dist)
print(" Probability: ${"%.6f".format(prob)}")
val uniform = if (chiIsUniform(ds, 0.05)) "Yes" else "No"
println(" Uniform? $uniform\n")
}
}</syntaxhighlight>
 
{{out}}
<pre>
Dataset: [199809.0, 200665.0, 199607.0, 200270.0, 199649.0]
DOF: 4 Distance: 4.1463 Probability: 0.386571 Uniform? Yes
 
Dataset: [522573.0, 244456.0, 139979.0, 71531.0, 21461.0]
DOF: 4 Distance: 790063.2759 Probability: 0.000000 Uniform? No
</pre>
 
=={{header|Mathematica}}/{{header|Wolfram Language}}==
This code explicity assumes a discrete uniform distribution since the chi square test is a poor test choice for continuous distributions and requires Mathematica version 2 or later
<langsyntaxhighlight Mathematicalang="mathematica">discreteUniformDistributionQ[data_, {min_Integer, max_Integer}, confLevel_: .05] :=
If[$VersionNumber >= 8,
confLevel <= PearsonChiSquareTest[data, DiscreteUniformDistribution[{min, max}]],
Line 504 ⟶ 1,245:
GammaRegularized[k/2, 0, v/2] <= 1 - confLevel]]
 
discreteUniformDistributionQ[data_] :=discreteUniformDistributionQ[data, data[[Ordering[data][[{1, -1}]]]]]</langsyntaxhighlight>
code used to create test data requires Mathematica version 6 or later
<langsyntaxhighlight Mathematicalang="mathematica">uniformData = RandomInteger[10, 100];
nonUniformData = Total@RandomInteger[10, {5, 100}];</langsyntaxhighlight>
<syntaxhighlight lang Mathematica="mathematica">{discreteUniformDistributionQ[uniformData],discreteUniformDistributionQ[nonUniformData]}</langsyntaxhighlight>
{{out}}<pre>{True,False}</pre>
 
=={{header|Nim}}==
{{trans|Go}}
We use the gamma function from the “math” module. To simplify the code, we use also the “lenientops” module which provides mixed operations between floats ane integers.
 
<syntaxhighlight lang="nim">import lenientops, math, stats, strformat, sugar
 
func simpson38(f: (float) -> float; a, b: float; n: int): float =
let h = (b - a) / n
let h1 = h / 3
var sum = f(a) + f(b)
for i in countdown(3 * n - 1, 1):
if i mod 3 == 0:
sum += 2 * f(a + h1 * i)
else:
sum += 3 * f(a + h1 * i)
result = h * sum / 8
 
func gammaIncQ(a, x: float): float =
let aa1 = a - 1
func f(t: float): float = pow(t, aa1) * exp(-t)
var y = aa1
let h = 1.5e-2
while f(y) * (x - y) > 2e-8 and y < x:
y += 0.4
if y > x: y = x
result = 1 - simpson38(f, 0, y, (y / h / gamma(a)).toInt)
 
func chi2ud(ds: openArray[int]): float =
let expected = mean(ds)
var s = 0.0
for d in ds:
let x = d.toFloat - expected
s += x * x
result = s / expected
 
func chi2p(dof: int; distance: float): float =
gammaIncQ(0.5 * dof, 0.5 * distance)
 
const SigLevel = 0.05
 
proc utest(dset: openArray[int]) =
 
echo "Uniform distribution test"
let s = sum(dset)
echo " dataset:", dset
echo " samples: ", s
echo " categories: ", dset.len
 
let dof = dset.len - 1
echo " degrees of freedom: ", dof
 
let dist = chi2ud(dset)
echo " chi square test statistic: ", dist
 
let p = chi2p(dof, dist)
echo " p-value of test statistic: ", p
 
let sig = p < SigLevel
echo &" significant at {int(SigLevel * 100)}% level? {sig}"
echo &" uniform? {not sig}\n"
 
 
for dset in [[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461]]:
utest(dset)</syntaxhighlight>
 
{{out}}
<pre>Uniform distribution test
dataset:[199809, 200665, 199607, 200270, 199649]
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 4.14628
p-value of test statistic: 0.3865708330827673
significant at 5% level? false
uniform? true
 
Uniform distribution test
dataset:[522573, 244456, 139979, 71531, 21461]
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 790063.27594
p-value of test statistic: 2.34864350190378e-11
significant at 5% level? true
uniform? false</pre>
 
=={{header|OCaml}}==
This code needs to be compiled with library [http://oandrieu.nerim.net/ocaml/gsl/ gsl.cma].
 
<langsyntaxhighlight lang="ocaml">let sqr x = x *. x
 
let chi2UniformDistance distrib =
Line 547 ⟶ 1,375:
[| 199809; 200665; 199607; 200270; 199649 |];
[| 522573; 244456; 139979; 71531; 21461 |]
]</langsyntaxhighlight>
 
Output
Line 559 ⟶ 1,387:
 
The sample data for the test was taken from [[#Go|Go]].
<langsyntaxhighlight lang="parigp">cumChi2(chi2,dof)={
my(g=gamma(dof/2));
incgam(dof/2,chi2/2,g)/g
Line 575 ⟶ 1,403:
 
test([199809, 200665, 199607, 200270, 199649])
test([522573, 244456, 139979, 71531, 21461])</langsyntaxhighlight>
 
=={{header|Perl 6}}==
{{trans|Raku}}
<syntaxhighlight lang="perl">use List::Util qw(sum reduce);
use constant pi => 3.14159265;
 
sub incomplete_G_series {
For the incomplete gamma function we use a series expansion related to Kummer's confluent hypergeometric function
my($s, $z) = @_;
(see http://en.wikipedia.org/wiki/Incomplete_gamma_function#Evaluation_formulae). The gamma function is calculated
my $n = 10;
in closed form, as we only need its value at integers and half integers.
push @numers, $z**$_ for 1..$n;
 
my @denoms = $s+1;
<lang perl6>sub incomplete-γ-series($s, $z) {
mypush \numers =@denoms, $z X*denoms[-1]*($s+$_) 1for 2..*$n;
my \denoms$M = [\*] $s X+ 1..*;
my $M += 1 + $numers[+$_-1] (numers Z/ $denoms)[$_-1] for 1... * < 1e-6$n;
$z**$s / $s * exp(-$z) * $M;
}
 
sub G_of_half {
sub postfix:<!>(Int $n) { [*] 2..$n }
my($n) = @_;
if ($n % 2) { f(2*$_) / (4**$_ * f($_)) * sqrt(pi) for int ($n-1) / 2 }
sub Γ-of-half(Int $n where * > 0) {
($nelse %% 2) ?? (($_-1)! { given f(($n div /2)-1) }
!! ((2*$_)! / (4**$_ * $_!) * sqrt(pi) given ($n-1) div 2);
}
 
sub f { reduce { $a * $b } 1, 1 .. $_[0] } # factorial
# degrees of freedom constrained due to numerical limitations
 
sub chi-squared-cdf(Int $k where 1..200, $x where * >= 0) {
sub chi_squared_cdf {
my $f = $k < 20 ?? 20 !! 10;
givenmy($k, $x) = {@_;
my $f = $k when< 020 ? 20 : { 0.0 }10;
if ($x == 0) when * < $k + $f*sqrt($k) { incomplete-γ-series($k/2, $x/2) / Γ-of-half($k) { 0.0 }
elsif ($x < $k + $f*sqrt($k)) { incomplete_G_series($k/2, $x/2) / G_of_half($k) }
default { 1.0 }
else { 1.0 }
}
}
sub chi_squared_test {
my(@bins) = @_;
sub chi-squared-test(@bins, :$significance = 0.05) {
my $nsignificance = +@bins0.05;
my $Nn = [+] @bins;
my $N = sum @bins;
my $expected = $N / $n;
my $chi-squaredchi_squared = [+]sum @bins.map: { ($^bin_ - $expected)**2 / $expected } @bins;
my $p-valuep_value = 1 - chi-squared-cdfchi_squared_cdf($n-1, $chi-squaredchi_squared);
return (:$chi-squaredchi_squared, :$p-valuep_value, :uniform($p-valuep_value > $significance)) ? 'True' : 'False';
}
 
for $dataset ([199809, 200665, 199607, 200270, 199649], [522573, 244456, 139979, 71531, 21461]) {
printf "C2 = %10.3f, p-value = %.3f, uniform = %s\n", chi_squared_test(@$dataset);
}</syntaxhighlight>
{{out}}
<pre>C2 = 4.146, p-value = 0.387, uniform = True
C2 = 790063.276, p-value = 0.000, uniform = False</pre>
 
=={{header|Phix}}==
{{trans|Go}}
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #004080;">atom</span> <span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">return</span> <span style="color: #7060A8;">power</span><span style="color: #0000FF;">(</span><span style="color: #000000;">t</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">aa1</span><span style="color: #0000FF;">)</span> <span style="color: #0000FF;">*</span> <span style="color: #7060A8;">exp</span><span style="color: #0000FF;">(-</span><span style="color: #000000;">t</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">simpson38</span><span style="color: #0000FF;">(</span><span style="color: #004080;">atom</span> <span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">a</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">b</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">n</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">h</span> <span style="color: #0000FF;">:=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">b</span><span style="color: #0000FF;">-</span><span style="color: #000000;">a</span><span style="color: #0000FF;">)/</span><span style="color: #000000;">n</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">h1</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">h</span><span style="color: #0000FF;">/</span><span style="color: #000000;">3</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">tot</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">a</span><span style="color: #0000FF;">)</span> <span style="color: #0000FF;">+</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">b</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">3</span><span style="color: #0000FF;">*</span><span style="color: #000000;">n</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">1</span> <span style="color: #008080;">by</span> <span style="color: #0000FF;">-</span><span style="color: #000000;">1</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">tot</span> <span style="color: #0000FF;">+=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">3</span><span style="color: #0000FF;">-(</span><span style="color: #7060A8;">mod</span><span style="color: #0000FF;">(</span><span style="color: #000000;">j</span><span style="color: #0000FF;">,</span><span style="color: #000000;">3</span><span style="color: #0000FF;">)=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">))</span> <span style="color: #0000FF;">*</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">a</span><span style="color: #0000FF;">+</span><span style="color: #000000;">h1</span><span style="color: #0000FF;">*</span><span style="color: #000000;">j</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">h</span><span style="color: #0000FF;">*</span><span style="color: #000000;">tot</span><span style="color: #0000FF;">/</span><span style="color: #000000;">8</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #000080;font-style:italic;">--&lt;copy of gamma from [[Gamma_function#Phix]]&gt;</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">c</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">12</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">gamma</span><span style="color: #0000FF;">(</span><span style="color: #004080;">atom</span> <span style="color: #000000;">z</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">accm</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">accm</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">accm</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sqrt</span><span style="color: #0000FF;">(</span><span style="color: #000000;">2</span><span style="color: #0000FF;">*</span><span style="color: #004600;">PI</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">c</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">accm</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">k1_factrl</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span> <span style="color: #000080;font-style:italic;">-- (k - 1)!*(-1)^k with 0!==1</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #000000;">12</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">c</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">exp</span><span style="color: #0000FF;">(</span><span style="color: #000000;">13</span><span style="color: #0000FF;">-</span><span style="color: #000000;">k</span><span style="color: #0000FF;">)*</span><span style="color: #7060A8;">power</span><span style="color: #0000FF;">(</span><span style="color: #000000;">13</span><span style="color: #0000FF;">-</span><span style="color: #000000;">k</span><span style="color: #0000FF;">,</span><span style="color: #000000;">k</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1.5</span><span style="color: #0000FF;">)/</span><span style="color: #000000;">k1_factrl</span>
<span style="color: #000000;">k1_factrl</span> <span style="color: #0000FF;">*=</span> <span style="color: #0000FF;">-(</span><span style="color: #000000;">k</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #000000;">2</span> <span style="color: #008080;">to</span> <span style="color: #000000;">12</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">accm</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">c</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]/(</span><span style="color: #000000;">z</span><span style="color: #0000FF;">+</span><span style="color: #000000;">k</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #000000;">accm</span> <span style="color: #0000FF;">*=</span> <span style="color: #7060A8;">exp</span><span style="color: #0000FF;">(-(</span><span style="color: #000000;">z</span><span style="color: #0000FF;">+</span><span style="color: #000000;">12</span><span style="color: #0000FF;">))*</span><span style="color: #7060A8;">power</span><span style="color: #0000FF;">(</span><span style="color: #000000;">z</span><span style="color: #0000FF;">+</span><span style="color: #000000;">12</span><span style="color: #0000FF;">,</span><span style="color: #000000;">z</span><span style="color: #0000FF;">+</span><span style="color: #000000;">0.5</span><span style="color: #0000FF;">)</span> <span style="color: #000080;font-style:italic;">-- Gamma(z+1)</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">accm</span><span style="color: #0000FF;">/</span><span style="color: #000000;">z</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #000080;font-style:italic;">--&lt;/copy of gamma&gt;</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">gammaIncQ</span><span style="color: #0000FF;">(</span><span style="color: #004080;">atom</span> <span style="color: #000000;">a</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">x</span><span style="color: #0000FF;">)</span>
for [< 199809 200665 199607 200270 199649 >],
<span style="color: #004080;">atom</span> <span style="color: #000000;">aa1</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">a</span><span style="color: #0000FF;">-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span>
[< 522573 244456 139979 71531 21461 >]
<span style="color: #000000;">y</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span>
-> $dataset
<span style="color: #000000;">h</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">1.5e-2</span>
{
<span style="color: #008080;">while</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">y</span><span style="color: #0000FF;">)*(</span><span style="color: #000000;">x</span><span style="color: #0000FF;">-</span><span style="color: #000000;">y</span><span style="color: #0000FF;">)</span> <span style="color: #0000FF;">></span> <span style="color: #000000;">2e-8</span> <span style="color: #008080;">and</span> <span style="color: #000000;">y</span> <span style="color: #0000FF;"><</span> <span style="color: #000000;">x</span> <span style="color: #008080;">do</span>
my %t = chi-squared-test($dataset);
<span style="color: #000000;">y</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">0.4</span>
say 'data: ', $dataset;
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
say "χ² = {%t<chi-squared>}, p-value = {%t<p-value>.fmt('%.4f')}, uniform = {%t<uniform>}";
<span style="color: #008080;">if</span> <span style="color: #000000;">y</span> <span style="color: #0000FF;">></span> <span style="color: #000000;">x</span> <span style="color: #008080;">then</span> <span style="color: #000000;">y</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">x</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
}</lang>
<span style="color: #008080;">return</span> <span style="color: #000000;">1</span> <span style="color: #0000FF;">-</span> <span style="color: #000000;">simpson38</span><span style="color: #0000FF;">(</span><span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">y</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">floor</span><span style="color: #0000FF;">(</span><span style="color: #000000;">y</span><span style="color: #0000FF;">/</span><span style="color: #000000;">h</span><span style="color: #0000FF;">/</span><span style="color: #000000;">gamma</span><span style="color: #0000FF;">(</span><span style="color: #000000;">a</span><span style="color: #0000FF;">)))</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">chi2ud</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">ds</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">expected</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ds</span><span style="color: #0000FF;">)/</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ds</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">tot</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sq_power</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sq_sub</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ds</span><span style="color: #0000FF;">,</span><span style="color: #000000;">expected</span><span style="color: #0000FF;">),</span><span style="color: #000000;">2</span><span style="color: #0000FF;">))</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">tot</span><span style="color: #0000FF;">/</span><span style="color: #000000;">expected</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">chi2p</span><span style="color: #0000FF;">(</span><span style="color: #004080;">integer</span> <span style="color: #000000;">dof</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">atom</span> <span style="color: #000000;">distance</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">gammaIncQ</span><span style="color: #0000FF;">(</span><span style="color: #000000;">0.5</span><span style="color: #0000FF;">*</span><span style="color: #000000;">dof</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0.5</span><span style="color: #0000FF;">*</span><span style="color: #000000;">distance</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">constant</span> <span style="color: #000000;">sigLevel</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0.05</span>
<span style="color: #008080;">procedure</span> <span style="color: #000000;">utest</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">dset</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Uniform distribution test\n"</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">tot</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #000000;">dset</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">dof</span> <span style="color: #0000FF;">:=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">dset</span><span style="color: #0000FF;">)-</span><span style="color: #000000;">1</span>
<span style="color: #004080;">atom</span> <span style="color: #000000;">dist</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">chi2ud</span><span style="color: #0000FF;">(</span><span style="color: #000000;">dset</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">p</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">chi2p</span><span style="color: #0000FF;">(</span><span style="color: #000000;">dof</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">dist</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">bool</span> <span style="color: #000000;">sig</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">p</span> <span style="color: #0000FF;"><</span> <span style="color: #000000;">sigLevel</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" dataset: %v\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">dset</span><span style="color: #0000FF;">})</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" samples: %d\n"</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">tot</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" categories: %d\n"</span><span style="color: #0000FF;">,</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">dset</span><span style="color: #0000FF;">))</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" degrees of freedom: %d\n"</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">dof</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" chi square test statistic: %g\n"</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">dist</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" p-value of test statistic: %g\n"</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">p</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" significant at %.0f%% level? %t\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">sigLevel</span><span style="color: #0000FF;">*</span><span style="color: #000000;">100</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">sig</span><span style="color: #0000FF;">})</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">" uniform? %t\n"</span><span style="color: #0000FF;">,</span><span style="color: #008080;">not</span> <span style="color: #000000;">sig</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #000000;">utest</span><span style="color: #0000FF;">({</span><span style="color: #000000;">199809</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">200665</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">199607</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">200270</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">199649</span><span style="color: #0000FF;">})</span>
<span style="color: #000000;">utest</span><span style="color: #0000FF;">({</span><span style="color: #000000;">522573</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">244456</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">139979</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">71531</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">21461</span><span style="color: #0000FF;">})</span>
<!--</syntaxhighlight>-->
{{out}}
<pre>
<pre>data: 199809 200665 199607 200270 199649
Uniform distribution test
χ² = 4.14628, p-value = 0.3866, uniform = True
dataset: {199809,200665,199607,200270,199649}
data: 522573 244456 139979 71531 21461
samples: 1000000
χ² = 790063.27594, p-value = 0.0000, uniform = False</pre>
categories: 5
degrees of freedom: 4
chi square test statistic: 4.14628
p-value of test statistic: 0.386571
significant at 5% level? false
uniform? true
Uniform distribution test
dataset: {522573,244456,139979,71531,21461}
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 790063
p-value of test statistic: 2.35282e-11
significant at 5% level? true
uniform? false
</pre>
 
=={{header|Python}}==
 
===Python: Using only standard libraries===
Implements the Chi Square Probability function with an integration. I'm
sure there are better ways to do this. Compare to OCaml implementation.
<langsyntaxhighlight lang="python">import math
import random
 
Line 699 ⟶ 1,630:
prob = chi2Probability( dof, distance)
print "probability: %.4f"%prob,
print "uniform? ", "Yes"if chi2IsUniform(ds,0.05) else "No"</langsyntaxhighlight>
Output:
<pre>Data set: [199809, 200665, 199607, 200270, 199649]
Line 705 ⟶ 1,636:
Data set: [522573, 244456, 139979, 71531, 21461]
dof: 4 distance: 790063.275940 probability: 0.0000 uniform? No</pre>
 
===Python: Using scipy===
This uses the library routine [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html scipy.stats.chisquare].
 
<syntaxhighlight lang="python">from scipy.stats import chisquare
 
 
if __name__ == '__main__':
dataSets = [[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461]]
print(f"{'Distance':^12} {'pvalue':^12} {'Uniform?':^8} {'Dataset'}")
for ds in dataSets:
dist, pvalue = chisquare(ds)
uni = 'YES' if pvalue > 0.05 else 'NO'
print(f"{dist:12.3f} {pvalue:12.8f} {uni:^8} {ds}")</syntaxhighlight>
 
{{out}}
<pre> Distance pvalue Uniform? Dataset
4.146 0.38657083 YES [199809, 200665, 199607, 200270, 199649]
790063.276 0.00000000 NO [522573, 244456, 139979, 71531, 21461]</pre>
 
=={{header|R}}==
R being a statistical computating language, the chi-squared test is built in with the function "chisq.test"
<langsyntaxhighlight lang="tcl">
dset1=c(199809,200665,199607,200270,199649)
dset2=c(522573,244456,139979,71531,21461)
Line 721 ⟶ 1,672:
print(paste("uniform?",chi2IsUniform(ds)))
}
</syntaxhighlight>
</lang>
 
Output:
Line 744 ⟶ 1,695:
 
=={{header|Racket}}==
<langsyntaxhighlight lang="racket">
#lang racket
(require
Line 785 ⟶ 1,736:
; Test whether the constant generator fails:
(is-uniform? (λ(_) 5) 1000 0.05)
</syntaxhighlight>
</lang>
Output:
<langsyntaxhighlight lang="racket">
#t
#f
</syntaxhighlight>
</lang>
 
=={{header|Raku}}==
(formerly Perl 6)
 
For the incomplete gamma function we use a series expansion related to Kummer's confluent hypergeometric function
(see http://en.wikipedia.org/wiki/Incomplete_gamma_function#Evaluation_formulae). The gamma function is calculated
in closed form, as we only need its value at integers and half integers.
 
<syntaxhighlight lang="raku" line>sub incomplete-γ-series($s, $z) {
my \numers = $z X** 1..*;
my \denoms = [\*] $s X+ 1..*;
my $M = 1 + [+] (numers Z/ denoms) ... * < 1e-6;
$z**$s / $s * exp(-$z) * $M;
}
sub postfix:<!>(Int $n) { [*] 2..$n }
sub Γ-of-half(Int $n where * > 0) {
($n %% 2) ?? (($_-1)! given $n div 2)
!! ((2*$_)! / (4**$_ * $_!) * sqrt(pi) given ($n-1) div 2);
}
# degrees of freedom constrained due to numerical limitations
sub chi-squared-cdf(Int $k where 1..200, $x where * >= 0) {
my $f = $k < 20 ?? 20 !! 10;
given $x {
when 0 { 0.0 }
when * < $k + $f*sqrt($k) { incomplete-γ-series($k/2, $x/2) / Γ-of-half($k) }
default { 1.0 }
}
}
sub chi-squared-test(@bins, :$significance = 0.05) {
my $n = +@bins;
my $N = [+] @bins;
my $expected = $N / $n;
my $chi-squared = [+] @bins.map: { ($^bin - $expected)**2 / $expected }
my $p-value = 1 - chi-squared-cdf($n-1, $chi-squared);
return (:$chi-squared, :$p-value, :uniform($p-value > $significance));
}
for [< 199809 200665 199607 200270 199649 >],
[< 522573 244456 139979 71531 21461 >]
-> $dataset
{
my %t = chi-squared-test($dataset);
say 'data: ', $dataset;
say "χ² = {%t<chi-squared>}, p-value = {%t<p-value>.fmt('%.4f')}, uniform = {%t<uniform>}";
}</syntaxhighlight>
{{out}}
<pre>data: 199809 200665 199607 200270 199649
χ² = 4.14628, p-value = 0.3866, uniform = True
data: 522573 244456 139979 71531 21461
χ² = 790063.27594, p-value = 0.0000, uniform = False</pre>
 
=={{header|REXX}}==
{{trans|Go}}
 
Programming notes:
 
The use of the &nbsp; '''pow''' &nbsp; was elided as it can just be replaced with &nbsp; '''t**(a-1)'''.
 
The &nbsp; '''gamma''' &nbsp; was replaced with a simple version. &nbsp; The argument
for &nbsp; '''gamma''' &nbsp; is (in the cases used herein) &nbsp; always
<br>positive, &nbsp; and is
either an integer, &nbsp; or a number which is a multiple of &nbsp; <big>'''<sup>1</sup>/<sub>2</sub>'''</big>, &nbsp; both of these cases can be calculated with
<br>a straight─forward calculation.
<syntaxhighlight lang="rexx">/*REXX program performs a chi─squared test to verify a given distribution is uniform. */
numeric digits length( pi() ) - length(.) /*enough decimal digs for calculations.*/
@.=; @.1= 199809 200665 199607 200270 199649
@.2= 522573 244456 139979 71531 21461
do s=1 while @.s\==''; call uTest @.s /*invoke uTest with a data set of #'s.*/
end /*s*/
exit /*stick a fork in it, we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
!: procedure; parse arg x; p=1; do j=2 to x; p= p*j; end /*j*/; return p
chi2p: procedure; parse arg dof, distance; return gammaI( dof/2, distance/2 )
f: parse arg t; if t=0 then return 0; return t ** (a-1) * exp(-t)
e: e =2.718281828459045235360287471352662497757247093699959574966967627724; return e
pi: pi=3.141592653589793238462643383279502884197169399375105820974944592308; return pi
/*──────────────────────────────────────────────────────────────────────────────────────*/
!!: procedure; parse arg x; if x<2 then return 1; p= x
do k=2+x//2 to x-1 by 2; p= p*k; end /*k*/; return p
/*──────────────────────────────────────────────────────────────────────────────────────*/
chi2ud: procedure: parse arg ds; sum=0; expect= 0
do j=1 for words(ds); expect= expect + word(ds, j)
end /*j*/
expect = expect / words(ds)
do k=1 for words(ds)
sum= sum + (word(ds, k) - expect) **2
end /*k*/
return sum / expect
/*──────────────────────────────────────────────────────────────────────────────────────*/
exp: procedure; parse arg x; ix= x%1; if abs(x-ix)>.5 then ix= ix + sign(x); x= x-ix
z=1; _=1; w=z; do j=1; _= _*x/j; z= (z + _)/1; if z==w then leave; w=z
end /*j*/; if z\==0 then z= z * e()**ix; return z
/*──────────────────────────────────────────────────────────────────────────────────────*/
gamma: procedure; parse arg x; if datatype(x, 'W') then return !(x-1) /*Int? Use fact*/
n= trunc(x) /*at this point, X is pos and a multiple of 1/2.*/
d= !!(n+n - 1) /*compute the double factorial of: 2*n - 1. */
if n//2 then p= -1 /*if N is odd, then use a negative unity. */
else p= 1 /*if N is even, then use a positive unity. */
if x>0 then return p * d * sqrt(pi()) / (2**n)
return p * (2**n) * sqrt(pi()) / d
/*──────────────────────────────────────────────────────────────────────────────────────*/
gammaI: procedure; parse arg a,x; y= a-1; do while f(y)*(x-y) > 2e-8 & y<x; y= y + .4
end /*while*/
y= min(x, y)
return 1 - simp38(0, y, y / 0.015 / gamma(a-1) % 1)
/*──────────────────────────────────────────────────────────────────────────────────────*/
simp38: procedure; parse arg a, b, n; h= (b-a) / n; h1= h / 3
sum= f(a) + f(b)
do j=3*n-1 by -1 while j>0
if j//3 == 0 then sum= sum + 2 * f(a + h1*j)
else sum= sum + 3 * f(a + h1*j)
end /*j*/
return h * sum / 8
/*──────────────────────────────────────────────────────────────────────────────────────*/
sqrt: procedure; parse arg x; if x=0 then return 0; d=digits(); numeric digits; h= d+6
numeric form; m.=9; parse value format(x,2,1,,0) 'E0' with g "E" _ .;g=g *.5'e'_%2
do j=0 while h>9; m.j=h; h=h%2+1; end /*j*/
do k=j+5 to 0 by -1; numeric digits m.k; g=(g+x/g)*.5; end /*k*/; return g
/*──────────────────────────────────────────────────────────────────────────────────────*/
uTest: procedure; parse arg dset; sum= 0; pad= left('', 11); sigLev= 1/20 /*5%*/
say; say ' ' center(" Uniform distribution test ", 75, '═')
#= words(dset); sigPC= sigLev*100/1
do j=1 for #; sum= sum + word(dset, j)
end /*j*/
say pad " dataset: " dset
say pad " samples: " sum
say pad " categories: " #
say pad " degrees of freedom: " # - 1
dist= chi2ud(dset)
P= chi2p(# - 1, dist)
sig = (abs(P) < dist * sigLev)
say pad "significant at " sigPC'% level? ' word('no yes', sig + 1)
say pad " is the dataset uniform? " word('no yes', (\(sig))+ 1)
return</syntaxhighlight>
{{out|output|text=&nbsp; when using the default inputs:}}
<pre>
════════════════════════ Uniform distribution test ════════════════════════
dataset: 199809 200665 199607 200270 199649
samples: 1000000
categories: 5
degrees of freedom: 4
significant at 5% level? no
is the dataset uniform? yes
 
════════════════════════ Uniform distribution test ════════════════════════
dataset: 522573 244456 139979 71531 21461
samples: 1000000
categories: 5
degrees of freedom: 4
significant at 5% level? yes
is the dataset uniform? no
</pre>
 
=={{header|Ruby}}==
{{trans|Python}}
<syntaxhighlight lang="ruby">def gammaInc_Q(a, x)
a1, a2 = a-1, a-2
f0 = lambda {|t| t**a1 * Math.exp(-t)}
df0 = lambda {|t| (a1-t) * t**a2 * Math.exp(-t)}
y = a1
y += 0.3 while f0[y]*(x-y) > 2.0e-8 and y < x
y = x if y > x
h = 3.0e-4
n = (y/h).to_i
h = y/n
hh = 0.5 * h
sum = 0
(n-1).step(0, -1) do |j|
t = h * j
sum += f0[t] + hh * df0[t]
end
h * sum / gamma_spounge(a)
end
 
A = 12
k1_factrl = 1.0
coef = [Math.sqrt(2.0*Math::PI)]
COEF = (1...A).each_with_object(coef) do |k,c|
c << Math.exp(A-k) * (A-k)**(k-0.5) / k1_factrl
k1_factrl *= -k
end
 
def gamma_spounge(z)
accm = (1...A).inject(COEF[0]){|res,k| res += COEF[k] / (z+k)}
accm * Math.exp(-(z+A)) * (z+A)**(z+0.5) / z
end
 
def chi2UniformDistance(dataSet)
expected = dataSet.inject(:+).to_f / dataSet.size
dataSet.map{|d|(d-expected)**2}.inject(:+) / expected
end
 
def chi2Probability(dof, distance)
1.0 - gammaInc_Q(0.5*dof, 0.5*distance)
end
 
def chi2IsUniform(dataSet, significance=0.05)
dof = dataSet.size - 1
dist = chi2UniformDistance(dataSet)
chi2Probability(dof, dist) > significance
end
 
dsets = [ [ 199809, 200665, 199607, 200270, 199649 ],
[ 522573, 244456, 139979, 71531, 21461 ] ]
 
for ds in dsets
puts "Data set:#{ds}"
dof = ds.size - 1
puts " degrees of freedom: %d" % dof
distance = chi2UniformDistance(ds)
puts " distance: %.4f" % distance
puts " probability: %.4f" % chi2Probability(dof, distance)
puts " uniform? %s" % (chi2IsUniform(ds) ? "Yes" : "No")
end</syntaxhighlight>
 
{{out}}
<pre>
Data set:[199809, 200665, 199607, 200270, 199649]
degrees of freedom: 4
distance: 4.1463
probability: 0.3866
uniform? Yes
Data set:[522573, 244456, 139979, 71531, 21461]
degrees of freedom: 4
distance: 790063.2759
probability: -0.0000
uniform? No
</pre>
 
=={{header|Rust}}==
<syntaxhighlight lang="rust">
use statrs::function::gamma::gamma_li;
 
fn chi_distance(dataset: &[u32]) -> f64 {
let expected = f64::from(dataset.iter().sum::<u32>()) / dataset.len() as f64;
dataset
.iter()
.fold(0., |acc, &elt| acc + (elt as f64 - expected).powf(2.))
/ expected
}
 
fn chi2_probability(dof: f64, distance: f64) -> f64 {
1. - gamma_li(dof * 0.5, distance * 0.5)
}
 
fn chi2_uniform(dataset: &[u32], significance: f64) -> bool {
let d = chi_distance(&dataset);
chi2_probability(dataset.len() as f64 - 1., d) > significance
}
 
fn main() {
let dsets = vec![
vec![199809, 200665, 199607, 200270, 199649],
vec![522573, 244456, 139979, 71531, 21461],
];
 
for ds in dsets {
println!("Data set: {:?}", ds);
let d = chi_distance(&ds);
print!("Distance: {:.6} ", d);
print!(
"Chi2 probability: {:.6} ",
chi2_probability(ds.len() as f64 - 1., d)
);
print!("Uniform? {}\n", chi2_uniform(&ds, 0.05));
}
}
 
</syntaxhighlight>
{{out}}
<pre>
Data set: [199809, 200665, 199607, 200270, 199649]
Distance: 4.146280 Chi2 probability: 0.386571 Uniform? true
Data set: [522573, 244456, 139979, 71531, 21461]
Distance: 790063.275940 Chi2 probability: 0.000000 Uniform? false
</pre>
 
=={{header|Scala}}==
{{Out}}See it yourself by running in your browser [https://scastie.scala-lang.org/WUFeFG5WQkq2MZ51kBqaYA Scastie (remote JVM)].
{{libheader|Scala Math Statistic}}
{{libheader|Scastie qualified}}
{{works with|Scala|2.13}}
<syntaxhighlight lang="scala">import org.apache.commons.math3.special.Gamma.regularizedGammaQ
 
object ChiSquare extends App {
private val dataSets: Seq[Seq[Double]] =
Seq(
Seq(199809, 200665, 199607, 200270, 199649),
Seq(522573, 244456, 139979, 71531, 21461)
)
 
private def χ2IsUniform(data: Seq[Double], significance: Double) =
χ2Prob(data.size - 1.0, χ2Dist(data)) > significance
 
private def χ2Dist(data: Seq[Double]) = {
val avg = data.sum / data.size
 
data.reduce((a, b) => a + math.pow(b - avg, 2)) / avg
}
 
private def χ2Prob(dof: Double, distance: Double) =
regularizedGammaQ(dof / 2, distance / 2)
 
printf(" %4s %10s %12s %8s %s%n",
"d.f.", "χ²distance", "χ²probability", "Uniform?", "dataset")
dataSets.foreach { ds =>
val (dist, dof) = (χ2Dist(ds), ds.size - 1)
 
printf("%4d %11.3f %13.8f %5s %6s%n",
dof, dist, χ2Prob(dof.toDouble, dist), if (χ2IsUniform(ds, 0.05)) "YES" else "NO", ds.mkString(", "))
}
}</syntaxhighlight>
 
=={{header|Sidef}}==
<syntaxhighlight lang="ruby"># Confluent hypergeometric function of the first kind F_1(a;b;z)
func F1(a, b, z, limit=100) {
sum(0..limit, {|k|
rising_factorial(a, k) / rising_factorial(b, k) * z**k / k!
})
}
 
func γ(a,x) { # lower incomplete gamma function γ(a,x)
#a**(-1) * x**a * F1(a, a+1, -x) # simpler formula
a**(-1) * x**a * exp(-x) * F1(1, a+1, x) # slightly better convergence
}
 
func P(a,z) { # regularized gamma function P(a,z)
γ(a,z) / Γ(a)
}
 
func chi_squared_cdf (k, x) {
var f = (k<20 ? 20 : 10)
given(x) {
when (0) { 0 }
case (. < (k + f*sqrt(k))) { P(k/2, x/2) }
else { 1 }
}
}
 
func chi_squared_test(arr, significance = 0.05) {
var n = arr.len
var N = arr.sum
var expected = N/n
var χ_squared = arr.sum_by {|v| (v-expected)**2 / expected }
var p_value = (1 - chi_squared_cdf(n-1, χ_squared))
[χ_squared, p_value, p_value > significance]
}
 
[
%n< 199809 200665 199607 200270 199649 >,
%n< 522573 244456 139979 71531 21461 >,
].each {|dataset|
var r = chi_squared_test(dataset)
say "data: #{dataset}"
say "χ² = #{r[0]}, p-value = #{r[1].round(-4)}, uniform = #{r[2]}\n"
}</syntaxhighlight>
{{out}}
<pre>
data: [199809, 200665, 199607, 200270, 199649]
χ² = 4.14628, p-value = 0.3866, uniform = true
 
data: [522573, 244456, 139979, 71531, 21461]
χ² = 790063.27594, p-value = 0, uniform = false
</pre>
 
=={{header|Tcl}}==
{{works with|Tcl|8.5}}
{{tcllib|math::statistics}}
<langsyntaxhighlight lang="tcl">package require Tcl 8.5
package require math::statistics
 
Line 809 ⟶ 2,130:
[expr {$degreesOfFreedom / 2.0}] [expr {$X2 / 2.0}]]
expr {$likelihoodOfRandom > $significance}
}</langsyntaxhighlight>
Testing:
<langsyntaxhighlight lang="tcl">proc makeDistribution {operation {count 1000000}} {
for {set i 0} {$i<$count} {incr i} {incr distribution([uplevel 1 $operation])}
return [array get distribution]
Line 819 ⟶ 2,140:
puts "distribution \"$distFair\" assessed as [expr [isUniform $distFair]?{fair}:{unfair}]"
set distUnfair [makeDistribution {expr int(rand()*rand()*5)}]
puts "distribution \"$distUnfair\" assessed as [expr [isUniform $distUnfair]?{fair}:{unfair}]"</langsyntaxhighlight>
Output:
<pre>distribution "0 199809 4 199649 1 200665 2 199607 3 200270" assessed as fair
distribution "4 21461 0 522573 1 244456 2 139979 3 71531" assessed as unfair</pre>
 
=={{header|VBA}}==
The built in worksheetfunction ChiSq_Dist of Excel VBA is used. Output formatted like R.
<syntaxhighlight lang="vb">Private Function Test4DiscreteUniformDistribution(ObservationFrequencies() As Variant, Significance As Single) As Boolean
'Returns true if the observed frequencies pass the Pearson Chi-squared test at the required significance level.
Dim Total As Long, Ei As Long, i As Integer
Dim ChiSquared As Double, DegreesOfFreedom As Integer, p_value As Double
Debug.Print "[1] ""Data set:"" ";
For i = LBound(ObservationFrequencies) To UBound(ObservationFrequencies)
Total = Total + ObservationFrequencies(i)
Debug.Print ObservationFrequencies(i); " ";
Next i
DegreesOfFreedom = UBound(ObservationFrequencies) - LBound(ObservationFrequencies)
'This is exactly the number of different categories minus 1
Ei = Total / (DegreesOfFreedom + 1)
For i = LBound(ObservationFrequencies) To UBound(ObservationFrequencies)
ChiSquared = ChiSquared + (ObservationFrequencies(i) - Ei) ^ 2 / Ei
Next i
p_value = 1 - WorksheetFunction.ChiSq_Dist(ChiSquared, DegreesOfFreedom, True)
Debug.Print
Debug.Print " Chi-squared test for given frequencies"
Debug.Print "X-squared ="; ChiSquared; ", ";
Debug.Print "df ="; DegreesOfFreedom; ", ";
Debug.Print "p-value = "; Format(p_value, "0.0000")
Test4DiscreteUniformDistribution = p_value > Significance
End Function
Public Sub test()
Dim O() As Variant
O = [{199809,200665,199607,200270,199649}]
Debug.Print "[1] ""Uniform? "; Test4DiscreteUniformDistribution(O, 0.05); """"
O = [{522573,244456,139979,71531,21461}]
Debug.Print "[1] ""Uniform? "; Test4DiscreteUniformDistribution(O, 0.05); """"
End Sub</syntaxhighlight>
{{out}<pre>[1] "Data set:" 199809 200665 199607 200270 199649
Chi-squared test for given frequencies
X-squared = 4.14628 , df = 4 , p-value = 0.3866
[1] "Uniform? True"
[1] "Data set:" 522573 244456 139979 71531 21461
Chi-squared test for given frequencies
X-squared = 790063.27594 , df = 4 , p-value = 0.0000
[1] "Uniform? False"</pre>
 
=={{header|V (Vlang)}}==
{{trans|Go}}
<syntaxhighlight lang="v (vlang)">import math
 
type Ifctn = fn(f64) f64
fn simpson38(f Ifctn, a f64, b f64, n int) f64 {
h := (b - a) / f64(n)
h1 := h / 3
mut sum := f(a) + f(b)
for j := 3*n - 1; j > 0; j-- {
if j%3 == 0 {
sum += 2 * f(a+h1*f64(j))
} else {
sum += 3 * f(a+h1*f64(j))
}
}
return h * sum / 8
}
fn gamma_inc_q(a f64, x f64) f64 {
aa1 := a - 1
f := Ifctn(fn[aa1](t f64) f64 {
return math.pow(t, aa1) * math.exp(-t)
})
mut y := aa1
h := 1.5e-2
for f(y)*(x-y) > 2e-8 && y < x {
y += .4
}
if y > x {
y = x
}
return 1 - simpson38(f, 0, y, int(y/h/math.gamma(a)))
}
fn chi2ud(ds []int) f64 {
mut sum, mut expected := 0.0,0.0
for d in ds {
expected += f64(d)
}
expected /= f64(ds.len)
for d in ds {
x := f64(d) - expected
sum += x * x
}
return sum / expected
}
fn chi2p(dof int, distance f64) f64 {
return gamma_inc_q(.5*f64(dof), .5*distance)
}
const sig_level = .05
fn main() {
for dset in [
[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461],
] {
utest(dset)
}
}
fn utest(dset []int) {
println("Uniform distribution test")
mut sum := 0
for c in dset {
sum += c
}
println(" dataset: $dset")
println(" samples: $sum")
println(" categories: $dset.len")
dof := dset.len - 1
println(" degrees of freedom: $dof")
dist := chi2ud(dset)
println(" chi square test statistic: $dist")
p := chi2p(dof, dist)
println(" p-value of test statistic: $p")
sig := p < sig_level
println(" significant at ${sig_level*100:2.0f}% level? $sig")
println(" uniform? ${!sig}\n")
}</syntaxhighlight>
{{out}}
<pre>
Uniform distribution test
dataset: [199809 200665 199607 200270 199649]
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 4.14628
p-value of test statistic: 0.3865708330827673
significant at 5% level? false
uniform? true
 
Uniform distribution test
dataset: [522573 244456 139979 71531 21461]
samples: 1000000
categories: 5
degrees of freedom: 4
chi square test statistic: 790063.27594
p-value of test statistic: 2.3528290427066167e-11
significant at 5% level? true
uniform? false
</pre>
 
=={{header|Wren}}==
{{trans|Kotlin}}
{{libheader|Wren-math}}
{{libheader|Wren-fmt}}
<syntaxhighlight lang="wren">import "./math" for Math, Nums
import "./fmt" for Fmt
 
var integrate = Fn.new { |a, b, n, f|
var h = (b - a) / n
var sum = 0
for (i in 0...n) {
var x = a + i*h
sum = sum + (f.call(x) + 4 * f.call(x + h/2) + f.call(x + h)) / 6
}
return sum * h
}
 
var gammaIncomplete = Fn.new { |a, x|
var am1 = a - 1
var f0 = Fn.new { |t| t.pow(am1) * (-t).exp }
var h = 1.5e-2
var y = am1
while ((f0.call(y) * (x - y) > 2e-8) && y < x) y = y + 0.4
if (y > x) y = x
return 1 - integrate.call(0, y, (y/h).truncate, f0) / Math.gamma(a)
}
 
var chi2UniformDistance = Fn.new { |ds|
var expected = Nums.mean(ds)
var sum = Nums.sum(ds.map { |d| (d - expected).pow(2) }.toList)
return sum / expected
}
 
var chi2Probability = Fn.new { |dof, dist| gammaIncomplete.call(0.5*dof, 0.5*dist) }
 
var chiIsUniform = Fn.new { |ds, significance|
var dof = ds.count - 1
var dist = chi2UniformDistance.call(ds)
return chi2Probability.call(dof, dist) > significance
}
 
var dsets = [
[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461]
]
for (ds in dsets) {
System.print("Dataset: %(ds)")
var dist = chi2UniformDistance.call(ds)
var dof = ds.count - 1
Fmt.write("DOF: $d Distance: $.4f", dof, dist)
var prob = chi2Probability.call(dof, dist)
Fmt.write(" Probability: $.6f", prob)
var uniform = chiIsUniform.call(ds, 0.05) ? "Yes" : "No"
System.print(" Uniform? %(uniform)\n")
}</syntaxhighlight>
 
{{out}}
<pre>
Dataset: [199809, 200665, 199607, 200270, 199649]
DOF: 4 Distance: 4.1463 Probability: 0.386571 Uniform? Yes
 
Dataset: [522573, 244456, 139979, 71531, 21461]
DOF: 4 Distance: 790063.2759 Probability: 0.000000 Uniform? No
</pre>
 
=={{header|zkl}}==
{{trans|C}}
{{trans|D}}
<syntaxhighlight lang="zkl">/* Numerical integration method */
fcn Simpson3_8(f,a,b,N){ // fcn,double,double,Int --> double
h,h1:=(b - a)/N, h/3.0;
h*[1..3*N - 1].reduce('wrap(sum,j){
l1:=(if(j%3) 3.0 else 2.0);
sum + l1*f(a + h1*j);
},f(a) + f(b))/8.0;
}
const A=12;
fcn Gamma_Spouge(z){ // double --> double
var coefs=fcn{ // this runs only once, at construction time
a,coefs:=A.toFloat(),(A).pump(List(),0.0);
k1_factrl:=1.0;
coefs[0]=(2.0*(0.0).pi).sqrt();
foreach k in ([1.0..A-1]){
coefs[k]=(a - k).exp() * (a - k).pow(k - 0.5) / k1_factrl;
k1_factrl*=-k;
}
coefs
}();
 
( [1..A-1].reduce('wrap(accum,k){ accum + coefs[k]/(z + k) },coefs[0])
* (-(z + A)).exp()*(z + A).pow(z + 0.5) )
/ z;
}
 
fcn f0(t,aa1){ t.pow(aa1)*(-t).exp() }
fcn GammaIncomplete_Q(a,x){ // double,double --> double
h:=1.5e-2; /* approximate integration step size */
/* this cuts off the tail of the integration to speed things up */
y:=a - 1; f:=f0.fp1(y);
while((f(y)*(x - y)>2.0e-8) and (y<x)){ y+=0.4; }
if(y>x) y=x;
1.0 - Simpson3_8(f,0.0,y,(y/h).toInt())/Gamma_Spouge(a);
}</syntaxhighlight>
<syntaxhighlight lang="zkl">fcn chi2UniformDistance(ds){ // --> double
dslen :=ds.len();
expected:=dslen.reduce('wrap(sum,k){ sum + ds[k] },0.0)/dslen;
sum := dslen.reduce('wrap(sum,k){ x:=ds[k] - expected; sum + x*x },0.0);
sum/expected
}
fcn chi2Probability(dof,distance){ GammaIncomplete_Q(0.5*dof, 0.5*distance) }
fcn chiIsUniform(dset,significance=0.05){
significance < chi2Probability(-1.0 + dset.len(),chi2UniformDistance(dset))
}</syntaxhighlight>
<syntaxhighlight lang="zkl">datasets:=T( T(199809.0, 200665.0, 199607.0, 200270.0, 199649.0),
T(522573.0, 244456.0, 139979.0, 71531.0, 21461.0) );
println(" %4s %12s %12s %8s %s".fmt(
"dof", "distance", "probability", "Uniform?", "dataset"));
foreach ds in (datasets){
dof :=ds.len() - 1;
dist:=chi2UniformDistance(ds);
prob:=chi2Probability(dof,dist);
println("%4d %12.3f %12.8f %5s %6s".fmt(
dof, dist, prob, chiIsUniform(ds) and "YES" or "NO",
ds.concat(",")));
}</syntaxhighlight>
{{out}}
<pre>
dof distance probability Uniform? dataset
4 4.146 0.38657083 YES 199809,200665,199607,200270,199649
4 790063.276 0.00000000 NO 522573,244456,139979,71531,21461
</pre>
 
{{omit from|GUISS}}
9,476

edits