Verify distribution uniformity/Chi-squared test: Difference between revisions

m
syntax highlighting fixup automation
No edit summary
m (syntax highlighting fixup automation)
Line 14:
{{trans|Python}}
 
<langsyntaxhighlight lang="11l">V a = 12
V k1_factrl = 1.0
[Float] c
Line 75:
V prob = chi2Probability(dof, distance)
print(‘probability: #.4’.format(prob), end' ‘ ’)
print(‘uniform? ’(I chi2IsUniform(ds, 0.05) {‘Yes’} E ‘No’))</langsyntaxhighlight>
 
{{out}}
Line 87:
=={{header|Ada}}==
First, we specify a simple package to compute the Chi-Square Distance from the uniform distribution:
<langsyntaxhighlight Adalang="ada">package Chi_Square is
type Flt is digits 18;
Line 94:
function Distance(Bins: Bins_Type) return Flt;
end Chi_Square;</langsyntaxhighlight>
 
Next, we implement that package:
 
<langsyntaxhighlight Adalang="ada">package body Chi_Square is
function Distance(Bins: Bins_Type) return Flt is
Line 124:
end Distance;
end Chi_Square;</langsyntaxhighlight>
 
Finally, we actually implement the Chi-square test. We do not actually compute the Chi-square probability; rather we hardcode a table of values for 5% significance level, which has been picked from Wikipedia [http://en.wikipedia.org/wiki/Chi-squared_distribution]:
<langsyntaxhighlight Adalang="ada">with Ada.Text_IO, Ada.Command_Line, Chi_Square; use Ada.Text_IO;
 
procedure Test_Chi_Square is
Line 154:
Put_Line("; (deviates significantly from uniform)");
end if;
end;</langsyntaxhighlight>
 
{{out}}
Line 165:
This first sections contains the functions required to compute the Chi-Squared probability.
These are not needed if a library containing the necessary function is availabile (e.g. see [[Numerical Integration]], [[Gamma function]]).
<langsyntaxhighlight lang="c">#include <stdlib.h>
#include <stdio.h>
#include <math.h>
Line 232:
 
return 1.0 - Simpson3_8( &f0, 0, y, (int)(y/h))/Gamma_Spouge(a);
}</langsyntaxhighlight>
This section contains the functions specific to the task.
<langsyntaxhighlight lang="c">double chi2UniformDistance( double *ds, int dslen)
{
double expected = 0.0;
Line 261:
double dist = chi2UniformDistance( dset, dslen);
return chi2Probability( dof, dist ) > significance;
}</langsyntaxhighlight>
Testing
<langsyntaxhighlight lang="c">int main(int argc, char **argv)
{
double dset1[] = { 199809., 200665., 199607., 200270., 199649. };
Line 287:
}
return 0;
}</langsyntaxhighlight>
 
=={{header|D}}==
<langsyntaxhighlight lang="d">import std.stdio, std.algorithm, std.mathspecial;
 
real x2Dist(T)(in T[] data) pure nothrow @safe @nogc {
Line 319:
dof, dist, prob, ds.x2IsUniform ? "YES" : "NO", ds);
}
}</langsyntaxhighlight>
{{out}}
<pre> dof distance probability Uniform? dataset
Line 327:
=={{header|Elixir}}==
{{trans|Ruby}}
<langsyntaxhighlight lang="elixir">defmodule Verify do
defp gammaInc_Q(a, x) do
a1 = a-1
Line 389:
:io.fwrite " probability: ~.4f~n", [Verify.chi2Probability(dof, distance)]
:io.fwrite " uniform? ~s~n", [(if Verify.chi2IsUniform(ds), do: "Yes", else: "No")]
end)</langsyntaxhighlight>
 
{{out}}
Line 412:
Instead of implementing the chi-squared distribution by ourselves, we bind to GNU Scientific Library; so we need a module to interface to the function we need (<tt>gsl_cdf_chisq_Q</tt>)
 
<langsyntaxhighlight lang="fortran">module gsl_mini_bind_m
 
use iso_c_binding
Line 440:
end function p_value
 
end module gsl_mini_bind_m</langsyntaxhighlight>
 
Now we're ready to complete the task.
 
<langsyntaxhighlight lang="fortran">program chi2test
 
use gsl_mini_bind_m, only: p_value
Line 490:
end function chisq
 
end program chi2test</langsyntaxhighlight>
 
Output:
<langsyntaxhighlight lang="txt">Dataset 1: 199809.0000 200665.0000 199607.0000 200270.0000 199649.0000
dof: 4 chisq: 4.1463
probability: 0.3866
Line 501:
dof: 4 chisq: 790063.2500
probability: 0.0000
uniform? F</langsyntaxhighlight>
 
=={{header|Go}}==
{{trans|C}}
Go has a nice gamma function in the library. Otherwise, it's mostly a port from C. Note, this implementation of the incomplete gamma function works for these two test cases, but, I believe, has serious limitations. See talk page.
<langsyntaxhighlight lang="go">package main
 
import (
Line 595:
fmt.Printf(" significant at %2.0f%% level? %t\n", sigLevel*100, sig)
fmt.Println(" uniform? ", !sig, "\n")
}</langsyntaxhighlight>
Output:
<pre>
Line 620:
 
=={{header|Hy}}==
<langsyntaxhighlight lang="lisp">(import
[scipy.stats [chisquare]]
[collections [Counter]])
Line 630:
size 'alpha'."
(<= alpha (second (chisquare
(.values (Counter (take repeats (repeatedly f))))))))</langsyntaxhighlight>
 
Examples of use:
 
<langsyntaxhighlight lang="lisp">(import [random [randint]])
 
(for [f [
(fn [] (randint 1 10))
(fn [] (if (randint 0 1) (randint 1 9) (randint 1 10)))]]
(print (uniform? f 5000)))</langsyntaxhighlight>
 
=={{header|J}}==
'''Solution (Tacit):'''
<langsyntaxhighlight lang="j">require 'stats/base'
 
countCats=: #@~. NB. counts the number of unique items
Line 655:
NB. y is: distribution to test
NB. x is: optionally specify number of categories possible
isUniform=: (countCats $: ]) : (0.95 > calcDf chisqcdf :: 1: calcX2)</langsyntaxhighlight>
 
'''Solution (Explicit):'''
<langsyntaxhighlight lang="j">require 'stats/base'
 
NB.*isUniformX v Tests (5%) whether y is uniformly distributed
Line 673:
degfreedom=. <: x NB. degrees of freedom
signif > degfreedom chisqcdf :: 1: X2
)</langsyntaxhighlight>
 
'''Example Usage:'''
<langsyntaxhighlight lang="j"> FairDistrib=: 1e6 ?@$ 5
UnfairDistrib=: (9.5e5 ?@$ 5) , (5e4 ?@$ 4)
isUniformX FairDistrib
Line 685:
1
4 isUniform 4 4 4 5 5 5 5 5 5 5 NB. not uniform if 4 categories possible
0</langsyntaxhighlight>
 
=={{header|Java}}==
{{trans|D}}
{{works with|Java|8}}
<langsyntaxhighlight lang="java">import static java.lang.Math.pow;
import java.util.Arrays;
import static java.util.Arrays.stream;
Line 727:
}
}
}</langsyntaxhighlight>
<pre> dof distance probability Uniform? dataset
4 4,146 0,38657083 YES [199809.0, 200665.0, 199607.0, 200270.0, 199649.0]
Line 733:
 
=={{header|Julia}}==
<langsyntaxhighlight lang="julia"># v0.6
 
using Distributions
Line 751:
println("Data:\n$data")
println("Hypothesis test: the original population is ", (eqdist(data) ? "" : "not "), "uniform.\n")
end</langsyntaxhighlight>
 
{{out}}
Line 765:
=={{header|Kotlin}}==
This program reuses Kotlin code from the [[Gamma function]] and [[Numerical Integration]] tasks but otherwise is a translation of the C entry for this task.
<langsyntaxhighlight lang="scala">// version 1.1.51
 
typealias Func = (Double) -> Double
Line 841:
println(" Uniform? $uniform\n")
}
}</langsyntaxhighlight>
 
{{out}}
Line 854:
=={{header|Mathematica}}/{{header|Wolfram Language}}==
This code explicity assumes a discrete uniform distribution since the chi square test is a poor test choice for continuous distributions and requires Mathematica version 2 or later
<langsyntaxhighlight Mathematicalang="mathematica">discreteUniformDistributionQ[data_, {min_Integer, max_Integer}, confLevel_: .05] :=
If[$VersionNumber >= 8,
confLevel <= PearsonChiSquareTest[data, DiscreteUniformDistribution[{min, max}]],
Line 861:
GammaRegularized[k/2, 0, v/2] <= 1 - confLevel]]
 
discreteUniformDistributionQ[data_] :=discreteUniformDistributionQ[data, data[[Ordering[data][[{1, -1}]]]]]</langsyntaxhighlight>
code used to create test data requires Mathematica version 6 or later
<langsyntaxhighlight Mathematicalang="mathematica">uniformData = RandomInteger[10, 100];
nonUniformData = Total@RandomInteger[10, {5, 100}];</langsyntaxhighlight>
<syntaxhighlight lang Mathematica="mathematica">{discreteUniformDistributionQ[uniformData],discreteUniformDistributionQ[nonUniformData]}</langsyntaxhighlight>
{{out}}<pre>{True,False}</pre>
 
Line 872:
We use the gamma function from the “math” module. To simplify the code, we use also the “lenientops” module which provides mixed operations between floats ane integers.
 
<langsyntaxhighlight Nimlang="nim">import lenientops, math, stats, strformat, sugar
 
func simpson38(f: (float) -> float; a, b: float; n: int): float =
Line 932:
for dset in [[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461]]:
utest(dset)</langsyntaxhighlight>
 
{{out}}
Line 958:
This code needs to be compiled with library [http://oandrieu.nerim.net/ocaml/gsl/ gsl.cma].
 
<langsyntaxhighlight lang="ocaml">let sqr x = x *. x
 
let chi2UniformDistance distrib =
Line 991:
[| 199809; 200665; 199607; 200270; 199649 |];
[| 522573; 244456; 139979; 71531; 21461 |]
]</langsyntaxhighlight>
 
Output
Line 1,003:
 
The sample data for the test was taken from [[#Go|Go]].
<langsyntaxhighlight lang="parigp">cumChi2(chi2,dof)={
my(g=gamma(dof/2));
incgam(dof/2,chi2/2,g)/g
Line 1,019:
 
test([199809, 200665, 199607, 200270, 199649])
test([522573, 244456, 139979, 71531, 21461])</langsyntaxhighlight>
 
=={{header|Perl}}==
{{trans|Raku}}
<langsyntaxhighlight lang="perl">use List::Util qw(sum reduce);
use constant pi => 3.14159265;
 
Line 1,065:
for $dataset ([199809, 200665, 199607, 200270, 199649], [522573, 244456, 139979, 71531, 21461]) {
printf "C2 = %10.3f, p-value = %.3f, uniform = %s\n", chi_squared_test(@$dataset);
}</langsyntaxhighlight>
{{out}}
<pre>C2 = 4.146, p-value = 0.387, uniform = True
Line 1,072:
=={{header|Phix}}==
{{trans|Go}}
<!--<langsyntaxhighlight Phixlang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #004080;">atom</span> <span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">)</span>
Line 1,152:
<span style="color: #000000;">utest</span><span style="color: #0000FF;">({</span><span style="color: #000000;">199809</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">200665</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">199607</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">200270</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">199649</span><span style="color: #0000FF;">})</span>
<span style="color: #000000;">utest</span><span style="color: #0000FF;">({</span><span style="color: #000000;">522573</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">244456</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">139979</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">71531</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">21461</span><span style="color: #0000FF;">})</span>
<!--</langsyntaxhighlight>-->
{{out}}
<pre>
Line 1,180:
Implements the Chi Square Probability function with an integration. I'm
sure there are better ways to do this. Compare to OCaml implementation.
<langsyntaxhighlight lang="python">import math
import random
 
Line 1,246:
prob = chi2Probability( dof, distance)
print "probability: %.4f"%prob,
print "uniform? ", "Yes"if chi2IsUniform(ds,0.05) else "No"</langsyntaxhighlight>
Output:
<pre>Data set: [199809, 200665, 199607, 200270, 199649]
Line 1,256:
This uses the library routine [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html scipy.stats.chisquare].
 
<langsyntaxhighlight lang="python">from scipy.stats import chisquare
 
 
Line 1,266:
dist, pvalue = chisquare(ds)
uni = 'YES' if pvalue > 0.05 else 'NO'
print(f"{dist:12.3f} {pvalue:12.8f} {uni:^8} {ds}")</langsyntaxhighlight>
 
{{out}}
Line 1,275:
=={{header|R}}==
R being a statistical computating language, the chi-squared test is built in with the function "chisq.test"
<langsyntaxhighlight lang="tcl">
dset1=c(199809,200665,199607,200270,199649)
dset2=c(522573,244456,139979,71531,21461)
Line 1,288:
print(paste("uniform?",chi2IsUniform(ds)))
}
</syntaxhighlight>
</lang>
 
Output:
Line 1,311:
 
=={{header|Racket}}==
<langsyntaxhighlight lang="racket">
#lang racket
(require
Line 1,352:
; Test whether the constant generator fails:
(is-uniform? (λ(_) 5) 1000 0.05)
</syntaxhighlight>
</lang>
Output:
<langsyntaxhighlight lang="racket">
#t
#f
</syntaxhighlight>
</lang>
 
=={{header|Raku}}==
Line 1,366:
in closed form, as we only need its value at integers and half integers.
 
<syntaxhighlight lang="raku" perl6line>sub incomplete-γ-series($s, $z) {
my \numers = $z X** 1..*;
my \denoms = [\*] $s X+ 1..*;
Line 1,406:
say 'data: ', $dataset;
say "χ² = {%t<chi-squared>}, p-value = {%t<p-value>.fmt('%.4f')}, uniform = {%t<uniform>}";
}</langsyntaxhighlight>
{{out}}
<pre>data: 199809 200665 199607 200270 199649
Line 1,425:
either an integer, &nbsp; or a number which is a multiple of &nbsp; <big>'''<sup>1</sup>/<sub>2</sub>'''</big>, &nbsp; both of these cases can be calculated with
<br>a straight─forward calculation.
<langsyntaxhighlight lang="rexx">/*REXX program performs a chi─squared test to verify a given distribution is uniform. */
numeric digits length( pi() ) - length(.) /*enough decimal digs for calculations.*/
@.=; @.1= 199809 200665 199607 200270 199649
Line 1,495:
say pad "significant at " sigPC'% level? ' word('no yes', sig + 1)
say pad " is the dataset uniform? " word('no yes', (\(sig))+ 1)
return</langsyntaxhighlight>
{{out|output|text=&nbsp; when using the default inputs:}}
<pre>
Line 1,517:
=={{header|Ruby}}==
{{trans|Python}}
<langsyntaxhighlight lang="ruby">def gammaInc_Q(a, x)
a1, a2 = a-1, a-2
f0 = lambda {|t| t**a1 * Math.exp(-t)}
Line 1,577:
puts " probability: %.4f" % chi2Probability(dof, distance)
puts " uniform? %s" % (chi2IsUniform(ds) ? "Yes" : "No")
end</langsyntaxhighlight>
 
{{out}}
Line 1,594:
 
=={{header|Rust}}==
<langsyntaxhighlight lang="rust">
use statrs::function::gamma::gamma_li;
 
Line 1,632:
}
 
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 1,646:
{{libheader|Scastie qualified}}
{{works with|Scala|2.13}}
<langsyntaxhighlight Scalalang="scala">import org.apache.commons.math3.special.Gamma.regularizedGammaQ
 
object ChiSquare extends App {
Line 1,675:
dof, dist, χ2Prob(dof.toDouble, dist), if (χ2IsUniform(ds, 0.05)) "YES" else "NO", ds.mkString(", "))
}
}</langsyntaxhighlight>
 
=={{header|Sidef}}==
<langsyntaxhighlight lang="ruby"># Confluent hypergeometric function of the first kind F_1(a;b;z)
func F1(a, b, z, limit=100) {
sum(0..limit, {|k|
Line 1,719:
say "data: #{dataset}"
say "χ² = #{r[0]}, p-value = #{r[1].round(-4)}, uniform = #{r[2]}\n"
}</langsyntaxhighlight>
{{out}}
<pre>
Line 1,732:
{{works with|Tcl|8.5}}
{{tcllib|math::statistics}}
<langsyntaxhighlight lang="tcl">package require Tcl 8.5
package require math::statistics
 
Line 1,746:
[expr {$degreesOfFreedom / 2.0}] [expr {$X2 / 2.0}]]
expr {$likelihoodOfRandom > $significance}
}</langsyntaxhighlight>
Testing:
<langsyntaxhighlight lang="tcl">proc makeDistribution {operation {count 1000000}} {
for {set i 0} {$i<$count} {incr i} {incr distribution([uplevel 1 $operation])}
return [array get distribution]
Line 1,756:
puts "distribution \"$distFair\" assessed as [expr [isUniform $distFair]?{fair}:{unfair}]"
set distUnfair [makeDistribution {expr int(rand()*rand()*5)}]
puts "distribution \"$distUnfair\" assessed as [expr [isUniform $distUnfair]?{fair}:{unfair}]"</langsyntaxhighlight>
Output:
<pre>distribution "0 199809 4 199649 1 200665 2 199607 3 200270" assessed as fair
Line 1,763:
=={{header|VBA}}==
The built in worksheetfunction ChiSq_Dist of Excel VBA is used. Output formatted like R.
<langsyntaxhighlight lang="vb">Private Function Test4DiscreteUniformDistribution(ObservationFrequencies() As Variant, Significance As Single) As Boolean
'Returns true if the observed frequencies pass the Pearson Chi-squared test at the required significance level.
Dim Total As Long, Ei As Long, i As Integer
Line 1,792:
O = [{522573,244456,139979,71531,21461}]
Debug.Print "[1] ""Uniform? "; Test4DiscreteUniformDistribution(O, 0.05); """"
End Sub</langsyntaxhighlight>
{{out}<pre>[1] "Data set:" 199809 200665 199607 200270 199649
Chi-squared test for given frequencies
Line 1,804:
=={{header|Vlang}}==
{{trans|Go}}
<langsyntaxhighlight lang="vlang">import math
 
type Ifctn = fn(f64) f64
Line 1,888:
println(" significant at ${sig_level*100:2.0f}% level? $sig")
println(" uniform? ${!sig}\n")
}</langsyntaxhighlight>
{{out}}
<pre>
Line 1,916:
{{libheader|Wren-math}}
{{libheader|Wren-fmt}}
<langsyntaxhighlight lang="ecmascript">import "/math" for Math, Nums
import "/fmt" for Fmt
 
Line 1,966:
var uniform = chiIsUniform.call(ds, 0.05) ? "Yes" : "No"
System.print(" Uniform? %(uniform)\n")
}</langsyntaxhighlight>
 
{{out}}
Line 1,980:
{{trans|C}}
{{trans|D}}
<langsyntaxhighlight lang="zkl">/* Numerical integration method */
fcn Simpson3_8(f,a,b,N){ // fcn,double,double,Int --> double
h,h1:=(b - a)/N, h/3.0;
Line 2,016:
if(y>x) y=x;
1.0 - Simpson3_8(f,0.0,y,(y/h).toInt())/Gamma_Spouge(a);
}</langsyntaxhighlight>
<langsyntaxhighlight lang="zkl">fcn chi2UniformDistance(ds){ // --> double
dslen :=ds.len();
expected:=dslen.reduce('wrap(sum,k){ sum + ds[k] },0.0)/dslen;
Line 2,028:
fcn chiIsUniform(dset,significance=0.05){
significance < chi2Probability(-1.0 + dset.len(),chi2UniformDistance(dset))
}</langsyntaxhighlight>
<langsyntaxhighlight lang="zkl">datasets:=T( T(199809.0, 200665.0, 199607.0, 200270.0, 199649.0),
T(522573.0, 244456.0, 139979.0, 71531.0, 21461.0) );
println(" %4s %12s %12s %8s %s".fmt(
Line 2,040:
dof, dist, prob, chiIsUniform(ds) and "YES" or "NO",
ds.concat(",")));
}</langsyntaxhighlight>
{{out}}
<pre>
10,327

edits