Verify distribution uniformity/Chi-squared test: Difference between revisions
Verify distribution uniformity/Chi-squared test (view source)
Revision as of 20:15, 28 August 2022
, 1 year agosyntax highlighting fixup automation
No edit summary |
Thundergnat (talk | contribs) m (syntax highlighting fixup automation) |
||
Line 14:
{{trans|Python}}
<
V k1_factrl = 1.0
[Float] c
Line 75:
V prob = chi2Probability(dof, distance)
print(‘probability: #.4’.format(prob), end' ‘ ’)
print(‘uniform? ’(I chi2IsUniform(ds, 0.05) {‘Yes’} E ‘No’))</
{{out}}
Line 87:
=={{header|Ada}}==
First, we specify a simple package to compute the Chi-Square Distance from the uniform distribution:
<
type Flt is digits 18;
Line 94:
function Distance(Bins: Bins_Type) return Flt;
end Chi_Square;</
Next, we implement that package:
<
function Distance(Bins: Bins_Type) return Flt is
Line 124:
end Distance;
end Chi_Square;</
Finally, we actually implement the Chi-square test. We do not actually compute the Chi-square probability; rather we hardcode a table of values for 5% significance level, which has been picked from Wikipedia [http://en.wikipedia.org/wiki/Chi-squared_distribution]:
<
procedure Test_Chi_Square is
Line 154:
Put_Line("; (deviates significantly from uniform)");
end if;
end;</
{{out}}
Line 165:
This first sections contains the functions required to compute the Chi-Squared probability.
These are not needed if a library containing the necessary function is availabile (e.g. see [[Numerical Integration]], [[Gamma function]]).
<
#include <stdio.h>
#include <math.h>
Line 232:
return 1.0 - Simpson3_8( &f0, 0, y, (int)(y/h))/Gamma_Spouge(a);
}</
This section contains the functions specific to the task.
<
{
double expected = 0.0;
Line 261:
double dist = chi2UniformDistance( dset, dslen);
return chi2Probability( dof, dist ) > significance;
}</
Testing
<
{
double dset1[] = { 199809., 200665., 199607., 200270., 199649. };
Line 287:
}
return 0;
}</
=={{header|D}}==
<
real x2Dist(T)(in T[] data) pure nothrow @safe @nogc {
Line 319:
dof, dist, prob, ds.x2IsUniform ? "YES" : "NO", ds);
}
}</
{{out}}
<pre> dof distance probability Uniform? dataset
Line 327:
=={{header|Elixir}}==
{{trans|Ruby}}
<
defp gammaInc_Q(a, x) do
a1 = a-1
Line 389:
:io.fwrite " probability: ~.4f~n", [Verify.chi2Probability(dof, distance)]
:io.fwrite " uniform? ~s~n", [(if Verify.chi2IsUniform(ds), do: "Yes", else: "No")]
end)</
{{out}}
Line 412:
Instead of implementing the chi-squared distribution by ourselves, we bind to GNU Scientific Library; so we need a module to interface to the function we need (<tt>gsl_cdf_chisq_Q</tt>)
<
use iso_c_binding
Line 440:
end function p_value
end module gsl_mini_bind_m</
Now we're ready to complete the task.
<
use gsl_mini_bind_m, only: p_value
Line 490:
end function chisq
end program chi2test</
Output:
<
dof: 4 chisq: 4.1463
probability: 0.3866
Line 501:
dof: 4 chisq: 790063.2500
probability: 0.0000
uniform? F</
=={{header|Go}}==
{{trans|C}}
Go has a nice gamma function in the library. Otherwise, it's mostly a port from C. Note, this implementation of the incomplete gamma function works for these two test cases, but, I believe, has serious limitations. See talk page.
<
import (
Line 595:
fmt.Printf(" significant at %2.0f%% level? %t\n", sigLevel*100, sig)
fmt.Println(" uniform? ", !sig, "\n")
}</
Output:
<pre>
Line 620:
=={{header|Hy}}==
<
[scipy.stats [chisquare]]
[collections [Counter]])
Line 630:
size 'alpha'."
(<= alpha (second (chisquare
(.values (Counter (take repeats (repeatedly f))))))))</
Examples of use:
<
(for [f [
(fn [] (randint 1 10))
(fn [] (if (randint 0 1) (randint 1 9) (randint 1 10)))]]
(print (uniform? f 5000)))</
=={{header|J}}==
'''Solution (Tacit):'''
<
countCats=: #@~. NB. counts the number of unique items
Line 655:
NB. y is: distribution to test
NB. x is: optionally specify number of categories possible
isUniform=: (countCats $: ]) : (0.95 > calcDf chisqcdf :: 1: calcX2)</
'''Solution (Explicit):'''
<
NB.*isUniformX v Tests (5%) whether y is uniformly distributed
Line 673:
degfreedom=. <: x NB. degrees of freedom
signif > degfreedom chisqcdf :: 1: X2
)</
'''Example Usage:'''
<
UnfairDistrib=: (9.5e5 ?@$ 5) , (5e4 ?@$ 4)
isUniformX FairDistrib
Line 685:
1
4 isUniform 4 4 4 5 5 5 5 5 5 5 NB. not uniform if 4 categories possible
0</
=={{header|Java}}==
{{trans|D}}
{{works with|Java|8}}
<
import java.util.Arrays;
import static java.util.Arrays.stream;
Line 727:
}
}
}</
<pre> dof distance probability Uniform? dataset
4 4,146 0,38657083 YES [199809.0, 200665.0, 199607.0, 200270.0, 199649.0]
Line 733:
=={{header|Julia}}==
<
using Distributions
Line 751:
println("Data:\n$data")
println("Hypothesis test: the original population is ", (eqdist(data) ? "" : "not "), "uniform.\n")
end</
{{out}}
Line 765:
=={{header|Kotlin}}==
This program reuses Kotlin code from the [[Gamma function]] and [[Numerical Integration]] tasks but otherwise is a translation of the C entry for this task.
<
typealias Func = (Double) -> Double
Line 841:
println(" Uniform? $uniform\n")
}
}</
{{out}}
Line 854:
=={{header|Mathematica}}/{{header|Wolfram Language}}==
This code explicity assumes a discrete uniform distribution since the chi square test is a poor test choice for continuous distributions and requires Mathematica version 2 or later
<
If[$VersionNumber >= 8,
confLevel <= PearsonChiSquareTest[data, DiscreteUniformDistribution[{min, max}]],
Line 861:
GammaRegularized[k/2, 0, v/2] <= 1 - confLevel]]
discreteUniformDistributionQ[data_] :=discreteUniformDistributionQ[data, data[[Ordering[data][[{1, -1}]]]]]</
code used to create test data requires Mathematica version 6 or later
<
nonUniformData = Total@RandomInteger[10, {5, 100}];</
<syntaxhighlight lang
{{out}}<pre>{True,False}</pre>
Line 872:
We use the gamma function from the “math” module. To simplify the code, we use also the “lenientops” module which provides mixed operations between floats ane integers.
<
func simpson38(f: (float) -> float; a, b: float; n: int): float =
Line 932:
for dset in [[199809, 200665, 199607, 200270, 199649],
[522573, 244456, 139979, 71531, 21461]]:
utest(dset)</
{{out}}
Line 958:
This code needs to be compiled with library [http://oandrieu.nerim.net/ocaml/gsl/ gsl.cma].
<
let chi2UniformDistance distrib =
Line 991:
[| 199809; 200665; 199607; 200270; 199649 |];
[| 522573; 244456; 139979; 71531; 21461 |]
]</
Output
Line 1,003:
The sample data for the test was taken from [[#Go|Go]].
<
my(g=gamma(dof/2));
incgam(dof/2,chi2/2,g)/g
Line 1,019:
test([199809, 200665, 199607, 200270, 199649])
test([522573, 244456, 139979, 71531, 21461])</
=={{header|Perl}}==
{{trans|Raku}}
<
use constant pi => 3.14159265;
Line 1,065:
for $dataset ([199809, 200665, 199607, 200270, 199649], [522573, 244456, 139979, 71531, 21461]) {
printf "C2 = %10.3f, p-value = %.3f, uniform = %s\n", chi_squared_test(@$dataset);
}</
{{out}}
<pre>C2 = 4.146, p-value = 0.387, uniform = True
Line 1,072:
=={{header|Phix}}==
{{trans|Go}}
<!--<
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">f</span><span style="color: #0000FF;">(</span><span style="color: #004080;">atom</span> <span style="color: #000000;">aa1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">)</span>
Line 1,152:
<span style="color: #000000;">utest</span><span style="color: #0000FF;">({</span><span style="color: #000000;">199809</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">200665</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">199607</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">200270</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">199649</span><span style="color: #0000FF;">})</span>
<span style="color: #000000;">utest</span><span style="color: #0000FF;">({</span><span style="color: #000000;">522573</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">244456</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">139979</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">71531</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">21461</span><span style="color: #0000FF;">})</span>
<!--</
{{out}}
<pre>
Line 1,180:
Implements the Chi Square Probability function with an integration. I'm
sure there are better ways to do this. Compare to OCaml implementation.
<
import random
Line 1,246:
prob = chi2Probability( dof, distance)
print "probability: %.4f"%prob,
print "uniform? ", "Yes"if chi2IsUniform(ds,0.05) else "No"</
Output:
<pre>Data set: [199809, 200665, 199607, 200270, 199649]
Line 1,256:
This uses the library routine [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html scipy.stats.chisquare].
<
Line 1,266:
dist, pvalue = chisquare(ds)
uni = 'YES' if pvalue > 0.05 else 'NO'
print(f"{dist:12.3f} {pvalue:12.8f} {uni:^8} {ds}")</
{{out}}
Line 1,275:
=={{header|R}}==
R being a statistical computating language, the chi-squared test is built in with the function "chisq.test"
<
dset1=c(199809,200665,199607,200270,199649)
dset2=c(522573,244456,139979,71531,21461)
Line 1,288:
print(paste("uniform?",chi2IsUniform(ds)))
}
</syntaxhighlight>
Output:
Line 1,311:
=={{header|Racket}}==
<
#lang racket
(require
Line 1,352:
; Test whether the constant generator fails:
(is-uniform? (λ(_) 5) 1000 0.05)
</syntaxhighlight>
Output:
<
#t
#f
</syntaxhighlight>
=={{header|Raku}}==
Line 1,366:
in closed form, as we only need its value at integers and half integers.
<syntaxhighlight lang="raku"
my \numers = $z X** 1..*;
my \denoms = [\*] $s X+ 1..*;
Line 1,406:
say 'data: ', $dataset;
say "χ² = {%t<chi-squared>}, p-value = {%t<p-value>.fmt('%.4f')}, uniform = {%t<uniform>}";
}</
{{out}}
<pre>data: 199809 200665 199607 200270 199649
Line 1,425:
either an integer, or a number which is a multiple of <big>'''<sup>1</sup>/<sub>2</sub>'''</big>, both of these cases can be calculated with
<br>a straight─forward calculation.
<
numeric digits length( pi() ) - length(.) /*enough decimal digs for calculations.*/
@.=; @.1= 199809 200665 199607 200270 199649
Line 1,495:
say pad "significant at " sigPC'% level? ' word('no yes', sig + 1)
say pad " is the dataset uniform? " word('no yes', (\(sig))+ 1)
return</
{{out|output|text= when using the default inputs:}}
<pre>
Line 1,517:
=={{header|Ruby}}==
{{trans|Python}}
<
a1, a2 = a-1, a-2
f0 = lambda {|t| t**a1 * Math.exp(-t)}
Line 1,577:
puts " probability: %.4f" % chi2Probability(dof, distance)
puts " uniform? %s" % (chi2IsUniform(ds) ? "Yes" : "No")
end</
{{out}}
Line 1,594:
=={{header|Rust}}==
<
use statrs::function::gamma::gamma_li;
Line 1,632:
}
</syntaxhighlight>
{{out}}
<pre>
Line 1,646:
{{libheader|Scastie qualified}}
{{works with|Scala|2.13}}
<
object ChiSquare extends App {
Line 1,675:
dof, dist, χ2Prob(dof.toDouble, dist), if (χ2IsUniform(ds, 0.05)) "YES" else "NO", ds.mkString(", "))
}
}</
=={{header|Sidef}}==
<
func F1(a, b, z, limit=100) {
sum(0..limit, {|k|
Line 1,719:
say "data: #{dataset}"
say "χ² = #{r[0]}, p-value = #{r[1].round(-4)}, uniform = #{r[2]}\n"
}</
{{out}}
<pre>
Line 1,732:
{{works with|Tcl|8.5}}
{{tcllib|math::statistics}}
<
package require math::statistics
Line 1,746:
[expr {$degreesOfFreedom / 2.0}] [expr {$X2 / 2.0}]]
expr {$likelihoodOfRandom > $significance}
}</
Testing:
<
for {set i 0} {$i<$count} {incr i} {incr distribution([uplevel 1 $operation])}
return [array get distribution]
Line 1,756:
puts "distribution \"$distFair\" assessed as [expr [isUniform $distFair]?{fair}:{unfair}]"
set distUnfair [makeDistribution {expr int(rand()*rand()*5)}]
puts "distribution \"$distUnfair\" assessed as [expr [isUniform $distUnfair]?{fair}:{unfair}]"</
Output:
<pre>distribution "0 199809 4 199649 1 200665 2 199607 3 200270" assessed as fair
Line 1,763:
=={{header|VBA}}==
The built in worksheetfunction ChiSq_Dist of Excel VBA is used. Output formatted like R.
<
'Returns true if the observed frequencies pass the Pearson Chi-squared test at the required significance level.
Dim Total As Long, Ei As Long, i As Integer
Line 1,792:
O = [{522573,244456,139979,71531,21461}]
Debug.Print "[1] ""Uniform? "; Test4DiscreteUniformDistribution(O, 0.05); """"
End Sub</
{{out}<pre>[1] "Data set:" 199809 200665 199607 200270 199649
Chi-squared test for given frequencies
Line 1,804:
=={{header|Vlang}}==
{{trans|Go}}
<
type Ifctn = fn(f64) f64
Line 1,888:
println(" significant at ${sig_level*100:2.0f}% level? $sig")
println(" uniform? ${!sig}\n")
}</
{{out}}
<pre>
Line 1,916:
{{libheader|Wren-math}}
{{libheader|Wren-fmt}}
<
import "/fmt" for Fmt
Line 1,966:
var uniform = chiIsUniform.call(ds, 0.05) ? "Yes" : "No"
System.print(" Uniform? %(uniform)\n")
}</
{{out}}
Line 1,980:
{{trans|C}}
{{trans|D}}
<
fcn Simpson3_8(f,a,b,N){ // fcn,double,double,Int --> double
h,h1:=(b - a)/N, h/3.0;
Line 2,016:
if(y>x) y=x;
1.0 - Simpson3_8(f,0.0,y,(y/h).toInt())/Gamma_Spouge(a);
}</
<
dslen :=ds.len();
expected:=dslen.reduce('wrap(sum,k){ sum + ds[k] },0.0)/dslen;
Line 2,028:
fcn chiIsUniform(dset,significance=0.05){
significance < chi2Probability(-1.0 + dset.len(),chi2UniformDistance(dset))
}</
<
T(522573.0, 244456.0, 139979.0, 71531.0, 21461.0) );
println(" %4s %12s %12s %8s %s".fmt(
Line 2,040:
dof, dist, prob, chiIsUniform(ds) and "YES" or "NO",
ds.concat(",")));
}</
{{out}}
<pre>
|