Welch's t-test: Difference between revisions

← Older edit

Welch's t-test (view source)

Revision as of 21:58, 21 April 2024

4,084 bytes added , 27 days ago

Added FreeBASIC

Jjuanhdez

2,122

edits

Revision as of 06:26, 18 May 2022 (view source) rosettacode>Zoziha m (→‎Using GSL: Rm blank lines) ← Older edit		Latest revision as of 21:58, 21 April 2024 (view source) Jjuanhdez (talk \| contribs) (Added FreeBASIC)
(4 intermediate revisions by 4 users not shown)
Line 69: {{trans\|Python}} <~~lang~~syntaxhighlight lang="11l">F betain(x, p, q) I p <= 0 \| q <= 0 \| x < 0 \| x > 1 X.throw ValueError(0) I x == 0 \| x == 1 Line 123: V n2 = a2.len I n1 <= 1 \| n2 <= 1 X.throw ValueError(0) V mean1 = sum(a1) / n1 Line 139: V a1 = [Float(3), 4, 1, 2.1] V a2 = [Float(490.2), 340, 433.9] print(welch_ttest(a1, a2))</~~lang~~syntaxhighlight> {{out}} Line 161: This shows how pvalue can be calculated from any two arrays, using Welch's 2-sided t-test, which doesn't assume equal variance. This is the equivalent of R's<code>t.test(vector1,vector2, alternative="two.sided", var.equal=FALSE)</code> and as such, it is compared against R's pvalues with the same vectors/arrays to show that the differences are very small (here 10^-14). <~~lang~~syntaxhighlight Clang="c">#include <stdio.h> #include <math.h> #include <stdlib.h> Line 428: return 0; } </syntaxhighlight> ~~</lang>~~ {{out}} Line 443: '''If''' your computer does not have <code>lgammal</code>, add the following function before <code>main</code> and replace <code>lgammal</code> with <code>lngammal</code> in the <code>calculate_Pvalue</code> function: <~~lang~~syntaxhighlight Clang="c">#include <stdio.h> #include <math.h> Line 463: } </syntaxhighlight> ~~</lang>~~ =={{header\|Fortran}}== Line 470: Alternatively, the program shows the p-value computed using the IMSL '''BETAI''' function. <~~lang~~syntaxhighlight lang="fortran">subroutine welch_ttest(n1, x1, n2, x2, t, df, p) use tdf_int implicit none Line 495: print , t, df, p print , betai(df / (t*2 + df), 0.5d0 df, 0.5d0) end program</~~lang~~syntaxhighlight> '''Output''' Line 505: With Absoft Pro Fortran, compile with <code>af90 -m64 pvalue.f90 %SLATEC_LINK%</code>. <~~lang~~syntaxhighlight lang="fortran">subroutine welch_ttest(n1, x1, n2, x2, t, df, p) implicit none integer :: n1, n2 Line 530: call welch_ttest(4, x, 3, y, t, df, p) print , t, df, p end program</~~lang~~syntaxhighlight> '''Output''' Line 542: Instead of implementing the t-distribution by ourselves, we bind to GNU Scientific Library: <~~lang~~syntaxhighlight lang="fortran">module t_test_m use, intrinsic :: iso_c_binding Line 596: print , t, df, p end program main</~~lang~~syntaxhighlight> '''Output''' <pre> -9.5594977219326580 2.0008523488562844 1.0751561149784494E-002</pre> =={{header\|FreeBASIC}}== ===Using Betain=== {{trans\|11l}} <syntaxhighlight lang="vbnet">#include "crt\math.bi" Function betain(x As Double, p As Double, q As Double) As Double If p <= 0 Or q <= 0 Or x < 0 Or x > 1 Then Print "ValueError" End End If If x = 0 Or x = 1 Then Return x Dim As Double acu = 1e-15 'Dim As Double lnbeta = LogGamma(p) + LogGamma(q) - LogGamma(p + q) Dim As Double lnbeta = lGamma(p) + lGamma(q) - lGamma(p + q) Dim As Double xx = x Dim As Double cx = 1 - x Dim As Double pp = p Dim As Double qq = q Dim As Integer indx = 0 Dim As Double psq = p + q If p < psq * x Then xx = 1 - x cx = x pp = q qq = p indx = 1 End If Dim As Double term = 1.0 Dim As Double ai = 1.0 Dim As Double value = 1.0 Dim As Integer ns = Int(qq + cx * psq) Dim As Double rx = xx / cx Dim As Double temp = qq - ai If ns = 0 Then rx = xx Do term = temp rx / (pp + ai) value += term temp = Abs(term) If temp <= acu And temp <= acu * value Then value = Exp(pp Log(xx) + (qq - 1) * Log(cx) - lnbeta) / pp Return Iif(indx, 1 - value, value) End If ai += 1 If ns > 0 Then ns -= 1 temp = qq - ai If ns = 0 Then rx = xx Else temp = psq psq += 1 End If End If Loop End Function Sub welch_ttest(a1() As Double, a2() As Double, Byref t As Double, Byref df As Double, Byref p As Double) Dim As Integer n1 = Ubound(a1) + 1 Dim As Integer n2 = Ubound(a2) + 1 If n1 <= 1 Or n2 <= 1 Then Print "ValueError" End End If Dim As Double mean1 = 0 For i As Integer = 0 To n1 - 1 mean1 += a1(i) Next i mean1 /= n1 Dim As Double mean2 = 0 For i As Integer = 0 To n2 - 1 mean2 += a2(i) Next i mean2 /= n2 Dim As Double var1 = 0 For i As Integer = 0 To n1 - 1 var1 += (a1(i) - mean1) ^ 2 Next i var1 /= (n1 - 1) Dim As Double var2 = 0 For i As Integer = 0 To n2 - 1 var2 += (a2(i) - mean2) ^ 2 Next i var2 /= (n2 - 1) t = (mean1 - mean2) / Sqr(var1 / n1 + var2 / n2) df = (var1 / n1 + var2 / n2) ^ 2 / (var1 ^ 2 / (n1 ^ 2 * (n1 - 1)) + var2 ^ 2 / (n2 ^ 2 * (n2 - 1))) p = betain(df / (t ^ 2 + df), df / 2, 1 / 2) End Sub Dim As Double a1(3) = {3, 4, 1, 2.1} Dim As Double a2(2) = {490.2, 340, 433.9} Dim As Double t, df, p welch_ttest(a1(), a2(), t, df, p) Print " t: "; t Print "df: "; df Print " p: "; p Sleep</syntaxhighlight> {{out}} <pre> t: -9.559497721932658 df: 2.000852348856284 p: 0.01075155600241868</pre> =={{header\|Go}}== <~~lang~~syntaxhighlight lang="go">package main import ( Line 694 ⟶ 808: func(r float64) float64 { return math.Pow(r, ν/2-1) / math.Sqrt(1-r) }) / math.Exp(g1+g2-g3) }</~~lang~~syntaxhighlight> {{out}} <pre> Line 708 ⟶ 822: Implementation: <~~lang~~syntaxhighlight Jlang="j">integrate=: adverb define 'a b steps'=. 3{.y,128 size=. (b - a)%steps Line 731 ⟶ 845: hi=. v%(t^2)+v (F f. simpson integrate lo,hi) % 0.5 B v%2 )</~~lang~~syntaxhighlight> <code>integrate</code> and <code>simpson</code> are from the [[Numerical_integration#J\|Numerical integration]] task. Line 748 ⟶ 862: Data for task examples: <~~lang~~syntaxhighlight Jlang="j">d1=: 27.5 21 19 23.6 17 17.9 16.9 20.1 21.9 22.6 23.1 19.6 19 21.7 21.4 d2=: 27.1 22 20.8 23.4 23.4 23.5 25.8 22 24.8 20.2 21.9 22.1 22.9 20.5 24.4 d3=: 17.2 20.9 22.6 18.1 21.7 21.4 23.5 24.2 14.7 21.8 Line 757 ⟶ 871: d8=: 29.89 29.93 29.72 29.98 30.02 29.98 d9=: 3 4 1 2.1 da=: 490.2 340 433.9</~~lang~~syntaxhighlight> Task examples: <~~lang~~syntaxhighlight Jlang="j"> d1 p2_tail d2 0.021378 d3 p2_tail d4 Line 769 ⟶ 883: 0.0907733 d9 p2_tail da 0.0107377</~~lang~~syntaxhighlight> =={{header\|Java}}== Using the '''[http://commons.apache.org/proper/commons-math/ Apache Commons Mathematics Library]'''. <~~lang~~syntaxhighlight lang="java">import org.apache.commons.math3.distribution.TDistribution; public class WelchTTest { Line 822 ⟶ 936: System.out.println("p = " + res[2]); } }</~~lang~~syntaxhighlight> '''Result''' Line 840 ⟶ 954: jq's `lgamma` returns the natural logarithm of the absolute value of the gamma function of x. <~~lang~~syntaxhighlight lang="jq">def mean: add / length; # Sample variance using division by (length-1) Line 900 ⟶ 1,014: pValue(d5; d6), pValue(d7; d8), pValue(x; y)</~~lang~~syntaxhighlight> {{out}} <pre> Line 914 ⟶ 1,028: {{works with\|Julia\|0.6}} <~~lang~~syntaxhighlight lang="julia">using HypothesisTests d1 = [27.5, 21.0, 19.0, 23.6, 17.0, 17.9, 16.9, 20.1, 21.9, 22.6, 23.1, 19.6, 19.0, 21.7, 21.4] Line 934 ⟶ 1,048: ttest = UnequalVarianceTTest(y1, y2) println("\nData:\n y1 = $y1\n y2 = $y2\nP-value for unequal variance TTest: ", round(pvalue(ttest), 4)) end</~~lang~~syntaxhighlight> {{out}} Line 966 ⟶ 1,080: =={{header\|Kotlin}}== This program brings in code from other tasks for gamma functions and integration by Simpson's rule as Kotlin doesn't have these built-in: <~~lang~~syntaxhighlight lang="scala">// version 1.1.4-3 typealias Func = (Double) -> Double Line 1,071 ⟶ 1,185: println(f.format(p2Tail(da7, da8))) println(f.format(p2Tail(x, y))) }</~~lang~~syntaxhighlight> {{out}} Line 1,084 ⟶ 1,198: =={{header\|Nim}}== {{trans\|Kotlin}} <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import math, stats, strutils, sugar func sqr(f: float): float = f * f Line 1,141 ⟶ 1,255: echo p2Tail(Da5, Da6).formatFloat(ffDecimal, 6) echo p2Tail(Da7, Da8).formatFloat(ffDecimal, 6) echo p2Tail(X, Y).formatFloat(ffDecimal, 6)</~~lang~~syntaxhighlight> {{out}} Line 1,152 ⟶ 1,266: =={{header\|Maple}}== <~~lang~~syntaxhighlight lang="maple">WelschTTest:=proc(x::list(numeric),y::list(numeric)) uses Statistics; local n1:=nops(x),n2:=nops(y), Line 1,167 ⟶ 1,281: y:=[490.2,340,433.9]: WelschTTest(x,y); # -9.55949772193266, 2.00085234885628, 0.0107515611497845</~~lang~~syntaxhighlight> =={{header\|Octave}}== {{trans\|Stata}} <~~lang~~syntaxhighlight lang="octave">x = [3.0,4.0,1.0,2.1]; y = [490.2,340.0,433.9]; n1 = length(x); Line 1,184 ⟶ 1,298: ans = -9.559498 2.000852 0.010752</~~lang~~syntaxhighlight> =={{header\|PARI/GP}}== <~~lang~~syntaxhighlight lang="parigp">B2(x,y)=exp(lngamma(x)+lngamma(y)-lngamma(x+y)) B3(x,a,b)=a--;b--;intnum(r=0,x,r^a(1-r)^b) Welch2(u,v)=my(m1=vecsum(u)/#u, m2=vecsum(v)/#v, v1=var(u,m1), v2=var(v,m2), s=v1/#u+v2/#v, t=(m1-m2)/sqrt(s), nu=s^2/(v1^2/#u^2/(#u-1)+v2^2/#v^2/(#v-1))); B3(nu/(t^2+nu),nu/2,1/2)/B2(nu/2,1/2); Welch2([3,4,1,2.1], [490.2,340,433.9])</~~lang~~syntaxhighlight> {{out}} <pre>%1 = 0.010751561149784496723954539777213062928</pre> Line 1,198 ⟶ 1,312: Uses Math::AnyNum for gamma and pi. It is possible to use some other modules (e.g. Math::Cephes) if Math::AnyNum has problematic dependencies. {{trans\|Sidef}} <~~lang~~syntaxhighlight lang="perl">use utf8; use List::Util qw(sum); use Math::AnyNum qw(gamma pi); sub p_value :prototype($$) { my ($A, $B) = @_; Line 1,258 ⟶ 1,372: my ($left, $right) = splice(@tests, 0, 2); print p_value($left, $right), "\n"; }</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,271 ⟶ 1,385: We use a slightly more accurate lgamma than the C code. Note that Perl can be compiled with different underlying floating point representations -- double, long double, or quad double. {{trans\|C}} <~~lang~~syntaxhighlight lang="perl">use strict; use warnings; use List::Util 'sum'; Line 1,398 ⟶ 1,512: printf("p-value = %.14g\n",$pvalue); } printf("cumulative error is %g\n", $error);</~~lang~~syntaxhighlight> {{out}} <pre>p-value = 0.021378001462867 Line 1,413 ⟶ 1,527: {{trans\|Go}} {{trans\|Kotlin}} <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">(phixonline)--> <span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> <span style="color: #008080;">function</span> <span style="color: #000000;">mean</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">a</span><span style="color: #0000FF;">)</span> Line 1,521 ⟶ 1,635: <span style="color: #0000FF;">?</span><span style="color: #000000;">pValue</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tests</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <!--</~~lang~~syntaxhighlight>--> {{out}} <pre> Line 1,533 ⟶ 1,647: The above was a bit off on the fifth test, so I also tried this.<br> using gamma() from [[Gamma_function#Phix]] (the one from above is probably also fine, but I didn't test that) <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">(phixonline)--> <span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span> <span style="color: #000080;font-style:italic;">--<copy of gamma from Gamma_function#Phix></span> Line 1,650 ⟶ 1,764: <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <span style="color: #0000FF;">?{</span><span style="color: #008000;">"cumulative error"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">cerr</span><span style="color: #0000FF;">}</span> <!--</~~lang~~syntaxhighlight>--> {{out}} <pre> Line 1,668 ⟶ 1,782: === Using NumPy & SciPy === <~~lang~~syntaxhighlight lang="python">import numpy as np import scipy as sp import scipy.stats Line 1,685 ⟶ 1,799: welch_ttest(np.array([3.0, 4.0, 1.0, 2.1]), np.array([490.2, 340.0, 433.9])) (-9.559497721932658, 2.0008523488562844, 0.01075156114978449)</~~lang~~syntaxhighlight> === Using betain from AS 63 === First, the implementation of betain (translated from the Stata program in the discussion page). The original Fortran code is under copyrighted by the Royal Statistical Society. The C translation is under GPL, written by John Burkardt. The exact statement of the RSS license is unclear. <~~lang~~syntaxhighlight lang="python">import math def betain(x, p, q): Line 1,739 ⟶ 1,853: else: temp = psq psq += 1</~~lang~~syntaxhighlight> The Python code is then straightforward: <~~lang~~syntaxhighlight lang="python">import math def welch_ttest(a1, a2): Line 1,761 ⟶ 1,875: p = betain(df / (t2 + df), df / 2, 1 / 2) return t, df, p</~~lang~~syntaxhighlight> '''Example''' <~~lang~~syntaxhighlight lang="python">a1 = [3, 4, 1, 2.1] a2 = [490.2, 340, 433.9] print(welch_ttest(a1, a2))</~~lang~~syntaxhighlight> '''Output''' Line 1,773 ⟶ 1,887: =={{header\|R}}== <~~lang~~syntaxhighlight Rlang="r">#!/usr/bin/R printf <- function(...) cat(sprintf(...)) Line 1,810 ⟶ 1,924: results <- t.test(z1,z2, alternative="two.sided", var.equal=FALSE) printf("%.15g\n", results$p.value); </syntaxhighlight> ~~</lang>~~ The output here is used to compare against C's output above. Line 1,827 ⟶ 1,941: {{trans\|C}} <~~lang~~syntaxhighlight lang="racket">#lang racket (require math/statistics math/special-functions) Line 1,876 ⟶ 1,990: (p-value (list 3.0 4.0 1.0 2.1) (list 490.2 340.0 433.9))))</~~lang~~syntaxhighlight> {{out}} Line 1,890 ⟶ 2,004: Perhaps "inspired by C example" may be more accurate. Gamma subroutine from [[Gamma_function#Raku\|Gamma function task]]. <syntaxhighlight lang="raku" ~~perl6~~line>sub Γ(\z) { constant g = 9; z < .5 ?? π / sin(π × z) / Γ(1 - z) !! Line 1,955 ⟶ 2,069: [<3.0 4.0 1.0 2.1>], [<490.2 340.0 433.9>] ) -> @left, @right { say p-value @left, @right }</~~lang~~syntaxhighlight> {{out}} <pre>0.0213780014628669 Line 1,971 ⟶ 2,085: This uses the Soper reduction formula to evaluate the integral, which converges much more quickly than Simpson's formula. <syntaxhighlight lang="raku" ~~perl6~~line>sub lgamma ( Num(Real) \n --> Num ){ use NativeCall; sub lgamma (num64 --> num64) is native {} Line 2,075 ⟶ 2,189: $error += abs($p-value - shift @answers); } printf("cumulative error is %g\n", $error);</~~lang~~syntaxhighlight> {{out}} <pre>p-value = 0.021378001462867 Line 2,089 ⟶ 2,203: =={{header\|Ruby}}== {{trans\|Perl}} <~~lang~~syntaxhighlight ~~Ruby~~lang="ruby">def calculate_p_value(array1, array2) return 1.0 if array1.size <= 1 return 1.0 if array2.size <= 1 Line 2,214 ⟶ 2,328: printf("the cumulative error is %g\n", error) </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 2,230 ⟶ 2,344: =={{header\|SAS}}== {{trans\|Stata}} <syntaxhighlight lang="text">data tbl; input value group @@; cards; Line 2,240 ⟶ 2,354: class group; var value; run;</~~lang~~syntaxhighlight> '''Output''' Line 2,401 ⟶ 2,515: Implementation in IML: <~~lang~~syntaxhighlight lang="sas">proc iml; use tbl; read all var {value} into x where(group=1); Line 2,414 ⟶ 2,528: p = 2probt(-abs(t), df); print t df p; quit;</~~lang~~syntaxhighlight> '''Output''' Line 2,421 ⟶ 2,535: =={{header\|Scala}}== <~~lang~~syntaxhighlight ~~Scala~~lang="scala">import org.apache.commons.math3.distribution.TDistribution object WelchTTest extends App { Line 2,454 ⟶ 2,568: println(s"\nSuccessfully completed without errors. [total ${scala.compat.Platform.currentTime - executionStart} ms]") }</~~lang~~syntaxhighlight> =={{header\|Scilab}}== Line 2,461 ⟶ 2,575: Scilab will print a warning because the number of degrees of freedom is not an integer. However, the underlying implementation makes use of the [http://www.netlib.org/random/ dcdflib] Fortran library, which happily accepts a noninteger df. <syntaxhighlight lang="text">x = [3.0,4.0,1.0,2.1]; y = [490.2,340.0,433.9]; n1 = length(x); Line 2,470 ⟶ 2,584: df = (v1/n1+v2/n2)^2/(v1^2/(n1^2(n1-1))+v2^2/(n2^2(n2-1))); [p, q] = cdft("PQ", -abs(t), df); [t df 2p]</~~lang~~syntaxhighlight> '''Output''' Line 2,480 ⟶ 2,594: =={{header\|Sidef}}== {{trans\|Raku}} <~~lang~~syntaxhighlight lang="ruby">func p_value (A, B) { [A.len, B.len].all { _ > 1 } \|\| return 1 Line 2,532 ⟶ 2,646: tests.each_slice(2, {\|left, right\| say p_value(left, right) })</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,547 ⟶ 2,661: Notice that here we use the option '''unequal''' of the '''ttest''' command, and not '''welch''', so that Stata uses the Welch-Satterthwaite approximation. <~~lang~~syntaxhighlight lang="stata">mat a=(3,4,1,2.1,490.2,340,433.9\1,1,1,1,2,2,2)' clear svmat double a Line 2,574 ⟶ 2,688: di r(p) .01075156</~~lang~~syntaxhighlight> The computation can easily be implemented in Mata. Here is how to compute the t statistic (t), the approximate degrees of freedom (df) and the p-value (p). <~~lang~~syntaxhighlight lang="stata">st_view(a=., ., .) x = select(a[., 1], a[., 2] :== 1) y = select(a[., 1], a[., 2] :== 2) Line 2,592 ⟶ 2,706: +----------------------------------------------+ 1 \| -9.559497722 2.000852349 .0107515611 \| +----------------------------------------------+</~~lang~~syntaxhighlight> =={{header\|Tcl}}== Line 2,602 ⟶ 2,716: This is not particularly idiomatic Tcl, but perhaps illustrates some of the language's relationship with the Lisp family. <~~lang~~syntaxhighlight ~~Tcl~~lang="tcl">#!/usr/bin/tclsh package require math::statistics Line 2,677 ⟶ 2,791: puts [pValue $left $right] } </syntaxhighlight> ~~</lang>~~ {{out}} Line 2,691 ⟶ 2,805: {{libheader\|Wren-math}} {{libheader\|Wren-fmt}} <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">import "./math" for Math, Nums import "./fmt" for Fmt var welch = Fn.new { \|a, b\| Line 2,729 ⟶ 2,843: var g3 = Math.gamma(nu/2 + 0.5).log var f = Fn.new { \|r\| r.pow(nu/2-1) / (1 - r).sqrt } return simpson0.call(2000, nu/(tt + nu), f) / ~~Math.exp~~(g1 + g2 - g3).exp } Line 2,748 ⟶ 2,862: Fmt.print("$0.6f", pValue.call(d5, d6)) Fmt.print("$0.6f", pValue.call(d7, d8)) Fmt.print("$0.6f", pValue.call(x, y))</~~lang~~syntaxhighlight> {{out}} Line 2,761 ⟶ 2,875: =={{header\|zkl}}== {{trans\|C}} <~~lang~~syntaxhighlight lang="zkl">fcn calculate_Pvalue(array1,array2){ if (array1.len()<=1 or array2.len()<=1) return(1.0); Line 2,811 ⟶ 2,925: foreach x in (cof){ ser+=(x/(y+=1)); } return((2.5066282746310005 * ser / x).log() - tmp); }</~~lang~~syntaxhighlight> <~~lang~~syntaxhighlight lang="zkl">testSets:=T( T(T(27.5,21.0,19.0,23.6,17.0,17.9,16.9,20.1,21.9,22.6,23.1,19.6,19.0,21.7,21.4), T(27.1,22.0,20.8,23.4,23.4,23.5,25.8,22.0,24.8,20.2,21.9,22.1,22.9,20.5,24.4)), Line 2,824 ⟶ 2,938: foreach x,y in (testSets) { println("Test set 1 p-value = %f".fmt(calculate_Pvalue(x,y))); }</~~lang~~syntaxhighlight> {{out}} <pre>