Selection bias in clinical sciences
In epidemiology, retrospective analyses have well-known limitations compared to prospective studies.
One such limitation is the occurrence of selection bias in the choice of subjects between treated and untreated groups about whom the data is collected. For example, a treatment may have only been given to persons who were less severely ill, which would bias the results in favor of such subjects appearing to have done better because of the treatment when the biased group is then compared to those who who did not receive the study treatment. Or, in a retrospective study, there may a choice to place subjects in a particular study group using a method which is inadvertently biased by the outcome being measured. Creating a programming example of a simulation of such selection bias in the design of a retrospective study is the topic of this task.
The genuine, historical example (only partially approximated in this task) is of a study done of persons who, over a course of 180 days, may or may not have become infected with Covid-19. Prior to becoming ill, these subjects may or may not have taken an available medication, which was actually taken on a particular schedule not used here, but is approximated by stating the medication was taken in doses of 3, 6, or 9 mg daily. The historical study then divided its subjects into three groups based on their cumulative dosage of the study medication:
- Group UNTREATED were those who did not take the study medication at all before they got Covid-19, including those who exited the study period without Covid-19 and having never taken the study medication.
- Group IRREGULAR is those who took the study medication but whose cumulative dose was less than a certain amount (approximated for our purposes as 100 mg) before they either came down with Covid-19 during the study or the study period ended.
- Group REGULAR is those who took (our approximation is >= 100 mg) of the study medication either before they came down with Covid-19 or took >= 100 mg by the end of the study and never became infected during the study.
- Assumptions for the study simulation programming task
- Daily risk of getting Covid-19 infection for each subject was 0.1% per day, or 18% over the 180 cumulative days of the study.
- The probability of starting treatment medication for anyone not already taking it was 0.5% per day. For those who started medication, the chance of continuing the treatment was increased 50-fold to 25% each day, since most who started the medication continued to take it to some extent.
- Study dose per day is random between the approximation for the simulation of 3, 6 and 9 mg. The daily cumulative dosage is used to determine the group the subject is in, unless a subject develops Covid-19. If a subject was diagnosed with Covid-19, their group at the time of that diagnosis is used in the statistical analysis of that group.
- Task
- Create a simulation of the subjects, keeping track of their medication dosages, group membership, and Covid-19 status during the study.
- Use at least 1000 subjects in the simulation over the 180 days (historically, the study size was 80,000).
- Statistics used are to be the Kruskal statistic for the analysis of multiple groups, with the boolean study outcome variable whether the subject got Covid-19 during the study period, analyzed versus category. If the programming language does not have a statistical package for such a test, calculation of the Kruskal H statistic may be used, since any values of H over about 50 are considered very highly indicative of significant group difference. A description of that test is found at Kruskal-Wallis test.
- You should get a statistical result highly favoring the REGULAR group.
- Stretch task
- Show monthly outcomes.
A note regarding outcome: Note that by simulation design all subjects must have an IDENTICAL risk, that is 0.1 per cent or p = 0.001 per day, of developing Covid-19. Because of the design, any statistical differences between the groups CANNOT come from an influence of the treatment on that risk, but must come from some other feature of the study design.
- See also
Julia
using HypothesisTests
@enum TreatmentClass Untreated Irregular Regular
mutable struct Subject
cum_dose::Float64
treatment_class::TreatmentClass
had_covid::Bool
end
function update!(subjects::Vector{Subject}, pcovid = 0.001, pstart = 0.005, pdosing = 0.25, drange = 3:3:9)
for subj in subjects
if subj.had_covid
continue
elseif rand() < pcovid
subj.had_covid = true
elseif subj.cum_dose > 0 && rand() <= pdosing || subj.cum_dose == 0 && rand() <= pstart
subj.cum_dose += rand(drange)
subj.treatment_class =
subj.cum_dose == 0 ? Untreated : subj.cum_dose >= 100 ? Regular : Irregular
end
end
end
function run_study(N = 10_000, duration = 180)
population = [Subject(0.0, Untreated, false) for _ in 1:N]
unt, unt_covid, irr, irr_covid, reg, reg_covid = 0, 0, 0, 0, 0, 0
println("Population size $N, daily infection risk 0.1%")
for day in 1:duration
update!(population)
if day % 30 == 0
println("\nDay $day:")
unt = count(s -> s.treatment_class == Untreated, population)
unt_covid = count(s -> (s.treatment_class == Untreated) && s.had_covid, population)
println("Untreated: N = $unt, with infection = $unt_covid")
irr = count(s -> s.treatment_class == Irregular, population)
irr_covid = count(s -> (s.treatment_class == Irregular) && s.had_covid, population)
println("Irregular Use: N = $irr, with infection = $irr_covid")
reg = count(s -> s.treatment_class == Regular, population)
reg_covid = count(s -> (s.treatment_class == Regular) && s.had_covid, population)
println("Regular Use: N = $reg, with infection = $reg_covid")
end
if day == 90
println("\nAt midpoint, Infection case percentages are:")
println(" Untreated : ", Float16(100 * unt_covid / unt))
println(" Irregulars: ", Float16(100 * irr_covid / irr))
println(" Regulars : ", Float16(100 * reg_covid / reg))
end
end
println("\nAt study end, Infection case percentages are:")
println(" Untreated : ", Float16(100 * unt_covid / unt), " of group size of $unt")
println(" Irregulars: ", Float16(100 * irr_covid / irr), " of group size of $irr")
println(" Regulars : ", Float16(100 * reg_covid / reg), " of group size of $reg")
untreated = [s.had_covid for s in population if s.treatment_class == Untreated]
irregular = [s.had_covid for s in population if s.treatment_class == Irregular]
regular = [s.had_covid for s in population if s.treatment_class == Regular]
println("\n\n Final statistics:\n")
@show KruskalWallisTest(untreated, irregular, regular)
end
run_study()
- Output:
Population size 10000, daily infection risk 0.1% Day 30: Untreated: N = 8633, with infection = 288 Irregular Use: N = 1367, with infection = 21 Regular Use: N = 0, with infection = 0 Day 60: Untreated: N = 7513, with infection = 519 Irregular Use: N = 2325, with infection = 79 Regular Use: N = 162, with infection = 2 Day 90: Untreated: N = 6559, with infection = 692 Irregular Use: N = 2362, with infection = 159 Regular Use: N = 1079, with infection = 24 At midpoint, Infection case percentages are: Untreated : 10.55 Irregulars: 6.73 Regulars : 2.225 Day 120: Untreated: N = 5794, with infection = 845 Irregular Use: N = 2071, with infection = 221 Regular Use: N = 2135, with infection = 72 Day 150: Untreated: N = 5115, with infection = 987 Irregular Use: N = 1835, with infection = 266 Regular Use: N = 3050, with infection = 156 Day 180: Untreated: N = 4538, with infection = 1106 Irregular Use: N = 1654, with infection = 302 Regular Use: N = 3808, with infection = 263 At study end, Infection case percentages are: Untreated : 24.38 of group size of 4538 Irregulars: 18.27 of group size of 1654 Regulars : 6.906 of group size of 3808 Final statistics: KruskalWallisTest(untreated, irregular, regular) = Kruskal-Wallis rank sum test (chi-square approximation) ------------------------------------------------------- Population details: parameter of interest: Location parameters value under h_0: "all equal" point estimate: NaN Test summary: outcome with 95% confidence: reject h_0 one-sided p-value: <1e-99 Details: number of observation in each group: [4538, 1654, 3808] χ²-statistic: 457.179 rank sums: [2.44308e7, 8.39891e6, 1.71753e7] degrees of freedom: 2 adjustment for ties: 0.417533 Kruskal-Wallis rank sum test (chi-square approximation) ------------------------------------------------------- Population details: parameter of interest: Location parameters value under h_0: "all equal" point estimate: NaN Test summary: outcome with 95% confidence: reject h_0 one-sided p-value: <1e-99 Details: number of observation in each group: [4538, 1654, 3808] χ²-statistic: 457.179 rank sums: [2.44308e7, 8.39891e6, 1.71753e7] degrees of freedom: 2 adjustment for ties: 0.417533
Nim
import std/[algorithm, math, random, sequtils, strformat, sugar]
type
TreatmentClass {.pure.} = enum Untreated, Irregular, Regular
Subject = object
cumDose: float
category: TreatmentClass
hadCovid: bool
const DoseForRegular = 100
func categorize(subject: var Subject) =
## Set treatment category based on cumulative treatment taken.
subject.category = if subject.cumDose == 0: Untreated
elif subject.cumDose >= DoseForRegular: Regular
else: Irregular
proc update(subject: var Subject; pCovid = 0.001;
pStart = 0.005; pDosing = 0.25; doses = @[3, 6, 9]) =
## Daily update on the subject to check for infection and randomly dose.
if not subject.hadCovid:
if rand(1.0) < pCovid:
subject.hadCovid = true
elif (subject.cumDose == 0 and rand(1.0) < pStart) or
(subject.cumDose > 0 and rand(1.0) < pDosing):
subject.cumDose += sample(doses).toFloat
subject.categorize()
func kruskalWallis(a, b, c: seq[bool]): float =
# Aggregate and sort.
let s = sorted(a & b & c)
# Find rank of first occurrence of "true".
let ix = s.find(true) + 1
# Calculate average ranks for "false" and "true".
let n = s.len
let arf = ix / 2
let art = (ix + n) / 2
# Calculate sum of ranks for each list.
let sra = sum(a.mapIt(if it: art else: arf))
let srb = sum(b.mapIt(if it: art else: arf))
let src = sum(c.mapIt(if it: art else: arf))
# Calculate H.
result = 12 / (n * (n + 1)) * (sra * sra / a.len.toFloat + srb * srb / b.len.toFloat +
src * src / c.len.toFloat) - 3 * float(n + 1)
proc runStudy(numSubjects = 1000; duration = 180; interval= 30) =
## Run the study using the population of size "numSubjects" for "duration" days.
var population = newSeqWith(numSubjects, Subject())
var unt, untCovid, irr, irrCovid, reg, regCovid = 0
echo &"Total subjects: {num_subjects}"
for day in 1..duration:
for subj in population.mitems:
subj.update()
if day mod interval == 0:
echo &"\nDay {day}:"
unt = population.countIt(it.category == Untreated)
untCovid = population.countIt(it.category == Untreated and it.hadCovid)
echo &"Untreated: N = {unt}, with infection = {untCovid}"
irr = population.countIt(it.category == IRREGULAR)
irrCovid = population.countIt(it.category == IRREGULAR and it.hadCovid)
echo &"Irregular Use: N = {irr}, with infection = {irrCovid}"
reg = population.countIt(it.category == REGULAR)
regCovid = population.countIt(it.category == REGULAR and it.hadCovid)
echo &"Regular Use: N = {reg}, with infection = {reg_covid}"
if day == duration div 2:
echo "\nAt midpoint, infection case percentages are:"
echo " Untreated : ", 100 * untCovid / unt
echo " Irregulars: ", 100 * irrCovid / irr
echo " Regulars : ", 100 * regCovid / reg
echo "\nAt study end, infection case percentages are:"
echo &" Untreated : {100 * untCovid / unt} of group size of {unt}"
echo &" Irregulars: {100 * irrCovid / irr} of group size of {irr}"
echo &" Regulars : {100 * regCovid / reg} of group size of {reg}"
let untreated = collect:
for s in population:
if s.category == Untreated:
s.hadCovid
let irregular = collect:
for s in population:
if s.category == Irregular:
s.hadCovid
let regular = collect:
for s in population:
if s.category == Regular:
s.hadCovid
echo "\nFinal statistics: ", kruskalWallis(untreated, irregular, regular)
randomize()
runStudy(10_000)
- Output:
Total subjects: 10000 Day 30: Untreated: N = 8619, with infection = 283 Irregular Use: N = 1379, with infection = 27 Regular Use: N = 2, with infection = 0 Day 60: Untreated: N = 7480, with infection = 498 Irregular Use: N = 2362, with infection = 77 Regular Use: N = 158, with infection = 3 Day 90: Untreated: N = 6479, with infection = 712 Irregular Use: N = 2440, with infection = 151 Regular Use: N = 1081, with infection = 24 At midpoint, infection case percentages are: Untreated : 10.98935020836549 Irregulars: 6.188524590163935 Regulars : 2.220166512488436 Day 120: Untreated: N = 5655, with infection = 873 Irregular Use: N = 2199, with infection = 210 Regular Use: N = 2146, with infection = 70 Day 150: Untreated: N = 4998, with infection = 1007 Irregular Use: N = 1910, with infection = 257 Regular Use: N = 3092, with infection = 135 Day 180: Untreated: N = 4479, with infection = 1122 Irregular Use: N = 1575, with infection = 302 Regular Use: N = 3946, with infection = 232 At study end, infection case percentages are: Untreated : 25.05023442732753 of group size of 4479 Irregulars: 19.17460317460317 of group size of 1575 Regulars : 5.879371515458693 of group size of 3946 Final statistics: 235.1089104422972
Perl
use v5.036;
use List::AllUtils <sum0 indexes firstidx>;
use enum qw<False True UNTREATED REGULAR IRREGULAR>;
use constant DOSE_FOR_REGULAR => 100;
my ($nSubjects,$duration,$interval) = (10000, 180, 30);
my (@dosage) = (0) x $nSubjects;
my (@category) = (UNTREATED) x $nSubjects;
my (@hadcovid) = (False) x $nSubjects;
sub update {
my $pCovid = 1e-3;
my $pStartTreatment = 5e-3;
my $pRedose = 1/4;
my @dRange = <3 6 9>;
for my $i (0 .. $nSubjects-1) {
unless ($hadcovid[$i]) {
if (rand() < $pCovid) {
$hadcovid[$i] = True
} else {
my $dose = $dosage[$i];
if ($dose==0 && rand() < $pStartTreatment or $dose > 0 && rand() < $pRedose) {
$dosage[$i] = $dose += $dRange[3*rand()];
$category[$i] = ($dose > DOSE_FOR_REGULAR) ? REGULAR : IRREGULAR
}
}
}
}
}
sub kruskal (@sets) {
my @ranked = sort @{$sets[0]}, @{$sets[1]}, @{$sets[2]};
my $n = @ranked;
my @sr = (0) x @sets;
my $ix = firstidx { $_ == 1 } @ranked;
my ($arf,$art) = ($ix/2, ($ix+$n)/2);
for my $i (0..2) {
$sr[$i] += $_ ? $art : $arf for @{$sets[$i]}
}
my $H = sum0 map { my $s = $sr[$_]; $s**2 / @{$sets[$_]} } 0..$#sr;
12/($n*($n+1)) * $H - 3 * ($n + 1)
}
my($unt,$irr,$reg,$hunt,$hirr,$hreg,@sunt,@sirr,@sreg);
my $midpoint = int $duration / 2;
say "Total subjects: $nSubjects\n";
for my $day (1 .. $duration) {
update();
if (0 == $day % $interval or $day == $duration or $day == $midpoint) {
@sunt = @hadcovid[ indexes { $_ == UNTREATED } @category];
@sirr = @hadcovid[ indexes { $_ == IRREGULAR } @category];
@sreg = @hadcovid[ indexes { $_ == REGULAR } @category];
( $unt, $irr, $reg) = (scalar(@sunt), scalar(@sirr), scalar(@sreg));
($hunt,$hirr,$hreg) = ( sum0(@sunt), sum0(@sirr), sum0(@sreg));
}
if (0 == $day % $interval) {
printf "Day %d:\n", $day;
printf "Untreated: N = %4d, with infection = %4d\n", $unt,$hunt;
printf "Irregular Use: N = %4d, with infection = %4d\n", $irr,$hirr;
printf "Regular Use: N = %4d, with infection = %4d\n\n",$reg,$hreg;
}
if ($day == $midpoint or $day == $duration) {
my $stage = $day == $midpoint ? 'midpoint' : 'study end';
printf "At $stage, Infection case percentages are:\n";
printf " Untreated : %6.2f\n", 100*$hunt/$unt;
printf " Irregulars: %6.2f\n", 100*$hirr/$irr;
printf " Regulars : %6.2f\n\n", 100*$hreg/$reg;
}
}
printf "Final statistics: H = %.2f\n", kruskal ( \@sunt, \@sirr, \@sreg );
- Output:
Total subjects: 10000 Day 30: Untreated: N = 8640, with infection = 283 Irregular Use: N = 1360, with infection = 18 Regular Use: N = 0, with infection = 0 Day 60: Untreated: N = 7499, with infection = 532 Irregular Use: N = 2338, with infection = 75 Regular Use: N = 163, with infection = 1 Day 90: Untreated: N = 6537, with infection = 709 Irregular Use: N = 2382, with infection = 144 Regular Use: N = 1081, with infection = 12 At midpoint, Infection case percentages are: Untreated : 10.85 Irregulars: 6.05 Regulars : 1.11 Day 120: Untreated: N = 5739, with infection = 855 Irregular Use: N = 2102, with infection = 216 Regular Use: N = 2159, with infection = 52 Day 150: Untreated: N = 5091, with infection = 1012 Irregular Use: N = 1792, with infection = 270 Regular Use: N = 3117, with infection = 134 Day 180: Untreated: N = 4468, with infection = 1126 Irregular Use: N = 1669, with infection = 315 Regular Use: N = 3863, with infection = 243 At study end, Infection case percentages are: Untreated : 25.20 Irregulars: 18.87 Regulars : 6.29 Final statistics: H = 218.74
Phix
with javascript_semantics sequence dosage, category, hadcovid enum UNTREATED, REGULAR, IRREGULAR constant DOSE_FOR_REGULAR = 100 procedure update(atom pCovid=0.001, pStartTreatment=0.005, pRedose=0.25, sequence dRange={3,6,9}) for i=1 to length(dosage) do if not hadcovid[i] then if rnd()<pCovid then hadcovid[i] = true else atom dose = dosage[i] if (dose=0 and rnd()<pStartTreatment) or (dose>0 and rnd()<pRedose) then dose += dRange[rand(length(dRange))] dosage[i] = dose category[i] = iff(dose>DOSE_FOR_REGULAR?REGULAR:IRREGULAR) end if end if end if end for end procedure function kruskal(sequence sets) sequence ranked = sort(flatten(sets)), sr = repeat(0,length(sets)) integer ix = find(true,ranked), n = length(ranked) atom arf = ix/2, art = (ix+n)/2 for i,set in sets do for b in set do sr[i] += iff(b?art:arf) end for end for atom H = 0 for i,s in sr do H += s*s/length(sets[i]) end for H = 12/(n*(n+1)) * H - 3 * (n + 1) return H end function procedure run_study(integer nSubjects=10000, duration=180, interval=30) dosage = repeat(0,nSubjects) category = repeat(UNTREATED,nSubjects) hadcovid = repeat(false,nSubjects) printf(1,"Total subjects: %d\n\n",nSubjects) sequence sunt,sirr,sreg integer unt,irr,reg,hunt,hirr,hreg,midpoint=floor(duration/2) for day=1 to duration do update() if rmdr(day,interval)=0 or day=duration or day=midpoint then sunt = extract(hadcovid,find_all(UNTREATED,category)) sirr = extract(hadcovid,find_all(IRREGULAR,category)) sreg = extract(hadcovid,find_all(REGULAR,category)) unt = length(sunt); hunt = sum(sunt) irr = length(sirr); hirr = sum(sirr) reg = length(sreg); hreg = sum(sreg) end if if rmdr(day,interval)=0 then printf(1,"Day %d:\n",day) printf(1,"Untreated: N = %d, with infection = %d\n",{unt,hunt}) printf(1,"Irregular Use: N = %d, with infection = %d\n",{irr,hirr}) printf(1,"Regular Use: N = %d, with infection = %d\n\n",{reg,hreg}) end if if day=midpoint then printf(1,"At midpoint, Infection case percentages are:\n") printf(1," Untreated : %f\n",100*hunt/unt) printf(1," Irregulars: %f\n",100*hirr/irr) printf(1," Regulars : %f\n\n",100*hreg/reg) end if end for printf(1,"At study end, Infection case percentages are:\n") printf(1," Untreated : %f\n",100*hunt/unt) printf(1," Irregulars: %f\n",100*hirr/irr) printf(1," Regulars : %f\n\n",100*hreg/reg) printf(1,"Final statistics: H = %f\n",kruskal({sunt, sirr, sreg})) end procedure run_study()
Aside: the best explanation of the Kruskal-Wallis Test I found was http://www.statisticslectures.com/topics/kruskalwallis/ but I didn't implement anything that I found there.
- Output:
Total subjects: 10000 Day 30: Untreated: N = 8628, with infection = 285 Irregular Use: N = 1372, with infection = 23 Regular Use: N = 0, with infection = 0 Day 60: Untreated: N = 7460, with infection = 518 Irregular Use: N = 2366, with infection = 72 Regular Use: N = 174, with infection = 0 Day 90: Untreated: N = 6497, with infection = 730 Irregular Use: N = 2399, with infection = 151 Regular Use: N = 1104, with infection = 17 At midpoint, Infection case percentages are: Untreated : 11.235955 Irregulars: 6.294289 Regulars : 1.539855 Day 120: Untreated: N = 5709, with infection = 879 Irregular Use: N = 2102, with infection = 220 Regular Use: N = 2189, with infection = 66 Day 150: Untreated: N = 5058, with infection = 1005 Irregular Use: N = 1846, with infection = 273 Regular Use: N = 3096, with infection = 152 Day 180: Untreated: N = 4486, with infection = 1109 Irregular Use: N = 1624, with infection = 315 Regular Use: N = 3890, with infection = 253 At study end, Infection case percentages are: Untreated : 24.721355 Irregulars: 19.396552 Regulars : 6.503856 Final statistics: H = 211.421331
Python
''' Rosetta code rosettacode.org/wiki/Study_Bias_in_Clinical_Sciences '''
from random import randrange
from numpy.random import rand
from scipy.stats import kruskal
UNTREATED = 0
IRREGULAR = 1
REGULAR = 2
DOSE_FOR_REGULAR = 100
class Subject:
''' A subject for the study '''
def __init__(self):
self.cum_dose = 0.0
self.category = UNTREATED
self.had_covid = False
self.update_count = 0
def update(self, p_covid=0.001, p_starting_treatment=0.005, p_redose=0.25, drange=(3, 10, 3)):
''' daily update on the subject to check for infection and randomly dose. '''
if not self.had_covid:
if rand() < p_covid:
self.had_covid = True
elif (self.cum_dose == 0 and rand() < p_starting_treatment) or\
(self.cum_dose > 0 and rand() < p_redose):
self.cum_dose += randrange(*drange)
self.categorize()
self.update_count += 1
def categorize(self):
''' Set treatment category based on cumulative treatment taken. '''
self.category = UNTREATED if self.cum_dose == 0 else REGULAR if\
self.cum_dose >= DOSE_FOR_REGULAR else IRREGULAR
return self.category
def run_study(num_subjects=1000, duration=180, interval=30):
''' Run the study using the population of size `N` for `duration` days. '''
population = [Subject() for _ in range(num_subjects)]
unt, unt_covid, irr, irr_covid, reg, reg_covid = 0, 0, 0, 0, 0, 0
print(f'Total subjects: {num_subjects:,}')
for day in range(duration):
for subj in population:
subj.update()
if (day + 1) % interval == 0:
print(f'\nDay {day + 1}:')
unt = sum(s.category == UNTREATED for s in population)
unt_covid = sum(s.category ==
UNTREATED and s.had_covid for s in population)
print(f'Untreated: N = {unt}, with infection = {unt_covid}')
irr = sum(s.category == IRREGULAR for s in population)
irr_covid = sum(s.category ==
IRREGULAR and s.had_covid for s in population)
print(f'Irregular Use: N = {irr}, with infection = {irr_covid}')
reg = sum(s.category == REGULAR for s in population)
reg_covid = sum(s.category ==
REGULAR and s.had_covid for s in population)
print(f'Regular Use: N = {reg}, with infection = {reg_covid}')
if day == duration // 2 - 1:
print('\nAt midpoint, Infection case percentages are:')
print(' Untreated : ', 100 * unt_covid / unt)
print(' Irregulars: ', 100 * irr_covid / irr)
print(' Regulars : ', 100 * reg_covid / reg)
print('\nAt study end, Infection case percentages are:')
print(f' Untreated : {100 * unt_covid / unt} of group size of {unt}')
print(f' Irregulars: {100 * irr_covid / irr} of group size of {irr}')
print(f' Regulars : {100 * reg_covid / reg} of group size of {reg}')
untreated = [
s.had_covid for s in population if s.category == UNTREATED]
irregular = [
s.had_covid for s in population if s.category == IRREGULAR]
regular = [s.had_covid for s in population if s.category == REGULAR]
print('\nFinal statistics: ', kruskal(untreated, irregular, regular))
run_study()
- Output:
Total subjects: 1,000 Day 30: Untreated: N = 872, with infection = 25 Irregular Use: N = 128, with infection = 2 Regular Use: N = 0, with infection = 0 Day 60: Untreated: N = 755, with infection = 55 Irregular Use: N = 222, with infection = 8 Regular Use: N = 23, with infection = 1 Day 90: Untreated: N = 671, with infection = 70 Irregular Use: N = 219, with infection = 13 Regular Use: N = 110, with infection = 4 At midpoint, Infection case percentages are: Untreated : 10.432190760059612 Irregulars: 5.936073059360731 Regulars : 3.6363636363636362 Day 120: Untreated: N = 600, with infection = 88 Irregular Use: N = 189, with infection = 17 Regular Use: N = 211, with infection = 8 Day 150: Untreated: N = 514, with infection = 108 Irregular Use: N = 194, with infection = 21 Regular Use: N = 292, with infection = 16 Day 180: Untreated: N = 447, with infection = 119 Irregular Use: N = 189, with infection = 26 Regular Use: N = 364, with infection = 26 At study end, Infection case percentages are: Untreated : 26.62192393736018 of group size of 447 Irregulars: 13.756613756613756 of group size of 189 Regulars : 7.142857142857143 of group size of 364 Final statistics: KruskalResult(statistic=55.48204323818349, pvalue=8.95833684545873e-13)
Raku
# 20221025 Raku programming solution
enum <UNTREATED REGULAR IRREGULAR>;
my \DOSE_FOR_REGULAR = 100;
my ($nSubjects,$duration,$interval) = 10000, 180, 30;
my (@dosage,@category,@hadcovid) := (0,UNTREATED,False)>>.&{($_ xx $nSubjects).Array};
sub update($pCovid=1e-3, $pStartTreatment=5e-3, $pRedose=¼, @dRange=<3 6 9>) {
for 0 ..^ @dosage.elems -> \i {
unless @hadcovid[i] {
if rand < $pCovid {
@hadcovid[i] = True
} else {
my $dose = @dosage[i];
if $dose==0 && rand < $pStartTreatment or $dose > 0 && rand < $pRedose {
@dosage[i] = $dose += @dRange.roll;
@category[i] = ($dose > DOSE_FOR_REGULAR) ?? REGULAR !! IRREGULAR
}
}
}
}
}
sub kruskal (@sets) {
my $n = ( my @ranked = @sets>>.List.flat.sort ).elems;
my @sr = 0 xx @sets.elems;
my $ix = (@ranked.first: * == True, :k)+1,
my ($arf,$art) = ($ix, $ix+$n) >>/>> 2;
for @sets.kv -> \i,@set { for @set -> $b { @sr[i] += $b ?? $art !! $arf } }
my $H = [+] @sr.kv.map: -> \i,\s { s*s/@sets[i].elems }
return 12/($n*($n+1)) * $H - 3 * ($n + 1)
}
say "Total subjects: $nSubjects\n";
my ($midpoint,$unt,$irr,$reg,$hunt,$hirr,$hreg,@sunt,@sirr,@sreg)=$duration div 2;
for 1 .. $duration -> \day {
update();
if day %% $interval or day == $duration or day == $midpoint {
@sunt = @hadcovid[ @category.grep: UNTREATED,:k ];
@sirr = @hadcovid[ @category.grep: IRREGULAR,:k ];
@sreg = @hadcovid[ @category.grep: REGULAR, :k ];
($unt,$hunt,$irr,$hirr,$reg,$hreg)=(@sunt,@sirr,@sreg).map:{|(.elems,.sum)}
}
if day %% $interval {
printf "Day %d:\n",day;
printf "Untreated: N = %4d, with infection = %4d\n", $unt,$hunt;
printf "Irregular Use: N = %4d, with infection = %4d\n", $irr,$hirr;
printf "Regular Use: N = %4d, with infection = %4d\n\n",$reg,$hreg
}
if day == $midpoint | $duration {
my $stage = day == $midpoint ?? 'midpoint' !! 'study end';
printf "At $stage, Infection case percentages are:\n";
printf " Untreated : %f\n", 100*$hunt/$unt;
printf " Irregulars: %f\n", 100*$hirr/$irr;
printf " Regulars : %f\n\n",100*$hreg/$reg
}
}
printf "Final statistics: H = %f\n", kruskal ( @sunt, @sirr, @sreg )
- Output:
Total subjects: 10000 Day 30: Untreated: N = 8616, with infection = 314 Irregular Use: N = 1383, with infection = 27 Regular Use: N = 1, with infection = 0 Day 60: Untreated: N = 7511, with infection = 561 Irregular Use: N = 2308, with infection = 87 Regular Use: N = 181, with infection = 1 Day 90: Untreated: N = 6571, with infection = 760 Irregular Use: N = 2374, with infection = 141 Regular Use: N = 1055, with infection = 19 At midpoint, Infection case percentages are: Untreated : 11.565972 Irregulars: 5.939343 Regulars : 1.800948 Day 120: Untreated: N = 5757, with infection = 909 Irregular Use: N = 2097, with infection = 189 Regular Use: N = 2146, with infection = 61 Day 150: Untreated: N = 5092, with infection = 1054 Irregular Use: N = 1832, with infection = 238 Regular Use: N = 3076, with infection = 132 Day 180: Untreated: N = 4530, with infection = 1172 Irregular Use: N = 1615, with infection = 282 Regular Use: N = 3855, with infection = 229 At study end, Infection case percentages are: Untreated : 25.871965 Irregulars: 17.461300 Regulars : 5.940337 Final statistics: H = 248.419454
Wren
import "random" for Random
import "./math" for Nums
var Rand = Random.new()
var UNTREATED = 0
var IRREGULAR = 1
var REGULAR = 2
var DOSE_FOR_REGULAR = 100
/* A subject for the study. */
class Subject {
construct new() {
_cumDose = 0
_category = UNTREATED
_hadCovid = false
_updateCount = 0
}
cumDose { _cumDose }
category { _category }
hadCovid { _hadCovid }
updateCount { _updateCount }
cumDose(d) { _cumDose = d }
category(c) { _category = c }
hadCovid(h) { _hadCovid = h }
updateCount(u) { _updateCount = u }
// Daily update on the subject to check for infection and randomly dose.
update(pCovid, pStartingTreatment, pRedose, dRange) {
if (!_hadCovid) {
if (Rand.float() < pCovid) {
_hadCovid = true
} else if ((_cumDose == 0 && Rand.float() < pStartingTreatment) ||
(_cumDose > 0 && Rand.float() < pRedose)) {
_cumDose = _cumDose + Rand.sample(dRange)
categorize()
}
}
_updateCount = _updateCount + 1
}
// Update using default parameters.
update() { update(0.001, 0.005, 0.25, [3, 6, 9]) }
// Set treatment category based on cumulative treatment taken.
categorize() {
_category = (_cumDose == 0) ? UNTREATED :
(_cumDose >= DOSE_FOR_REGULAR) ? REGULAR : IRREGULAR
return _category
}
}
// a, b and c are assumed to be lists of boolean values.
var kruskal = Fn.new { |a, b, c|
// map the bool values to 1 (true) or 0 (false).
var aa = a.map { |e| e ? 1: 0 }.toList
var bb = b.map { |e| e ? 1: 0 }.toList
var cc = b.map { |e| e ? 1: 0 }.toList
// aggregate and sort them
var ss = (aa + bb + cc).sort()
// find rank of first occurrence of 1
var ix = ss.indexOf(1) + 1
// calculate average ranks for 0 and 1
var arf = (1 + ix - 1) / 2
var n = ss.count
var art = (ix + n) / 2
// calculate sum of ranks for each list
var sra = Nums.sum(a.map { |e| e ? art : arf })
var srb = Nums.sum(b.map { |e| e ? art : arf })
var src = Nums.sum(c.map { |e| e ? art : arf })
// calculate H
var H = 12/(n*(n+1)) * (sra*sra/a.count + srb*srb/b.count + src*src/c.count) - 3 * (n + 1)
return H
}
// Run the study using the population of size 'N' for 'duration' days.
var runStudy = Fn.new { |numSubjects, duration, interval|
var population = List.filled(numSubjects, null)
for (i in 0...numSubjects) population[i] = Subject.new()
var unt = 0
var untCovid = 0
var irr = 0
var irrCovid = 0
var reg = 0
var regCovid = 0
System.print("Total subjects: %(numSubjects)")
for (day in 0...duration) {
for (subj in population) subj.update()
if ((day + 1) % interval == 0) {
System.print("\nDay %(day + 1):")
unt = population.count { |s| s.category == UNTREATED }
untCovid = population.count { |s| s.category == UNTREATED && s.hadCovid }
System.print("Untreated: N = %(unt), with infection = %(untCovid)")
irr = population.count { |s| s.category == IRREGULAR }
irrCovid = population.count { |s| s.category == IRREGULAR && s.hadCovid }
reg = population.count { |s| s.category == REGULAR }
regCovid = population.count { |s| s.category == REGULAR && s.hadCovid }
System.print("Regular Use: N = %(reg), with infection = %(regCovid)")
}
if (day == (duration/2).floor - 1) {
System.print("\nAt midpoint, Infection case percentages are:")
System.print(" Untreated : %(100 * untCovid / unt)")
System.print(" Irregulars: %(100 * irrCovid / irr)")
System.print(" Regulars : %(100 * regCovid / reg)")
}
}
System.print("\nAt study end, Infection case percentages are:")
System.print(" Untreated : %(100 * untCovid / unt) of group size of %(unt)")
System.print(" Irregulars: %(100 * irrCovid / irr) of group size of %(irr)")
System.print(" Regulars : %(100 * regCovid / reg) of group size of %(reg)")
var untreated = population.where { |s| s.category == UNTREATED }.map { |s| s.hadCovid }.toList
var irregular = population.where { |s| s.category == IRREGULAR }.map { |s| s.hadCovid }.toList
var regular = population.where { |s| s.category == REGULAR }.map { |s| s.hadCovid }.toList
System.print("\nFinal statistics: H = %(kruskal.call(untreated, irregular, regular))")
}
runStudy.call(1000, 180, 30)
- Output:
Sample run.
Total subjects: 1000 Day 30: Untreated: N = 882, with infection = 32 Regular Use: N = 0, with infection = 0 Day 60: Untreated: N = 783, with infection = 58 Regular Use: N = 13, with infection = 0 Day 90: Untreated: N = 679, with infection = 82 Regular Use: N = 91, with infection = 0 At midpoint, Infection case percentages are: Untreated : 12.076583210604 Irregulars: 4.3478260869565 Regulars : 0 Day 120: Untreated: N = 592, with infection = 95 Regular Use: N = 186, with infection = 5 Day 150: Untreated: N = 517, with infection = 106 Regular Use: N = 286, with infection = 19 Day 180: Untreated: N = 459, with infection = 118 Regular Use: N = 375, with infection = 26 At study end, Infection case percentages are: Untreated : 25.708061002179 of group size of 459 Irregulars: 15.060240963855 of group size of 166 Regulars : 6.9333333333333 of group size of 375 Final statistics: H = 395.09607330604