Selection bias in clinical sciences

In epidemiology, retrospective analyses have well-known limitations compared to prospective studies.

One such limitation is the occurrence of selection bias in the choice of subjects between treated and untreated groups about whom the data is collected. For example, a treatment may have only been given to persons who were less severely ill, which would bias the results in favor of such subjects appearing to have done better because of the treatment when the biased group is then compared to those who who did not receive the study treatment. Or, in a retrospective study, there may a choice to place subjects in a particular study group using a method which is inadvertently biased by the outcome being measured. Creating a programming example of a simulation of such selection bias in the design of a retrospective study is the topic of this task.

The genuine, historical example (only partially approximated in this task) is of a study done of persons who, over a course of 180 days, may or may not have become infected with Covid-19. Prior to becoming ill, these subjects may or may not have taken an available medication, which was actually taken on a particular schedule not used here, but is approximated by stating the medication was taken in doses of 3, 6, or 9 mg daily. The historical study then divided its subjects into three groups based on their cumulative dosage of the study medication:

Group UNTREATED were those who did not take the study medication at all before they got Covid-19, including those who exited the study period without Covid-19 and having never taken the study medication.

Group IRREGULAR is those who took the study medication but whose cumulative dose was less than a certain amount (approximated for our purposes as 100 mg) before they either came down with Covid-19 during the study or the study period ended.

Group REGULAR is those who took (our approximation is >= 100 mg) of the study medication either before they came down with Covid-19 or took >= 100 mg by the end of the study and never became infected during the study.

Assumptions for the study simulation programming task

Daily risk of getting Covid-19 infection for each subject was 0.1% per day, or 18% over the 180 cumulative days of the study.

The probability of starting treatment medication for anyone not already taking it was 0.5% per day. For those who started medication, the chance of continuing the treatment was increased 50-fold to 25% each day, since most who started the medication continued to take it to some extent.

Study dose per day is random between the approximation for the simulation of 3, 6 and 9 mg. The daily cumulative dosage is used to determine the group the subject is in, unless a subject develops Covid-19. If a subject was diagnosed with Covid-19, their group at the time of that diagnosis is used in the statistical analysis of that group.

Task

Create a simulation of the subjects, keeping track of their medication dosages, group membership, and Covid-19 status during the study.

Use at least 1000 subjects in the simulation over the 180 days (historically, the study size was 80,000).

Statistics used are to be the Kruskal statistic for the analysis of multiple groups, with the boolean study outcome variable whether the subject got Covid-19 during the study period, analyzed versus category. If the programming language does not have a statistical package for such a test, calculation of the Kruskal H statistic may be used, since any values of H over about 50 are considered very highly indicative of significant group difference. A description of that test is found at Kruskal-Wallis test.

You should get a statistical result highly favoring the REGULAR group.

Stretch task

Show monthly outcomes.

A note regarding outcome: Note that by simulation design all subjects must have an IDENTICAL risk, that is 0.1 per cent or p = 0.001 per day, of developing Covid-19. Because of the design, any statistical differences between the groups CANNOT come from an influence of the treatment on that risk, but must come from some other feature of the study design.

See also

Wikipedia - Selection Bias Wikipedia - Retrospective studies UCDavis (pdf) Challenges in Retrospective studies

FreeBASIC

Translation of: Wren

Const UNTREATED = 0
Const IRREGULAR = 1
Const REGULAR   = 2
Const DOSE_FOR_REGULAR = 100

' A subject for the study
Type Subject
    cumDose As Double
    category As Integer
    hadCovid As Boolean
    updateCount As Integer
End Type

'Set treatment category based on cumulative treatment taken.
Sub categorize (Byref s As Subject)
    If s.cumDose = 0 Then
        s.category = UNTREATED
    Elseif s.cumDose >= DOSE_FOR_REGULAR Then
        s.category = REGULAR
    Else
        s.category = IRREGULAR
    End If
End Sub

'Daily update on the subject to check for infection and randomly dose.
Sub update (Byref s As Subject, pCovid As Double, pStartingTreatment As Double, pRedose As Double, dRange As Double)
    If Not s.hadCovid Then
        If Rnd < pCovid Then
            s.hadCovid = True
        Elseif (s.cumDose = 0 And Rnd < pStartingTreatment) Or (s.cumDose > 0 And Rnd < pRedose) Then
            s.cumDose += Int(Rnd * dRange)
            categorize(s)
        End If
    End If
    s.updateCount += 1
End Sub

Function sumRanks(a() As Integer) As Integer
    Dim As Integer sum = 0
    For i As Integer = Lbound(a) To Ubound(a)
        sum += a(i)
    Next i
    Return sum
End Function

Function kruskal(a() As Integer, b() As Integer, c() As Integer) As Double
    Dim As Integer n = Ubound(a) + Ubound(b) + Ubound(c) + 3
    Dim As Double sra = sumRanks(a())
    Dim As Double srb = sumRanks(b())
    Dim As Double src = sumRanks(c())
    Dim As Double H = 12/(n*(n+1)) * (sra*sra/Ubound(a) + srb*srb/Ubound(b) + src*src/Ubound(c)) - 3*(n+1)
    Return H
End Function

'Run the study using the population of size 'N' for 'duration' days
Sub runStudy(numSubjects As Integer, duration As Integer, interval As Integer)
    Dim As Integer i
    Dim As Subject population(numSubjects - 1)    
    For i = 0 To numSubjects - 1
        population(i).cumDose = 0
        population(i).category = UNTREATED
        population(i).hadCovid = False
        population(i).updateCount = 0
    Next i
    
    Dim As Integer unt, untCovid, irr, irrCovid, reg, regCovid, dia
    Print "Total subjects:"; numSubjects
    For dia = 0 To duration - 1
        For i = 0 To numSubjects - 1
            update(population(i), 0.001, 0.005, 0.25, Int(Rnd * 9) + 3)
        Next i
        If (dia + 1) Mod interval = 0 Then
            Print !"\nDay"; dia + 1
            unt = 0
            untCovid = 0
            irr = 0
            irrCovid = 0
            reg = 0
            regCovid = 0
            For i = 0 To numSubjects - 1
                If population(i).category = UNTREATED Then
                    unt += 1
                    If population(i).hadCovid Then untCovid += 1
                Elseif population(i).category = IRREGULAR Then
                    irr += 1
                    If population(i).hadCovid Then irrCovid += 1
                Elseif population(i).category = REGULAR Then
                    reg += 1
                    If population(i).hadCovid Then regCovid += 1
                End If
            Next i
            Print "    Untreated: N ="; unt; ", with infection ="; untCovid
            Print "Irregular Use: N ="; irr; ", with infection ="; irrCovid
            Print "  Regular Use: N ="; reg; ", with infection ="; regCovid
        End If

        If dia = duration \ 2 - 1 Then
            Print !"\nAt midpoint, Infection case percentages are:"
            Print "  Untreated :"; 100 * untCovid / unt
            Print "  Irregulars:"; 100 * irrCovid / irr
            Print "  Regulars  :"; 100 * regCovid / reg
        End If
    Next dia
    
    Print !"\nAt study end, Infection case percentages are:"
    Print "  Untreated :"; 100 * untCovid / unt; " of group size of"; unt
    Print "  Irregulars:"; 100 * irrCovid / irr; " of group size of"; irr
    Print "  Regulars  :"; 100 * regCovid / reg; " of group size of"; reg
    
    Dim As Integer sinTratar(numSubjects - 1)
    Dim As Integer esIrregular(numSubjects - 1)
    Dim As Integer esRegular(numSubjects - 1)
    For i = 0 To numSubjects - 1
        If population(i).category = UNTREATED Then sinTratar(i) = population(i).hadCovid
        If population(i).category = IRREGULAR Then esIrregular(i) = population(i).hadCovid
        If population(i).category = REGULAR Then esRegular(i) = population(i).hadCovid
    Next i
    
    Print !"\nFinal statistics: H = "; kruskal(sinTratar(), esIrregular(), esRegular())
End Sub

Randomize Timer
runStudy(1000, 180, 30)

Sleep

Output:

Total subjects: 1000

Day 30
    Untreated: N = 908, with infection = 41
Irregular Use: N = 92, with infection = 1
  Regular Use: N = 0, with infection = 0

Day 60
    Untreated: N = 794, with infection = 73
Irregular Use: N = 206, with infection = 2
  Regular Use: N = 0, with infection = 0

Day 90
    Untreated: N = 718, with infection = 92
Irregular Use: N = 281, with infection = 6
  Regular Use: N = 1, with infection = 0

At midpoint, Infection case percentages are:
  Untreated : 12.81337047353761
  Irregulars: 2.135231316725978
  Regulars  : 0

Day 120
    Untreated: N = 640, with infection = 101
Irregular Use: N = 347, with infection = 11
  Regular Use: N = 13, with infection = 0

Day 150
    Untreated: N = 589, with infection = 116
Irregular Use: N = 349, with infection = 27
  Regular Use: N = 62, with infection = 0

Day 180
    Untreated: N = 528, with infection = 131
Irregular Use: N = 331, with infection = 36
  Regular Use: N = 141, with infection = 4

At study end, Infection case percentages are:
  Untreated : 24.81060606060606 of group size of 528
  Irregulars: 10.8761329305136 of group size of 331
  Regulars  : 2.836879432624114 of group size of 141

Final statistics: H = -9002.999975352894

jq

Adapted from Wren

Works with jq, the C implementation of jq

Works with gojq, the Go implementation of jq

Since jq does not include a PRN generator, an external source of entropy such as /dev/urandom is assumed. A suitable invocation of jq would be along the lines of:

< /dev/urandom tr -cd '0-9' | fold -w 1 | $jq -Rcnr --rawfile text alice_oz.txt -f mc.jq

### Generic utilities
def lpad($len): tostring | ($len - length) as $l | (" " * $l) + .;

def sum(s): reduce s as $x (0; . + $x);

# For readability:
def count(s): reduce s as $x (0; . + 1);

### Kruskal-Wallis statistic

# The arguments are assumed to be arrays of boolean values.
def kruskal($a; $b; $c):
  def tobits: map(if . then 1 else 0 end);
  # map the boolean values to 1 (true) or 0 (false).
  ($a|tobits) as $aa
  | ($b|tobits) as $bb
  | ($c|tobits) as $cc
  # aggregate and sort them
  | (($aa + $bb + $cc)|sort) as $ss
  # find rank of first occurrence of 1
  | (($ss|index(1)) + 1) as $ix
  # calculate average ranks for 0 and 1
  | ($ix / 2) as $arf
  | ($ss|length) as $n
  | (($ix + $n) / 2) as $art
  # calculate sum of ranks for each list
  | if any($a,$b,$c; length == 0) then null
  else
    sum($a[] | if . then $art else $arf end) as $sra
  | sum($b[] | if . then $art else $arf end) as $srb
  | sum($c[] | if . then $art else $arf end) as $src
  # calculate H
  | (12/($n*($n+1))) * ($sra*$sra/($a|length) + $srb*$srb/($b|length) + $src*$src/($c|length)) - 3 * ($n + 1)
    end;

### Pseuo-random numbers

# Output: a prn in range(0;$n) where $n is `.`
def prn:
  if . == 1 then 0
  else . as $n
  | ([1, (($n-1)|tostring|length)]|max) as $w
  | [limit($w; inputs)] | join("") | tonumber
  | if . < $n then . else ($n | prn) end
  end;

def sample:
  if length == 0 # e.g. null or []
  then null
  else .[length|prn]
  end;

def randFloat:
  (1000|prn) / 1000;

### Simulation

def UNTREATED: 0;
def IRREGULAR: 1;
def REGULAR:   2;
def DOSE_FOR_REGULAR: 100;

def Subject:
  {cumDose:  0,
   category: UNTREATED,
   hadCovid: false,
   updateCount: 0 };

def Subject($cumDose; $category; $hadCovid; $updateCount):
  {$cumDose, $category, $hadCovid, $updateCount};

# Set treatment category based on cumulative treatment taken.
def categorize:
  if .cumDose == 0 then UNTREATED
  elif .cumDose >= DOSE_FOR_REGULAR then REGULAR
  else IRREGULAR
  end;

# Daily update on the input Subject to check for infection and randomly dose.
def update($pCovid; $pStartingTreatment; $pRedose; $dRange):
  if (.hadCovid|not)
  then if (randFloat < $pCovid) 
       then .hadCovid = true
       else if (.cumDose == 0 and randFloat < $pStartingTreatment) or 
               (.cumDose  > 0 and randFloat < $pRedose)
            then .cumDose += ($dRange|sample)
            | .category = categorize
            end
       end
  | .updateCount += 1
  end;

# Update using default parameters.
def update: update(0.001; 0.005; 0.25; [3, 6, 9]);

# Run the study using the population of size $numSubjects for $duration days.
def runStudy($numSubjects; $duration; $interval):
  def r: . * 1000 | round / 1000;
  def div($i; $j): if $j == 0 then 0 else $i / $j | r | lpad(6) end;
  
  { population: [range(0; $numSubjects) | Subject],
    unt: 0,
    untCovid: 0,
    irr: 0,
    irrCovid: 0,
    reg:  0,
    regCovid: 0
  }
  | "Total subjects: \($numSubjects)",
    (foreach range(0; $duration) as $day (.;
       reduce range(0; $numSubjects) as $i (.; .population[$i] |= update) ;
       
       if ( ($day + 1) % $interval == 0 ) or $day == (($duration/2)|floor) - 1 or $day == $duration-1
       then
           .unt      = count(.population[] | select(.category == UNTREATED))
         | .untCovid = count(.population[] | select( .category == UNTREATED and .hadCovid ))
         | .irr      = count(.population[] | select(.category == IRREGULAR ))
         | .irrCovid = count(.population[] | select(.category == IRREGULAR and .hadCovid))
         | .reg      = count(.population[] | select (.category == REGULAR ))
         | .regCovid = count(.population[] | select(.category == REGULAR and .hadCovid))
       | (select( ($day + 1) % $interval == 0 )
          | "Day \($day + 1):",
            "Untreated: N = \(.unt), with infection = \(.untCovid)",
            "Regular Use: N = \(.reg), with infection = \(.regCovid)" ),

         (select($day == (($duration/2)|floor) - 1)
          | "\nAt midpoint, Infection case percentages are:",
            "  Untreated : \(div(100 * .untCovid ; .unt))",
            "  Irregulars: \(div(100 * .irrCovid ; .irr))",
            "  Regulars  : \(div(100 * .regCovid ; .reg))" ),
 
         (select($day == $duration-1)
          | "\nAt study end, infection case percentages are:",
            "  Untreated : \(div(100 * .untCovid ; .unt)) of group size \(.unt)",
            "  Irregulars: \(div(100 * .irrCovid ; .irr)) of group size \(.irr)",
            "  Regulars  : \(div(100 * .regCovid ; .reg)) of group size \(.reg)",
            ( (.population | map(select(.category == UNTREATED)) | map(.hadCovid)) as $untreated
            | (.population | map(select(.category == IRREGULAR)) | map(.hadCovid)) as $irregular
            | (.population | map(select(.category == REGULAR))   | map(.hadCovid)) as $regular
            | "\nFinal statistics: H = \(kruskal($untreated; $irregular; $regular)|r)"      ) )
       else empty
       end ) );

runStudy(10000; 180; 60)

Output:

Total subjects: 10000

Day 60:
Untreated: N = 7471, with infection = 508
Regular Use: N = 185, with infection = 0

At midpoint, Infection case percentages are:
  Untreated :  10.74
  Irregulars:    6.2
  Regulars  :   1.43

Day 120:
Untreated: N = 5780, with infection = 880
Regular Use: N = 2139, with infection = 75

Day 180:
Untreated: N = 4583, with infection = 1127
Regular Use: N = 3823, with infection = 249

At study end, infection case percentages are:
  Untreated : 24.591 of group size 4583
  Irregulars: 18.068 of group size 1594
  Regulars  :  6.513 of group size 3823

Final statistics: H = 205.487

Julia

using HypothesisTests

@enum TreatmentClass Untreated Irregular Regular

mutable struct Subject
    cum_dose::Float64
    treatment_class::TreatmentClass
    had_covid::Bool
end

function update!(subjects::Vector{Subject}, pcovid = 0.001, pstart = 0.005, pdosing = 0.25, drange = 3:3:9)
    for subj in subjects
        if subj.had_covid
            continue
        elseif rand() < pcovid
            subj.had_covid = true
        elseif subj.cum_dose > 0 && rand() <= pdosing || subj.cum_dose == 0 && rand() <= pstart
            subj.cum_dose += rand(drange)
            subj.treatment_class =
               subj.cum_dose == 0 ? Untreated : subj.cum_dose >= 100 ? Regular : Irregular
        end
    end
end

function run_study(N = 10_000, duration = 180)
    population = [Subject(0.0, Untreated, false) for _ in 1:N]
    unt, unt_covid, irr, irr_covid, reg, reg_covid = 0, 0, 0, 0, 0, 0
    println("Population size $N, daily infection risk 0.1%")
    for day in 1:duration
        update!(population)
        if day % 30 == 0
            println("\nDay $day:")
            unt = count(s -> s.treatment_class == Untreated, population)
            unt_covid = count(s -> (s.treatment_class == Untreated) && s.had_covid, population)
            println("Untreated: N = $unt, with infection = $unt_covid")
            irr = count(s -> s.treatment_class == Irregular, population)
            irr_covid = count(s -> (s.treatment_class == Irregular) && s.had_covid, population)
            println("Irregular Use: N = $irr, with infection = $irr_covid")
            reg = count(s -> s.treatment_class == Regular, population)
            reg_covid = count(s -> (s.treatment_class == Regular) && s.had_covid, population)
            println("Regular Use: N = $reg, with infection = $reg_covid")
        end
        if day == 90
            println("\nAt midpoint, Infection case percentages are:")
            println("  Untreated : ", Float16(100 * unt_covid / unt))
            println("  Irregulars: ", Float16(100 * irr_covid / irr))
            println("  Regulars  : ", Float16(100 * reg_covid / reg))
        end
    end
    println("\nAt study end, Infection case percentages are:")
    println("  Untreated : ", Float16(100 * unt_covid / unt), " of group size of $unt")
    println("  Irregulars: ", Float16(100 * irr_covid / irr), " of group size of $irr")
    println("  Regulars  : ", Float16(100 * reg_covid / reg), " of group size of $reg")
    untreated = [s.had_covid for s in population if s.treatment_class == Untreated]
    irregular = [s.had_covid for s in population if s.treatment_class == Irregular]
    regular = [s.had_covid for s in population if s.treatment_class == Regular]
    println("\n\n   Final statistics:\n")
    @show KruskalWallisTest(untreated, irregular, regular)
end

run_study()

Output:

Population size 10000, daily infection risk 0.1%

Day 30:
Untreated: N = 8633, with infection = 288
Irregular Use: N = 1367, with infection = 21
Regular Use: N = 0, with infection = 0

Day 60:
Untreated: N = 7513, with infection = 519
Irregular Use: N = 2325, with infection = 79
Regular Use: N = 162, with infection = 2

Day 90:
Untreated: N = 6559, with infection = 692
Irregular Use: N = 2362, with infection = 159
Regular Use: N = 1079, with infection = 24

At midpoint, Infection case percentages are:
  Untreated : 10.55
  Irregulars: 6.73
  Regulars  : 2.225

Day 120:
Untreated: N = 5794, with infection = 845
Irregular Use: N = 2071, with infection = 221
Regular Use: N = 2135, with infection = 72

Day 150:
Untreated: N = 5115, with infection = 987
Irregular Use: N = 1835, with infection = 266
Regular Use: N = 3050, with infection = 156

Day 180:
Untreated: N = 4538, with infection = 1106
Irregular Use: N = 1654, with infection = 302
Regular Use: N = 3808, with infection = 263

At study end, Infection case percentages are:
  Untreated : 24.38 of group size of 4538
  Irregulars: 18.27 of group size of 1654
  Regulars  : 6.906 of group size of 3808


   Final statistics:

KruskalWallisTest(untreated, irregular, regular) = Kruskal-Wallis rank sum test (chi-square approximation)
-------------------------------------------------------
Population details:
    parameter of interest:   Location parameters
    value under h_0:         "all equal"
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-99

Details:
    number of observation in each group: [4538, 1654, 3808]
    χ²-statistic:                        457.179
    rank sums:                           [2.44308e7, 8.39891e6, 1.71753e7]
    degrees of freedom:                  2
    adjustment for ties:                 0.417533

Kruskal-Wallis rank sum test (chi-square approximation)
-------------------------------------------------------
Population details:
    parameter of interest:   Location parameters
    value under h_0:         "all equal"
    point estimate:          NaN

Test summary:
    outcome with 95% confidence: reject h_0
    one-sided p-value:           <1e-99

Details:
    number of observation in each group: [4538, 1654, 3808]
    χ²-statistic:                        457.179
    rank sums:                           [2.44308e7, 8.39891e6, 1.71753e7]
    degrees of freedom:                  2
    adjustment for ties:                 0.417533

Nim

Translation of: Python

Translation of: Wren

import std/[algorithm, math, random, sequtils, strformat, sugar]

type

  TreatmentClass {.pure.} = enum Untreated, Irregular, Regular

  Subject = object
    cumDose: float
    category: TreatmentClass
    hadCovid: bool

const DoseForRegular = 100

func categorize(subject: var Subject) =
  ## Set treatment category based on cumulative treatment taken.
  subject.category = if subject.cumDose == 0: Untreated
                     elif subject.cumDose >= DoseForRegular: Regular
                     else: Irregular

proc update(subject: var Subject; pCovid = 0.001;
            pStart = 0.005; pDosing = 0.25; doses = @[3, 6, 9]) =
  ## Daily update on the subject to check for infection and randomly dose.
  if not subject.hadCovid:
    if rand(1.0) < pCovid:
      subject.hadCovid = true
    elif (subject.cumDose == 0 and rand(1.0) < pStart) or
         (subject.cumDose > 0 and rand(1.0) < pDosing):
      subject.cumDose += sample(doses).toFloat
      subject.categorize()

func kruskalWallis(a, b, c: seq[bool]): float =

  # Aggregate and sort.
  let s = sorted(a & b & c)
  # Find rank of first occurrence of "true".
  let ix = s.find(true) + 1
  # Calculate average ranks for "false" and "true".
  let n = s.len
  let arf = ix / 2
  let art = (ix + n) / 2
  # Calculate sum of ranks for each list.
  let sra = sum(a.mapIt(if it: art else: arf))
  let srb = sum(b.mapIt(if it: art else: arf))
  let src = sum(c.mapIt(if it: art else: arf))
  # Calculate H.
  result = 12 / (n * (n + 1)) * (sra * sra / a.len.toFloat + srb * srb / b.len.toFloat +
                                 src * src / c.len.toFloat) - 3 * float(n + 1)

proc runStudy(numSubjects = 1000; duration = 180; interval= 30) =
  ## Run the study using the population of size "numSubjects" for "duration" days.
  var population = newSeqWith(numSubjects, Subject())
  var unt, untCovid, irr, irrCovid, reg, regCovid = 0
  echo &"Total subjects: {num_subjects}"

  for day in 1..duration:
    for subj in population.mitems:
      subj.update()

    if day mod interval == 0:
      echo &"\nDay {day}:"
      unt = population.countIt(it.category == Untreated)
      untCovid = population.countIt(it.category == Untreated and it.hadCovid)
      echo &"Untreated: N = {unt}, with infection = {untCovid}"
      irr = population.countIt(it.category == IRREGULAR)
      irrCovid = population.countIt(it.category == IRREGULAR and it.hadCovid)
      echo &"Irregular Use: N = {irr}, with infection = {irrCovid}"
      reg = population.countIt(it.category == REGULAR)
      regCovid = population.countIt(it.category == REGULAR and it.hadCovid)
      echo &"Regular Use: N = {reg}, with infection = {reg_covid}"

    if day == duration div 2:
      echo "\nAt midpoint, infection case percentages are:"
      echo "  Untreated : ", 100 * untCovid / unt
      echo "  Irregulars: ", 100 * irrCovid / irr
      echo "  Regulars  : ", 100 * regCovid / reg

  echo "\nAt study end, infection case percentages are:"
  echo &"  Untreated : {100 * untCovid / unt} of group size of {unt}"
  echo &"  Irregulars: {100 * irrCovid / irr} of group size of {irr}"
  echo &"  Regulars  : {100 * regCovid / reg} of group size of {reg}"
  let untreated = collect:
                    for s in population:
                      if s.category == Untreated:
                        s.hadCovid
  let irregular = collect:
                    for s in population:
                      if s.category == Irregular:
                        s.hadCovid
  let regular = collect:
                  for s in population:
                    if s.category == Regular:
                      s.hadCovid
  echo "\nFinal statistics: ", kruskalWallis(untreated, irregular, regular)

randomize()
runStudy(10_000)

Output:

Total subjects: 10000

Day 30:
Untreated: N = 8619, with infection = 283
Irregular Use: N = 1379, with infection = 27
Regular Use: N = 2, with infection = 0

Day 60:
Untreated: N = 7480, with infection = 498
Irregular Use: N = 2362, with infection = 77
Regular Use: N = 158, with infection = 3

Day 90:
Untreated: N = 6479, with infection = 712
Irregular Use: N = 2440, with infection = 151
Regular Use: N = 1081, with infection = 24

At midpoint, infection case percentages are:
  Untreated : 10.98935020836549
  Irregulars: 6.188524590163935
  Regulars  : 2.220166512488436

Day 120:
Untreated: N = 5655, with infection = 873
Irregular Use: N = 2199, with infection = 210
Regular Use: N = 2146, with infection = 70

Day 150:
Untreated: N = 4998, with infection = 1007
Irregular Use: N = 1910, with infection = 257
Regular Use: N = 3092, with infection = 135

Day 180:
Untreated: N = 4479, with infection = 1122
Irregular Use: N = 1575, with infection = 302
Regular Use: N = 3946, with infection = 232

At study end, infection case percentages are:
  Untreated : 25.05023442732753 of group size of 4479
  Irregulars: 19.17460317460317 of group size of 1575
  Regulars  : 5.879371515458693 of group size of 3946

Final statistics: 235.1089104422972

Perl

Translation of: Raku

use v5.036;
use List::AllUtils <sum0 indexes firstidx>;
use enum qw<False True UNTREATED REGULAR IRREGULAR>;
use constant DOSE_FOR_REGULAR => 100;

my ($nSubjects,$duration,$interval) = (10000, 180, 30);
my (@dosage)   = (0)         x $nSubjects;
my (@category) = (UNTREATED) x $nSubjects;
my (@hadcovid) = (False)     x $nSubjects;

sub update {
    my $pCovid          = 1e-3;
    my $pStartTreatment = 5e-3;
    my $pRedose         = 1/4;
    my @dRange          = <3 6 9>;
    for my $i (0 .. $nSubjects-1) {
        unless ($hadcovid[$i]) {
            if (rand() < $pCovid) {
                $hadcovid[$i] = True
            } else {
                my $dose = $dosage[$i];
                if ($dose==0 && rand() < $pStartTreatment or $dose > 0 && rand() < $pRedose) {
                    $dosage[$i]   = $dose += $dRange[3*rand()];
                    $category[$i] = ($dose > DOSE_FOR_REGULAR) ? REGULAR : IRREGULAR
                }
            }
        }
    }
}

sub kruskal (@sets) {
     my @ranked     = sort @{$sets[0]}, @{$sets[1]}, @{$sets[2]};
     my $n          = @ranked;
     my @sr         = (0) x @sets;
     my $ix         = firstidx { $_ == 1 } @ranked;
     my ($arf,$art) = ($ix/2, ($ix+$n)/2);

    for my $i (0..2) {
        $sr[$i] += $_ ? $art : $arf for @{$sets[$i]}
    }

    my $H = sum0 map { my $s = $sr[$_]; $s**2 / @{$sets[$_]} } 0..$#sr;
    12/($n*($n+1)) * $H - 3 * ($n + 1)
}

my($unt,$irr,$reg,$hunt,$hirr,$hreg,@sunt,@sirr,@sreg);
my $midpoint = int $duration / 2;

say "Total subjects: $nSubjects\n";

for my $day (1 .. $duration) {
    update();
    if (0 == $day % $interval or $day == $duration or $day == $midpoint) {
        @sunt = @hadcovid[ indexes { $_ == UNTREATED } @category];
        @sirr = @hadcovid[ indexes { $_ == IRREGULAR } @category];
        @sreg = @hadcovid[ indexes { $_ ==   REGULAR } @category];
        ( $unt, $irr, $reg) = (scalar(@sunt), scalar(@sirr), scalar(@sreg));
        ($hunt,$hirr,$hreg) = (  sum0(@sunt),   sum0(@sirr),   sum0(@sreg));
    }

    if (0 == $day % $interval) {
        printf "Day %d:\n", $day;
        printf "Untreated:     N = %4d, with infection = %4d\n",  $unt,$hunt;
        printf "Irregular Use: N = %4d, with infection = %4d\n",  $irr,$hirr;
        printf "Regular Use:   N = %4d, with infection = %4d\n\n",$reg,$hreg;
    }

    if ($day == $midpoint or $day == $duration) {
        my $stage = $day == $midpoint ? 'midpoint' : 'study end';
        printf "At $stage, Infection case percentages are:\n";
        printf "  Untreated : %6.2f\n",   100*$hunt/$unt;
        printf "  Irregulars: %6.2f\n",   100*$hirr/$irr;
        printf "  Regulars  : %6.2f\n\n", 100*$hreg/$reg;
    }
}

printf "Final statistics: H = %.2f\n", kruskal ( \@sunt, \@sirr, \@sreg );

Output:

Total subjects: 10000

Day 30:
Untreated:     N = 8640, with infection =  283
Irregular Use: N = 1360, with infection =   18
Regular Use:   N =    0, with infection =    0

Day 60:
Untreated:     N = 7499, with infection =  532
Irregular Use: N = 2338, with infection =   75
Regular Use:   N =  163, with infection =    1

Day 90:
Untreated:     N = 6537, with infection =  709
Irregular Use: N = 2382, with infection =  144
Regular Use:   N = 1081, with infection =   12

At midpoint, Infection case percentages are:
  Untreated :  10.85
  Irregulars:   6.05
  Regulars  :   1.11

Day 120:
Untreated:     N = 5739, with infection =  855
Irregular Use: N = 2102, with infection =  216
Regular Use:   N = 2159, with infection =   52

Day 150:
Untreated:     N = 5091, with infection = 1012
Irregular Use: N = 1792, with infection =  270
Regular Use:   N = 3117, with infection =  134

Day 180:
Untreated:     N = 4468, with infection = 1126
Irregular Use: N = 1669, with infection =  315
Regular Use:   N = 3863, with infection =  243

At study end, Infection case percentages are:
  Untreated :  25.20
  Irregulars:  18.87
  Regulars  :   6.29

Final statistics: H = 218.74

Phix

Translation of: Wren

with javascript_semantics
sequence dosage, category, hadcovid
enum UNTREATED, REGULAR, IRREGULAR
constant DOSE_FOR_REGULAR = 100

procedure update(atom pCovid=0.001, pStartTreatment=0.005, pRedose=0.25, sequence dRange={3,6,9})
    for i=1 to length(dosage) do
        if not hadcovid[i] then
            if rnd()<pCovid then
                hadcovid[i] = true
            else
                atom dose = dosage[i]
                if (dose=0 and rnd()<pStartTreatment)
                or (dose>0 and rnd()<pRedose) then
                    dose += dRange[rand(length(dRange))]
                    dosage[i] = dose
                    category[i] = iff(dose>DOSE_FOR_REGULAR?REGULAR:IRREGULAR)
                end if
            end if
        end if
    end for
end procedure

function kruskal(sequence sets)
    sequence ranked = sort(flatten(sets)),
             sr = repeat(0,length(sets))
    integer ix = find(true,ranked),
            n = length(ranked)
    atom arf = ix/2, art = (ix+n)/2
    for i,set in sets do
        for b in set do
            sr[i] += iff(b?art:arf)
        end for
    end for
    atom H = 0
    for i,s in sr do
        H += s*s/length(sets[i])
    end for
    H = 12/(n*(n+1)) * H - 3 * (n + 1)
    return H
end function

procedure run_study(integer nSubjects=10000, duration=180, interval=30)
    dosage = repeat(0,nSubjects)
    category = repeat(UNTREATED,nSubjects)
    hadcovid = repeat(false,nSubjects)
    printf(1,"Total subjects: %d\n\n",nSubjects)
    sequence sunt,sirr,sreg
    integer unt,irr,reg,hunt,hirr,hreg,midpoint=floor(duration/2)
    for day=1 to duration do
        update()
        if rmdr(day,interval)=0 or day=duration or day=midpoint then
            sunt = extract(hadcovid,find_all(UNTREATED,category))
            sirr = extract(hadcovid,find_all(IRREGULAR,category))
            sreg = extract(hadcovid,find_all(REGULAR,category))
            unt = length(sunt); hunt = sum(sunt)
            irr = length(sirr); hirr = sum(sirr)
            reg = length(sreg); hreg = sum(sreg)
        end if
        if rmdr(day,interval)=0 then
            printf(1,"Day %d:\n",day)
            printf(1,"Untreated: N = %d, with infection = %d\n",{unt,hunt})
            printf(1,"Irregular Use: N = %d, with infection = %d\n",{irr,hirr})
            printf(1,"Regular Use: N = %d, with infection = %d\n\n",{reg,hreg})
        end if
        if day=midpoint then
            printf(1,"At midpoint, Infection case percentages are:\n")
            printf(1,"  Untreated : %f\n",100*hunt/unt)
            printf(1,"  Irregulars: %f\n",100*hirr/irr)
            printf(1,"  Regulars  : %f\n\n",100*hreg/reg)
        end if
    end for
    printf(1,"At study end, Infection case percentages are:\n")
    printf(1,"  Untreated : %f\n",100*hunt/unt)
    printf(1,"  Irregulars: %f\n",100*hirr/irr)
    printf(1,"  Regulars  : %f\n\n",100*hreg/reg)
    printf(1,"Final statistics: H = %f\n",kruskal({sunt, sirr, sreg}))
end procedure

run_study()

Aside: the best explanation of the Kruskal-Wallis Test I found was http://www.statisticslectures.com/topics/kruskalwallis/ but I didn't implement anything that I found there.

Output:

Total subjects: 10000

Day 30:
Untreated: N = 8628, with infection = 285
Irregular Use: N = 1372, with infection = 23
Regular Use: N = 0, with infection = 0

Day 60:
Untreated: N = 7460, with infection = 518
Irregular Use: N = 2366, with infection = 72
Regular Use: N = 174, with infection = 0

Day 90:
Untreated: N = 6497, with infection = 730
Irregular Use: N = 2399, with infection = 151
Regular Use: N = 1104, with infection = 17

At midpoint, Infection case percentages are:
  Untreated : 11.235955
  Irregulars: 6.294289
  Regulars  : 1.539855

Day 120:
Untreated: N = 5709, with infection = 879
Irregular Use: N = 2102, with infection = 220
Regular Use: N = 2189, with infection = 66

Day 150:
Untreated: N = 5058, with infection = 1005
Irregular Use: N = 1846, with infection = 273
Regular Use: N = 3096, with infection = 152

Day 180:
Untreated: N = 4486, with infection = 1109
Irregular Use: N = 1624, with infection = 315
Regular Use: N = 3890, with infection = 253

At study end, Infection case percentages are:
  Untreated : 24.721355
  Irregulars: 19.396552
  Regulars  : 6.503856

Final statistics: H = 211.421331

Python

''' Rosetta code rosettacode.org/wiki/Study_Bias_in_Clinical_Sciences '''


from random import randrange
from numpy.random import rand
from scipy.stats import kruskal

UNTREATED = 0
IRREGULAR = 1
REGULAR = 2
DOSE_FOR_REGULAR = 100


class Subject:
    ''' A subject for the study '''

    def __init__(self):
        self.cum_dose = 0.0
        self.category = UNTREATED
        self.had_covid = False
        self.update_count = 0

    def update(self, p_covid=0.001, p_starting_treatment=0.005, p_redose=0.25, drange=(3, 10, 3)):
        ''' daily update on the subject to check for infection and randomly dose. '''
        if not self.had_covid:
            if rand() < p_covid:
                self.had_covid = True
            elif (self.cum_dose == 0 and rand() < p_starting_treatment) or\
                 (self.cum_dose > 0 and rand() < p_redose):
                self.cum_dose += randrange(*drange)
                self.categorize()
        self.update_count += 1

    def categorize(self):
        ''' Set treatment category based on cumulative treatment taken. '''
        self.category = UNTREATED if self.cum_dose == 0 else REGULAR if\
            self.cum_dose >= DOSE_FOR_REGULAR else IRREGULAR
        return self.category


def run_study(num_subjects=1000, duration=180, interval=30):
    ''' Run the study using the population of size `N` for `duration` days. '''
    population = [Subject() for _ in range(num_subjects)]
    unt, unt_covid, irr, irr_covid, reg, reg_covid = 0, 0, 0, 0, 0, 0
    print(f'Total subjects: {num_subjects:,}')
    for day in range(duration):
        for subj in population:
            subj.update()

        if (day + 1) % interval == 0:
            print(f'\nDay {day + 1}:')
            unt = sum(s.category == UNTREATED for s in population)
            unt_covid = sum(s.category ==
                            UNTREATED and s.had_covid for s in population)
            print(f'Untreated: N = {unt}, with infection = {unt_covid}')
            irr = sum(s.category == IRREGULAR for s in population)
            irr_covid = sum(s.category ==
                            IRREGULAR and s.had_covid for s in population)
            print(f'Irregular Use: N = {irr}, with infection = {irr_covid}')
            reg = sum(s.category == REGULAR for s in population)
            reg_covid = sum(s.category ==
                            REGULAR and s.had_covid for s in population)
            print(f'Regular Use: N = {reg}, with infection = {reg_covid}')

        if day == duration // 2 - 1:
            print('\nAt midpoint, Infection case percentages are:')
            print('  Untreated : ', 100 * unt_covid / unt)
            print('  Irregulars: ', 100 * irr_covid / irr)
            print('  Regulars  : ', 100 * reg_covid / reg)

    print('\nAt study end, Infection case percentages are:')
    print(f'  Untreated : {100 * unt_covid / unt} of group size of {unt}')
    print(f'  Irregulars: {100 * irr_covid / irr} of group size of {irr}')
    print(f'  Regulars  : {100 * reg_covid / reg} of group size of {reg}')
    untreated = [
        s.had_covid for s in population if s.category == UNTREATED]
    irregular = [
        s.had_covid for s in population if s.category == IRREGULAR]
    regular = [s.had_covid for s in population if s.category == REGULAR]
    print('\nFinal statistics: ', kruskal(untreated, irregular, regular))


run_study()

Output:

Total subjects: 1,000

Day 30:
Untreated: N = 872, with infection = 25
Irregular Use: N = 128, with infection = 2
Regular Use: N = 0, with infection = 0

Day 60:
Untreated: N = 755, with infection = 55
Irregular Use: N = 222, with infection = 8
Regular Use: N = 23, with infection = 1

Day 90:
Untreated: N = 671, with infection = 70
Irregular Use: N = 219, with infection = 13
Regular Use: N = 110, with infection = 4

At midpoint, Infection case percentages are:
  Untreated :  10.432190760059612
  Irregulars:  5.936073059360731
  Regulars  :  3.6363636363636362

Day 120:
Untreated: N = 600, with infection = 88
Irregular Use: N = 189, with infection = 17
Regular Use: N = 211, with infection = 8

Day 150:
Untreated: N = 514, with infection = 108
Irregular Use: N = 194, with infection = 21
Regular Use: N = 292, with infection = 16

Day 180:
Untreated: N = 447, with infection = 119
Irregular Use: N = 189, with infection = 26
Regular Use: N = 364, with infection = 26

At study end, Infection case percentages are:
  Untreated : 26.62192393736018 of group size of 447
  Irregulars: 13.756613756613756 of group size of 189
  Regulars  : 7.142857142857143 of group size of 364

Final statistics:  KruskalResult(statistic=55.48204323818349, pvalue=8.95833684545873e-13)

Raku

Translation of: Phix

# 20221025 Raku programming solution

enum <UNTREATED REGULAR IRREGULAR>;
my \DOSE_FOR_REGULAR = 100;
my ($nSubjects,$duration,$interval) = 10000, 180, 30;
my (@dosage,@category,@hadcovid) := (0,UNTREATED,False)>>.&{($_ xx $nSubjects).Array};

sub update($pCovid=1e-3, $pStartTreatment=5e-3, $pRedose=¼, @dRange=<3 6 9>) {
   for 0 ..^ @dosage.elems -> \i { 
      unless @hadcovid[i] { 
         if rand < $pCovid { 
            @hadcovid[i] = True
         } else {
            my $dose = @dosage[i];
            if $dose==0 && rand < $pStartTreatment or $dose > 0 && rand < $pRedose {               
	           @dosage[i]   = $dose += @dRange.roll;
               @category[i] = ($dose > DOSE_FOR_REGULAR) ?? REGULAR !! IRREGULAR
            } 
         } 
      } 
   } 
}

sub kruskal (@sets) {
   my $n          = ( my @ranked = @sets>>.List.flat.sort ).elems;
   my @sr         = 0 xx @sets.elems;
   my $ix         = (@ranked.first: * == True, :k)+1, 
   my ($arf,$art) = ($ix, $ix+$n) >>/>> 2;

   for @sets.kv -> \i,@set { for @set -> $b { @sr[i] += $b ?? $art !! $arf } } 

   my $H = [+] @sr.kv.map: -> \i,\s { s*s/@sets[i].elems }
   return 12/($n*($n+1)) * $H - 3 * ($n + 1)
}

say "Total subjects: $nSubjects\n";

my ($midpoint,$unt,$irr,$reg,$hunt,$hirr,$hreg,@sunt,@sirr,@sreg)=$duration div 2;

for 1 .. $duration -> \day { 
   update();
   if day %% $interval or day == $duration or day == $midpoint { 
      @sunt = @hadcovid[ @category.grep: UNTREATED,:k ];
      @sirr = @hadcovid[ @category.grep: IRREGULAR,:k ]; 
      @sreg = @hadcovid[ @category.grep: REGULAR,  :k ]; 
      ($unt,$hunt,$irr,$hirr,$reg,$hreg)=(@sunt,@sirr,@sreg).map:{|(.elems,.sum)}
   } 
   if day %% $interval {
      printf "Day %d:\n",day;
      printf "Untreated:     N = %4d, with infection = %4d\n",  $unt,$hunt;
      printf "Irregular Use: N = %4d, with infection = %4d\n",  $irr,$hirr;
      printf "Regular Use:   N = %4d, with infection = %4d\n\n",$reg,$hreg
   } 
   if day == $midpoint | $duration { 
      my $stage = day == $midpoint ?? 'midpoint' !! 'study end';
      printf "At $stage, Infection case percentages are:\n";
      printf "  Untreated : %f\n",  100*$hunt/$unt;
      printf "  Irregulars: %f\n",  100*$hirr/$irr;
      printf "  Regulars  : %f\n\n",100*$hreg/$reg
   }    
}

printf "Final statistics: H = %f\n", kruskal ( @sunt, @sirr, @sreg )

Output:

Total subjects: 10000

Day 30:
Untreated:     N = 8616, with infection =  314
Irregular Use: N = 1383, with infection =   27
Regular Use:   N =    1, with infection =    0

Day 60:
Untreated:     N = 7511, with infection =  561
Irregular Use: N = 2308, with infection =   87
Regular Use:   N =  181, with infection =    1

Day 90:
Untreated:     N = 6571, with infection =  760
Irregular Use: N = 2374, with infection =  141
Regular Use:   N = 1055, with infection =   19

At midpoint, Infection case percentages are:
  Untreated : 11.565972
  Irregulars: 5.939343
  Regulars  : 1.800948

Day 120:
Untreated:     N = 5757, with infection =  909
Irregular Use: N = 2097, with infection =  189
Regular Use:   N = 2146, with infection =   61

Day 150:
Untreated:     N = 5092, with infection = 1054
Irregular Use: N = 1832, with infection =  238
Regular Use:   N = 3076, with infection =  132

Day 180:
Untreated:     N = 4530, with infection = 1172
Irregular Use: N = 1615, with infection =  282
Regular Use:   N = 3855, with infection =  229

At study end, Infection case percentages are:
  Untreated : 25.871965
  Irregulars: 17.461300
  Regulars  : 5.940337

Final statistics: H = 248.419454

Wren

Translation of: Python

Library: Wren-math

import "random" for Random
import "./math" for Nums

var Rand = Random.new()

var UNTREATED = 0
var IRREGULAR = 1
var REGULAR   = 2
var DOSE_FOR_REGULAR = 100

/* A subject for the study. */
class Subject {
    construct new() {
        _cumDose = 0
        _category = UNTREATED
        _hadCovid = false
        _updateCount = 0
    }

    cumDose     { _cumDose }
    category    { _category }
    hadCovid    { _hadCovid }
    updateCount { _updateCount }

    cumDose(d)     { _cumDose = d }
    category(c)    { _category = c }
    hadCovid(h)    { _hadCovid = h }
    updateCount(u) { _updateCount = u }

    // Daily update on the subject to check for infection and randomly dose.
    update(pCovid, pStartingTreatment, pRedose, dRange) {
        if (!_hadCovid) {
            if (Rand.float() < pCovid) {
                _hadCovid = true
            } else if ((_cumDose == 0 && Rand.float() < pStartingTreatment) ||
                       (_cumDose > 0 && Rand.float() < pRedose)) {
                _cumDose = _cumDose + Rand.sample(dRange)
                categorize()
            }
        }
        _updateCount = _updateCount + 1
    }

    // Update using default parameters.
    update() { update(0.001, 0.005, 0.25, [3, 6, 9]) }

    // Set treatment category based on cumulative treatment taken.
    categorize() {
        _category = (_cumDose == 0) ? UNTREATED :
                    (_cumDose >= DOSE_FOR_REGULAR) ? REGULAR : IRREGULAR
        return _category
    }
}

// a, b and c are assumed to be lists of boolean values.
var kruskal = Fn.new { |a, b, c|
    // map the bool values to 1 (true) or 0 (false).
    var aa = a.map { |e| e ? 1: 0 }.toList
    var bb = b.map { |e| e ? 1: 0 }.toList
    var cc = b.map { |e| e ? 1: 0 }.toList
    // aggregate and sort them
    var ss = (aa + bb + cc).sort()
    // find rank of first occurrence of 1
    var ix = ss.indexOf(1) + 1
    // calculate average ranks for 0 and 1
    var arf = (1 + ix - 1) / 2
    var n = ss.count
    var art = (ix + n) / 2
    // calculate sum of ranks for each list
    var sra = Nums.sum(a.map { |e| e ? art : arf })
    var srb = Nums.sum(b.map { |e| e ? art : arf })
    var src = Nums.sum(c.map { |e| e ? art : arf })
    // calculate H
    var H = 12/(n*(n+1)) * (sra*sra/a.count + srb*srb/b.count + src*src/c.count) - 3 * (n + 1)
    return H
}

// Run the study using the population of size 'N' for 'duration' days.
var runStudy = Fn.new { |numSubjects, duration, interval|
    var population = List.filled(numSubjects, null)
    for (i in 0...numSubjects) population[i] = Subject.new()
    var unt = 0
    var untCovid = 0
    var irr = 0
    var irrCovid = 0
    var reg = 0
    var regCovid = 0
    System.print("Total subjects: %(numSubjects)")
    for (day in 0...duration) {
        for (subj in population) subj.update()
        if ((day + 1) % interval == 0) {
            System.print("\nDay %(day + 1):")
            unt = population.count { |s| s.category == UNTREATED }
            untCovid = population.count { |s| s.category == UNTREATED && s.hadCovid }
            System.print("Untreated: N = %(unt), with infection = %(untCovid)")
            irr = population.count { |s| s.category == IRREGULAR }
            irrCovid = population.count { |s| s.category == IRREGULAR && s.hadCovid }
            reg = population.count { |s| s.category == REGULAR }
            regCovid = population.count { |s| s.category == REGULAR && s.hadCovid }
            System.print("Regular Use: N = %(reg), with infection = %(regCovid)")
        }
        if (day == (duration/2).floor - 1) {
            System.print("\nAt midpoint, Infection case percentages are:")
            System.print("  Untreated : %(100 * untCovid / unt)")
            System.print("  Irregulars: %(100 * irrCovid / irr)")
            System.print("  Regulars  : %(100 * regCovid / reg)")
        }
    }
    System.print("\nAt study end, Infection case percentages are:")
    System.print("  Untreated : %(100 * untCovid / unt) of group size of %(unt)")
    System.print("  Irregulars: %(100 * irrCovid / irr) of group size of %(irr)")
    System.print("  Regulars  : %(100 * regCovid / reg) of group size of %(reg)")
    var untreated = population.where { |s| s.category == UNTREATED }.map { |s| s.hadCovid }.toList
    var irregular = population.where { |s| s.category == IRREGULAR }.map { |s| s.hadCovid }.toList
    var regular   = population.where { |s| s.category == REGULAR   }.map { |s| s.hadCovid }.toList
    System.print("\nFinal statistics: H = %(kruskal.call(untreated, irregular, regular))")
}

runStudy.call(1000, 180, 30)

Output:

Sample run.

Total subjects: 1000

Day 30:
Untreated: N = 882, with infection = 32
Regular Use: N = 0, with infection = 0

Day 60:
Untreated: N = 783, with infection = 58
Regular Use: N = 13, with infection = 0

Day 90:
Untreated: N = 679, with infection = 82
Regular Use: N = 91, with infection = 0

At midpoint, Infection case percentages are:
  Untreated : 12.076583210604
  Irregulars: 4.3478260869565
  Regulars  : 0

Day 120:
Untreated: N = 592, with infection = 95
Regular Use: N = 186, with infection = 5

Day 150:
Untreated: N = 517, with infection = 106
Regular Use: N = 286, with infection = 19

Day 180:
Untreated: N = 459, with infection = 118
Regular Use: N = 375, with infection = 26

At study end, Infection case percentages are:
  Untreated : 25.708061002179 of group size of 459
  Irregulars: 15.060240963855 of group size of 166
  Regulars  : 6.9333333333333 of group size of 375

Final statistics: H = 395.09607330604