# Logistic curve fitting in epidemiology

Logistic curve fitting in epidemiology is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The least-squares method (see references below) in statistics is used to fit data to the best of a family of similar curves by finding the parameters for a curve which minimizes the total of the distances from each data point to the curve.

Often, the curve used is a straight line, in which case the method is also called linear regression. If a curve which uses logarithmic growth is fit, the method can be called logistic regression.

A commonly used family of functions used in statistical studies of populations, including the growth of epidemics, are curves akin to the logistic curve:

```   f(x) = L / (1 + e-k(x-x0))
```

Though predictions based on fitting to such curves may err, especially if used to extrapolate from incomplete data, curves similar to the logistic curve have had good fits in population studies, including modeling the growth of past epidemics.

• Given the following daily world totals since December 31, 2019 for persons

who have become infected with the novel coronavirus Covid-19:

```Daily totals:
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820,
4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273,
31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103,
69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339,
80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077,
95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497,
133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364,
271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652
```
• Use the following variant of the logistic curve as a formula:
```   f(t) = n0 e(r t) / ((1 + n0 (e(r t) - 1) / K)
```

Where r is the rate of growth of the infection in the population.

The R0 of an infection (different from r above) is a measure of how many new individuals will become infected for every individual currently infected. It is an important measure of how quickly an infectious disease may spread.

R0 is related to the logistic curve's r parameter by the formula

```   r ≈ ln(R0) / G
```

where G, the generation time, is roughly the sum of the incubation time, perhaps 5 days, and the mean contagion period, perhaps 7 days, so, for covid-19, roughly we have

```   R0 ≈ e12r
```

Assume the following constants hold in the formula above:

• K is the world population, about 7.8 billion
• n0 is 27, the number of cases found in China at the start of the pandemic.
• Demonstrate code that finds a least-squares fits of the curve to the data.
• Show the calculated r for the logistic curve.
• Show the final R0 parameter you calculate from the logistic curve r value parameter.

## Go

Library: lm

This uses the Levenberg-Marquardt method. <lang go>package main

import (

```   "fmt"
"github.com/maorshutman/lm"
"log"
"math"
```

)

const (

```   K  = 7_800_000_000 // approx world population
n0 = 27            // number of cases at day 0
```

)

var y = []float64{

```   27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652,
```

}

func f(dst, p []float64) {

```   for i := 0; i < len(y); i++ {
t := float64(i)
dst[i] = (n0*math.Exp(p[0]*t))/(1+n0*(math.Exp(p[0]*t)-1)/K) - y[i]
}
```

}

func main() {

```   j := lm.NumJac{Func: f}
prob := lm.LMProblem{
Dim:        1,
Size:       len(y),
Func:       f,
Jac:        j.Jac,
InitParams: []float64{0.5},
Tau:        1e-6,
Eps1:       1e-8,
Eps2:       1e-8,
}
res, err := lm.LM(prob, &lm.Settings{Iterations: 100, ObjectiveTol: 1e-16})
if err != nil {
log.Fatal(err)
}
r := res.X[0]
fmt.Printf("The logistic curve r for the world data is %.8f\n", r)
fmt.Printf("R0 is then approximately equal to %.7f\n", math.Exp(12*r))
```

}</lang>

Output:
```The logistic curve r for the world data is 0.11230218
R0 is then approximately equal to 3.8482793
```

## Julia

<lang julia>using LsqFit

const K = 7_800_000_000 # approximate world population const n0 = 27 # starting at day 0 with 27 Chinese cases

""" The model for logistic regression with a given r """ @. model(t, r) = (n0 * exp(r * t)) / (( 1 + n0 * (exp(r * t) - 1) / K))

1. Daily world totals of covid cases, all countries

ydata = [ 27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652, ] tdata = collect(LinRange(0.0, 96, 97))

1. starting approximation for r of 1/2

rparam = [0.5]

fit = curve_fit(model, tdata, ydata, rparam)

1. Our answer for r given the world data and simplistic model

r = fit.param println("The logistic curve r for the world data is: ", r) println("The confidence interval at 5% significance is: ",

```   confidence_interval(fit, 0.05))
```

println("Since R0 ≈ exp(G * r), and G ≈ 12 days, R0 ≈ ", exp(12r[1]))

</lang>

Output:
```The logistic curve r for the world data is: [0.11230217572265622]
The confidence interval at 5% significance is: [(0.11199074156706985, 0.11261360987824258)]
Since R0 ≈ exp(G * r), and G ≈ 12 days, R0 ≈ 3.8482792820761063
```

## Perl

Translation of: Phix

<lang perl>use strict; use warnings;

my \$K = 7_800_000_000; # population my \$n0 = 27; # cases @ day 0

my @y = (

```   27,     27,     27,     44,     44,     59,     59,     59,    59,      59,     59,     59,     59,
60,     60,     61,     61,     66,     83,    219,    239,    392,    534,    631,    897,   1350,
2023,   2820,   4587,   6067,   7823,   9826,  11946,  14554,  17372,  20615,  24522,  28273,  31491,
34933,  37552,  40540,  43105,  45177,  60328,  64543,  67103,  69265,  71332,  73327,  75191,  75723,
76719,  77804,  78812,  79339,  80132,  80995,  82101,  83365,  85203,  87024,  89068,  90664,  93077,
95316,  98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096,
```

194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353, 777796, 851308, 928436,1000249,1082054,1174652 );

sub logistic_func {

```   my(\$r) = @_;
my \$sq = 0;
for my \$i (0 .. @y-1) {
my \$eri = exp(\$r * \$i);
my \$dst = (\$n0 * \$eri) / (1 + \$n0 * (\$eri-1) / \$K) - \$y[\$i];
\$sq = \$sq + \$dst**2;
}
\$sq
```

}

sub solve {

```   my(\$fn, \$guess, \$epsilon) = @_;
my(\$nfm,\$nfp);
my \$f0 = &\$fn(\$guess);
my \$delta = \$guess;
my \$factor = 2;
while (\$delta > \$epsilon) {
(\$nfm = &\$fn(\$guess - \$delta)) < \$f0 ?
(\$f0 = \$nfm, \$guess -= \$delta, \$delta *= \$factor)
:
(\$nfp = &\$fn(\$guess + \$delta)) < \$f0 ?
(\$f0 = \$nfp, \$guess += \$delta, \$delta *= \$factor)
:
\$delta /= \$factor
}
\$guess
```

}

my \$r = solve(\&logistic_func, 0.5, 0); my \$R0 = exp(12 * \$r); printf "r = %%(%.3f), R0 = %%(%.3f)\n", \$r, \$R0;</lang>

Output:
`r = %(0.112), R0 = %(3.848)`

## Phix

Simplified, my interpretation of shift-cutting (from wp:Non-linear_least_squares) <lang Phix>-- demo\rosetta\Curve_fit.exw constant K = 7_800_000_000, -- approx world population

```        n0 = 27                -- number of cases at day 0

```

sequence actual = {

```   27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652,
```

}

function f(atom r)

```   atom sq = 0
for i=1 to length(actual) do
atom eri = exp(r*(i-1)),
guess = (n0*eri)/(1+n0*(eri-1)/K),
diff = guess-actual[i]
sq += diff*diff
end for
return sq
```

end function

function solve(integer f, atom guess=0.5, epsilon=0)

```   atom f0 = f(guess),
delta = iff(guess?guess:1),
factor = 2 -- double until f0 best, then
-- halve until delta<=epsilon
--  or we hit precision limit
while delta>epsilon             -- (predefined limit)
and guess!=guess-delta do     -- (precision limit)
atom nf = f(guess-delta)
if nf<f0 then
f0 = nf
guess -= delta
else
nf = f(guess+delta)
if nf<f0 then
f0 = nf
guess += delta
else
factor = 0.5
end if
end if
delta *= factor
end while
return guess
```

end function

atom r = solve(f),

```    R0 = exp(12 * r)
```

printf(1,"r = %.10f, R0 = %.8f\n",{r,R0})</lang>

Output:
```r = 0.1123021757, R0 = 3.84827928
```

## Python

Uses NumPy/SciPy's optimize package. <lang python>import numpy as np import scipy.optimize as opt

n0, K = 27, 7_800_000_000

def f(t, r):

```   return (n0 * np.exp(r * t)) / (( 1 + n0 * (np.exp(r * t) - 1) / K))
```

y = [ 27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652, ] x = np.linspace(0.0, 96, 97)

r, cov = opt.curve_fit(f, x, y, [0.5])

1. Our answer for r given the world data and simplistic model

print("The r for the world Covid-19 data is:", r,

```   ", with covariance of", cov)
```

print("The calculated R0 is then", np.exp(12 * r))

</lang>

Output:
```The r for the world Covid-19 data is: [0.11230218] , with covariance of [[2.46164331e-08]]
The calculated R0 is then [3.8482793]
```

## R

<lang R> library(minpack.lm)

K <- 7800000000 # approximate world population n0 <- 27 # starting at day 0 with 27 Chinese cases

x <- seq(from=0.0, 96.0, length=97)

1. The model for logistic regression with a given r0

getPred <- function(par, xx) (n0 * exp(par\$r * xx)) / (( 1 + n0 * (exp(par\$r * xx) - 1) / K))

residFun <- function(p, observed, xx) observed - getPred(p, xx)

parStart <- list(r=0.5)

1. Daily world totals of covid cases, all countries

observed <- c(

``` 27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652)
```

nls.out <- nls.lm(par=parStart, fn=residFun, observed=observed, xx = x)

cat("The r for the model is: ", coef(nls.out), "\n")

cat("Therefore, R0 is approximately: ", exp(12 * coef(nls.out)))

</lang>

Output:
```The r for the model is:  0.1123022
Therefore, R0 is approximately:  3.848279
```

## Wren

Translation of: Phix

<lang ecmascript>var K = 7800000000 // approx world population var n0 = 27 // number of cases at day 0

var y = [

```   27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652
```

]

var exp = Fn.new { |x|

```   var e = 2.718281828459045
return e.pow(x)
```

}

var f = Fn.new { |r|

```   var sq = 0
for (i in 0...y.count) {
var eri = exp.call(r*i)
var dst = (n0*eri)/(1+n0*(eri-1)/K) - y[i]
sq = sq + dst * dst
}
return sq
```

}

var solve = Fn.new { |f, guess, epsilon|

```   var f0 = f.call(guess)
var delta = guess
var factor = 2 // double until f0 best then halve until delta <= epsilon
while (delta > epsilon) {
var nf = f.call(guess - delta)
if (nf < f0) {
f0 = nf
guess = guess - delta
} else {
nf = f.call(guess + delta)
if (nf < f0) {
f0 = nf
guess = guess + delta
} else {
factor = 0.5
}
}
delta = delta * factor
}
return guess
```

}

var r = (solve.call(f, 0.5, 0) * 1e10).round / 1e10 var R0 = (exp.call(12 * r) * 1e8).round / 1e8 System.print("r = %(r), R0 = %(R0)")</lang>

Output:
```r = 0.1123021757, R0 = 3.84827928
```