I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

Logistic Curve Fitting in Epidemiology

From Rosetta Code
Logistic Curve Fitting in Epidemiology is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

The least-squares method (see references below) in statistics is used to fit data to the best of a family of similar curves by finding the parameters for a curve which minimizes the total of the distances from each data point to the curve.

Often, the curve used is a straight line, in which case the method is also called linear regression. If a curve which uses logarithmic growth is fit, the method can be called logistic regression.

A commonly used family of functions used in statistical studies of populations, including the growth of epidemics, are curves akin to the logistic curve:

   f(x) = L / (1 + e-k(x-x0))

Though predictions based on fitting to such curves may err, especially if used to extrapolate from incomplete data, curves similar to the logistic curve have had good fits in population studies, including modeling the growth of past epidemics.

The task:

Task
  • Given the following daily world totals since December 31, 2019 for persons

who have become infected with the novel coronavirus Covid-19:

Daily totals:
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023, 2820,
4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273,
31491, 34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103,
69265, 71332, 73327, 75191, 75723, 76719, 77804, 78812, 79339, 
80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077, 
95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 
133852, 143227, 151367, 167418, 180096, 194836, 213150, 242364, 
271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 
656866, 715353, 777796, 851308, 928436, 1000249, 1082054, 1174652
  • Use the following variant of the logistic curve as a formula:
   f(t) = n0 e(r t) / ((1 + n0 (e(r t) - 1) / K)

Where r is the rate of growth of the infection in the population.

The R0 of an infection (different from r above) is a measure of how many new individuals will become infected for every individual currently infected. It is an important measure of how quickly an infectious disease may spread.

R0 is related to the logistic curve's r parameter by the formula

   r ≈ ln(R0) / G

where G, the generation time, is roughly the sum of the incubation time, perhaps 5 days, and the mean contagion period, perhaps 7 days, so, for covid-19, roughly we have

   R0 ≈ e12r

Assume the following constants hold in the formula above:

  • K is the world population, about 7.8 billion
  • n0 is 27, the number of cases found in China at the start of the pandemic.
Task
  • Demonstrate code that finds a least-squares fits of the curve to the data.
  • Show the calculated r for the logistic curve.
  • Show the final R0 parameter you calculate from the logistic curve r value parameter.
See also


Go[edit]

Library: lm

This uses the Levenberg-Marquardt method.

package main
 
import (
"fmt"
"github.com/maorshutman/lm"
"log"
"math"
)
 
const (
K = 7_800_000_000 // approx world population
n0 = 27 // number of cases at day 0
)
 
var y = []float64{
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652,
}
 
func f(dst, p []float64) {
for i := 0; i < len(y); i++ {
t := float64(i)
dst[i] = (n0*math.Exp(p[0]*t))/(1+n0*(math.Exp(p[0]*t)-1)/K) - y[i]
}
}
 
func main() {
j := lm.NumJac{Func: f}
prob := lm.LMProblem{
Dim: 1,
Size: len(y),
Func: f,
Jac: j.Jac,
InitParams: []float64{0.5},
Tau: 1e-6,
Eps1: 1e-8,
Eps2: 1e-8,
}
res, err := lm.LM(prob, &lm.Settings{Iterations: 100, ObjectiveTol: 1e-16})
if err != nil {
log.Fatal(err)
}
r := res.X[0]
fmt.Printf("The logistic curve r for the world data is %.8f\n", r)
fmt.Printf("R0 is then approximately equal to %.7f\n", math.Exp(12*r))
}
Output:
The logistic curve r for the world data is 0.11230218
R0 is then approximately equal to 3.8482793

Julia[edit]

using LsqFit
 
const K = 7_800_000_000 # approximate world population
const n0 = 27 # starting at day 0 with 27 Chinese cases
 
""" The model for logistic regression with a given r """
@. model(t, r) = (n0 * exp(r * t)) / (( 1 + n0 * (exp(r * t) - 1) / K))
 
# Daily world totals of covid cases, all countries
ydata = [
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652,
]
tdata = collect(LinRange(0.0, 96, 97))
 
# starting approximation for r of 1/2
rparam = [0.5]
 
fit = curve_fit(model, tdata, ydata, rparam)
 
# Our answer for r given the world data and simplistic model
r = fit.param
println("The logistic curve r for the world data is: ", r)
println("The confidence interval at 5% significance is: ",
confidence_interval(fit, 0.05))
println("Since R0 ≈ exp(G * r), and G ≈ 12 days, R0 ≈ ", exp(12r[1]))
 
Output:
The logistic curve r for the world data is: [0.11230217572265622]
The confidence interval at 5% significance is: [(0.11199074156706985, 0.11261360987824258)]
Since R0 ≈ exp(G * r), and G ≈ 12 days, R0 ≈ 3.8482792820761063

Perl[edit]

Translation of: Phix
use strict;
use warnings;
 
my $K = 7_800_000_000; # population
my $n0 = 27; # cases @ day 0
 
my @y = (
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59,
60, 60, 61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350,
2023, 2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615, 24522, 28273, 31491,
34933, 37552, 40540, 43105, 45177, 60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365, 85203, 87024, 89068, 90664, 93077,
95316, 98172, 102133, 105824, 109695, 114232, 118610, 125497, 133852, 143227, 151367, 167418, 180096,
194836, 213150, 242364, 271106, 305117, 338133, 377918, 416845, 468049, 527767, 591704, 656866, 715353,
777796, 851308, 928436,1000249,1082054,1174652
);
 
sub logistic_func {
my($r) = @_;
my $sq = 0;
for my $i (0 .. @y-1) {
my $eri = exp($r * $i);
my $dst = ($n0 * $eri) / (1 + $n0 * ($eri-1) / $K) - $y[$i];
$sq = $sq + $dst**2;
}
$sq
}
 
sub solve {
my($fn, $guess, $epsilon) = @_;
my($nfm,$nfp);
my $f0 = &$fn($guess);
my $delta = $guess;
my $factor = 2;
while ($delta > $epsilon) {
($nfm = &$fn($guess - $delta)) < $f0 ?
($f0 = $nfm, $guess -= $delta, $delta *= $factor)
:
($nfp = &$fn($guess + $delta)) < $f0 ?
($f0 = $nfp, $guess += $delta, $delta *= $factor)
:
$delta /= $factor
}
$guess
}
 
my $r = solve(\&logistic_func, 0.5, 0);
my $R0 = exp(12 * $r);
printf "r = %%(%.3f), R0 = %%(%.3f)\n", $r, $R0;
Output:
r = %(0.112), R0 = %(3.848)

Phix[edit]

Simplified, my interpretation of shift-cutting (from wp:Non-linear_least_squares)

-- demo\rosetta\Curve_fit.exw
constant K = 7_800_000_000, -- approx world population
n0 = 27, -- number of cases at day 0
actual = {
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652 }
 
function f(atom r)
atom sq = 0
for i=1 to length(actual) do
atom eri = exp(r*(i-1)),
guess = (n0*eri)/(1+n0*(eri-1)/K),
diff = guess-actual[i]
sq += diff*diff
end for
return sq
end function
 
function solve(integer f, atom guess=0.5, epsilon=0)
atom f0 = f(guess),
delta = iff(guess?guess:1),
factor = 2 -- double until f0 best, then
-- halve until delta<=epsilon
-- or we hit precision limit
while delta>epsilon -- (predefined limit)
and guess!=guess-delta do -- (precision limit)
atom nf = f(guess-delta)
if nf<f0 then
f0 = nf
guess -= delta
else
nf = f(guess+delta)
if nf<f0 then
f0 = nf
guess += delta
else
factor = 0.5
end if
end if
delta *= factor
end while
return guess
end function
 
atom r = solve(f),
R0 = exp(12 * r)
printf(1,"r = %.10f, R0 = %.8f\n",{r,R0})
Output:
r = 0.1123021757, R0 = 3.84827928

Python[edit]

Uses NumPy/SciPy's optimize package.

import numpy as np
import scipy.optimize as opt
 
n0, K = 27, 7_800_000_000
 
def f(t, r):
return (n0 * np.exp(r * t)) / (( 1 + n0 * (np.exp(r * t) - 1) / K))
 
y = [
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652,
]
x = np.linspace(0.0, 96, 97)
 
r, cov = opt.curve_fit(f, x, y, [0.5])
 
# Our answer for r given the world data and simplistic model
print("The r for the world Covid-19 data is:", r,
", with covariance of", cov)
print("The calculated R0 is then", np.exp(12 * r))
 
Output:
The r for the world Covid-19 data is: [0.11230218] , with covariance of [[2.46164331e-08]]
The calculated R0 is then [3.8482793]

R[edit]

 
library(minpack.lm)
 
K <- 7800000000 # approximate world population
n0 <- 27 # starting at day 0 with 27 Chinese cases
 
x <- seq(from=0.0, 96.0, length=97)
 
# The model for logistic regression with a given r0
getPred <- function(par, xx) (n0 * exp(par$r * xx)) / (( 1 + n0 * (exp(par$r * xx) - 1) / K))
 
residFun <- function(p, observed, xx) observed - getPred(p, xx)
 
parStart <- list(r=0.5)
 
# Daily world totals of covid cases, all countries
observed <- c(
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652)
 
nls.out <- nls.lm(par=parStart, fn=residFun, observed=observed, xx = x)
 
cat("The r for the model is: ", coef(nls.out), "\n")
 
cat("Therefore, R0 is approximately: ", exp(12 * coef(nls.out)))
 
Output:
The r for the model is:  0.1123022 
Therefore, R0 is approximately:  3.848279

Raku[edit]

Works with: Rakudo version 2020.05
Translation of: Perl

Original task numbers of cases per day as of April 05 plus updated, as of today, May 11 numbers.

my $K  = 7_800_000_000; #  population
my $n0 = 27; # cases @ day 0
 
my @Apr05 = <
27 27 27 44 44 59 59 59 59 59
59 59 59 60 60 61 61 66 83 219
239 392 534 631 897 1350 2023 2820 4587 6067
7823 9826 11946 14554 17372 20615 24522 28273 31491 34933
37552 40540 43105 45177 60328 64543 67103 69265 71332 73327
75191 75723 76719 77804 78812 79339 80132 80995 82101 83365
85203 87024 89068 90664 93077 95316 98172 102133 105824 109695
114232 118610 125497 133852 143227 151367 167418 180096 194836 213150
242364 271106 305117 338133 377918 416845 468049 527767 591704 656866
715353 777796 851308 928436 1000249 1082054 1174652
>;
 
my @May11 = <
27 27 27 44 44 59 59 59 59 59
59 59 59 60 60 61 61 66 83 219
239 392 534 631 897 1350 2023 2820 4587 6067
7823 9826 11946 14554 17372 20615 24522 28273 31491 34933
37552 40540 43105 45177 60328 64543 67103 69265 71332 73327
75191 75723 76719 77804 78812 79339 80132 80995 82101 83365
85203 87026 89068 90865 93077 95316 98172 102133 105824 109695
114235 118613 125497 133853 143259 154774 167418 180094 194843 213149
242374 271116 305235 338235 377968 416881 468092 527839 591796 656834
715377 777187 851587 928491 1006063 1087822 1174306 1245601 1316988 1391881
1476792 1563819 1653160 1734868 1807256 1873639 1953786 2033745 2117297 2199460
2278484 2350993 2427353 2513399 2579823 2657910 2731217 2832750 2915977 2981542
3054404 3131487 3216467 3308341 3389459 3468047 3545486 3624789 3714816 3809262
3899379 3986931 4063525
>;
 
sub logistic-func ($rate, @y) {
my $sq = 0;
for ^@y -> $time {
my $ert = exp($rate * $time);
my $δt = ($n0 * $ert) / (1 + $n0 * ($ert-1) / $K) - @y[$time];
$sq += $δt²;
}
$sq
}
 
sub solve (&f, $guess,, @y) {
my $f-minus;
my $f-plus;
my $rate = $guess;
my $f= f $rate, @y;
my= $rate;
my $factor = 2;
while> ε {
($f-minus = f $rate -, @y) < $f??
do {
$f= $f-minus;
$rate -=;
*= $factor;
} !!
($f-plus = f $rate +, @y) < $f??
do {
$f= $f-plus;
$rate +=;
*= $factor;
} !!
/= $factor
}
$rate
}
 
for @Apr05, 'Dec 31 - Apr 5',
@May11, 'Dec 31 - May 11' -> @y, $period {
my $rate = solve(&logistic-func, 0.5, 0, @y);
my $R0 = exp(12 * $rate);
say "\nFor period: $period";
say "Instantaneous rate of growth: r = " , $rate.fmt('%08f');
say "Reproductive rate: R0 = ", $R0.fmt('%08f');
}
 
Output:
For period: Dec 31 - Apr 5
Instantaneous rate of growth: r = 0.112302
Reproductive rate: R0 = 3.848279

For period: Dec 31 - May 11
Instantaneous rate of growth: r = 0.093328
Reproductive rate: R0 = 3.064672

Wren[edit]

Translation of: Phix
Library: Wren-math
import "/math" for Math
 
var K = 7800000000 // approx world population
var n0 = 27 // number of cases at day 0
 
var y = [
27, 27, 27, 44, 44, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60,
61, 61, 66, 83, 219, 239, 392, 534, 631, 897, 1350, 2023,
2820, 4587, 6067, 7823, 9826, 11946, 14554, 17372, 20615,
24522, 28273, 31491, 34933, 37552, 40540, 43105, 45177,
60328, 64543, 67103, 69265, 71332, 73327, 75191, 75723,
76719, 77804, 78812, 79339, 80132, 80995, 82101, 83365,
85203, 87024, 89068, 90664, 93077, 95316, 98172, 102133,
105824, 109695, 114232, 118610, 125497, 133852, 143227,
151367, 167418, 180096, 194836, 213150, 242364, 271106,
305117, 338133, 377918, 416845, 468049, 527767, 591704,
656866, 715353, 777796, 851308, 928436, 1000249, 1082054,
1174652
]
 
var f = Fn.new { |r|
var sq = 0
for (i in 0...y.count) {
var eri = Math.exp(r*i)
var dst = (n0*eri)/(1+n0*(eri-1)/K) - y[i]
sq = sq + dst * dst
}
return sq
}
 
var solve = Fn.new { |f, guess, epsilon|
var f0 = f.call(guess)
var delta = guess
var factor = 2 // double until f0 best then halve until delta <= epsilon
while (delta > epsilon) {
var nf = f.call(guess - delta)
if (nf < f0) {
f0 = nf
guess = guess - delta
} else {
nf = f.call(guess + delta)
if (nf < f0) {
f0 = nf
guess = guess + delta
} else {
factor = 0.5
}
}
delta = delta * factor
}
return guess
}
 
var r = (solve.call(f, 0.5, 0) * 1e10).round / 1e10
var R0 = (Math.exp(12 * r) * 1e8).round / 1e8
System.print("r = %(r), R0 = %(R0)")
Output:
r = 0.1123021757, R0 = 3.84827928