Kahan summation: Difference between revisions
Content added Content deleted
m (→{{header|R}}) |
(reworked the sample code to make it (hopefully) more understandable) |
||
Line 367: | Line 367: | ||
=={{header|R}}== |
=={{header|R}}== |
||
{{incorrect|R|Not as clear and understandable as it should be. See talk page.}} |
|||
<lang r> |
|||
# Try using the task rules |
|||
R can only limit the number of digits being displayed, not the ones used in calculation. In fact one of the base types is '''numeric''', implemented as standard floating point number. |
|||
# 1. Do all arithmetic in decimal to a precision of six digits. |
|||
Therefore, trying to use the main rules will not work as we can see in the code below |
|||
# R can only limit the number of digits being displayed, not the ones used in calculation, all operations are floating point |
|||
<lang r> |
|||
# 2. Write a function/method/procedure/subroutine/... to perform Kahan summation |
|||
# an implementation of the Kahan summation algorithm |
|||
# on an ordered collection of numbers, (such as a list of numbers). |
|||
kahansum <- function(x) { |
kahansum <- function(x) { |
||
ks <- 0 |
ks <- 0 |
||
Line 389: | Line 386: | ||
} |
} |
||
# |
# The three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 respectively. |
||
a <- 10000.0 |
a <- 10000.0 |
||
b <- 3.14159 |
b <- 3.14159 |
||
c <- 2.71828 |
c <- 2.71828 |
||
# just to make sure, let's look at the classes of these three numbers |
|||
# 4. Show that the simple left-to-right summation, equivalent to (a + b) + c gives an answer of 10005.8 |
|||
sapply(c(a, b, c), FUN=Class) |
|||
# [1] "numeric" "numeric" "numeric" |
|||
# The simple left-to-right summation: (a + b) + c |
|||
(a + b) + c |
(a + b) + c |
||
# [1] 10005.86 |
# [1] 10005.86 |
||
# |
# The Kahan summation applied to the values a, b, c |
||
input <- c(a, b, c) |
input <- c(a, b, c) |
||
kahansum(input) |
kahansum(input) |
||
# [1] 10005.86 |
# [1] 10005.86 |
||
</lang> |
|||
Apparently there is no difference between the two approaches. So let's try the alternative steps given in the task. |
|||
We first calculate '''epsilon''' |
|||
<lang r> |
|||
epsilon = 1.0 |
epsilon = 1.0 |
||
while ((1.0 + epsilon) != 1.0) { |
while ((1.0 + epsilon) != 1.0) { |
||
Line 410: | Line 415: | ||
epsilon |
epsilon |
||
# [1] 1.11022e-16 |
# [1] 1.11022e-16 |
||
</lang> |
|||
and use it to test the left-to-right summation and the Kahan function |
|||
<lang r> |
|||
a <- 1.0 |
a <- 1.0 |
||
b <- epsilon |
b <- epsilon |
||
Line 422: | Line 431: | ||
kahansum(c(a, b, c)) |
kahansum(c(a, b, c)) |
||
# [1] 1 |
# [1] 1 |
||
</lang> |
|||
It might seem that again there is no difference, but let's explore a bit more and see if both results are really the same |
|||
# but, are they really the same or is an artifact? |
|||
<lang r> |
|||
(a + b) + c == kahansum(c(a, b, c)) |
(a + b) + c == kahansum(c(a, b, c)) |
||
# FALSE |
# FALSE |
||
Line 430: | Line 442: | ||
((a + b) + c) - kahansum(c(a, b, c)) |
((a + b) + c) - kahansum(c(a, b, c)) |
||
# [1] -1.110223e-16 |
# [1] -1.110223e-16 |
||
</lang> |
|||
The absolute value of the difference, is very close to the value we obtained for '''epsilon''' |
|||
Just to make sure we are not fooling ourselves, let's see if the value of epsilon is stable using different divisors for the generating function |
|||
<lang r> |
|||
# machine epsilon |
|||
mepsilon <- function(divisor) { |
|||
guess <- 1.0 |
|||
while ((1.0 + guess) != 1.0) { |
|||
guess <- guess / divisor |
|||
} |
|||
guess |
|||
} |
|||
# let's try from 2 to 1000 |
|||
epsilons <- sapply(2:1000, FUN=mepsilon) |
|||
summary(epsilons) |
|||
# Min. 1st Qu. Median Mean 3rd Qu. Max. |
|||
# 2.439e-19 1.905e-18 5.939e-18 1.774e-17 2.238e-17 1.110e-16 |
|||
</lang> |
|||
It would seem that it makes a differences what divisor we are using, let's look at the distribution using a text plotting library |
|||
<lang r> |
|||
library(txtplot) |
|||
txtboxplot(epsilons) |
|||
# 0 1e-17 2e-17 3e-17 4e-17 5e-17 |
|||
# |---+-----------+----------+-----------+-----------+-----------+--------| |
|||
# +----+------------------+ |
|||
# --| | |------------------------------------- |
|||
# +----+------------------+ |
|||
txtdensity(epsilons) |
|||
# 6e+16 +--+---------+----------+----------+---------+----------+----------+ |
|||
# | ** | |
|||
# | *** | |
|||
# 5e+16 + * ** + |
|||
# | * | |
|||
# 4e+16 + * + |
|||
# | * | |
|||
# | * | |
|||
# 3e+16 + ** + |
|||
# | * | |
|||
# 2e+16 + ** + |
|||
# | ** | |
|||
# | *** | |
|||
# 1e+16 + **** + |
|||
# | ************* | |
|||
# 0 + ************************************ + |
|||
# +--+---------+----------+----------+---------+----------+----------+ |
|||
# 0 2e-17 4e-17 6e-17 8e-17 1e-16 |
|||
</lang> |
|||
Definitely the epsilon generating function is not stable and gives a positively skewed (right-tailed) distribution. |
|||
We could consider using the median value of the series of epsilons we have estimated, but because the precision for the base numeric type (class) in R is ~16 decimals, using that value will be in practice indistinguishable from using zero. |
|||
<lang r> |
|||
epsilon <- median(epsilons) |
|||
epsilon |
|||
# [1] 5.939024e-18 |
|||
a <- 1.0 |
|||
b <- epsilon |
|||
c <- -epsilon |
|||
# left-to-right summation |
|||
(a + b) + c |
|||
# [1] 1 |
|||
# kahan summation |
|||
kahansum(c(a, b, c)) |
|||
# [1] 1 |
|||
# are they the same? |
|||
(a + b) + c == kahansum(c(a, b, c)) |
|||
# TRUE |
|||
</lang> |
</lang> |