Kahan summation: Difference between revisions
Content added Content deleted
m (→{{header|R}}) |
(reworked the sample code to make it (hopefully) more understandable) |
||
Line 367:
=={{header|R}}==
R can only limit the number of digits being displayed, not the ones used in calculation. In fact one of the base types is '''numeric''', implemented as standard floating point number.
Therefore, trying to use the main rules will not work as we can see in the code below
<lang r>
# an implementation of the Kahan summation algorithm
kahansum <- function(x) {
ks <- 0
Line 389 ⟶ 386:
}
#
a <- 10000.0
b <- 3.14159
c <- 2.71828
# just to make sure, let's look at the classes of these three numbers
sapply(c(a, b, c), FUN=Class)
# [1] "numeric" "numeric" "numeric"
# The simple left-to-right summation: (a + b) + c
(a + b) + c
# [1] 10005.86
#
input <- c(a, b, c)
kahansum(input)
# [1] 10005.86
</lang>
We first calculate '''epsilon'''
<lang r>
epsilon = 1.0
while ((1.0 + epsilon) != 1.0) {
Line 410 ⟶ 415:
epsilon
# [1] 1.11022e-16
</lang>
and use it to test the left-to-right summation and the Kahan function
<lang r>
a <- 1.0
b <- epsilon
Line 422 ⟶ 431:
kahansum(c(a, b, c))
# [1] 1
</lang>
It might seem that again there is no difference, but let's explore a bit more and see if both results are really the same
<lang r>
(a + b) + c == kahansum(c(a, b, c))
# FALSE
Line 430 ⟶ 442:
((a + b) + c) - kahansum(c(a, b, c))
# [1] -1.110223e-16
</lang>
The absolute value of the difference, is very close to the value we obtained for '''epsilon'''
Just to make sure we are not fooling ourselves, let's see if the value of epsilon is stable using different divisors for the generating function
<lang r>
# machine epsilon
mepsilon <- function(divisor) {
guess <- 1.0
while ((1.0 + guess) != 1.0) {
guess <- guess / divisor
}
guess
}
# let's try from 2 to 1000
epsilons <- sapply(2:1000, FUN=mepsilon)
summary(epsilons)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 2.439e-19 1.905e-18 5.939e-18 1.774e-17 2.238e-17 1.110e-16
</lang>
It would seem that it makes a differences what divisor we are using, let's look at the distribution using a text plotting library
<lang r>
library(txtplot)
txtboxplot(epsilons)
# 0 1e-17 2e-17 3e-17 4e-17 5e-17
# |---+-----------+----------+-----------+-----------+-----------+--------|
# +----+------------------+
# --| | |-------------------------------------
# +----+------------------+
txtdensity(epsilons)
# 6e+16 +--+---------+----------+----------+---------+----------+----------+
# | ** |
# | *** |
# 5e+16 + * ** +
# | * |
# 4e+16 + * +
# | * |
# | * |
# 3e+16 + ** +
# | * |
# 2e+16 + ** +
# | ** |
# | *** |
# 1e+16 + **** +
# | ************* |
# 0 + ************************************ +
# +--+---------+----------+----------+---------+----------+----------+
# 0 2e-17 4e-17 6e-17 8e-17 1e-16
</lang>
Definitely the epsilon generating function is not stable and gives a positively skewed (right-tailed) distribution.
We could consider using the median value of the series of epsilons we have estimated, but because the precision for the base numeric type (class) in R is ~16 decimals, using that value will be in practice indistinguishable from using zero.
<lang r>
epsilon <- median(epsilons)
epsilon
# [1] 5.939024e-18
a <- 1.0
b <- epsilon
c <- -epsilon
# left-to-right summation
(a + b) + c
# [1] 1
# kahan summation
kahansum(c(a, b, c))
# [1] 1
# are they the same?
(a + b) + c == kahansum(c(a, b, c))
# TRUE
</lang>
|