Jump to content

Kahan summation: Difference between revisions

reworked the sample code to make it (hopefully) more understandable
(reworked the sample code to make it (hopefully) more understandable)
Line 367:
 
=={{header|R}}==
{{incorrect|R|Not as clear and understandable as it should be. See talk page.}}
<lang r>
# Try using the task rules
 
R can only limit the number of digits being displayed, not the ones used in calculation. In fact one of the base types is '''numeric''', implemented as standard floating point number.
# 1. Do all arithmetic in decimal to a precision of six digits.
 
Therefore, trying to use the main rules will not work as we can see in the code below
# R can only limit the number of digits being displayed, not the ones used in calculation, all operations are floating point
 
<lang r>
# 2. Write a function/method/procedure/subroutine/... to perform Kahan summation
# an implementation of the Kahan summation algorithm
# on an ordered collection of numbers, (such as a list of numbers).
kahansum <- function(x) {
ks <- 0
Line 389 ⟶ 386:
}
 
# 3. Create theThe three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 respectively.
a <- 10000.0
b <- 3.14159
c <- 2.71828
 
# just to make sure, let's look at the classes of these three numbers
# 4. Show that the simple left-to-right summation, equivalent to (a + b) + c gives an answer of 10005.8
sapply(c(a, b, c), FUN=Class)
# [1] "numeric" "numeric" "numeric"
 
# The simple left-to-right summation: (a + b) + c
(a + b) + c
# [1] 10005.86
 
# 5. Show that theThe Kahan functionsummation applied to the sequence of values a, b, c results in the more precise answer of 10005.9
input <- c(a, b, c)
kahansum(input)
# [1] 10005.86
</lang>
 
# asApparently there is no difference between the two approaches. So let's try the subtaskalternative steps given in the task.
 
We first calculate '''epsilon'''
<lang r>
epsilon = 1.0
while ((1.0 + epsilon) != 1.0) {
Line 410 ⟶ 415:
epsilon
# [1] 1.11022e-16
</lang>
 
and use it to test the left-to-right summation and the Kahan function
 
<lang r>
a <- 1.0
b <- epsilon
Line 422 ⟶ 431:
kahansum(c(a, b, c))
# [1] 1
</lang>
 
It might seem that again there is no difference, but let's explore a bit more and see if both results are really the same
# but, are they really the same or is an artifact?
 
<lang r>
(a + b) + c == kahansum(c(a, b, c))
# FALSE
Line 430 ⟶ 442:
((a + b) + c) - kahansum(c(a, b, c))
# [1] -1.110223e-16
</lang>
 
The absolute value of the difference, is very close to the value we obtained for '''epsilon'''
 
Just to make sure we are not fooling ourselves, let's see if the value of epsilon is stable using different divisors for the generating function
 
 
<lang r>
# machine epsilon
mepsilon <- function(divisor) {
guess <- 1.0
while ((1.0 + guess) != 1.0) {
guess <- guess / divisor
}
guess
}
 
# let's try from 2 to 1000
epsilons <- sapply(2:1000, FUN=mepsilon)
summary(epsilons)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 2.439e-19 1.905e-18 5.939e-18 1.774e-17 2.238e-17 1.110e-16
</lang>
 
It would seem that it makes a differences what divisor we are using, let's look at the distribution using a text plotting library
 
<lang r>
library(txtplot)
 
txtboxplot(epsilons)
# 0 1e-17 2e-17 3e-17 4e-17 5e-17
# |---+-----------+----------+-----------+-----------+-----------+--------|
# +----+------------------+
# --| | |-------------------------------------
# +----+------------------+
 
txtdensity(epsilons)
# 6e+16 +--+---------+----------+----------+---------+----------+----------+
# | ** |
# | *** |
# 5e+16 + * ** +
# | * |
# 4e+16 + * +
# | * |
# | * |
# 3e+16 + ** +
# | * |
# 2e+16 + ** +
# | ** |
# | *** |
# 1e+16 + **** +
# | ************* |
# 0 + ************************************ +
# +--+---------+----------+----------+---------+----------+----------+
# 0 2e-17 4e-17 6e-17 8e-17 1e-16
</lang>
 
Definitely the epsilon generating function is not stable and gives a positively skewed (right-tailed) distribution.
 
We could consider using the median value of the series of epsilons we have estimated, but because the precision for the base numeric type (class) in R is ~16 decimals, using that value will be in practice indistinguishable from using zero.
 
<lang r>
epsilon <- median(epsilons)
epsilon
# [1] 5.939024e-18
 
a <- 1.0
b <- epsilon
c <- -epsilon
 
# left-to-right summation
(a + b) + c
# [1] 1
 
# kahan summation
kahansum(c(a, b, c))
# [1] 1
 
# are they the same?
(a + b) + c == kahansum(c(a, b, c))
# TRUE
</lang>
Cookies help us deliver our services. By using our services, you agree to our use of cookies.