Kahan summation: Difference between revisions

Content added Content deleted

Inline

@@ Line 367: / Line 367: @@
 =={{header|R}}==
-{{incorrect|R|Not as clear and understandable as it should be. See talk page.}}
-<lang r>
-# Try using the task rules
+R can only limit the number of digits being displayed, not the ones used in calculation. In fact one of the base types is '''numeric''', implemented as standard floating point number.
-# 1. Do all arithmetic in decimal to a precision of six digits.
+Therefore, trying to use the main rules will not work as we can see in the code below
-# R can only limit the number of digits being displayed, not the ones used in calculation, all operations are floating point
+<lang r>
-# 2. Write a function/method/procedure/subroutine/... to perform Kahan summation
+# an implementation of the Kahan summation algorithm
-# on an ordered collection of numbers, (such as a list of numbers).
 kahansum <- function(x) {
     ks <- 0
@@ Line 389: / Line 386: @@
 }
-# 3. Create the three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 respectively.
+# The three numbers a, b, c equal to 10000.0, 3.14159, 2.71828 respectively.
 a <- 10000.0
 b <- 3.14159
 c <- 2.71828
+# just to make sure, let's look at the classes of these three numbers
-# 4. Show that the simple left-to-right summation, equivalent to (a + b) + c gives an answer of 10005.8
+sapply(c(a, b, c), FUN=Class)
+# [1] "numeric" "numeric" "numeric"
+# The simple left-to-right summation: (a + b) + c
 (a + b) + c
 # [1] 10005.86
-# 5. Show that the Kahan function applied to the sequence of values a, b, c results in the more precise answer of 10005.9
+# The Kahan summation applied to the values a, b, c
 input <- c(a, b, c)
 kahansum(input)
 # [1] 10005.86
+</lang>
-# as there is no difference let's try the subtask
+Apparently there is no difference between the two approaches. So let's try the alternative steps given in the task.
+We first calculate '''epsilon'''
+<lang r>
 epsilon = 1.0
 while ((1.0 + epsilon) != 1.0) {
@@ Line 410: / Line 415: @@
 epsilon
 # [1] 1.11022e-16
+</lang>
+and use it to test the left-to-right summation and the Kahan function
+<lang r>
 a <- 1.0
 b <- epsilon
@@ Line 422: / Line 431: @@
 kahansum(c(a, b, c))
 # [1] 1
+</lang>
+It might seem that again there is no difference, but let's explore a bit more and see if both results are really the same
-# but, are they really the same or is an artifact?
+<lang r>
 (a + b) + c == kahansum(c(a, b, c))
 # FALSE
@@ Line 430: / Line 442: @@
 ((a + b) + c) - kahansum(c(a, b, c))
 # [1] -1.110223e-16
+</lang>
+The absolute value of the difference, is very close to the value we obtained for '''epsilon'''
+Just to make sure we are not fooling ourselves, let's see if the value of epsilon is stable using different divisors for the generating function
+<lang r>
+# machine epsilon
+mepsilon <- function(divisor) {
+  guess <- 1.0
+  while ((1.0 + guess) != 1.0) {
+    guess <- guess / divisor
+  }
+  guess
+}
+# let's try from 2 to 1000
+epsilons <- sapply(2:1000, FUN=mepsilon)
+summary(epsilons)
+#      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
+# 2.439e-19 1.905e-18 5.939e-18 1.774e-17 2.238e-17 1.110e-16
+</lang>
+It would seem that it makes a differences what divisor we are using, let's look at the distribution using a text plotting library
+<lang r>
+library(txtplot)
+txtboxplot(epsilons)
+#     0         1e-17      2e-17       3e-17       4e-17       5e-17
+# |---+-----------+----------+-----------+-----------+-----------+--------|
+#       +----+------------------+
+#     --|    |                  |-------------------------------------
+#       +----+------------------+
+txtdensity(epsilons)
+# 6e+16 +--+---------+----------+----------+---------+----------+----------+
+#       |   **                                                             |
+#       |  ***                                                             |
+# 5e+16 +  * **                                                            +
+#       |     *                                                            |
+# 4e+16 +     *                                                            +
+#       |      *                                                           |
+#       |      *                                                           |
+# 3e+16 +      **                                                          +
+#       |       *                                                          |
+# 2e+16 +       **                                                         +
+#       |        **                                                        |
+#       |         ***                                                      |
+# 1e+16 +           ****                                                   +
+#       |              *************                                       |
+#     0 +                          ************************************    +
+#       +--+---------+----------+----------+---------+----------+----------+
+#          0       2e-17      4e-17      6e-17     8e-17      1e-16
+</lang>
+Definitely the epsilon generating function is not stable and gives a positively skewed (right-tailed) distribution.
+We could consider using the median value of the series of epsilons we have estimated, but because the precision for the base numeric type (class) in R is ~16 decimals, using that value will be in practice indistinguishable from using zero.
+<lang r>
+epsilon <- median(epsilons)
+epsilon
+# [1] 5.939024e-18
+a <- 1.0
+b <- epsilon
+c <- -epsilon
+# left-to-right summation
+(a + b) + c
+# [1] 1
+# kahan summation
+kahansum(c(a, b, c))
+# [1] 1
+# are they the same?
+(a + b) + c == kahansum(c(a, b, c))
+# TRUE
 </lang>