Talk:Kahan summation: Difference between revisions

Epsilon computation
(Epsilon computation)
Line 348:
</pre>
::&mdash;[[User:Sonia|Sonia]] ([[User talk:Sonia|talk]]) 01:53, 21 December 2014 (UTC)
 
 
==Epsilon computation==
The "Epsilon computation around 1" sub task should be a task by itself.
<lang python>epsilon = 1.0
while 1.0 + epsilon != 1.0:
epsilon = epsilon / 2.0</lang>
The "Epsilon computation around 1" sub task is a nice indicator about IEEE 754 floating point, used (or not) in the language implementation (specific compilers). Most of the time languages are not explicit about the precision required in floating point data type.
===IEEE 754===
IEEE 754 (1985 & 2008) floating point has 4 major data types:
<pre>
Precision Name Bits Mantissa Decimal precision
------------------ ---- --------- -----------------
simple precision 32 24 bits 5.9604E-9
double precision 64 53 bits 1.1102E-16
extended precision 80 64 bits 5.4210E-20
decimal128 120 34 digits 1.0000E-34
</pre>
C 99, Fortran 77, have IEEE-754 corresponding data types:
<pre>
IEEE 754 name GNU C Intel C Visual C Fortran 77 Fortran 95 VB .Net
------------------ ----------- ----------- -------- ---------- ---------------------- ----------
simple precision float float float real*4 SELECTED_REAL_KIND(8) Single
double precision double double double real*8 SELECTED_REAL_KIND(16) Double
extended precision long double long double n/a real*10 SELECTED_REAL_KIND(20) n/a
decimal128 __float128 __Quad n/a real*16 SELECTED_REAL_KIND(34) n/a
</pre>
In Microsoft Visual C long double is treated as double.
===Compilers===
The epsilon computation using different implementation of languages and different data types gives the
following results:
<pre>
Language Compiler Declaration N Epsilon IEEE-754
-------- -------- ------------ -- -------------------------- --------
C++ VC++ 6.0 float 53 1.110223E-16 Double
C++ VC++ 6.0 double 53 1.110223E-16 Double
C++ VC++ 6.0 long double 53 1.110223E-16 Double
Fortran Plato real*4 64 5.421011E-20 Extended
Fortran Plato real*8 64 5.421010862428E-20 Extended
Fortran Plato real*10 64 5.42101086242752217E-20 Extended
Fortran Plato real*16 64 5.42101086242752217E-20 Extended
Pascal Free real 64 5.42101086242752E-020 Extended
Pascal Free double 64 5.42101086242752E-020 Extended
Pascal Free extended 64 5.4210108624275222E-0020 Extended
Perl Strawberry 53 1.11022302462516e-016 Double
Python v335 53 1.1102230246251565e-16 Double
SmallBasic 1.2 94 1.0e-28 Fixed128*
VB6 VB 6.0 Single 24 5.960464E-08 Single
VB6 VB 6.0 Double 53 1.11022302462516E-16 Double
VBA VBA 7.1 Single 24 5.960464E-08 Single
VBA VBA 7.1 Double 53 1.11022302462516E-16 Double
VBScript Win 10 53 1.110223E-16 Double
VB.Net VS 2013 Single 53 1.110223E-16 Double
VB.Net VS 2013 Double 53 1.11022302462516E-16 Double
VB.Net VS 2013 Decimal 94 1.0e-28 Fixed128*
</pre>
N is the loop count.<br>
Fixed128 : Use of IEEE Decimal128 floating point to emulate fixed point arithmetic.<br>
 
It is interesting to see that several compilers do not use the different IEEE-754 precisions to implement the different data types.
The trade-off between compiler simplicity a runtime efficiency is: why to bother with different floating point precisions and all the implied cross conversion routines, why not use only the higher precision.<br>
--[[User:PatGarrett|PatGarrett]] ([[User talk:PatGarrett|talk]]) 16:43, 16 February 2019 (UTC)
1,392

edits