Jump to content

Floating point

From Rosetta Code

Floating point is a numeric system for approximating real numbers. Each floating-point number stores some digits and an exponent (plus a sign, which is either 1 or -1) taking the form

value = sign × digits × RADIXexponent

This design uses a constant RADIX and limits the maximum number of digits. Calculations are fast but inexact, because the limit on digits causes round-off errors. It should be noted that, with an appropriate exponent, a floating point number can represent a substantial range of integers exactly (though less than the range that could fit in the same space with a “pure” integer).

The most common floating-point formats in modern practice are those based on the IEEE 754 standard, in particular with the RADIX being 2, and the digits and exponent being a fixed number of binary digits that fit (together with the sign) in a piece of memory of size 32 bits (4 bytes, float) or 64 bits (8 bytes, double).

Cookies help us deliver our services. By using our services, you agree to our use of cookies.