Coding with float

Some people trust computer to a degree that they will accept its computation as correct when algorithm is right and program has no bug. Some mistrust it to an extent that they wont trust any calculation returned by it.

There are serious limitations. Not only one can not represent 10/3 accurately as finite decimal representation, but one can’t represent some finite decimal representation such as 0.1 correctly in binary format [1]. See [2] for a detailed commentary on issues involved with computation with real numbers.

These issues have led many to explore alternatives of computation [3]. Till people come up with a nice model to compute real numbers, we can find ways to deal with them on computer.

On Windows machine, try computing \sqrt{4} - 2 or 2- \sqrt{4} in its inbuilt calculator. If you find its answer surprising and think of it as bug like this guy you should read [1].

Reference [1] explains why and when x=x+1.0 is true and when and why x=x is false when dealing with floating point numbers. Its a great reference for electrical engineering students who wants to implement a processor with floating point units in some HDL.

A rule of thumb is that when adding two floating point numbers, their relative values make a huge difference and one should be prepared for surprises. One can start his research into this by this wiki article. Ordering matters in such cases. Adding a float a into float b can be very different from adding b into a. Here is Haskell code which proves it. It computes the function f(x) = (2- x) - (2^3 - \frac{x^3}{12}) . One can translate it into other language also. This also serve as a beauty of Haskell syntax.

-- This is type signature of function. It says that function calc takes an
-- argument of type Float and returns a Float 
calc :: Float -> Float

-- This is the definition of function. Quite readable.
calc x = (x - (x ^ 3) / 12)

-- Haskell can infer the type by itself. So we can choose not to write a type
-- declaration. This function subtract f(x) from f(2).
calc2 x = (calc 2) - (calc x)

-- This function takes a list of Float and apply function calc2 on all of its
-- elements and add them up. In python, equivalent is
-- def calcList1(l):
--    newlist = []
--    for a in l:
--       newlist.append(calc2(a))
--    return (0.0 + sum(newlist))
calcList1 :: [Float] -> Float
calcList1 l = foldl (+) 0.0 (map calc2 l)

-- This is like the previous function but sum is done in reverse order. This is
-- not entirely correct. we are using foldr (+) and foldl (+) function to sum a
-- list. While the foldr adds first to the last, foldl adds from last to first.
-- Again this is not entirely correct. 

calcList2 :: [Float] -> Float
calcList2 l = foldr (+) 0.0 (map calc2 l)

-- Here we are applying the same function on the element of list but adding them
-- in different order. The input step construct a list l where each element is
-- step size away from the previos one.
-- e.g. [0.0,0.5..2.0] is [0.0, 0.5, 1.0, 1.5, 2.0]

test1 :: Float -> Float
test1 step = (calcList1 l) - (calcList2 l) 
        l = [0.0,step..2.0]

Let’s see now what happens when we run this script filename.hs in ghci filename.hs

*Main> test1 0.1
*Main> test1 0.01
*Main> test1 0.001
*Main> test1 0.0001

One expects 0.0 but this is not what is happening here. Life with floating point numbers is not easy, and it can be terribly confusing. If one is writing simulator, it pays to be aware of potential traps.


  1. What every computer scientist should know about floating-point arithmatic
  2. How are real number specified in computation
  3. Continuum Computing – The Next Step in the Theory of Computing