Why accuracy matters ?

Contrarily to what is generally believed, computers are not good calculators. Natural laws of mathematics do not apply inside your CPU and errors are piling up.

Computers use a binary representation format for floating point numbers, defined by the IEEE754 standard. It allows to codify the rules of floating point arithmetics.

This format is flexible and compact, and allows to represent very big numbers (as big as 1016 for the 32 bit format) as well as very small numbers (as small as 10-16 for the 32 bit format). However, this flexibility has a price: the rounding errors. In other words, some different real numbers will be rounded towards a same binary floating point number representation. On the other hand, real numbers will not always have a "natural" equivalent in the world of floating point numbers. For example, the value 0.1 is quite unexpectedly approximated in floating point arithmetics by the value 0.100000024 (for the 32 bit format).

The floating point arithmetics involves rounding errors not only for values, but also for the calculations.

These rounding errors in the representation of numbers pile up all the way long during the execution of calculations and end up by bringing a discrepancy between the expected mathematical result and the result computed by the machine. In some cases this discrepancy can lead to incorrect behavior of the software and to an industrial accident. The bug that caused the failure of the Patriot missile in 1991 originated precisely from this behavior.

The standard case of the missile Patriot (source)

Illustration d'un missile Patriot américainOn February 25, 1991, during the Gulf War, an American Patriot Missile battery in Dharan, Saudi Arabia, failed to track and intercept an incoming Iraqi Scud missile. The Scud struck an American Army barracks, killing 28 soldiers and injuring around 100 other people. A report of the General Accounting office, GAO/IMTEC-92-26, entitled Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia reported on the cause of the failure.

It turns out that the cause was an inaccurate calculation of the time since boot due to computer arithmetic errors. Specifically, the time in tenths of second as measured by the system's internal clock was multiplied by 1/10 to produce the time in seconds. This calculation was performed using a 24 bit fixed point register. In particular, the value 1/10, which has a non-terminating binary expansion, was chopped at 24 bits after the radix point. The small chopping error, when multiplied by the large number giving the time in tenths of a second, led to a significant error. Indeed, the Patriot battery had been up around 100 hours, and an easy calculation shows that the resulting time error due to the magnified chopping error was about 0.34 seconds.

The number 1/10 equals 1/24 +1/25 +1/28 +1/29 +1/212 +1/213 +....
In other words, the binary expansion of 1/10 is 0.0001100110011001100110011001100.... Now the 24 bit register in the Patriot stored instead 0.00011001100110011001100 introducing an error of 0.0000000000000000000000011001100... binary, or about 0.000000095 decimal. Multiplying by the number of tenths of a second in 100 hours gives 0.000000095×100×60×60×10=0.34.

A Scud travels at about 1,676 meters per second, and so travels more than half a kilometer in this time. This was far enough that the incoming Scud was outside the "range gate" that the Patriot tracked. Ironically, the fact that the bad time calculation had been improved in some parts of the code, but not all, contributed to the problem, since it meant that the inaccuracies did not cancel.