New Math: Don’t let real numbers cause the loss of real lives or money

Have you ever written a program where you were sure your math was right, but it came out wrong? Often, it’s not the math that is wrong. This is programmer error, not an error of rounding with the implementation. If you are doing calculations that count for lives or money, you need to check how you are representing real numbers in repeated series, or you could calculate the wrong numbers leading to serious problems. Previously, a programming language named COBOL allowed for unusual controls around the accuracy for non-whole numbers.

In Marianne Bellotti’s Is COBOL holding you hostage with Math?, significant math discrepancies occur in three cases. This type of error caused a patriot missile to kill 28 people in 1991 and $700M in damage during the construction of an offshore oil rig. The article states that the implementation of real numbers is also preventing the IRS from moving from COBOL to high-performance computing. Other use cases that can be affected include machine learning solutions that don’t scale and some smart contract that pays 20 times what it should. This weakness is a recent one added to the CWE corpus, CWE-1339: Insufficient Precision or Accuracy of a Real Number

To confirm this issue was real, I wrote a program in Rust on a Windows 10 O.S. This program calculated something called the Muller’s Recurrence. It is a simple formula that when executed should converge to 5 as the number of iterations increase. First, I tried a floating number represented with 64 bits. The limit was 11 iterations before the number stopped converging at 5. Next, I tried a bunch of 128-bit floats with a fixed number of bits for the decimal. Even at 112 bits, I could not go past 25 iterations without getting a number that shoots over 5.

The next method of storing reals is to store it in a fraction of two integers. Trying this with 32-bit integers, the math breaks at 12 iterations moving to a value over 10. The maximum value of a 32-bit number is 2,147,483,647. Using 64-bit integers, the math breaks at 26. The maximum value of a 64-bit number is 9,223,372,036,854,775,807. The most disturbing part of these ratios is that when they go bad, they do not converge to an obviously bad number but jump around.

Then, I pulled out the big guns and used a BigInt for the integers in the ratio. A BigInt in Rust is a number that is represented with a vector of digits that are base 2³². A maximum BigInt is about 3.96 * 10²⁸. This implementation worked all the way to 800 iterations (only 80 iterations are shown on the graph below).

This recurrence should converge to a value of 5. in the graph that while the correct value should converge to 5, some of these implementations fail quickly in a way that is obvious while others fail in ways that provide answers that are close and occasionally fail in spectacular ways. This means that what may look like a simple math recurring math equation might fail and might be hard to detect in test cases.

As Bellotti highlights in her article, these types of errors could lead to critical systems malfunction and potentially loss of life. The magnitude of these errors could lead to equations that could be off by 20 times, an amount in the finance world that could cause major impact. When writing code that needs to be accurate, make sure you know the limits of the problem you are trying to solve and test up to those limits.

Lead author: Steve Battista. Support: Steve Christey Coley, Adam Chaudry, Marisa Harriston, and Alec Summers. +