Another brilliant solution to an obsolete problem

8 min readJul 1, 2024

I have a weak spot for ingenious ideas that didn’t make it. This time we’ll talk about the numbers inside computers, the “floating point numbers” or just “floats”. This is not going to be about numeric accuracy (interesting topic, but there’s enough literature on this already), but about the difference between these and the floating point numbers in your calculator.

In this article, I’ll address the miracle of IEEE 754 packed decimal floating point encoding, explain why Excel can give very strange results sometimes, and as a bonus describe the connection between the early versions of the game Quake and the hexadecimal constant 0x5F3759DF.

Floating point numbers are literally used billions of times per second on your device (when playing a game or watching a video, for example). It sounds like these are getting boring by now, don’t you think? Well just to show you that they aren’t that boring, lets start with some trivial computations (using Python here, but that isn’t the point):

Floating point math has surprises if you write numbers in decimal

As the example shows, the standard floating point in your computer (“double precision”, in this case) appear to be imprecise. This has nothing to do with Python. The reason is because that’s how binary floating point works. Without going into details: it’s because if you write in decimal and calculate in binary, you get trouble in the conversion between the two.

In the early days, electronic calculators worked in decimal. It started with the incredible HP9100A from 1966 (this device had only transistors, no chips, but was still a full programmable calculator!). It represented numbers as decimal floating point with a mantissa of 10 digits, and an exponent of two digits, the same as many calculators today. Decimal was chosen because the hardware of the time couldn’t afford to convert between decimal and binary. I have personal experience how hard this is: when I wrote a set of binary floating point calculation routines for the Z80 in 1982, I found the conversion to decimal harder than any of the other routines, including the trigonometric functions!

Later, when computers got faster and programs got bigger, most of the numbers were not visible on the outside anymore. It became useful to work in binary because it is more accurate and easier to implement. Then only at the end of the computation, you convert to decimal. This saves time and increases the accuracy of the computation.

The case for decimal

But for a calculator, there is another reason decimal numbers are used in calculators, still today: surprises such as the examples above are avoided: with decimal numbers, roundoff occurs when and how you expect it. For that reason, Python has a nice built-in library for computations in decimal:

clearly, there is no precision loss when representing numbers;
the precision is user configurable;
as an extra, the library allows to use millions numbers of digits with surprising speed, because its computation routines are optimized for this.

The library is very useful for calculating with monetary amounts, but I have also used to calculate the digits of large numbers.. This can be faster than the builtin numbers of Python, since the conversion routines that produce decimal numbers are not very sophisticated:

For long numbers, Decimal can be much faster

How floating point numbers are stored

The storage and processing of floating point numbers is very accurately defined in the standard IEEE 754. This standard is interesting reading, because it shows how hard it is to do arithmetic right. For example, since not all numbers can be represented as floating point, you have to round to the nearest representable floating point number. But in the rare case that the result of your calculation is exactly between to representable numbers, you have to pick the number that ends in a zero (this is called “round to even”). That implies that when performing the calculation, you will have to calculate with enough precision to know for sure if this is the case. And yes, this is what your phone can do in a billionth of second.

This wasn’t the case in the past: many early computers weren’t so accurate in their calculations which could accumulate to make them frustrating to use. This also includes relatively modern famous computers like the Cray supercomputers.

Getting to the point: storage of floating point numbers

After this long introduction I can finally make my point. Of course, IEEE 754 has a specification for storing floating point numbers. The format is quite complicated, and it is notoriously hard to do this by hand (just an example: the number pi is represented as 40490FDB hexadecimal).

But this standard also supports decimal! I was positively surprised to see decimal is not only supported, but it is done in a very sophisticated way. For example, IEEE 754 single precision decimal numbers are stored in four bytes (just like regular floating point numbers) with the following properties:

7 digits mantissa
exponent from -95 to +96

Precision and range is comparable to binary numbers, and, just like the binary floating point numbers, there’s also support for special values such as NaN and infinites.

To achieve so much information in just four bytes, an encoding scheme is defined that is compact but can still be decoded into digits with little hardware. The 32 bits have the following meaning:

A sign bit (S in the picture)
five bits encoding the first mantissa digit together with the first two exponent bits (1D&2exp, see below for details)
six more exponent bits
two times 10 bits for three groups of mantissa digits (instead of 12 bits what normally would be used)

The group of five bits (offering 32 combinations) efficiently encodes the ten combinations of the mantissa digits and three combinations of the exponent bits. The other 2 values are used for NaN and infinity.

The groups of ten bits (that’s 1024 combinations) efficiently encode the thousand different values you can make with three digits. For this, densely packed decimal is used that allows to extract the decimal digits using only a few logic gates. (This encoding can also pack two digits in seven bits or one digit in for bits with the same hardware!)

Densely packed decimal (encoding bits: pqrstuvwxy; digits: abcd efgh ijkm)

And this beautiful design is used almost nowhere at all. It feels like a waste to realize that these geniuses have defined a way to efficiently work with decimal floating point on a modern computer, even taking into account how to efficiently do conversions, while it is barely used at all.

Fortunately, there is an exception. The incredible Swiss Micros DM42 calculator emulates a classic HP-42 calculator using decimal float point (quadruple precision: 34 digits!). And furtunately, this calculator’s software is also available on your computer, since the software Free42 is open source and ported to many platforms. The real nitpickers among my readers know that this implementation uses a simplified version of densely packed decimal called BID, but oh well, you can’t have everything.

Excel’s way to use floating point

And now for another place where decimal and binary meet: Excel. The designers of Excel had an interesting design challenge: most of its users do simple calculations and expect the results to be equal to a decimal calculator (especially since spreadsheets were originally designed for financial calculations). But because it would be too hard (and to slow) to faithfully use decimal numbers in the calculations, regular double precision numbers are used. And as saw above, this may give results that casual Excel users wouldn’t expect.

To solve this, they chose to cheat: first calculate in binary, and then represent the result as if it was calculated as a 15 digit number, even if more precision is available.

This can for example result in two different numbers that give zero when subtracted:

Excel is hiding something from you, obviously.

More examples of Excel’s strange behaviour can be found on Wikipedia.

Historical example of floating point hacking

As a bonus, a story from floating point past that doesn’t have to do much with the preceding text, but it too weird not to mention. Here is the story of a small piece of code in the game Quake from the early 1990s. This was one of the first games that drew 3D graphics in real time, and as such it had to do millions of calculations per second to update the screen.

One of the most occurring computations was the function 1/√x (the inverse square root), that you use for normalizing vectors. Because it was used so much, it had to be as fast as possible. With the computers of that time, square roots were very slow, and division was pretty slow as well.

To do this calculation, a clever programmer managed to compute this value without doing any division or square root with the following code. Don’t worry about the computer code, just read the comments.

The piece of code in question, including original comments

The calculation took the floating point value of x, put the bits directly in an integer, then shifted and subtracted to create an estimate for the inverse square root, and used three multiplications and a subtraction to increase the accuracy to sufficient precision. It is clear that the author really understood IEEE 754.

If you want all details, the full story is here. Nowadays, this trick isn’t useful anymore since 1999 Intel added fast instructions for both division and square root.

I studied this code to see if you can use a comparable trick to estimate the number of bits in an integer (you can, but it is no use). But the work had one interesting result: the deep dive into IEEE 754 resulted in the article you just read.