Image for post
Image for post

Having Some Fun With Floating-Point Numbers

Subhajit Paul
Sep 13, 2020 · 4 min read

For all the years I have spent coding in C, I have avoided comparing floating-point numbers, because, unlike integer comparison, they are known to produce unpredictable results, owing to the method floating-point numbers are stored in memory. For example, if I run the code below, what do you think the output will be?

float n_0 = 1.0 / 2.0;
float n_1 = 1.0 / 200.0;
float n_2 = n_1 * 100.0;
printf(n_0 == n_2 ? "You got lucky!" : "Not so accurate, are you?")

On almost every computer out there in the sun, you will end up “getting lucky”, and I can bet a fortune on that. But if I am so certain of the outcome, why do I avoid using this logic for real applications?

To understand this, let’s take a look at how floating-point numbers are stored in memory. Any floating point number consists of three parts that need to be stored, the sign, the integer part, and the fractional part. For example, the number 2.5 can be stored as (sign = +, integer = 2, fractional = 0.5). Another method of storing floating-point numbers is to store the sign, a non-zero single-digit integer coefficient, a mantissa and an exponent, a method commonly known as the scientific notation. In this method, 2.5 can be represented as (sign = +, coefficient = 2, mantissa = 0.5 , exponent = 0). Lets take a look at some other numbers represented in scientific notation.

250.00, +2.5000000e+002
0.0025, +2.5000000e-002
-12.34, -1.2340000e+001

Computers store floating-point numbers using the second method, and uses fixed number of cells to store each component. For example, in the examples above, the sign takes the first cell, the coefficient takes the next cell, seven cells are reserved for the mantissa and four cells are used for the exponent, which is again split into a cell for the sign and three cells for the value.

Now, we go on to see how a floating-point arithmetic is done on two numbers, stored in this format. As an example, we will try to multiply two numbers, say 1/2 and 12. We know that 1/2 is represented as +5.0000000e-001, and 12 is represented as +1.2000000e+001. So the multiplication can be done as:

Image for post
Image for post

As you can see, because the mantissa for both the numbers have a lot of trailing zeros, the result of the multiplication is unambiguously 6. That is exactly what was happening in that little code we ran before. As long as we operate on floating-point numbers that could be represented in our format without any loss of precision, we can be sure about the outcome of the computation.

Just to illustrate my point, lets look at another example, this time using a number, which must be truncated to fit the given cells, and therefore losing precision. The example below shows the multiplication between 1/3 and 12.

Image for post
Image for post

The extra digit produced during the multiplication, which cannot fit in the provided cells, lead to rounding of the remaining digits in the coefficient and mantissa, and in this case, since the least significant digit 6 is greater than 5, the result gets rounded to +4.0000000e+000, which is exactly equal to the actual answer 4.

But does the rounding always produce the correct result? How about multiplying 1/3 and 18?

Image for post
Image for post

This is exactly why we cannot use results of floating-point computations in comparisons. If you run the code below, which is a slight modification to the one we started with, you will most probably see that for yourself.

float n_0 = 1.0 / 7.0;
float n_1 = 1.0 / 700.0;
float n_2 = n_1 * 100.0;
printf(n_0 == n_2 ? "You got lucky!" : "Not so accurate, are you?")

Although I did not mention bits anywhere above, and talked mostly in decimal, this can easily be extended to fit our binary computers. The computer stores the sign as a single bit, the coefficient is not stored at all (because the only non-zero digit in binary is 1, so there is no point wasting a bit to store that), and the exponent and the mantissa are stored using a fixed number of bits depending on the desired precision. You can lookup IEEE-754 standard to get the exact details.

Also, at this point, if you are curious about what the result of multiplying 1/3 and 15 would be, I would suggest you take a look at the rounding rules of IEEE 754.

The Startup

Get smarter at building your thing. Join The Startup’s +776K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store