Java Floating Point Round Off Error and Solution

Fasrin Aleem
Nerd For Tech
Published in
6 min readOct 5, 2021

Depending on your application’s mathematical needs, you may need to take extra care to ensure that your operations provide the results you want in terms of floating point (double) precision.

In computation, floating point numbers are an important data type that is widely used. However, many users do not know the standard which is used in almost all computer hardware to store and process these.

The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard. -Wiki

There are a few different ways to represent floating point numbers, but in most circumstances IEEE 754 is the most efficient.

The three basic components of IEEE 754 are as follows:

  1. The Sign of Mantissa -It’s as straightforward as the name suggests. A positive number is represented by 0 and a negative number is represented by 1.
  2. The Biased exponent –Both positive and negative exponents must be represented in the exponent field. A bias is added to the actual exponent in order to get the stored exponent.
  3. The Normalized Mantissa -The mantissa is part of a number in scientific notation or a floating-point number, consisting of its significant digits. Here we have only 2 digits, i.e. O and 1. So a normalized mantissa is one with only one 1 to the left of the decimal.
IEE 754 Format

Single Precision

  • 32 Bits

1 bit sign

8 bits-exponent

23 bits-fraction(mantissa )

Double Precision

  • 64bits

1 bit -sign

11 bits -exponent

52 bits -fraction(mantissa)

Long Double Precision

  • 80bits

1 bit -sign

15 bits -exponent

52 bits -fraction(mantissa)

IEE 754 standard represents in the floating point

Here is an example of the loss of precision using double:

public class DoubleForCurrency {

public static void main(String[] args) {

double total = 0.2;

for (int i = 0; i < 100; i++) {

total += 0.2;

}System.out.println(“total = “ + total);

}

}

What is the expected output?

Expected output: total = 20.20

Now simply compile and run the code and see what the actual output is.

Actual Output: total = 20.19999999999996

The result should have been 20.20, however due to the floating point computation, it came out as 20.19999999999996. This is the loss of precision (or loss of significance).

Will take another example and see how exactly this representation works.

Question -Convert 9.1 into Binary!

Step 01: First we separate whole part and decimal part separately.

Step 02: Convert whole number part(9) into binary.

Converting 9 into binary

Step 03: Convert decimal part(0.1) into binary.

Conversion of 0.1 decimal to binary

Final binary representation of 9.1 is as below,

Binary representation of 9.1

Step 04: Scientific notation of the binary representation

Scientific representation of 9.1

We take decimal point and bring it to front.

Step 05: Convert it to IEEE 754 standard

First bit is Sign,
Which means, if the value is a negative (Minus -) this becomes 1 and if the value is a positive (plus +) this becomes 0.

Next we have Exponent
e have 8 bits to represent the exponent part, we can represent ²⁸ numbers. (i.e -128 to 127). This 127 is called “Exponent Bias”.

What Exponent Bias means is, whenever we have a plus (positive +) number for this(2 to the power 3). We add that into 127 and take the binary form.

Exponent value of scientific representation

In our case,
We have 2 to the power 3. So our exponent value is a 3. So, we are adding that value(3) into bias.

127 + 3 = 130

Binary version of 130

Exponent value of 130 is 10000010

Mantissa -Mantissa part comes from1.001 000110011001100110011 …….*2 ³ , We take exactly 23 bits. We can skip the first bit(The number before the decimal point) because it is always 1 for all numbers.

The mantissa is 001 000110011001100110011 …….

So, IEE standard of the 9.1 value is as follows.

Binary form of 9.1

So far it went with no issues, but let’s try to get the binary representation of 9.1 from a IEEE 754 calculator.

Comparison of original and calculator value

So you can see a difference between 2 representations in the last 2 digits. So, this is called “FLOATING POINT ROUNDING PROBLEM IN COMPUTER”. This give very unintended result in our programming language if we really don’t use appropriate data types.

How IEEE calculator value and original value got different?

In IEEE 754 there is a rule called ROUNDING, When calculating the mantissa, if the value is greater than 23 bits, the 24th bit is checked. If the 24th bit is “1,” the rule states that we must add “1” to the 23rd bit to round the value. so in the above example since the value is larger than the 23 bit and since the value of the 24th bit is “1” we must add an additional “1” in 23rd bit.

Rounding the Mantissa

Let’s convert back the Binary Value into Original Floating Point value and see if we get 9.1!

Decimal Representation

As a result, when we store 9.1 computer as IEEE 754 standard, it will produce a different value, which is 9.10000038147 when doing computations, due to the rounding that occurs in IEEE 754 format. This is why we had unexpected results on certain test scenario earlier(double for currency class). As a result, we can conclude that we should not use float or double for sensitive data calculations such as currency.

Solution?

how can we do floating-point calculation accurately?

To avoid the floating-point rounding issue in a precise calculation like currency, Java has introduced a BigDecimal class. We can use BigDecimal instead of float or double. Arithmetic, comparison, hashing, rounding, manipulation, and format conversion are all supported by this class. By specifying the rounding mode, we can return the exact expected result. This class can handle both extremely small and very big floating-point numbers with great precision.

Now Let’s consider how can we solve above-mentioned problems using BigDecimal!

Solution 1 using BigDecimal

Output: 0.3

Solution 2 using BigDecimal

Output: 10,562.20825

As a conclusion, when designing an application, we must pay close attention to assigning data types to variables. Depending on your needs, choose an appropriate datatype such as int, float, double, long or big decimal.

References

  1. 2016, IEEE Standard for Floating Point Numbers. [Article] -https://www.ias.ac.in/public/Volumes/reso/021/01/0011-0030.pdf
  2. 2020. How Computer deal with Floating point numbers | Decimal to IEEE 754 Floating point Representation. [video]-https://www.youtube.com/watch?v=2VM028vpguU&t=7s
  3. Wikihow.com. 2021. How to Convert a Number from Decimal to IEEE 754 Floating Point Representation. [online] -https://www.wikihow.com/Convert-a-Number-from-Decimal-to-IEEE-754-Floating-Point-Representation

--

--