Specific computation of floating point values

Hi all! My professor write on the board simple numbers and ask the class to calculate it at home.

When I came home I open my laptop and starting to write. After finishing I realized that sum1 and sum2 are different.

We all know the Math rule by permutation sum is not changed so
(a + b) + c = a + (b + c),
and it seems to be that it is not TRUE in programming.

It is really confused me, so I decided to understand why it happens and calculate step by step.

(a + b) + c = 1 seems pretty clear

a + (b + c) = 0 . Looks like a magic …

Why -1e+30 + 1.0 = -1e+30 ?

The answer is a simple. Computer allocate the memory depends on what type we use. In our case we are working with double. It means that 64 bit memory block will be allocated for each number (the range of numbers is 1.7у-38 < |x| < 3.4e38), 17 figures accuracy. Then computer try to perform -1e+30 + 1.0 it’s just get rid of the 1 due to the lack of bit space.

That’s explains our differ results.

Such errors related with binary representation of floating point numbers in memory. If you want to exclude them you need to use BigDecimal. But be careful do not use BigDecimal(s) without need because it’s just a wasting of memory.

Additionally, you can always see the sign, mantissa and exponent by using this code or get more info using this documentation

//Representation of floating point value according to IEEE 754
let lbits = Double(-0.06)
Lbits.sign //sign
Lbits.exponent //exponent
Lbits.significand //mantissa

To read more about representation values with floating point you can go here

So please try it yourself here is the link!