# How floating-point no is stored in memory?

• Above line contains 2–3 abstract terms & I think you will unable to understand the above line until you read further.

# Floating point number memory layout

`+-+--------+-----------------------+| |        |                       |+-+--------+-----------------------+ ^    ^                ^ |    |                | |    |                +-- significand(width- 23 bit)  |    | |    +------------------- exponent(width- 8 bit)  | +------------------------ sign bit(width- 1 bit)`
1. sign
2. exponent
3. significand(AKA mantissa)

## Sign

• The high-order bit indicates a sign.
• `0` indicates a positive value, `1` indicates negative.

## Exponent

• The next 8 bits are used for the exponent which can be positive or negative, but instead of reserving another sign bit, they’re encoded such that `1000 0000` represents `0`, so `0000 0000` represents `-128` and `1111 1111` represents `127`.
• How does this encoding work? go to exponent bias or see it in next point practically.

## Significand

• The remaining 23-bits used for the significand(AKA mantissa). Each bit represents a negative power of 2 countings from the left, so:
`01101 = 0 * 2^-1 + 1 * 2^-2 + 1 * 2^-3 + 0 * 2^-4 + 1 * 2^-5       = 0.25 + 0.125 + 0.03125       = 0.40625`

# Let’s understand practically

• So, we consider very famous float value `3.14`(PI) example.
• Sign: Zero here, as PI is positive!

## Exponent calculation

• `3` is easy: `0011` in binary
• The rest, `0.14`
`0.14 x 2 = 0.28, 00.28 x 2 = 0.56, 000.56 x 2 = 1.12, 0010.12 x 2 = 0.24, 00100.24 x 2 = 0.48, 001000.48 x 2 = 0.96, 0010000.96 x 2 = 1.92, 00100010.92 x 2 = 1.84, 001000110.84 x 2 = 1.68, 001000111And so on . . .`
• So,` 0.14 = 001000111...`If you don't know how to convert decimal no in binary then refer this float to binary.
• Add `3`,` 11.001000111... with exp 0 (3.14 * 2^0)`
• Now shift it (normalize it) and adjust the exponent accordingly` 1.1001000111... with exp +1 (1.57 * 2^1)`
• Now you only have to add the bias of `127` to the exponent `1` and store it(i.e. `128` = `1000 0000`)` 0 1000 0000 1100 1000 111...`
• Forget the top `1` of the mantissa (which is always supposed to be `1`, except for some special values, so it is not stored), and you get:` 0 1000 0000 1001 0001 111...`
• So our value of `3.14` would be represented as something like:
`0 10000000 10010001111010111000011    ^     ^               ^    |     |               |    |     |               +--- significand = 0.7853975    |     |    |     +------------------- exponent = 1    |    +------------------------- sign = 0 (positive)`
• The number of bits in the exponent determines the range (the minimum and maximum values you can represent).

## Summing up significand

• If you add up all the bits in the significand, they don’t total `0.7853975`(which should be, according to 7 digit precision). They come out to `0.78539747`.
• There aren’t quite enough bits to store the value exactly. we can only store an approximation.
• The number of bits in the significand determines the precision.
• 23-bits gives us roughly 6 decimal digits of precision. 64-bit floating-point types give roughly 12 to 15 digits of precision.
• Some values cannot represent exactly no matter how many bits you use. Just as values like 1/3 cannot represent in a finite number of decimal digits, values like 1/10 cannot represent in a finite number of bits.
• Since values are approximate, calculations with them are also approximate, and rounding errors accumulate.

# Let’s see things working

`#include <stdio.h>#include <string.h>/* Print binary stored in plain 32 bit block */ void intToBinary(unsigned int n){        int c, k;        for (c = 31; c >= 0; c--)        {                k = n >> c;                if (k & 1)  printf("1");                else        printf("0");        }        printf("\n");}int main(void) {        unsigned int m;        float f = 3.14;        /* See hex representation */        printf("f = %a\n", f);          /* Copy memory representation of float to plain 32 bit block */        memcpy(&m, &f, sizeof (m));             intToBinary(m);        return 0;}`
• This C code will print binary representation of float on the console.
`f = 0x3.23d70cp+001000000010010001111010111000011`

# Where the decimal point is stored?

• The decimal point not explicitly stored anywhere.

--

-- ## Vishal Chovatiya

🔗 http://www.vishalchovatiya.com, Software Developer⌨, Fitness Freak🏋, Geek🤓, Hipster🕴, Productivity Hacker⌚, Always a Student👨‍🎓 & Learning Junkie📚.