Mathematics and Physics for Programmers

Fractions in Binary notation.

Edozié Izegbu
Learning to code
7 min readNov 16, 2015

--

The Radix Point stands between being a whole and being a half.

So far we have spoken about how binary , decimal and hexadecimal integers. However how does this change in binary format , for fractions. We understood in the beginning that fractions are quotients of two integers, the any Integers in base system is essentially sum of multiplications of integers. This means that computers are just “counting up” when they are discussing larger and larger numbers, “1000” == 8 , “100” == 4 “10” == 2 , “1” == 1 . But what happens when we introduce a marker or how it is technically called a radix point .

A radix point is the general description for that dot that you see in all decimal representaions of fractions . Ie “3/2” = 1.5 . In decimal the radix point is… you guessed it , a decimal point. After the decimal point each slot assigned for a digit is devoted to exponents of reciprocals of the base x the nth slot (counting down from 1)

10.345

1 x 10¹ + 0 x 10⁰ + 3 x 10^-1 + 4 x 10^-2 + 5 x 10^-3

In binary this functions in the same way except of course they are in quotients of 2.

1.10101 Binary

1 x 2⁰ + 1 x (1/2) + 0 x (1/4) + 1 x (1/8) + 0 x (1/16) + 1 x (1/32)

Now there are disadvantages of this way of representing fractions , one of which is when trying to show fractions with no termination point.In decimal notation it is imposible to completely represent the fraction 1/3 in a couple of digits . This is because the numbers are continuously trying to get as close as posible to 1/3 using multiples of 1/10^-n. You can see these fractions as limits of infinite series’ where the infinite series is 3/10 + 3/100 + 3/1000 + 3/10000 + 3/100,000… and its limit is 1/3 (The point at which a sequence or equation is aproaching but never truly reaches. This fraction is represented in decimal notation as 0.33333333….

The Quotient of 823 and 110 has no termination point. Computer says me no like.

An example in binary is the quotient of 1 and 5 , because once again the value of 1/5 must be shown as a multiple of 1/2^-n then it becomes very difficult for computers to provide a precise answer . The binary notation for 1/5 is 0.00110110110… which will give your computer a headache as it has no termination point .

There is a way around this though , what can be done is you can represent the denominator and numerator within fractions as integers and forget about the radix point, we would be able to represent the fractions more accurately without any of the recurring digits. But there are downsides to this approach too, take the arithmetic equation 1/2 + 1/3. This equals 5/6. Then to get closer to the number we add 1/7 , this gives 41/42 then afterwards 41/42 + 1/11 = 493/462.

Well what is going on here you say ? 1/2 + 1/3 = 493/462 . This is what happens when a computer would get handed an incompatible fraction where the denominators of the fraction have no common factor the computer tries desperately to resolve this by finding 2 common factor fractions.

Doing this simply overflows the computers stack and you can easily go out of the range of your CPU’s ALU width. Even if it is within your CPU’s ALU width it is still a slow calculation in comparison and will bog your program down.

Another problem is representing irrational numbers; which are numbers that cannot be found by taking the quotients of two integers. These are found estimation and rounding off to the nearest digit.

So there are however disadvantages to using this integer representation mechanism to define, fractions. Take a fixed point representation of a number with 8 bits , with a radix point the more accurate a number tries to be on oneside of the radix point , the less space will be given to the other.

A solution for this is to use floating point representation similar to scientific notation , where a number is represented by its first significant digit (non-zero) and then everything after that is after the radix point. So the base 10 number 3432.53 would be represented as a floating point like (3.43253 , e3) The 3 here represents the power that it increase the base to and then multiply the crunched number to get the original number. so 0.0432 would be represented in floating point as (4.32, e-2). e is the base number (10) -2 is the exponent , and 4.32 is the 1st significant figure with everything else after a radix point. Bear in mind that this is all in base 10 while things operate the same in binary they obviously look a bit different.

The hands behind your hardware.

Standards in computing

The IEEE which is the Institute of Electrical and Electronic Engineers have been around for a long time I mean a LONG time , since 1871 in the UK . So these guys are the ones that say what should be done inside of Hardware. The standard was set within a 32 bit CPU , that 1 bit should be denoted to whether a number is negative or not. 8 bits is devoted to the exponent (size) (This is moderated by a bias so the exponent is then added +127 , this is so that you can represent exponents from -127 to +127) , the last 23 bits is devoted to the mantissa which are the actual digits of the number , in fact the actual digits of floating point representations are known as significands as these are the actual digits of the number they represent how precise the number is.

We have been speaking a lot about base 10 so lets look at representations of how the number 10.8 is shown in binary.

  1. First the number 10.8 is converted into binary which looks like so… 1010.111
  2. Then the number is converted 1.010111 e3 .
  3. Then the exponent 3 is added on to the bias of 127 so the total value of the exponent is 130. Which looks like this in Binary →10000010
  4. The significand is then flattend out to fill up the rest of the 23 bits 10101110000000000000000. 10000010
  5. The first 1 is then ignored and assumed because of the first significant figure always being 1 → 0101110000000000000000. 10000010 it is then reduced from 24 bits to 23 bits .
  6. Then afterwards one bit is left aside to denote whether the number is positive or negative , since it is positive the bit is 0
  7. The total number looks like this in floating point representation 0101110000000000000000. 10000010 0

Next we look at how the number -1 e35 looks like in binary.

  1. First we convert the number into binary which looks something like this. → -1.10101001010110100101101… e-117
  2. Afterwards the exponent is converted into binary as it is a negative number when the bias is applied to it the value of the exponent is 10 . (127–117 = 10) . So it looks like so in binary → 00001010
  3. The significand is flattened out to support the mantissa length (24 bits): 110101001010110100101101. 00001010
  4. The first 1 is ignored because as we know machine code knows that the first significant number is going to be 1 99% of the time. → 10101001010110100101101. 00001010
  5. Then the last bit is given a 1 because it is negative. → 10101001010110100101101. 00001010 1

So it seems convenient that we can ignore the 1 in the beginning to add one more bit of detail to the significand. However what if we want to represent exactly 0 well there is a way for that. We can represent the exponent as 0 and then your cpu will recognize that you are trying to represent 0, the last bit can be either positive or negative.

There are also special values that can be returned like NaN , and overflow and underflow, these represent certain cases that will happen when the computer is having trouble calculating your number. Ie NaN which stands for Not a Number, will come up when you are trying to show the sqrt(-1) . Overflow happens when you have to large a number to fit inside the ALU width .

Lastly we will investigate the very small numbers ie: 2 e-40 is represented in binary as 1.00010110110000100110001 e-132 , which is outside of the 127 bias range , so what we do is we actually shift the exponent up 5 places and fill in the number with the corresponding number of zeros. to get 0.000010001011011000010011001 e-127 so in binary with the significand flattened out would look like → 00000100010110110000100 00000000

Double Precision Floating Point Representation.

Now say you want a number that is more precise than 23 bits in a 32 bit processor, well it is possible, as explained in my cpp article IEEE also have another type of number representation called a double this essentially sticks two floats together to create a more precise number which takes up 32–64 bits and a long double can take 64–96 bits of space. You may not be needing this much space initially but this could be useful for scientific modeling and representations of real life where you want to get as close as possible to the real thing. In my next article I am going to talk about basic functions that are performed in binary working with floating point numbers and touch briefly on some arithmetic.

--

--