We write programs all the time using high level programming languages like Java, C etc... With these programs we can instruct the computer to do something useful. But have you ever wondered how exactly the computer is able to execute these instructions? Idea of this post is not to teach you how to program in a low level language or in worst case binary form but to show you what happens under the hood.
As you might already know, computers work with bits. A bit is something that just represents two states. An on or off, 1 or 0, true or false, plus or minus etc… We can use any two things to represent these two states but in general we use 1 and 0 to say the bit is on and off respectively. We know that computers can store words, pictures, sounds etc… But in reality all these are nothing but bits and we humans have grouped them and given it some meaning (a code). For examples let’s take a byte which is in fact 8 bits. How many patterns can we make up with 8 bits? With 1 bit there are only two possibilities, 1 or 0. With 2 bits, 4 patterns; likewise with 8 bits we have 256 possible ways of arranging bits. So what did people do? They got together and held some meetings and agreed on some code and made it a standard. Take ASCII for example,the bit pattern ‘01000001’ represents letter ‘A’. We just gave some meaning to a bit pattern. But keep in mind that how these patterns are interpreted is based on the context in which it is being used.
In a high level sense basically what happens when you run a program is, your program gets loaded into the RAM first and then the processor starts to execute the instructions in that program. An instruction is simply a sequence of one or more bytes and different processors follow different Instruction Set Architectures(ISAs) in their instruction encoding. (In Linux, you can use the following command to find out what ISA is being used by your machine. In my case it is x86_64)
Let’s take a simple program written in C to print a “Hello World!” and see what exactly happens under the hood when we run it.
First we write our program in a particular high level language (I have written it in C) with ASCII characters and saved it in a file. Next we will use a special program called a compiler to translate this text file to assembly language statements. Assembly language represents the symbolic version of machine instructions that the hardware can understand and the binary version is called the machine language.
When I run the following command, it generates a binary file called ‘hello.o’ and if we open it in a text editor it would appear gibberish.
gcc -Og -c hello.c
Now to view the content of this generated machine code (hello.o), in Linux we have a program called ‘OBJDUMP’ which we can use to view the generated instructions in assembly format.
Now to generate the actual executable code we need a linker to run on this generated object file(hello.o).
gcc -o output hello.o
Now again if we disassemble this generated executable file (output) using objdump you will see the following.
If you look at the C program that we have written in ASCII, there we have used a printf function which actually is provided by the standard C library. What the linker has done here is, it merged the hello.o program with the pre-compiled printf.o and made an executable file called ‘output’. Main difference you can see here in the assembly code with that of the relocatable object file is that the linker has filled the proper address for callq instruction.
Now we are ready to load our executable file (output) into main memory and let the processor execute it.