How C Source Code Becomes an Executable Program

Isaac Wong
4 min readJan 18, 2018

--

Photo by Jonathan Hollin

It is a well known that computers process binary, and many people also know that only few programmers actually instruct the computer by manually inputting 0s and 1s. Instead of binary, programmers employ various programming languages in order to expedite the process of tasking the computer.

One of the languages that has stood the test of time is the C programming language, chosen by developers for its utility, and instructors as a pedagogical tool because the use of the C programming language requires an understanding of how computers function. Even in the routine act of a developer translating, or compiling, her code into machine code, which is truly the instruction set that the computer requires, a rudimentary knowledge of the operation of the computer is required.

So, how does code in the C language, that looks like this:

Hello, World! The traditional first C program.

Get turned into machine code?

It is through the use of a compiler. Compilers are programs that translate a programming language into the target language. C is known as a compiled language, and must be translated before use; in this case, from the language, or syntax of C, into binary.

So, how does a C compiler compile C source code and turn it code the computer can execute?

It is a four step process, which is composed of the following:

Preprocessing

Compilation

Assembly

Linking

Preprocessing

The first step is preprocessing. This is the stage where all C code is gathered together, and extraneous lines of code, denoted by the use of comments, are removed. C programs have components called includes and macros, which are other lines of code and other data not in the current file, but referenced by the current C source code. In the “Hello, World!” example above, the line:

#include <stdio.h>

Is referencing a header file, or a file with additional function that the programmer wishes to include in the program. In this case, it is referencing the stdio.h file, which includes functions for basic input and output. Not in this example, but in other C source files, macros are also used, which are another type of way to reference additional C source code files.

A snippet of the C file after preprocessing

Compilation

The next step is compilation. It is not a direct translation from C code into binary. First, C source code is translated into assembly code. These assembly files can be identified with a .s file extension.

Hello, World! — After compilation into Assembly

Assembly is considered by most the closest computer language that humans can still read before translation into binary.

Assembly

Confusingly, this next step, after the assembly code has been generated, is where the assembly code is translated into machine code, or essentially hex code, which is a shorter way of expressing binary code, that the computer can execute. These files can be identified with the .o object file extension.

A snippet of “Hello, World!” in machine code

Linking

The final step is linking. Even though the code is in machine readable format now, many programs will have multiple object files, which the computer will have to link together in the appropriate manner in order to properly create a single executable program.

The final result! A program that displays, “Hello, World!”

With all these processes so visible to the programmer, it is easy to understand why many computer science educators adopt the C programming as a teaching tool. It allows for the student to have rapid exposure to many things going on under the hood of the computer. Hopefully, these examples have also allowed you to have a deeper appreciation of the process the computer goes through in order to make programs from C.

--

--