How the ‘gcc’ command Compiles C Files
On today's Ted Talk I would like to talk about how the command gcc
compiles C files. If you’re a complete beginner, you probably have no idea what that means, and I’ll do my best to break it down.
All a computer knows is ones and zeros. It would be extremely difficult for us (humans) to write commands and scripts with only two numbers. This is why very smart people invented programming languages such as C.
Our machines do not know how to read and run this type of language, and that’s why compilers exists; a program that converts instructions to machine code so they can be read and executed by a processor. C files specifically go through multiple steps of conversion, in order to become machine code.
Preprocessing
The first step is ‘preprocessing’. As understood by its name, this step prepares the code before it is actually processed. A few things happen to the code in this step — all header files and macros (a shortcut that allows to re-use code), get expanded to their official code. A header such as#include <stdio.h>
would be expanded to the hundreds of lines of code in the stdio.h
header file. Also, this step strips all the comments from the code, as comments are not necessary for the processor to execute files.
To look at the different files through the compilation process, we use the command gcc
with flags. To output the preprocessed file, we use gcc -E file.c
(where file.c
is the file that contains our C code). If we want to redirect this output to a file and not the command line, we could add -o preFile
to the command, and then a file named “preFile.i” will be created.
Compilation
In this step, the preprocessed file gets converted to code in Assembly. This language is one step closer to machine understandable code! This code can either be in At&t syntax, or in Intel syntax. In order to look at how these files are written, we can use the command gcc -S file.c -o compFile
. This will create a new file “compFile.s” that is full of converted assembly language code (like in the picture). The Assembly language is a low level language, which means it’s closer to machine code than C or other high level languages.
Assembly
This next step converts the Assembly written code into an object file (machine readable code). This code is not readable, and consists of many characters that aren’t necessarily letters or numbers. In order to see what this kind of file looks like, we can run the command gcc -c file.c -o objFile
. This will create a file “objFile.o”. For further understanding, let’s say file.c contains this code:
#include <stdio.h>
/**
* main - Entry point
*
* Return: Always 0 (Success)
*/
int main(void)
{
return (0);
}
This C code does not print or do anything, it just includes a header file, a few comments, and a main function. When the C code runs, the entry point is the main
function, which is why it must always exist in your code. After running the gcc
command to create on object file, the file objFile.o
is created and looks something like:
^ELF^B^A^A^@^@^@^@^@^@^@^@^@^B^@>^@^A^@^@^@^@^D@^@^@^@^@^@@^@^@^@^@^@^@^@\210^Q^@^@^@^@^@^@^@^@^@^@@^@8^@ ^@@^@^\\^@^[^@^F^@^@^@^E^@^@^@@^@^@^@^@^@^@^@@^@@^@^@^@^@^@@^@@^@^@^@^@^@\370^A^@^@^@^@^@^@\370^A^@^@^@^@^@^@^H^@^@^@^@^@^@^@^C\^@^@^@^D^@^@^@8^B^@^@^@^@^@^@8^B@^@^@^@^@^@8^B@^@^@^@^@^@^\^@^@^@^@^@^@^@^\^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^E^@^@\^@^@^@^@^@^@^@^@^@^@^@@^@^@^@^@^@^@^@@^@^@^@^@^@\254^F^@^@^@^@^@^@\254^F^@^@^@^@^@^@^@^@ ^@^@^@^@^@^A^@^@^@^F^@^@^@^P^N\^@^@^@^@^@^@^P^N`^@^@^@^@^@^P^N`^@^@^@^@^@(^B^@^@^@^@^@^@0^B^@^@^@^@^@^@^@^@ ^@^@^@^@^@^B^@^@^@^F^@^@
Linking
The last and final step, is linking. In this step, the object file is converted into an executable file. In this step, any functions or environment variables that are defined in other files get implemented to their real code. For instance, if you use the function printf()
but do not include the header file stdio.h
, then the object file will fail trying to convert into being executable. Understanding the difference between the compilation and the linking steps, can help you debug your code when you run into errors. If the code is missing a semi-colon or a parentheses, then it won’t even pass compilation. However, if you use a built-in C function but forget to include its header file (where the function is defined), you will get an error after compilation. Thus you can differentiate errors, and understand better how to fix your code.
After all, in order to see what our C program executes, we have to run the final executable file. In the picture, I have created a file main.c
that contains a simple C program that prints the string “Hello World”. Using all the commands shown above, different compilation files have been created.
Now, using the gcc
command without any flags (except -o
to output into a different file), we completed the final step of creating an executable file called “execFile”. After going through this insane compilation process and many conversions of different files in a matter of nano-seconds, we can see that running our executable file prints the string “Hello World!”.
Amazing.