Everything you want to know about GCC

megha mohan
5 min readFeb 11, 2017

--

As a newbie to the world of programming languages and computers, we should be aware that computers cannot ‘understand’ any of the programming languages that we are working with. It can interpret only machine languages ( zero’s and one’s). Compiler is what that comes to the rescue here. It ’translates’ the programming languages to machine language. Or to put it in another way, it converts our source code to executable instruction file for computers.

GCC stands for “GNU Compiler Collection”. GCC is an integrated distribution of compilers for several major programming languages. These languages currently include C, C++, Objective-C, Objective-C++, Java, Fortran, and Ada.

Lets go through the compilation process of a simple C program and understand whats happening behind the scenes.

First of all, write a simple C program to print “Hello World!”. I open the vim editor(which is in-built in Linux), write the program and save it as “HelloWorld.c”.

We now have a file named “HelloWorld.c” which is written in C language. The goal is to make the system interpret the C code and convert it into a machine language that the system understands. There are numerous processes through which the code undergoes to achieve the final output.

Preprocessing

This steps does the following : Removal of Comments, Expansion of Macros, Expansion of the included files.

The lines in our code that begin with the “#” character are preprocessor directives. In our “HelloWorld.c” program the first preprocessor directive (#include <stdio.h>) requests a standard header file, stdio.h, to be included into our source file. If you use macros in your program , this is the stage where it gets substituted by the corresponding value. In our program, #define PRINTTHIS “Hello World\n” is the macro and all occurrences of PRINTTHIS will get substituted with the corresponding value, here by the string “Hello World\n”

So in preprocessor stage those included header files and defined macros are expanded and merged within the source file to produce a transitory source file.

By using gcc’s “-E” flag we can directly do the pre-processing operation.

[bash]$ gcc -E HelloWorld.c -o HelloWorldOutput
Preprocessed File (here, the output of HelloWorldOutput)

Note from the above output of the preprocessed file that since our program has requested the stdio.h header be included into our source which in turn, requested a whole bunch of other header files.

Compilation

The next step is to take the Preprocessed file as input, compile it and produce an intermediate compiled output. The output file for this stage produces Assembly code which is machine dependent.

By using “-S” flag with gcc we can convert the preprocessed C source code into assembly language without creating an object file:

[bash]$ gcc -S HelloWorld.i -o HelloWorld.s
Compiled File

Though I am not much into assembly level programming but a quick look concludes that this assembly level output is in some form of instructions which the assembler can understand and convert it into machine level language.

Assembly

As we all know ,machines can understand only binary language, so now we require an ASSEMBLER that converts assembly code in “HelloWorld.c” file into binary code.

ASSEMBLER was one of the first software tools developed after the invention of the digital computer.

If there are any calls to external functions in the assembly code, the Assembler leaves the addresses of the external functions undefined, to be filled in later by the Linker.

The Assembler can be invoked as shown below. By using “-c” flag in gcc we can convert the assembly code into machine level code:

[bash]$ gcc -c HelloWorld.c -o HelloWorld.o
HelloWorld.o

The only thing we can explain by looking at the HelloWorld.o file is about the string ELF in the first line. ELF stands for executable and linkable format.

An object file and an executable file come in several formats such as ELF (Executable and Linking Format) and COFF (Common Object-File Format). For example, ELF is used on Linux systems, while COFF is used on Windows systems.

This is a relatively new format for machine level object files and executable that are produced by gcc. Prior to this, a format known as a.out was used. ELF is said to be more sophisticated format than a.out (We might dig deeper into the ELF format in some other future article).

If you compile your code without specifying the name of the output file, the output file produced has name ‘a.out’ but the format now have changed to ELF. It is just that the default executable file name remains the same.

Linking

This is the final phase in which all the linking of function calls with their definitions are done. Linker knows where all these functions are implemented (Assembler has left the address of all the external functions to be called). Till this stage GCC doesn't know about the function like printf() .The Assembler would have left the address of the functions to be called and Linker does the final process of filling in these addresses with the actual definitions. The linker also does a few additional tasks for us. It combines our program with some standard routines that are needed to make our program run. So the final executable size is way more than the input file!

The entire linking process is handled by gcc and invoked as follows:

[bash]$ gcc -o Output HelloWorld.c

The above command runs the file “HelloWorld.c” and produces the final executable file “Output”.

listing all the files using ls -l

As you can see , ‘Output’ file is by default an executable file with permissions -rwxrwxr-x ,this just means that it has executable permission for all the users(owner,group and others). If you run this executable file by simply typing ‘./Output’ you get the final output of our Program !

Output of the executable file

So now we know how a C program gets converted to an executable . We will dive a little deeper into C programming in the coming articles. Till then, Happy Learning! :)

--

--