A second course on “Hello World!”(1)
The first program people write in any programming language is the “Hello World!”. It was normally viewed as a trivial one. In C it’s like this:
int main ()
Then one is instructed to save it as something like hello.c and invoke the compiler by
# gcc hello.c
which then produces a file called a.out. If one now enters “./a.out” the following would happen:
What’s the magic that make this happen? In this post I will try my best to explain.
Note: I work on a Macbook Pro running Mac OS X 10.9, but most of the commands I listed in the post shall work on both Mac and Linux. By “Linux”, I mean the Linux machine I worked on. Things will be a bit different if I ever get into the details of the standard library and system calls.
First of all, the C program “hello.c” was compiled to the executable “a.out”, which is machine code. In our case above the compiler is called GCC, which stands for “GNU Compiler Collection”.
The first interesting observation is that the same C program can be used on different computers, to produce different executables. An executable made for one computer (say a Mac running OS X) simply won’t execute on another (say a Linux machine) — here is what’s likely to happen if you try to do so:
-bash: ./a.out: cannot execute binary file: Exec format error
Stages of Compilation
A lot is going on during compilation: the file “hello.c” is only 77 bytes while the executable “a.out” turns out to be 8496 bytes on my machine.
According to the man page of GCC(1),
Compilation can involve up to four stages: preprocessing, compilation proper, assembly and linking, always in that order.
Preprocessing handles the first line of “hello.c” (#include) as well as macros if presented. One can check the result of this stage by
# gcc -E hello.c -o hello.i
(preprocess the file “hello.c” and write the output to “hello.i”.) The preprocessor basically removes the line that starts with “#include” and replace it by the contents of the file <stdio.h> (located at /usr/include/stdio.h). “hello.i” a lot longer than “hello.c”, but most of it are forward declarations that won’t be used, so it is effectively equivalent to this “hello.c without stdio.h”:
int printf(const char *, …);
The second stage of compilation, “compilation proper”, can be checked out by doing
# gcc -S hello.i
which will produce a file called “hello.s”. The content of “hello.s” might get over people’s head, but one may guess what it does: push/pop things to/from the stack, move things from a register to another etc.
The third stage, assembly, uses the assembler to handle “hello.s”. On OS X, it can be done manually by the following:
# clang -cc1as -filetype obj -o hello.o hello.s
and on my Linux box it can be done by
# as --64 -o hello.o hello.s
Now with the object file “hello.o”, we can finally link and produce the executable. One might ask, what are being linked against, if there is only one object file? Answer: against the definition of “printf”, which is only forwared declared so far. On OS X, printf is defined in the system lib. So the linking stage can be done manually by
# ld -demangle -macosx_version_min 10.9.0 hello.o -lSystem
while on Linux it’s defined by libc.
# ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o hello.o -lc /usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o /usr/lib/x86_64-linux-gnu/crtn.o
Whoo! Now there is the long waited “a.out”. One can now appreciate a little bit the hard work that gcc has done. It won’t tell all this unless ‘-v’ is added! Let’s take a break here. There is more to say.