The GNU Toolchain Explained

Aditya Rastogi
The Startup
Published in
9 min readMar 17, 2020
Photo by Fabian Grohs on Unsplash

The GNU Toolchain is a set of programming tools in Linux systems that programmers can use to make and compile their code to produce a program or library. The toolchain contains GNU m4, GNU Make, GNU Bison, GCC, GNU Binutils, GNU Debugger and the GNU build system. Let’s understand these tools one by one in detail.

GNU m4

GNU m4 is a macros preprocessor. The way it works is that in the source code file itself, we can define our own macro definitions (and there are some built-in definitions as well), based on which the m4 software copies the input to the output, expanding macros wherever they occur in the input.

The m4 program can be run from the command line by just typing m4. Starting with the standard syntax of a macro definition, we have “define” in m4. As we see in the following image, defining Hi as Hello, just replaces Hi with Hello.

Note that the opening delimiter is the back-tick and the ending delimiter is the single quote character.

This above example illustrates the if-else construct in m4. The ifelse checks for the equality of the first and second arguments, and if they’re equal, then the third argument is executed otherwise the fourth argument is executed as can be seen in the above example.

The above example is using concat as a function. $n is the n-th argument to the function. So, by defining concat as $1$2, we’re essentially concatenating the first and second arguments passed to concat to produce its output as can be seen from the output ‘New York’.

There is lot more to m4 and there are a lot of things which we can do with it. For now, let’s proceed to GNU Make.

GNU Make

Suppose you’re compiling a single C file. The linux command for this is pretty straightforward, just use gcc with the name of the source code file. But what if we want to compile a large software, possibly with a large number of files with a lot of dependencies between them. This is where GNU “make” helps us with. Make works with files known as makefiles. Let’s see what a makefile looks like with the following examples.

Perhaps the simplest thing we can do with a make file is to create an empty file.

If we type make as above, the GNU make tool searches first for the filename Makefile. In the Makefile, it by default tries to call commands related to the first target. We can specify dependencies after the target as shown in the following image. Before making the target, Make would first make its dependencies.

The most common use of make is to compile software with multiple programs and dependencies between them. An advantage of using make is that, after a make, if we modify some files and need to remake, then it can figure out from the dependencies and file updates that which minimal subset of files need to be remake-d in order to remake the entire software. This can save a lot of compilation time for huge software.

We can use make for other purposes too. In the next example, we use make to convert multiple image files in a single directory from jpg to png and vice versa.

GNU Bison

Let’s recall what a lexer and a parser is, as it would be helpful to understand the role of GNU Bison.

Lexer

The lexer is also known by the name of a lexer analyzer or a tokenizer. So, what is its role in the compilation process? The lexer reads the source code of a program and outputs a stream of tokens. For doing so, it first groups the source code characters into lexemes. A lexeme is an exclusive identifiable sequence of characters. Examples of such a sequence of characters are keywords (int, char, void, etc.), literals, identifiers, punctuators, etc. You can think of a lexeme as a class and a token as its instantiated object.

Example of a commonly used lexer generator is flex. Note that flex is not part of the GNU project. Let’s see some flex code to better understand it.

This flex code identifies strings (without spaces), numbers, semicolons and sends this string of tokens to the bisoncode file. Lines 1 to 4 is the part where header files are included. Here, Bisoncode.tab.h would be later generated by the bisoncode (parser) file. Lines 6 and 7 define what a STRING and a NUMBER is, through the use of regular expressions. Lines 10 to 13 identify the occurrences of expressions on the left side and execute the block of code on the right. The returned values are sent to the parser.

Parser

The sequence of tokens generated by the lexer is sent to the parser for syntactical analysis. The main roles of the parser are to check if the grammar of the language allows the generated sequence of tokens, to report any syntax error and to do error recovery, if possible. GNU Bison is a parser generator. Now, let’s see the bisoncode file which goes with the above lex code.

Lines 14 to 40 are the most important part of this bison code. They define the grammar of our language, which in this case, is that our program is a bunch of statements where each statement should be ended with a semicolon. Lines 4 and 5 are standard includes for lexer.

Now let’s run the flex and bison programs on these codes. For this, let’s see the following make file.

Line 2 runs flex on the .l file we wrote above. On running this, the lex.yy.c file would be created. Line 3 runs bison on the bisoncode.y file and here -d option produces a header file. After running line 3, bisoncode.tab.h and bisoncode.tab.c files would be created. Line 4 uses lex and parse code to generate the executable file.

Running the executable with the following input generates the following output.

running the executable file

GNU Compiler Collection (GCC)

The GNU Compiler Collection (GCC) is a set of compilers used in Linux systems. “gcc” is the compiler for C, and “g++” is the compiler for C++. There are other compilers in this set like gcj for Java, gccgo for Go, gfortran for Fortran and GNAT for Ada.

Normally, GCC does preprocessing, compilation, assembly, and linking. But we can stop the process at an intermediate stage using its options. For example, when the -c option is used, then the linker is not run. The output then consists of an object file (.o) for each source file. The -S option stops the process after compilation and we get assembly code (.s) as the output.

GCC Optimization Levels

GCC also supports a number of optimization levels. In order to choose some optimization level with gcc, while compiling with gcc, we need to use the option -oLevel where Level varies from 0 to 3.

By default, GCC doesn’t perform any optimization and this is called level 0. Level 1 optimization turns on simple optimizations that don’t need any speed-space tradeoffs. Using this option, not only is the resulting executable faster, but also it is smaller in size. Level 2 turns on the instruction scheduling optimization. Even with level 2, speed-space tradeoffs are not used and hence the executable file doesn’t increase in size. However with the instruction scheduling optimization being carried out while compilation, the compilation time and memory increase. Level 3 turns on more expensive optimizations like function inlining in which instead of having function calls and returns, the code is expanded by substituting the function call inline with function definition code. With this, both the executable’s speed and size increase.

Generally, people use level 0 for debugging, and level 2 for development and deployment.

GNU Binutils

The GNU Binutils are a set of tools (e.g. assembler, linker, etc.) that manage files such as object files, assembly code, libraries, etc. The most important tools included in binutils are the assembler and the linker.

Assembler

The assembler processes assembly language programs to produce relocatable machine code. For a particular processor, there’s a one-to-one mapping between assembly code and machine code.

Linker

During the compilation of a program, it often happens that the compilation is done in pieces. Library files and other object files have to be linked together with the relocatable machine code produced by the assembler. The role of the linker is to link these multiple files together.

Commands included in binutils are as (assembler), ld (linker), gprof (profiler), objcopy, objdump, addr2line, etc.

GNU Debugger (GDB)

GNU Debugger (GDB) is the debugging tool that comes with the GNU toolchain.

Let’s debug the following short C program using GDB in order to understand some of its functionalities.

The above program’s goal is to print the sum of 10 random bits. But we get stuck in an infinite loop when we try to run it. From the code, we can see that it gets stuck because we’re not updating the variable i inside the while loop. But let’s use GDB to figure that out.

We need to compile our program with the -g debugging option in gcc and type gdb with our executable’s name to open gdb as shown in the below image.

The command “layout next” opens the following display in which we would be able to see the program in execution after we would run it.

Let’s run our code using the run command. We see that it gets in an infinite loop. We press Ctrl + C to interrupt the execution and we see that it is stuck in the while loop.

We then press n (for next line) and print the value of i (using print i) to see that i is not updating. So, from this, we get to know that the value of i isn’t updating and we should add an i++ in the loop.

Using GDB, we can start our program with input arguments and stop it as well by specifying some conditions. After our program gets stopped, we can examine what has happened. One way of doing this to print the values of variables as we did in the above example. We can also change things in our program using GDB and experiment with bug corrections.

GNU build system

The GNU build system, also known as autotools, includes Autoconf, Autoheader, Automake, Libtool, and GNUlib. These are used to produce configure scripts, portable makefiles, and libraries. Autotools are designed to help make source code packages portable to many Unix-like systems.

References

  1. https://www.linux.org/threads/gnu-toolchain-explained.10570/
  2. https://www.youtube.com/watch?v=Lyp36ku7D0A for GNU Make
  3. https://www.youtube.com/watch?v=POjnw0xEVas for flex and bison
  4. https://box.matto.nl/m4.html for m4
  5. http://web.mit.edu/gnu/doc/html/m4_1.html for m4
  6. https://www.linuxtopia.org/online_books/an_introduction_to_gcc/gccintro_49.html for GCC optimization levels
  7. https://www.youtube.com/watch?v=bWH-nL7v5F4 for GDB

--

--

Aditya Rastogi
The Startup

Interested in learning about computations that make perception, reasoning and action possible.