An introduction to Debugging (in C and lldb), Part- I
What is Debugging?
How the word “bug” (and hence the word “debugging”) took the meaning that we (computer programmers) have associated with it, is not entirely clear. It could be thanks to the legendary moth in the Mark II computer at Harvard or could have been coined long ago than that. Whatever the origin is, as programmers, we all have stories to share about debugging. Sometimes they are sad, some other times they are a description of several grueling hours, but most of the time they have a happy ending. But one thing we all have in common, we encounter debugging. Almost daily. So, before starting the little tour into debugging, let us define what it is.
I propose the definition of debugging as follows—
Debugging is the process in which a developer (who could be the author or not of the code he is trying to debug), using various tools available to him, makes an effort to change part of a previously written code in order to get rid of an error that was not caught at the compile (or syntax checking) time (in other words that has surfaced at the run time subject to certain conditions).
Does it sound cool? or intimidating? If it is the later, then you should read on. Maybe this little post will help you after all.
Different types of debugging
There are many different ways people perform debugging. But before listing them out we need to understand that any debugging process is mainly (most of the time) composed of two major activities. First — to investigate values of different variables at different stages of the program execution, and second — to examine in which way the control flows (Are we entering the “if” block? Is the “break” statement inside the “while” loop ever reached? …)
- Print statements — You use “print” (or “printf” or “echo” or whatever is the equivalent) to output some values while running the code. This is possibly the easiest way to start debugging anything. It is also the most clumsy and time taking way. It is clumsy because we can very easily leave all those (or part of those) print statements behind us in the code once we are done, causing an unclean code (and in some very rare occasions, breaking it in production.). It is time taking because we have to write all the print statements and then have to monitor them while the program is being executed.
- Using logging — This method is also text-based, like printing, but with many goodies. Instead of only having “stdout” as the output mechanism (as for the printing) we can actually forward structured messages from our program to log management platforms and then use those tools to perform various searches over the corpus of text to gather the necessary info. It is a very effective way. And logging is a must for any serious production app. But it has its own limitations as well. We still have to write all the log statements inside our code by hand (and remove them later if they are not needed) and also we can not (in most of the cases) peek under the hood of program execution.
- Interactive (Symbolic) debugger — If you have ever used something like “gdb” (or “ipdb” or “pydb” or “lldb” or anything like that) then you know what I am talking about. For the rest, there are tools available in your own development machines, which give you an all-powerful access to the program at runtime in an interactive mode and let you perform very low level (and high level) investigation on a running piece of code. The advantage of this kind of debugging is that it does not need from your part any extra effort to start debugging the code. You just fire-up the debugger and point it to the executable, and voila! you are in the middle of a running program but you also have all the power to investigate (and even change if needed) it in realtime. Cool!
Few words about Virtual Memory
Before we can do any serious debugging using debugger, we need to understand how OS assigns memory to a program. There is an excellent (probably the best) introduction to interactive debugging by Peter Salzman and my post is just a small summary of the first few pages of it (with small extra due to lldb) but I will still take the time to say few words about VM. Anyone feeling curious please follow Peter’s guide (and links he mentioned there).
When a program starts, the OS allocates a chunk of memory in the primary memory (RAM) so that the process can run. This is called the virtual memory space. It is a complicated and advanced mechanism involving the OS and the MMU (Memory Management Unit) of the CPU. We do not need to go into great details about that. But we do need to know that inside VM there is a chunk of memory allocated for the process which is used by two different data structures. The Stack and The Heap (apart from some other blocks reserved for some other things). It looks like the following —
Any dynamic memory, allocated by us or the libraries we are using, will be located inside the Heap. The Stack is composed for “Frames”. Each frame represents one function call. When a new function call is made, a new frame is inserted at the top of the Stack. If you imagine the allocated VM as a vertical block of memory inside the primary memory then the stack grows downward and the heap grows upward as the process proceeds(As shown in the figure above).
We will need the concept of frames when we debug.
Debugging in Action
I will be showing commands that are relevant to the lldb, and are executed in a recent MacBook Pro (which comes preinstalled with llvm). The versions for both cc
and lldb
are following
Apple LLVM version 9.1.0 (clang-902.0.39.2) (result of cc --version)lldb-902.0.79.7 (result of lldb --version)
Let’s first write the simplest C program you can imagine. I am going to use “vim” as the editor. You can use anything you want.
vim hello.c
Once inside the editor, please copy-paste (or type) the following code —
#include<stdio.h>int main(int argc, char* argv){ printf("Hello World!\n"); return 0;
}
Save and exit the editor and then do the following
cc -g --std=c99 -Wall hello.c -o hello
Here are some descriptions of the flags we used for the compiler
-g
It tells our compiler to embed debugging info inside the output file (Symbol Table, as an example)--std=c99
The standard of C language that we are using-Wall
Catch all the warnings
To see that what we did actually runs we can try the following
./hello
Hello World!
So far so good! Let’s fire up the debugger now. Use the following command in order to do so
lldb hello
If everything goes right then it will show something like the following —
This is the prompt of the interactive debugger. In our case it is lldb
The commands in gdb
were, to quote the lldb documentation, “free form”. However, lldb had made an effort to have a standardization in commands. The commands in lldb
follow the following pattern —
<noun> <verb> [-options [option-value]] [argument [argument...]]
Here noun
represents the object you are trying to work on. This can be thread
or frame
or similar. And the verb
is the actual command. So unlike gdb
, to see the all the frames of the current stack, you have to enter thread backtrace
(For gdb
it was simply backtrace
). If you enter this command at the very beginning, you will get “error: invalid process”. This is because although we had started the debugger with the name of the executable ( hello
in our case) we had not actually asked to run it. So here goes the first command that you should run — run
. It will produce an output like the following —
We have successfully ran the executable inside the debugger. BUT! we did not have any control on it. It ran. Succeeded. And finally exited with 0 status. No magic here. So, let’s create some. We are gonna use the breakpoint set
command. Breakpoints are specific lines of code inside the source file where the debugger will suspend the execution and will give us an interactive prompt to mingle with the running process :) (the -l 10
signifies the line in my source file where I want to suspend the execution. For you the line number can be different)
Notice the magic here. We had set the breakpoint following the original source file line number but we are loading and running the executable, and yet the debugger somehow knows how to connect them and suspends at the right line of code. At this stage we can examine what are the different variables in the present frame and what are their values. To examine the variables, we do this
Voila! we have all the variables related to the present frame and their types and values. And now it is time for thread backtrace
—
As we can see, our function call main
is the top of the stack. But it is not the only frame we have in the stack. We have some other frames related to function calls that we did not do. These are dynamic libraries that are loaded at the run time.
Finally, as we do not have anything to investigate further (we are running a very simple code) we need to know how to resume the process from the point it is suspended. How do we move forward? Well, there are two commands. step
or s
in short and continue
or c
in short. The difference between them is the fact that step
just executes the very next instruction and suspends again. It is kind of one-step-at-a-time journey through the running process, whereas c
will execute all the instructions until it hits a next breakpoint or the process ends. Here we do not want to step through the process, so we use c
and as there are no more breakpoints the process will finish running.
This is the end of Part-I. We learnt about the basics of debugging here. But we have not performed any real debugging yet. We will be seeing that in Part-II.
If you like this little write up then please clap as many times as you feel. This will encourage me to write more like this. Also please share and comment. They are immensely helpful :)
See you in Part-II. Happy debugging!