Scientific Debugging, Part 1

Engineering Insights

Talin
Machine Words
Published in
5 min readJan 11, 2019

--

This is part one of a three-part series on scientific debugging — using the scientific method to track down difficult software bugs. It is continued in part two and part three.

The ultimate goal of scientific debugging, or any debugging for that matter, is to discover the source of the bug: its underlying cause, it’s location in the code (if applicable), and hopefully a remedy for it.

In many cases, the cause of a bug is obvious, and there’s no need for a complex deductive investigation. But frequently bugs can be significantly more challenging; finding the error in a large code base may feel like looking for a needle in a scrapyard.

Let’s start with a quick review of the scientific method, in simplified form:

  1. Observe. Gather data about the behavior of the system under study.
  2. Hypothesize. Come up with a testable theory that narrows the space of possibilities.
  3. Experiment. Perform tests which can either confirm or falsify your theory.
  4. Go back to step 1; repeat until the answer is found.

How does this process apply to software debugging?

In the Observe step, we look at the behavior of the program. What are its outputs? What information is it displaying? How is it responding to user input?

In the Hypothesize step, we try to divide the set of possible causes into multiple independent groups. For example, in a client/server app, a potential hypothesis might be “the bug is in the client”, which is either true or false; if false, then presumably the bug is on the server. A good hypothesis should be falsifiable, which means that you ought to be able to invent some test which can disprove the proposition.

Finally, in the Experiment step, we run the test — that is, we execute the program in a way that allows us to either verify or invalidate the hypothesis. In many cases, this will involve writing additional code that is not a normal part of the program; you can think of this code as your “experimental apparatus”.

Finally, we go back to the Observe step to gather the results of running the experiment.

A basic example

Here’s a simple example to demonstrate how scientific debugging works. Imagine you have a web application that stores names and addresses in a database. We’ll assume that the application has a client-side component written in JavaScript, and a server component written in Python.

Let’s further imagine that there’s a bug, such that last names are appearing in upper case; so if you enter the name “Jones” it displays as “JONES”, which is not what you wanted.

Where do we start? A good place is to determine at what point the string is getting converted into uppercase. There are several possibilities:

  • The string is actually stored in upper case in the database, in which case it was probably written incorrectly.
  • The server code is converting the string to upper case before sending it to the client.
  • The client code is receiving the correct string from the server, and converting it to upper case before displaying it.
  • The Cascading Style Sheet (CSS) for the web site is telling the browser to display the text in upper case (using the text-transform property).

We can narrow these possibilities with a simple set of experiments:

  • To determine whether the style sheet is responsible, we can simply inspect the raw HTML DOM for the page, using the browser’s debugger, or we can print out the HTML on the terminal. If the string in the HTML is not uppercase, then we know that it must be the stylesheet that is causing it to be displayed in uppercase.
  • To determine whether the client-side JavaScript is responsible, we can use a network trace utility or the browser debugger to examine the response data from the server. Again, if we see that the data is not uppercase, then we know that it must have been transformed later in the process.
  • To determine whether the server code is responsible, we could add a console debugging statement to print out the value returned from the database. However, we have to be careful here; this will only tell us what the value was at the time it was printed out, not what was actually stored in the database, because the code to load the data from the database might be at fault.
  • Finally we can check the value in the database to see whether it is uppercase or not.

Deductive Logic

We can generalize this set of logical tests by recognizing the following: at any point, if we see that the value has not been converted into uppercase, then we know that the conversion must have occurred at a later stage.

However, if the value is uppercase, then that does not mean that the opposite is true — you cannot assume that the string was not converted later. The reason is because converting a string to uppercase is idempotent: if you attempt to convert a string to uppercase a second time, nothing happens, there is no change. Thus, you have to account for the possibility that there is more than one bug; if you see the string in uppercase at a given point in the code, you know it was transformed earlier, but it may also have been transformed later as well.

Although multiple overlapping bugs are rare, they are not all that rare — and you can waste a lot of time trying to reconcile apparently contradictory experimental results because you didn’t consider this possibility.

Also, a lot of hypotheses have this asymmetrical quality, where proving a proposition to be false may yield much more information than proving it to be true. In fact, it’s not at all uncommon to have a case where a false result eliminates a lot of possibilities and a true result eliminates none at all — or vice versa. Sometimes you may have to run multiple experiments to really nail down the truth or falseness of a proposition.

All of these example “experiments” I have given are relatively simple. Most of them simply involved direct inspection of the value in a debugger or inspector. In only one case did we actually need to modify the program (adding a print statement) to produce additional output to test our theory.

In addition, the structure of the example program was such that it was easily divided into several obvious sections, where the data traveled upwards from server to client in a linear fashion. However, in a real app, data flows are much more complex, and the boundary between components may not be so clear cut. Part of your task is to imagine all of the various ways that your program can be divided into parts or segments, such that your experiment will reveal the part in which the bug resides.

Also, locating which major program component contains the problem is not the end of the process; we want to narrow it down much further, which will require additional rounds of experimentation, until we finally isolate the exact line of code where the bug resides.

In part two of this article, we’ll examine a more complex set of examples and more advanced techniques.

--

--

Talin
Machine Words

I’m not a mad scientist. I’m a mad natural philosopher.