Apples and Kumquats; the perils of multi-threaded polymorphism.

The last several months I’ve been working on a pet project of mine creating object-oriented C++ wrappers around a popular real-time operating system. After releasing the first few versions, I moved on to adding additional functionality and as part of the implementation I started using my own wrapper classes. During this process of eating my own dog food I stumbled across a rather nasty bug that I thought I’d share here. This bug is especially interesting because it shows how two dissimilar parts of a software system can interact in very unexpected ways.

A fundamental class in my wrapper library is a Thread. In the operating system itself the idea of an execution context is called a task. The Thread class encapsulates this idea of a task and allows a C++ program to interact with the OS in a more object-oriented manner. The class is also an abstract base class; to make use of it a programmer derives their own class from the Thread class, overrides its Run method and away you go.

Another goal of this project was to keep the interface clean and simple. Therefore, in the spirit of RAII I created the underlying operating system task within the base class constructor. On the surface this seems like a great idea, you’re creating a Thread therefore create the backing task that’s going to actually execute that Thread. This hides the underlying implementation from the application programmer, which is exactly what we want to do. However it’s exactly this pattern that crashes the program.

The title of this article, “Apples and Kumquats”, was a favorite quote of a manager of mine. Rather than say it’s like “apples and oranges”, he’d say it’s like “Apples and Kumquats”, emphasizing just how dissimilar a set of concepts were. In this case its C++ and multithreading. C++ is a static compiled programming language. For most of its existence it never had any concept whatsoever of multithreading. Multithreading is a run-time feature of a program that is non-deterministic potentially across multiple invocations. Until very recently these were completely orthogonal concepts to one another, effectively apples and kumquats.

For you C++ and multithreading experts out there, you may already have guessed what the bug was. For everyone else, me included, sometimes when I ran my code the program crashed by dereferencing a pure virtual method. This is supposed to be impossible to do in C++ because the compiler checks for it and prevents it. You could make it happen by forcibly circumventing the compiler via typecasting, but I wasn’t. However with multi-threading this is incredibly easy to do. Let’s see how.

Remember that the Thread class is an abstract base class. Let’s simplify it and consider that it only has a constructor and an abstract virtual method called Run. The base constructor creates and schedules an operating system task for you that will call your Run method. To use a Thread, you derive from the it and implement your own real Run method.

Quickly checking our C++ textbooks, we’re reminded that multiple constructors run in order. C++ guarantees that a base class constructor is called before the derived class constructor. And if you think about it, that makes sense. However the problem is that while we were in our basic class constructor we turned on multithreading. Right then, the underlying operating system can decide to start running that additional task, and now we have one single object with code executing on two tasks running at the same time. Even worse, one of those paths of code execution is our construction chain. It’s a hidden race condition. Looking back at our C++ textbook again, we find that virtual methods are usually implemented through a vtable construct, a table of function pointers. This table is filled in as the code walks through the constructors, each one filling in their respective virtual method.

(Figure 1: simplified buggy code)

If the operating system preempts the constructor call chain before the derived class constructor has had a chance to run and fill in its Run method entry in the vtable, when the operating system tries to execute Run, there will be no function there. All that will be there is a reference to a pure virtual method, because it preempted the very constructor call chain that was going to fill it in. There is nothing wrong with the C++ compiler, it’s working exactly as it’s supposed to. There is nothing wrong with the operating system or with multithreading, it’s working exactly as it’s supposed to. The problem was with me, for not realizing how the two orthogonal systems might interact with each other.

After much work and algorithm development, it became clear that the only generic solution was a two stage initialization. The base constructor simply cannot schedule the object to run as a task. Instead, there needs to be an additional method that allows the derived class to signal that it’s completed it construction and is ready for its Run method to actually be run. If you have any experience programming Java, you recognize this pattern already. And now you and I know why it’s there after learning it the hard way.

(Figure 2: Corrected code)

Another interesting side note, because I was working with a hard real-time operating system I never actually encountered this bug in the first version of code, despite all of the unit tests I ran. In a hard real-time operating system tasks have an explicit priority. In all of my unit tests, the priority of the tasks guaranteed that created Threads could never run until after the creating Thread was complete. Which means you’d never see the bug. On the other hand if the Threads you created are always higher priority than the creating Thread, then you always hit the bug. Why do I mention this? Because in most general-purpose operating systems like Linux and Windows, thread priorities aren’t usually specified and may dynamically change at run-time. Because I was on a hard real-time operating system this was an “always or never” bug. On a general-purpose operating system, this would be a fantastic heisenbug.

(Special thanks to my former manager Dave for the quote.)

References:

(Originally posted at Linkedin.com.)