The Dining Philosophers
Edsger Dijkstra develops an intuitive way to juggle multiple processes at the same time
When reading the history of their subject, it might seem that computer scientists develop a solution to a problem only to find that their solution opens up a whole new field (and a whole new group of problems). In some of these situations, a computer scientist can be forgiven for not anticipating the complications that result from their work. But multi-tasking must have seemed a pretty safe bet as a harbinger of chaos. It was almost certain that allowing multiple users concurrent access to a system’s resources might raise a problem or twelve. As multi-tasking, multi-user computers went mainstream in the 1960s, those problems quickly revealed themselves and they turned out to be particularly nasty, verging on evil. But, since concurrency allows computers to be used so much more effectively, these problems were worth tackling head on.
Why are they so bad? Well, understanding and predicting the behaviour of even a single program can be difficult. That difficulty comes from having to address every eventuality when running through the program step-by-step. A totally linear program (i.e., one with no decisions or iterations at all) is simpler to understand because there is only one path through the program. Each time you add another iteration or decision, you increase the number of ways the program might behave when executed, thus adding to your mental load. Still, however complex it becomes, a program running alone and uninterrupted on a computer is theoretically deterministic. That’s to say, how it will behave can always be predicted if you know all the input.
It’s like the old clockwork universe theory of physics which posits that the outcome of any event in the universe (like throwing a ball, or a planet beginning a new orbit) is totally predictable as long as we have the right formulae and enough precise information. But like the clockwork universe theory, it’s not that simple in reality. In physics, chaos theory showed just how unpredictable the universe really is. It proved that a physicist’s prediction, no matter how accurate the inputs, won’t necessarily turn out exactly as expected. This means two supposedly “identical” events could have different consequences. Chaos theory smashed the delicate inner workings of the clockwork view.
Computer science has an equivalent chaos theory in the form of non-determinism.
Holed up in their abstract world, safe from the vagaries of matter, you might suppose computer scientists had the luxury of laughing at the physicists, but not so. Computer science has an equivalent chaos theory in the form of non-determinism. Like chaos theory, it tells us that under some conditions and without preventative measures, it’s possible that a program won’t behave predictably even though we might know all the inputs to a program. In other words, that program can yield different results every time you run it with the exact same input.
One way to introduce non-determinism into your programs is to run them concurrently on the same computer. Not only does concurrency have hazardous effects on the predictability of programs (which is bad enough), preventing those hazards is often extremely difficult. Concurrently-running programs have to share access to certain computer resources and operate under the direct control of the operating system. This is the root cause of non-determinism. A program on a multi-tasking computer is exposed to all manner of dangers. For example, a CPU that interrupts its execution without warning; unruly neighbouring programs that break the operating system’s rules, or shared information which is unexpectedly altered by other processes. It’s no longer just the number of paths through your program that makes things complicated. With concurrency, forces beyond your control can change important factors at any time and screw up your program regardless of the path taken.
I’ll come to the specifics of these chaotic problems in a moment. But first, I’ll introduce you to the computer scientist’s favourite way of modelling these problems. Edsger Dijkstra was a well-regarded computer scientist who contributed to the fight against the chaos of concurrency. He brought his brand of mathematical discipline to bear on the whole mess and emerged with an elegant framework that describes both the problems and solutions in a concise way. In addition to the actual work, Dijkstra is also recognised for his method of modelling concurrency problems, which, curiously enough, involves a group of philosophers eating spaghetti.
The scenario is as follows: Five philosophers are sitting at a round table and they each have a plate of fresh spaghetti in front of them. There’s also one fork between each plate (five forks). However, while our philosophers are intellectually adept, their motor skills leave something to be desired. None of them is able to eat spaghetti without using two forks at the same time. They won’t even attempt to eat without holding two forks. The philosophers therefore can’t all start eating simultaneously, because there aren’t enough forks on the table. Fortunately, there are a couple of characteristics about philosophers that will help us. Dijkstra’s philosophers are very simple creatures. They are only ever in one of two states: thinking or eating. Their one-track minds mean they cannot eat and think at the same time. So a philosopher stuffing his face will at some point let his mind wander off to ponder the eternal mysteries of existence, at which point he’ll place both his forks on the table. A while later, he’ll grow hungry again and want to eat. If both forks are available he can start munching immediately, but if either of his neighbours have started eating in the meantime, he’ll have to wait until the forks become available again.
The whole setup is analogous to a set of processes on a computer. A single-tasking computer is like a solitary philosopher with his own pair of forks. When processes don’t interfere with each other in any way, they can do what they like and you’re free to think of them as deterministic. But when they do interfere with each other, in this case because they are fighting for access to a shared resource (the forks), they become non-deterministic and vulnerable to serious problems. A thinking philosopher is analogous to a process doing stuff which doesn’t affect other processes. An eating philosopher represents a process that is dealing with a shared resource and thus impacts on other processes.
Yes, it’s weird, but don’t blame me, I didn’t make it up. Nevertheless, by using our spaghetti-eating philosophers, it’s possible to explain the problems we face in concurrent computing in a more understandable way. There are three main problems that one has to combat when implementing concurrency in computers.
Deadlock is the first problem to be discussed because it is the mother of all concurrency problems. Not only can the conditions leading to it be extremely subtle, but its occurrence will certainly kill a program and perhaps even take the whole system with it.
For example, when writing a program that needs access to a shared resource, you might develop a program that checks if the resource is available, pauses if it’s not and checks again after waiting a while. Seems sensible. But what if other programs were also designed like this? Let’s use an example of file handling. A file is shared resource, so we can only ever allow a file to be written to by one process at a time. Let’s say process P1 is currently writing to the accounts file. P1 therefore has exclusive access to it at that moment. P1 then decides it also needs to write to the personnel file, but it finds that a different process, P2, is currently using the personnel file.
“OK,” says P1. “I’ll pause and wait until P2 gives up access to the personnel file.”
At the same time, P2 decides it wants to use the accounts file, but finds that P1 is using it.
“No problem,” says P2. “I’ll just pause and wait until P1 gives up access to the accounts file.”
Each process is waiting for the other to make a move…like two extremely shy people in love.
Can you see the problem? Each process is waiting for the other to make a move, but the condition for either process making a move is that the other process makes a move! Like two painfully shy people in love, each is waiting interminably for the other and the whole situation proceeds no further. Not only have both processes deadlocked themselves, they’ve also locked up the resources they’re holding (files X and Y) so that no other process can use them.
The possibility of deadlock clearly needs preventing, but it’s a tough problem to crack. To see how subtle it can be, let’s look at a perfectly reasonable policy we might write in order to control access to shared resources. Our philosophers will kindly demonstrate the policy:
- Keep thinking until your left fork becomes available, then pick it up.
- If your right fork is available pick it up, otherwise keep thinking until it’s available and then take it.
- Fill your face with spaghetti.
- Put down the left fork.
- Put down the right fork.
- Sit back and think again and go back to step 1.
Seems a reasonable, cooperative solution, yes? As long as philosophers are alternating between thinking and eating, what could possible go wrong? But what happens if every philosopher has picked up their left fork? No more forks are available, but each philosopher is stuck at step 2, waiting for a fork to become available. This will continue until they all starve to death. Even worse, all the spaghetti will go cold.
Attempted “solutions” for deadlock often end up like this. They appear to work perfectly well under most conceivable conditions, but some obscure corner case ends up ruining everything. Unfortunately, there’s no ultimate solution for deadlock. However, there are several strategies for dealing with it, including detection techniques (which allow deadlocks but hunt them down), avoidance strategies (which imperfectly prevent the conditions necessary for deadlock, e.g., by allowing processes access to a maximum of one shared resource at a time), or hoping they don’t occur (I kid you not…these are called ostrich algorithms).
Among our philosophers, we have Baruch Spinoza. He isn’t much of an eater and he’s quite a slow thinker, spending a lot of time lost in thought. Both his immediate neighbours at the table, Aristotle and Cicero, are quite the opposite. They have quick minds and big appetites, spending little time in thought before turning back to the spaghetti feast. Because Spinoza’s particularly slow, Aristotle and Cicero can even put their forks down, think for a while, and pick them up again before Spinoza even realises. Consequently, every time Spinoza comes out of thought and tries to eat, he finds that his neighbours are already eating. Thus, poor Spinoza never gets a chance to eat.
This problem is called — appropriately enough — starvation. Starvation occurs when a process never gets a chance to access the shared resource because of the scheduling design. It’s not deadlocked, because the process can still do something, but all it’s doing is repeatedly checking the resource.
Starvation is dealt with by improving the scheduling approach. Operating systems can use a number of techniques to guarantee that every process gets some minimum level of attention. One such technique is called ageing. Using this approach, the operating system looks out for processes that have gone unrun for a long time and assigns such processes a higher priority. This lets them “jump the queue” and get dealt with ahead of the competing processes that are starving it.
Perhaps the dining philosophers could implement a similar system: Any philosopher whose stomach starts to rumble is then allowed to force their neighbours to give up their forks. That should prevent Spinoza from going hungry on account of his greedy Mediterranean neighbours.
While Spinoza is having his problems, there’s also trouble brewing on the opposite side of the table. Immanuel Kant and Bertrand Russell are both strong-willed and not particularly enamoured of one another. They decide it’s best to avoid each other and both philosophers get on with having a jolly good think. When Kant finishes, he looks to see if a fork is available. Barely a millisecond later, Russell looks at the exact same unused fork. Both reach for it but Kant manages to snatch it first. The now-enraged Russell has become the victim of a race condition: he observed the state of a shared resource, but someone else changed that state before Russell could act upon his observation.
Some of us might have been the victim of a race condition in the real world. Booking systems are particularly vulnerable to them. To illustrate this, let’s put more of our characters into a situation and see what happens. This time, Alan Turing and Edsger Dijkstra are making travel arrangements for an upcoming computer science conference. At the exact same moment, Turing and Dijkstra phone the same airline and get through to different clerks. There’s only one seat left on the plane they want: B23. Here’s how the whole thing proceeds in real-time:
Dijkstra’s clerk checks the airline’s computer system for a seat and finds B23 free. “You’re in luck, Mr. Dijkstra. B23 is the last seat left. Would you like me to book it for you?”
Meanwhile, Turing’s clerk also checks the system and finds B23 free. “Not a moment too soon, Mr. Turing. All we have left is B23. Would you like me to book it for you?”
Dijkstra replies to his clerk. “Yes please.”
Turing replies to his clerk. “You bet.”
Dijkstra’s clerk instructs the system to enter “Dijkstra” into seat B23.
Turing’s clerk instructs the system to enter “Turing” into seat B23…and Dijkstra’s name is unknowingly overwritten by Turing’s.
Two weeks later, Dijkstra turns up at the airport to check in only to find that B23, the seat he booked, is reported by the system as booked out to some guy named “Turing”. Enraged, Dijkstra berates the check-in clerk before being escorted out by airport security. He goes home and the experience makes him resolve to fix computer-based concurrency problems once and for all, including inventing some weird analogy involving spaghetti-eating philosophers. The rest is history.
All right, so it wasn’t really an infuriating double-booking incident that stung Dijkstra into action, but he was still responsible for a good share of the work on concurrent programming. As well as the spaghetti-eating philosophers analogy, he also developed was the principle of mutual exclusion.
Mutual exclusion makes sure that no processes can access shared data in a way that causes inconsistency problems. To do this, all parts of a program that access shared data are isolated in something called a critical section. When a running process enters its critical section, no other processes on the system are allowed to enter their own critical sections. The other processes are free to do whatever else they like, but are expressly forbidden from running any part of their critical sections until the original process exits its own. It’s as though the data were housed in a locked, windowless room and only one person at a time is allowed inside.
No perfect solution exists… If there were a one-size-fits-all solution, then we would always be using it.
That’s the idea anyway. As with the other concurrency issues, no perfect solution exists, which makes it a particularly rich field within computer science. If there were a one-size-fits-all solution, then we would always be using it and there would be no need for further research. But without a perfect solution, computer scientists have had to come up with a host of concurrent computing tools. Every system designer needs to take into account the ways in which the system must work and, using the available tools, tailor a solution.
As an example, mutual exclusion actually comes in two flavours: optimistic and pessimistic. The safer way, pessimistic, has already been described — all processes are locked out of a critical section while another is executing its own in order to prevent inconsistencies. But if the likelihood of an inconsistency occurring is low, then pessimistic locking can cause unnecessary performance problems because many processes might be blocked needlessly. In other words, this approach would be overly cautious and an optimistic locking approach might make more sense. This process is free to enter its critical section at any time, but it must do two additional things. First, when it enters its critical section, it must check the initial state of the shared data. Second, when it is just about to make a permanent change to the shared data, the process must check that data once again. If the data is the same as before then no other process altered it in the meantime, and so it may make its changes. If the data is different than before, it means the data was altered in the intervening time. The process must therefore throw away the work it did and start again from the critical section’s beginning. Not ideal, but if this kind of collision occurs only rarely, it’s probably worth tolerating for the benefit of overall system performance.
If the airline decided to deploy a pessimistic solution to the airport double-booking problem, our original situation with Turing and Dijkstra would run a little differently. The critical section for a flight-booking system would include any code that accesses information about the seats. Let’s look again as they call the airport simultaneously to book a ticket.
Dijkstra’s clerk checks the seat availability, thereby entering a critical section because it is shared data.
“B23 is the last seat available,” says Dijkstra’s clerk. “Would you like me to book it for you?”
A second later, Turing’s clerk also checks the seat availability. Because one process in the system is already registered as being inside a critical section, the clerk’s computer puts up the “Please wait” sign.
“Just one moment, Mr. Turing,” says the clerk. “Checking availability now.”
Meanwhile, Dijkstra replies to his clerk. “B23? Sounds charming.”
Dijkstra’s clerk uses the computer to book him into seat B23. The process stores “Dijkstra” in the database and, having completed the transaction, leaves the critical section. This automatically alerts the program, signalling that it’s now safe for other programs to enter their critical sections.
Back to Turing’s clerk. The process is signalled that its now free to enter its critical section. Upon doing so, it checks the roster and finds that all seats have been booked.
“Sorry, Mr. Turing,” the clerk says. “We just sold the last seat literally a few seconds ago.”
Poor old Turing. I’m sure he would have taken comfort from the fact that it was mathematical correctness and rigour that beat him to that ticket.
Like several other areas of computing, concurrency is an area where computers clearly cannot do anything they please. As far as we can make out, there are limits to what we can achieve with concurrently running programs and some hazards remain. This forces some of the problem onto the programmers who must put this knowledge into practice and craft an imperfect solution, dealing with the inevitable trade-offs in the process. But again, because we can’t prove that real solutions to the hazards don’t exist, the research continues.