Prisoner’s Dilemma’s Dilemma

Some math applies to real life, some of the time

Stuart Ferguson
6 min readJan 14, 2023
Photo by ThisisEngineering RAEng on Unsplash

Young people always complain that they are never going to need the abstract math they are being taught in school. Of course there are many professions that require quite advanced math, not just science and engineering but business, finance and public policy. Many crafts are surprisingly math-adjacent, finding their foundations in linear algebra and geometry. Despite that, the sentiment isn’t entirely false. After all, what can math tell us about getting along with one another? That has to be learned in the school of hard knocks.

Actually math does a have a field that purports to be relevant to human behavior — game theory. Game theory is the study of various kinds of formal games, and how to unlock reliable solutions. Games are often used to model more complex relationships in the hope of finding principles that can be generalized. The mother of all of those game theory problems is prisoner’s dilemma.

The setup is simple. Two people have been arrested for a crime. They are separated and are each independently offered the chance to defect — to implicate the other person in the crime. If both cooperate and refuse to defect, they both go free; if one defects that person goes free while the other goes to prison; if both defect then they both go to prison.

Ideally the prisoners cooperate with each other, no one gets implicated and no one goes to prison. The problem is that neither prisoner knows what the other one is doing. If you add up all the possible outcomes the best decision is to defect. Since both of the prisoners use the same logic to arrive at the same decision, both go to prison. Even though acting together they could both succeed, the fact that they are acting independently means they both fail.

This is the crux of the dilemma.

Photo by Yoav Aziz on Unsplash

Now if you paid close attention you may have noticed that, no, the setup I described implies no such thing. If we just want to avoid jail, then either cooperating or defecting are equally possible to succeed depending on what the other person does. There’s no obvious reason why defecting leads to the better outcome. In fact game theorists have to stack the game to capture the behavior they want: for example, they might score going to prison alone as 0, going to prison together as 2, going free together as 3, and going free alone as 5. Thus cooperation gets you 0 or 3, while defection gets you 2 or 5, giving defection the better payoff if the other person acts randomly.

It’s hard to quite see how the narrative can support those numbers. How is going to prison together that much better than going to prison alone?

Likewise in messy reality there are social factors to consider. Many communities look down on anyone who helps the authorities. From gangs and organized crime, to marginalized people, to the lawless wasteland of the schoolyard, everyone knows that snitches get stitches. In that circumstance the social pressure against defection could easily tip the balance to successful cooperation. People also have a reputation they are concerned about, which could be tainted by a defection here. Even a simple commitment to justice and fairness should leave a person outraged enough by this blatant abuse of police power to override the math of self-interest.

Photo by Michael Förtsch on Unsplash

The inequalities in the payoff matrix really define prisoner’s dilemma, not the strained metaphor. Purely as a game where two players secretly pick their move and then earn a payout based on the table described above, it does manage to model an interesting situation. It’s a bit like a collective action problem, where parties need to work together for all to succeed, but there are substantial rewards — and no penalties — for one party to back out of the arrangement. Thus everyone betrays each other, like the tragic ending of a heist movie.

This result is formalized as the Nash equilibrium, which is any outcome in which none of the parties can improve their position by making a unilateral change, and therefore stable. “Both defect” is the Nash equilibrium for prisoner’s dilemma. In all other outcomes a cooperating player can always improve their position by defecting.

To game theorists this is considered the rational solution.

It’s easy to generalize. We could conclude from this well-studied example that any collective action is impossible. Perhaps this illustrates that we cannot rely on individual action to maintain common goods. Any compact founded on trust will be bound to fail because individuals will find exploits that benefit themselves at the expense of those who followed the compact. Is prisoner’s dilemma a mathematical proof of the tragedy of the commons? There are social sanctions that disapprove of non-cooperation in ones social duties, but what do those really mean anymore? How strong can that be against the promise of personal enrichment?

Photo by Sangga Rima Roman Selia on Unsplash

Fortunately there’s more to the story than just the Nash equilibrium. Human interactions are rarely one-time events, and normal social behavior may be better modeled by what’s called iterated prisoner’s dilemma. Instead of a single game, the same two players play a series of games where they can adjust their strategy as they go. Purely logical analysis would conclude that “always defect” is still the strongest strategy, because it’s the logical single-game strategy so it should dominate in iterated games as well. Indeed a strong strategy includes defecting when necessary, but there’s a complication.

In iterated prisoner’s dilemma what counts is the cumulative score. Two players who always defect come away from the contest each with middling scores, because the “both defect” outcome has marginal returns. In order to do better a player could play a predatory game trying to trick the other player into cooperation when they intend to defect. This isn’t sustainable. The other way to get higher scores is for two players to cooperate. Not just once or twice but many times.

The only way to consistently beat the default score is to learn to recognize opponents who will reliably cooperate.

Photo by Karthik Balakrishnan on Unsplash

It may seem counterintuitive — after all we’ve been informed by game theory that defection is the rational choice — but this result is borne out by experiments. There’s no closed-form solution, but strategies that did better than the baseline in iterative prisoner’s dilemma tended to have these features:

  • Nice — start with cooperation and don’t be the first to defect
  • Retaliation — meet defection with defection
  • Forgiving — after retaliation allow a return to cooperation
  • Non-envious — don’t try to score more than the other player

Cooperation doesn’t mean being a pushover. Strategies that find a way to cooperate do better in long run because there are objective rewards for cooperating. Even given the miserly payoff schedule of prisoner’s dilemma, collective action is worthwhile. The cost is that sometimes the other player does better than you.

It makes you wonder, is the Nash equilibrium actually even the rational solution to this kind of game? Turns out the answer is no, but that’s a story for another time.

--

--

Stuart Ferguson

3D graphics pioneer, entrepreneur, maker, champion of science and reason, and philosophical gadfly