A single trip through the policy deck

Designing the policy deck in Secret Hitler

Last essay I talked a bit about why I think the policy deck is a good contribution to the Social Deduction genre: it introduced a new method of hiding information about the actions of the Bad Team that allowed for more deduction than other mechanics. Any time a fascist policy is enacted, members of the government can plausibly claim that it was the deck that caused the fascist policy to emerge, not the players.

We built Secret Hitler around lots of interactions between pairs of people. Players always learn about someone specific, but it’s always a small piece of the overall puzzle. As a result, different players quickly accumulate different sets of information, which in turn produces very different expectations.

If the game is well-balanced, the liberals can win by making their information public, painting a cohesive picture and using social cues to sort out information from misinformation quickly. The fascists can win by creating clever misinformation that suggests an alternate version of events.

Here I want to give you some more details about how we designed the policy deck to make Secret Hitler so intense; I hope to answer the question, “Okay sure but were you successful in creating a puzzle-like social deduction game?” We emphasized playtesting throughout the process, and only used (very basic) number-crunching to make sure that player intuitions were roughly tracking the actual game state.

Deck Imbalance

The policy deck is intended to give fascists cover for enacting fascist policy. It should force some fascist actions from liberal players (which complicates liberals’ ability to find each other) without forcing so many that the fascists never have to take risks.

We found over months of playtesting that to strike the right balance, fascist policies had to outweigh liberal policies about 2-to-1. That’s a number we pretty carefully calibrated: the earliest versions of the deck were much more evenly split, and we continued to add fascist policies to the deck until we got to almost 2.5 fascist policies for every 1 liberal policy. Then, we scaled it back to our final balance: at the outset of the game the policy deck contains 11 fascist policies and 6 liberal policies.

Players’ allegiance had almost no correlation to their revealed actions.

Before we got up to 2-to-1, the puzzle was too easy: fascists couldn’t create believable misinformation. Liberals drew at least one liberal policy so often that they could safely exclude anyone who claimed to draw three fascist policies. Even if they prevented another liberal being elected, getting most of the liberals right was enough for the liberals to win.

Any higher than 2-to-1, and the game felt less like a puzzle and more like a roulette wheel. Liberals couldn’t meaningfully separate signal from noise. Fascists’ odds were so good, it was almost always plausible the President drew 3 fascist policies, and the liberals were helpless. From a game design perspective, this was the worst-case scenario: players’ allegiance had almost no correlation to their revealed actions. We had games where every liberal policy was passed by a fascist attempting to go undercover, and most fascist policies were passed by deck-forced liberals. Just a nightmare, really.

Getting it Right

Around 2-to-1, we found it: neither team can rely on the deck to protect them completely. Liberals must use social cues to separate false positives (liberals who were deck-forced) from the actual fascists at the table, and fascists must take risks and be clever about the misinformation they create.

In short, it’s plausible to blame the deck for your fascist-looking action, but not to blame it for every fascist-looking action.

It’s not a surprise that 2-to-1 ended up being the right ratio: 2-to-1 draws are the most interesting, because they give liberals a chance but allow one fascist on the government to force a fascist policy. It makes sense that the deck would reflect that ratio too. The odds of that draw typically stay around 50% as the deck composition changes; some deviation makes sense, too much deviation from it should prompt questions from the players.

Probability of drawing two fascist cards and one liberal card for any deck composition

I’ve built some very basic models to show how quickly players accumulate different sets of information, and the results mirror what our playtests tell us: small differences in information have a significant impact on expectations. We obviously don’t expect players to compute Bayesian probabilities on the fly — these were more a learning exercise for me personally than a game design tool we relied on — and we didn’t find that knowing probabilities in advance created a significant advantage. In fact, if these models show anything, it’s that most players do a good job intuitively of tracking what’s missing and why it matters. Still, don’t let me be the one to discourage it. If you’re interested, I’ve put some more information about those models at the end.

Draw/Discard Piles

Balancing the deck was an incremental process, but I think the single biggest breakthrough on making the policy deck really interesting to interact with came when, about halfway through development, we stopped shuffling discarded policies back into the deck and started a separate discard pile. That did two things I’m extremely happy about:

First, it uprooted one of the last remaining bits of “grounded” information: the contents of the deck. while before players could figure rough probabilities about what was left in the deck, now players must rely on the President and Chancellor to report what they saw.

Second, it created more continuity between rounds. Any government that drew three fascist policies now cleared all three out of the deck, making it less likely that a later liberal government would draw three fascist policies. On the other hand, fascists who discarded liberal policies discarded them semi-permanently: liberals would miss out on that policy until the deck was exhausted and reshuffled.

By allowing players to clear cards out, we made it much less likely that outcomes would cluster around each other, especially streaks of three-fascist draws. Early rounds have a much bigger impact on rounds that follow. Giving players more control over the deck and then forcing players to rely on each other to accurately report the deck state brings the social deduction and puzzle-solving aspects of the game even closer together.

All Together Now

Once we added the discard pile, we settled on 11 fascist policies and 6 liberal policies because it gives us 17 total cards. Each round, three cards are played; after five rounds, 15 cards have been played and there are two policies left over. Those two “trash cards” are shuffled in without being revealed, which prevents anyone from getting perfect information.

(We made this decision after an excruciating game with a twelve-and-six deck. A friend of mine who routinely counts cards was able to unravel the entire game by comparing the last three cards in the deck to what previous governments reported should be there, and we had to sit there for half an hour while he worked it out. Had there been two extra cards he didn’t see, we could have said “maybe the liberal policies you’re expecting are the trash cards.” My thanks to him for ruining that game entirely.)

During a recent playtest, someone leaned over to me and said, “There are one or two things that if I could just figure those out, I’d have it.” That’s a feeling we’ve created very intentionally — we spent months of playtesting finding and eliminating those one or two things. Players can have a strong sense of the connections, but without a foundation, a solid starting point, they have to combine social information with their deductive prowess.

Addendum: Models

I built models to calculate P(The President drew 3 fascist policies|The Chancellor was handed two fascist policies) and P(The President drew 3 fascist policies|A fascist policy was returned) for any deck composition. That doesn’t come even close to a full modeling of the game, but it was an interesting exercise for me:

That reflects the gap between the Chancellor and the entire rest of the table. This model is most useful for the upper left, which reflects the initial state of the deck. It shows how much more trustworthy the President is from the Chancellor’s perspective than the other players’, and that gap is driven by only two additional facts: the Chancellor’s loyalty and the second policy card, both of which the Chancellor knows but players don’t.

In addition to the deck composition, there are variables for the probability that a fascist (or Hitler) on government will discard a liberal given the chance. I assume that all players are equally likely to be President (since roles are randomly distributed and the Presidency starts at a random location, that’s a safe assumption up front that becomes less reliable as the game proceeds) and that all non-Presidential players are equally likely to be Chancellor.

It ignores:

  • How players become more or less likely to get elected based on past actions
  • How likely fascist Presidents might be to Enchancell other fascists

And since it uses actual deck composition instead of expected deck composition, it’s not very useful for projecting several rounds ahead, where there might be significant variance in expected deck vs. actual deck.

In the above tables I’ve assumed that any fascist President, including Hitler, will play as aggressively as possible the first round, always discarding a liberal if they’re given the chance. More typically, fascists (especially Hitler) attempt to buy trust by passing a liberal on this early, but adjusting how aggressively the fascists play doesn’t affect the gap between the Chancellor and the rest of the table.

I didn’t spend time building a more complex model because we weren’t using these spreadsheets to optimize the game; I think that would be a bad idea. Playtests were definitely centered in development. Once we had a rudimentary sense that players reliably tracked the game without using or needing rigorous probabilities, we were sold. In fact, whether players do appreciably better or worse than the models is a function of social cues, and designing around just information or around models would have missed the point entirely.