Dice Roll Distributions: Statistics, and the Importance of Runtime Efficiency

Nerds love tabletops because of statistics; Change my mind

Griffin Poole
The Startup
12 min readAug 29, 2020

--

Around the start of the COVID-19 outbreak earlier this year, my friends and I were working out ways to adjust to a life on lockdown. In addition to the physical separation of the quarantine, I had recently accepted a job out-of-state that would require me to move in the coming year, and so it was important for us to find a way to stay in touch over the time and distance.

As an answer to this, I decided to start running a Dungeons and Dragons campaign for my friends. As a varied collection of assorted nerds, this would feed into our collective hobbyist, RPG enthusiast, and geeky natures, and the idea was very positively received by everyone involved.

Setting up the game and running it came with some challenges, which led me to a very interesting mathematical and coding problem, but first, I’ll explain a little about the game.

Dungeons and Dragons, a Dice-Rolling RPG

For those of you unfamiliar with D&D, Dungeons and Dragons is a tabletop fantasy role-playing game. Its hallmarks are a group of friends gathering around a table, placing themselves in the shoes of warriors and wizards to go on adventures, undergo quests, and explore dungeons. The game is run and organized by a single person, the Game Master(GM) and the “players” create characters that they use for role-play, combat, exploration, etc.

An example of the classic dungeons and dragons dice set consisting of d4, d6, d8, d10, d12, d20, and d100 dice.

The game is very open in how it’s played, but one factor is consistent across the majority of rulesets; outcomes and results in the game are decided by dice rolls taking the notation of nds, with n being the number of die to roll and s being the number of sides of each die.

For example, in combat the success or failure of an attack might be determined by a roll of 1d20, or one twenty-sided die. The higher the role, the more likely you are to successfully hit something. Similarly, multiple dice can be used. For the damage of a certain spell, the rules may ask you to roll 3d8, which simply means roll three eight-sided die, and the damage you do is the sum of these results.

As the GM of our game session, the responsibilities of building the world that my players would explore fell to me. This includes writing the story, creating the non-player characters, and perhaps most importantly, designing and orchestrating the combat encounters. As a part of this, it is important for me as the GM to create encounters that challenge my players, without sending something at them that just kills them outright in a turn or two.

Above: An example of something that may very well kill my players outright.

There is a problem with this however. If you follow the rulesets properly, more difficult encounters with higher-level monsters invariably carry a higher risk of accidentally killing off the party. Stronger creatures mean higher die counts and rolls to their damage, and I can’t control the outcome of these rolls without fudging the rules a bit (something you can technically do, but is generally avoided if at all possible).

So I started looking into the nature of dice rolls, and the way that their outcomes are distributed across various combinations. If I wanted to avoid falsifying dice rolls, I instead had to work with the likelihood of die outcomes and design encounters that maybe looked more difficult than they actually were. To do this, let’s talk about dice roll outcomes and statistics.

Distributions of Die Outcomes

Let’s say I know that my most delicate player has a max health of 22. Through various manipulations, I can balance an encounter around having creatures with only a 5% chance of taking them out in one shot, while also consistently being able to do around 60–80% of their health in a single hit. In this way, I guarantee that my players will feel challenged and most likely terrified by what I throw at them, while also avoiding the potentially un-fun outcome of being taken out in the first turn and not being able to contribute to an encounter at all.

To understand this, let’s look at die distributions. As common sense would dictate, rolling any fair die will result in each side coming up approximately the same number of times (randomness doesn’t actually work that way necessarily, but that’s a different article). Below is a probability distribution for one twenty-sided die:

Top: Percent chance of each die outcome, Bottom: Cumulative chance of rolling the given value or lower

If we think about this distribution in terms of damage, a player could take anywhere between 1 and 20 damage, taking an average of 10.5 damage. But the variance here is as high as it can possibly be. On any given roll, a player is as likely to take 20dmg as they are to take 10dmg or even 1dmg. This inconsistency might be sought after in some cases, but there are ways to represent similar ranges with more consistent results.

Top: Probability distribution, Bottom: Cumulative distribution

This distribution above shows a slightly different range of 2–20, but the outcomes much more consistently hover around the average. In this scenario, the odds of rolling max damage of 20 and min damage of 2 have dropped from 5% to just under 1% odds. Furthermore, the result of the approximate average of 11 has gone from a 5% odds to just under 10%, almost double the likelihood. Similarly, the values flanking the average have also seen their odds jump.

As it turns out, in dice rolling, the greater the number of dice that you roll compared to the number of sides on the die, the more closely the distribution approximates a normal distribution with discrete values. To show this, observe what happens when we maintain the number of sides on the die, but repeatedly increase the number rolled:

Similarly, there seems to be a direct relationship between this ratio and the standard deviation of the output distribution. A higher number of dice reduces the standard deviation, and the outcomes more strongly cluster around the average. On the other hand, increasing the number of sides on the die increases the standard deviation, and spreads the outcome to make the extremes more likely.

Understanding these concepts makes it surprisingly easy to design encounters that run much more consistently with what you as the engineer have in mind, while still providing the illusion of randomness and the “just-good-enough” roll that feels so satisfying to play. Players can take near-fatal blows that just miss killing them, without realizing that your manipulations make the odds of a near-fatal blow about 90%, while the odds of an actually fatal blow lie below 2%.

This understanding greatly helped me improve the quality of my games, as I could design encounters that felt threatening and kept my players on their toes. Encounters that challenged them to think and plan with real danger in the midst. At the same time, I’m able to avoid a game-breaking snowball, where one of my players dies from the outset, and a fight planned for five people is now only four strong.

With this new understanding at hand, I decided to generate a toolkit to help me produce these distributions. I wanted a way to plug in these die rolls and print a distribution output so I could easily see how likely I was to kill my players.

As it turns out, this problem was more difficult than I originally thought.

Generating Die Outcomes, and Doing it Fast

I chose to approach this problem in the same language that I do most of my quick scratch-work: Python. Python has the benefit of already containing a large online repo of math-based libraries, and given the ease of coding in it, I figured it was a good place to start.

I started this approach the way a lot of people start a lot of problems. The good-ol’ brute force method! With the help of my good friend Kyle Silver (who is substantially better at abstract math than me), we made an algorithm to generate all possible outcomes of n rolls of s sided die through a clever mathematical trick.

The code for our initial base number method

The way this algorithm works is as follows:

  1. First, we figure out how many possible outcomes of die rolls there are and iterate through these in a loop (for n in range(0,sides**num))
  2. Next, for each number, we represent the number in mod base => #sides. For example, in mod 4, 10 would be represented as 22.
  3. Last, for each number, we chop off digits num number of times, and add 1 to each digit. This represents a single die outcome in the roll.

This produces the effect of generating all outcomes of n rolls by counting from 0 to the number of expected outcomes in mod base of the number of die sides. This is probably better explained with an example…

Say I roll 2d3, or two three-sided die. The number of possible outcomes for this combination is 2³ or 8 possible outcomes. To generate the number of outcomes, we count from 0 to 8 in base 3 with two digits, which looks like this:

  1. 0 = 00 => (1,1)
  2. 1 = 01 => (1,2)
  3. 2 = 02 => (1,3)
  4. 3 = 10 => (2,1)
  5. 4 = 11 => (2,2)
  6. 5 = 12 => (2,3)
  7. 6 = 20 => (3,1)
  8. 7 = 21 => (3,2)
  9. 8 = 22 => (3,3)

As you can see above, with this counting method, we can mathematically generate all possible outcomes of the set of die rolls consistently across any combination of dice. By taking the sums of these resulting digit outcomes, we can generate our distribution. But this algorithm comes with a problem:

The first algorithm in runtime (14d6)

This method is sloooooooooooooow. The example I show above is the algorithm running through a 14d6 calculation, which is the damage for a 9th level Fireball in D&D if anyone is curious. I didn’t test the full runtime of it, but the reason I didn’t take the time is because I projected how long it would take after tracking its pace for about an hour, and it came out to approximately two-weeks of runtime to finish the 14d6 operation. I decided not to wait that long, killed the program, and talked to my buddy Kyle. In his own words, “Hmmm, it could probably use some optimization.”

So I poured over the problem for an evening, did some research and finally came to a realization. By generating every possible combination of numbers from 1 to [#sides of the die] allowing for duplicates, I could produce a list of all possible number combinations that would appear in a series of die rolls of a certain number of sides. For example in a 3d4 setting, I could generate all combinations as follows:

All combinations of numbers allowing for duplicates in a 3d4 setting

With these combinations, I could generate all possible die roll outcomes as permutations of each set. For example 1,2,4 could be 2,1,4 or 4,2,1, etc. All of these are valid rolls, and the benefit of this method is that the sum of each combination (and thus all of its permutations) would be the same, and I’d only need to calculate it once per combination.

The downside was twofold. For starters, generating all of the permutations from the combinations was part of the problem in the first algorithm, and tended to be fairly slow. Second, many of the combinations had duplicates, and generating permutations that swapped duplicate values (e.g. 1,1,2 becoming 1,1,2 by ‘swapping’ the 1's, despite the outcome being the same) would have to be checked against and removed, decreasing speed further. But as it turns out, I didn’t need to generate all the permutations of each combination in order to produce the distribution. All I needed to know was how many unique permutations there are for each combination, and increment the sum bin accordingly. As it turns out, there’s a formula just for this:

  • (factorial of the length of the set)/(factorials of the duplicates)

What this means is that if I have the following combination: (1,1,2,3,4,4,4,5), I have a set of length 8, a duplicate set of length 2, and a duplicate set of length 3. Thus, the total number of unique permutations for this set can be calculated as follows:

8!/(2!3!)

Essentially, you take the factorial of the length, which is the standard way of determining permutations of a set, and divide out the permutations of the subsets of duplicates. Because of this operation, I no longer had to generate every possible outcome if I just generated every combination and performed a mathematical operation on each. The code came out something like this:

Final processing code for the combination-method of generating the distribution

After constructing this method, I wanted to test it on the same set I faced a problem with in the previous algorithm. So I ran it for 14d6…

The second algorithm in runtime (14d6)

And it goes without saying, compared to the first build, the second method in unexaggeratedly, blindingly fast compared to the first one, completing the 14d6 distribution in less than a second. In fact, I went on to test this method for higher, more absurd dice counts, and it performs ‘reasonably’ even for 20d10 counts or 100d5 counts (and by ‘reasonably’, I mean it finishes within 10–20min, which is really quite impressive for the latter).

I haven’t really done any analysis as to the actual BigO notation for both of these methods, but they did strike me as very easy-to-recognize examples of just how important BigO efficiency is. Even in these problems, a totally plausible and useful real world example of generating the distribution of 14d6 had a runtime difference of between 1-second and 2-weeks depending on the algorithm we used, and the faster algorithm in this case allowed us to query sets that would have been completely unreasonable or impossible to run with the original build.

I still haven’t been able to find a faster way to generate the distribution of die outcomes for an arbitrary number of arbitrarily sided die, despite my research. I’m strongly convinced that there is something out there to help, but I just haven’t known where to look. Regardless, this was a fun side-project to work on that I’ve been able to get a good deal of personal use out of, an exciting math problem to approach in my own time, and a solid reminder of the importance of runtime efficiency. Last but not least, this project has given me a cool, cathartic screen saver for a number-nerd like myself, and there’s always a way to push the algorithm to last as long as I want.

100d5 in transit (7.8886*10^69 possible die outcomes)
100d5, the longest distribution I had the patience to sit through (7.8886*10^69 possible die outcomes)

Below I’ve included another research article about a somewhat related, and somewhat unrelated dice problem analysis. I came across this article after I had done the work on this project, and if you liked this write-up, you will also certainly enjoy the one below. I hope you’ve enjoyed this project, and thanks for reading!

--

--

Griffin Poole
The Startup

Software Engineer, Web Developer, Neuroscience BA