Challenge Rating: Is it the only thing a Monster is made of?

Andrew Ingalls
10 min readJun 11, 2022

--

Introduction

While I’ve been playing Dungeons and Dragons for over 15 years now, there is one thing I’ve always seen as a constant source of struggle. Creating homebrew monsters.

Yes, you can reskin another monster from the manual, sure you can use the Challenge Rating (CR) equation that the dungeon master guide provides, hell you can straight up do whatever…you’re the DM after all.

Photo by Ian Fajardo on Unsplash

The only problem is…when it comes time for your players to face that monster, it doesn’t feel very good if they wipe the floor with it…or worse you give ’em the ol’ TPK.

Now a lot of the DM’s job is built on the fly. There isn’t a DM out there who’s run an entire campaign without thinking “Yup, time to add another 100 HP to that monster”. That being said, one of my favorite activities has always been creating new, challenging encounters.

Whether that’s a new big bad villain, a scary dragon, or a surprisingly devious kobold, I’m always trying to find new ways to make these monsters fun and interesting.

Because, after 15 years I can pretty much recite many of the monster stat blocks from memory.

Photo by Clint Bustrillos on Unsplash

But that’s why Dungeons and Dragons is so amazing. After 15 years, I can still play the same game. All it takes is a bit of tweaking and my own imagination.

Dungeons and Dragons is about the freedom to play and create a story with your friends. It’s not about winning or losing.

With all that being said, I am still curious if there is an easier way for Dungeon Masters to create a monster stat block on the fly.

While it may not be perfect, it will allow us to create something quickly that we can then spend time tweaking and turning into a beautiful encounter, rather than spending the time trying to create the monster itself.

To that end, this project focused on four major questions:

  1. How different are the Challenge Ratings of Monster Manual Monsters when calculated from the CR equation?
  2. How do the monster’s stats correlate to the Challenge Rating System and themselves?
  3. How do the monster’s non-stat-oriented categories (type, environment, size, alignment) impact its stats?
  4. Can we predict a monster stat block for inexperienced DMs that resembles SRC monsters?

Part 1: Challenge Rating Equation

You may not know this, but Wizards of the Coast has provided a series of charts and a general equation to create your very own homebrew monster. I’ll try not to get into the debate about Challenge Rating itself.

There are DMs who are pulling their hair out at this very moment as their perfect encounter turns completely upside down.

What I wanted to know was how accurate was this equation. If I used it on the monsters from the Monster Manual, would I get the same CR rating that each of them is stamped with?

So, I took three monsters, from all ends of the CR spectrum, and plugged them into the equation. When I performed the calculations, they actually came pretty close.

Ancient Red Dragon

Hit Points: 546 which equals a 24 CR
Legendary Resistance increases Hit Points by 90

Hit Points: 636, which equals 26 CR

Armor Class: 22
Immunities: 1
Saving Throws: 4, this increase AC by 2 = 24

Damage Per Round: 215 which equals a 25 CR
Attack Bonus: +17

Defensive CR: 26 CR (Hit Points) has an AC of 19, a 5 points difference
from the dragons actual AC, increasing CR by 2.5 = 28.5 CR


Offensive CR: 25 CR (Damage Per Round) has an Attack bonus of 12, the dragon's actual attack bonus is 5 points higher, increasing CR by 2.5 = 27.5

Giving us an average CR of 28. The Monster Manual's CR for an Ancient Red Dragon is 24.

But pretty close could mean life or death for your PCs. Which got me thinking about a better, easier way.

So, I got to work. I took over 300 monsters from the System Reference Document (SRD) of Dungeons and Dragon and began to explore the data. Lucky for us, Wizards of the Coast has made the information in this document free to use and explore for our own benefit. Thanks, WotC!

If you are curious how I scraped this data using Selenium, comment below. I will make another blog if enough people find it beneficial.

When they say 90% of a Data Scientist’s life is spent cleaning data, they weren’t kidding. There was a lot of mud to trudge through, but I was able to get some amazing insights as well.

Here are all the features I managed to scrape/create:

[‘Monster Name’, ‘Size’, ‘Type’, ‘Alignment’, ‘Traits’, ‘Reactions’, ‘Armor Class’, ‘Hit Points’, ‘Speed’, ‘Challenge’, ‘Proficiency Bonus’, ‘STR’, ‘DEX’, ‘CON’, ‘INT’, ‘WIS’, ‘CHA’, ‘Actions’, ‘Legendary Actions’, ‘Environment’, ‘Attack_Bonus’, ‘Spell_Bonus’, ‘Spell_Save_DC’, ‘WIS_SV’, ‘INT_SV’, ‘CHA_SV’, ‘STR_SV’, ‘DEX_SV’, ‘CON_SV’, ‘Arctic’, ‘Coastal’, ‘Desert’, ‘Forest’, ‘Grassland’, ‘Hill’, ‘Mountain’, ‘NA’, ‘Swamp’, ‘Underdark’, ‘Underwater’, ‘Urban’, ‘Average_Damage_per_Round’, ‘Damage Resistances’, ‘Damage Immunities’, ‘Condition Immunities’, ‘Damage Vulnerabilities’, ‘Spellcaster’, ‘Magic Resistance’, ‘Legendary Resistance’, ‘Regeneration’, ‘Undead Fortitude’, ‘Pack Tactics’, ‘Damage Transfer’, ‘Angelic Weapons’, ‘Charge’]

Part 2: Does CR relate to our monsters’ stats?

Like any good investigation, I started at the foundation of exploratory data analysis: distributions. What does our dataset look like for these stats? It turns out we have a lot of right-skewed information. This makes sense considering how right-skewed Challenge Rating is (most of our monsters are lower leveled).

I’m actually very surprised to see that most of the standard stats: Strength, Dexterity, Wisdom, Intellect, and Charisma have distributions ranging from almost 0 to 30. Constitution, however, barely has any monsters below 10. This has to do with the fact that a 0 constitution equals death.

When I plotted the joint distributions of the stats, I started to see a clearer picture of correlation.

Proficiency Bonus was correlated to Challenge Rating, which makes sense considering the Proficiency Bonus is set based on Challenge Rating. This means it’s not worth using in our model since it won’t provide us with any more information.

I was surprised by other correlated stats like Attack Bonus and Constitution. To that end, I decided to create a heatmap of correlations.

One of the biggest surprises here was how little correlation Dexterity has with any of the other stats. Wisdom is the closest and that’s only 0.33.

I do see some very strong correlations, however, with many of these stats, which means they exhibit an impact on each other.

Unlike Proficiency Bonus, which can be calculated straight from our Challenge Rating, having these stats correlated means that we want the outputs of the model to help shape the result. This is where a neural network comes in strong.

I wanted to explore the main stats of all monsters a bit further, so I used a simple box plot and ANOVA/pairwise analysis to discover some insights.

It turns out that only strength and constitution are not significantly different from one another. All other stats have a statistically significant difference in mean.

There is a lot to unpack here, but essentially monster stats are all over the place, which is really cool. It also makes sense. It’s probably very rare for any one monster to have all high or low stats. The other insight here is: strength and constitution go hand in hand. If you are strong, you are tough, if you are weak, you are brittle. Again, it makes sense.

But what I want I really needed to understand is how do these stats compare to the challenge rating. Turns out, outside of dexterity and dexterity saving throws, they are all pretty well correlated.

I was very interested to see a clear lack of correlation between the number of immunities and resistances with the challenge rating of a monster. I can’t believe there are so many weaker monsters with immunities.

Looks like I don’t know the monster stats as well as I thought :D

Part 3: Monster Type, Environment, Size, and Alignment

Now that I’ve compared Challenge Rating to the stats, we can be pretty certain it will be a strong input for our model, but what about the other categorical stats: Type, Environment, Size, and Alignment.

Do they have any impact on our stat blocks? Can they be used to help fine-tune our predictions?

I’m not going to lie, radar charts are my all-time favorite chart. It probably has something to do with pokemon, but they are just so satisfying. So I thought, what better way to compare our categorical data to the stats than through these amazing graphs!

Radar Chart for Types
Radar Charts for Environment

I was so surprised/disappointed to see how little an impact environment had on monster stats. I know I hinted at this earlier, but I was really hoping for some interesting patterns here. Something like how the monster-type charts turned out.

That being said, we can see two very distinct overall shapes in the Environment charts: One with a high strength/constitution and another with a more well-rounded, but lower intellect shape.

This is great news because it suggests there is some relationship between environment and stats.

These distinct patterns will help shape our predictions. What did surprise me about the environments was how minimal the average stats were, specifically intellect. I know many beasts aren’t smart, but after seeing the monster-type charts, I would have figured a bit more range.

Part 4: Creating a model to predict a Monster Stat Block

Now that I have an understanding of how our inputs and outputs relate, I can begin to look at what type of model will best predict the values. Based on the problem, I know we are looking for a regressor model. We need specific values, not classification.

However, after trying several basic algorithms from scikit-learn (linear, k-nearest, decision trees, random forest). I wasn’t getting any accuracy or loss function results I was satisfied with.

I decided to choose a more robust algorithm that could learn from the weaker inputs and use the outputs that are correlated to build a better model.

Insert Keras API using TensorFlow. I built and trained a sequential model using three layers: input, output, and one hidden layer.

Optimization took about ten runs by changing the activation function, batch size, epochs, and learning rate. Eventually, I achieved a test accuracy of 85.64% with a low Mean Square Error of 59.7. I felt for my minimum viable product, this was good enough to run and test.

I am very keen on getting this model into the hands of other DMs as they will be able to understand its worth and drawbacks better than just myself.

Photo by Alex Chambers on Unsplash

The end product will need to be more accurate if this will provide Dungeon Masters the comfort and consistency of a straightforward equation.

However, we need a front end if this is going to work. I can’t have people input random 1s and 0s like they understand what is going on behind the scenes. I needed something usable, interactive, and quick. I needed Dash by Plotly. Dash is an amazing tool that I highly recommend others use.

I spun up a quick dashboard and before you know it, BOOM testing on my local computer.

Bonus: Deploy a Dash App on AWS!

While local is good, to get this into the hands of others, I needed to deploy it on the cloud. I recommend AWS Lightsail for anyone wanting to quickly push an app to the internet.

Test out the App here.

Conclusion

In this article, we took a look at how to predict a monster stat block using Keras Sequential model.

  1. We found that the challenge rating equation provided by WotC actually results in different CRs when used to calculate the Monster Manual monsters!
  2. We then looked at all the monster stats and how well they correlated to CR as well as to each other. Turns out Dexterity doesn’t! Pretty much every other stat was correlated though!
  3. Our final exploration was with our categorical data. We saw that environment didn’t have a huge impact on differentiating our stat blocks. Type and size, however, play a huge role.
  4. We then looked at several different regressor algorithms and after optimization, decided to build and train using Keras API and TensorFlow. We created an 85.6% accurate model and a Dash UI to manipulate it.
  5. Finally, we saw how easy it was to deploy our dash app using AWS Lightsail, to get it into the hands of DMs for testing.

Thank you all so much for reading.

If you liked what you read, please give me a clap, interact via comments, or check out my GitHub to see more about this analysis.

Have a great day!

--

--

Andrew Ingalls

The views of the blog are my own and do not reflect that of my employer.