410 trillion permutations: the art of knowing what NOT to test

Published in

Outfit7

10 min readFeb 13, 2023

TL;DR: When there are too many scenarios to test, the question stops being “how can we test every single one?” and becomes “what can we skip testing without causing an issue?”. With every addition to Mythic Legends, the number of possible formations and battles grew exponentially. Understanding what needed to be tested and what didn’t was a key component of testing gameplay.

My name is Marko Drvarič and I’m a Quality Engineer at Outfit7. Since joining the company, I’ve spent most of my time working on new games, outside the umbrella of the Talking Tom brand. Just over two years ago, I joined the Mythic Legends team.

Being a competitive gamer in my free time, I was very excited to work on a game that falls, more or less, into that category. I knew that testing a game like Mythic Legends would be a challenge, I didn’t quite realize the extent to which some things would be “untestable.”

Mythic… what?

Mythic Legends is an asynchronous game, based on the auto-chess genre. The game presents you with a range of champions and heroes that you can collect, upgrade and deploy to fight other players. You compete in tournaments, where you play with your collected army of champions and heroes. During a tournament, you can also get temporary boosts for the duration of the tournament, called artifacts.

The asynchronous part means you’re not actually playing against opponents in real-time, but rather playing against what we call a “snapshot” of a player’s game that we store. When you complete a tournament (between five and 20 battles), we store information about your moves and then use those to create the snapshot for other players to play against during their tournaments.

During the battle, you have no input. The champions battle on their own, meaning that the game is entirely strategic and not at all mechanics-based. From a gaming perspective, this means that the decisions of which champions and heroes you use are key to your success in the game.

From a development and testing perspective, this meant it was up to our team to make sure your decisions would have a logical and mostly predictable outcome. This might seem obvious, but was actually one of the main pillars of the game and its testing. More on that later. For now, let’s take a look at how it all began.

*Setting up a formation and sending your champions into battle.*

Initial testing and realization

Since I was involved with the project from a very early stage, I was tasked with testing the very first champions and heroes. I went into it with a pretty simple objective: I wanted the champions to look and behave as intended a) on their own and b) in their interactions with other champions.

Champions and heroes are units that you use in the battles (think of “champions” as ground soldiers and “heroes” as generals). In any given battle, you use just one hero, but multiple champions. Each champion and hero is unique in their design and the role they play strategically in battle.

What I was actually testing was essentially a big bag of all the stats linked to that particular champion. These stats were:

Health: a champion’s total hit points
Attack: how much damage the champion deals with automatic attacks
Attack speed: the rate at which the automatic attacks are used
Critical strike chance: the percentage of attacks that deal double damage
Defense: how much damage from other champions’ attacks is mitigated
Attack range: the range at which the champion can attack other champions
Ability: every champion’s unique skill, triggered upon reaching full mana

*One of my favorite champions with all of his stats.*

To conduct a test, I would place a champion on the board, set up an opposing champion and see what happened. For nearly all of the early champions, they worked as intended when they were alone on the board. Their health was correctly decreased based on the opponent’s champion, their own health and defense. They would attack at their intended range. I only found a couple of bugs here and there, mostly related to abilities, some requiring extra definitions from the game designers, but nothing too crazy. But, as you progress through the game, you unlock new champions and new heroes, leading to new interactions between them. And this is where it got more complicated.

The game’s development mirrored that of the progression through the game itself. It started simply enough, with eight champions and two heroes. Initially, I wanted to test every maximum size formation on my side of the board. Eight possible champions, each of which can go on any of the 20 spots on the board, plus two heroes that don’t go on board but are selected for their abilities.

A short trip down memory lane back to math classroom tells us that’s 2 * 20 * 8 * 19 * 7 * 18 * 6 * 17 * 5 * 16 * 4 * 15 * 3 * 14 * 2 * 13 possible permutations. In case you aren’t a human calculator, the answer is 409,579,462,656,000.

Almost 410 trillion possible variations on my side of the battle. That’s a lot of possibilities! And that’s IF we ignore that you could have anywhere from one to eight champions on board. To account for all possible battles, you’ll need to square that number. And, obviously, this grows exponentially with the number of champions and heroes.

Just eight champions and two heroes gave us 410 trillion permutations. The current state of the game? 54 champions and nine heroes. And a bigger board. I know. We’re all thinking it: that’s more possibilities than there are atoms in the universe.

I quickly realized that testing was never going to be about testing every possible formation and scenario. Heck, it wasn’t even going to be about testing most scenarios. Nope. It was going to be about understanding the game itself and having the imagination to foresee what could prove to be problematic, testing for those things and nothing else. But before I was ready to embark on this quest for everything that I wouldn’t test, I had to make sure that this type of testing was possible and that it made sense.

What makes testing possible and worthwhile (AKA a developer is — sometimes — a QA’s best friend)

Even with my better understanding of how to approach testing, it wouldn’t really have been possible had it not been for a couple of good decisions on the part of the developers, most of which were made before I even started to test.

The first was a relatively basic concept that I mentioned earlier: determinism. Simply put: If I have the same champions, in the same positions, with the same random seed, the same artifacts and the same opponent formations, I should always get the same results.

Early on in development, a good amount of time and effort was spent on ensuring that this was true for our game, and it was what made testing possible. With the game being totally deterministic, any bug that popped up, whether visual or logical, was always reproducible (and as a QA there’s nothing sweeter than knowing that something is reproducible). This allowed me to focus on the ideas I wanted to test, without worrying about something being a one-off random bug.

The second thing was an extensive, interactive, built-to-my-liking DBG menu. This was an iterative process, as I only realized what I needed through testing. But the willingness of our developers to implement requests made the testing so much more efficient, even with a seemingly impossible number of variables involved. I ended up being able to see every statistic of the battle in real-time, logs of every attack, logs of every bit of VFX being played, and had the ability to slow down the battle to a frame every couple of seconds, as well as the ability to save replays within the game and replay the exact same scenarios.

If it hadn’t been for the effort of our developers to make sure both of these things happened, hundreds upon hundreds of hours of QA effort would never have been enough. Hours of developer effort were equivalent to weeks of QA effort.

The actual testing

Now, equipped with the knowledge of what to expect from testing and all the tools I needed to actually do it, it was time to dive in and start to test new champions and heroes, one at a time.

With all this talk of what not to test, you might be wondering “well,what did you actually test, then?” The answer is not quite as easy to explain, but remains, to this day, one of the most fun approaches to testing I’ve had the pleasure of experiencing.

I would say that most of the testing wasn’t done on a device. Or rather, it was done on a device, but the actual clicking and tapping was just a short check for an idea that took a much longer time to develop in my mind. I would often sit around and just think about what could possibly go wrong for much longer than it took to actually test it. So, while I still did a lot of button smashing and using devices like a toddler, the whole thing felt like a puzzle I was trying to solve, with the solution being a bug, a crash, a weird VFX, an unexpected interaction or something similar.

An example of this would be the first time we added a champion of the priest class. Their signature move is always somehow connected to healing. In this case, it was an Undead Priest that healed based on the damage he was dealing to others. Good idea. ice concept. Except nobody in the definition phase thought about exactly who a priest can and cannot heal. So what could happen if you had an Undead Priest on one side and a knight on the other? The knight would deal very little damage while also taking very little damage, while on the other side of the board, the priest would have no targets to heal except for himself. The result? An endless battle. And if the knight on the other side was also Undead both champions just kept endlessly healing themselves. This is a perfect example of something that would have come up in extensive testing eventually, but thinking about what could go wrong highlighted the issue far more quickly.

There are countless stories and examples just like this one for nearly every single champion that was added. Of course, some of them were a lot more technical and boring, with me calculating by hand how much damage should be done in one frame and then comparing it to the actual damage being done or just checking if the intended VFX and SFX are playing, but it was a diverse and fun time to be a QA.

Something that might be counterintuitive is the time spent testing new additions to the game. It would seem that, with the number of possibilities exponentially growing, time needed to test new champions would also be exponentially higher. But not only was it not exponentially higher, it wasn’t even higher at all. Quite the opposite. I went from using nearly a week for testing the initial champions, to just one or two days for the last ones. Turns out that once you get to trillions of possibilities, it doesn’t really matter if it’s 400 or 15,000 trillion.

Cool story, but did it work?

Obviously, this all sounds fun and it’s a cool concept, but surely it’s all useless unless there were actually fruitful results. So, more than a year after the final champions I tested were added to the game, how did my testing hold up?

I would say reasonably well. :)

I wish I could tell you we didn’t discover any bugs after going live, or that users never reported unintended interactions, or that this part of the gameplay never crashed due to an interaction, but I can’t. It happened, as it always does, but I would say that considering how many things could potentially go wrong, not many did.

Over time, we would get reports from users that had me thinking everything from, “Wow. How did I miss that?” to “Well, I’m glad you found that because never in my wildest dreams would I try that!” In a couple of instances, we had issues that required bigger changes, but none of them were game-breaking, widespread bugs that would force us to resubmit the game immediately or anything like that. For nearly all issues, our response was to thank the user for the report and fix it in the next update, whenever that came around.

Trying to transfer my knowledge to coworkers who took over testing was a challenge. As you can imagine, someone telling you to “just think about it” or to “kinda get a feeling for what might go wrong” isn’t very helpful. That said, it worked out just fine and we now have countless changes and new additions to the gameplay months after I passed the testing (and NOT testing) torch to someone else.

Final thoughts

I often get asked how I can play the same game every day at work and still find new things to test, or how I don’t get bored of the game . It’s a valid question, but whenever I’m presented with it, I always think of the testing I did for Mythic Legends. It didn’t seem that I was playing the game over and over again. Instead, it was like being presented with a fresh puzzle every day that kept me going and made the whole thing interesting. Even in the face of trillions upon trillions of possibilities, it was fun and my strategy proved useful. And, after all, it wasn’t the testing I did, but the testing I didn’t do that really mattered.