Tic Tac Toe and Generative Testing

I’ve been working on implementing an unbeatable tic tac toe game in Clojure for the past week or so, and generally things have been going well. Unlike the first time I did this in Ruby, I’ve made a concerted effort to TDD my way through the whole process — and mostly succeeded.

Although I understand and appreciate the benefits of practicing TDD, it’s not as though every test is easy to write (at least not yet, though I doubt it will ever be so). Simple ideas in tic tac toe like “create a new board”, “mark the board at a given space”, and “is the board full yet?” are correspondingly simple to test. But how do you test this? — “Given a tic tac toe board, what is the optimal move that the next player should take to maximize their chances of winning?”

The unbeatable part of an unbeatable tic tac toe game boils down to implementing a solution to that question. And even once you’ve gotten that solution, how are you sure that it’s always correct?

The problem: how do I automate testing that my tic tac toe game is truly unbeatable?

For my Ruby version, I took the easy way out and manually tested the game dozens of times, ensuring that I could never beat my computer player. When TDDing my Clojure version, I started off by considering some simple cases for my get-move function, like:

  • Given a board with only one open space, it returns that space
  • Given a board with an imminent win, it returns the space that results in a win
  • Given a board with an imminent win for the opponent, it returns the space that blocks the opponent’s win

Beyond those cases that only require thinking a move or two ahead, what alternative is there to manual testing?

Generative Testing

Generative testing is one strategy that can be effective for tackling testing scenarios with a huge range of potential inputs (like all the possible game states for a tic tac toe board).

In a nutshell, generative testing allows you to separate how to generate your input domain from the process of validating properties of the output. In the context of tic tac toe, we would have something like this:

  • Input domain: any valid board state (from empty board up to but not including a board in an end state)
  • Output: a board in an end state (someone won or the board is full)
  • Property to test: The winner for a board is never the human player

This may or may not seem straightforward, but either way it glosses over one important point. There is a difference between this tic tac toe scenario and pretty much every example of generative testing that you can easily find by googling, which is:

What we want to test is not what happens when “get-move” is called a single time, but what happens when the computer plays an entire game against a human player and calls “get-move” for each of its turns until the game reaches a terminal state.

In addition to the steps listed above, we need to

  • simulate the human player’s moves
  • continue looping until the game is finished

My first “generative testing inspired” solution

My approach to solving this problem was to write a test that played through every possible scenario for a tic tac toe game (starting with an empty board) and verify that the human player never won.

The problem can be further broken down into these constituent parts

  • simulate the computer player’s moves
  • simulate the human player’s moves
  • play through all possible games

Simulating the computer player’s moves is probably the easiest problem to tackle, since (theoretically) we can already figure out what move the computer can make given a specific board. (For each board, the computer player will make only one move and therefore generate 1 possible game state).

Simulating the human player’s moves requires a little bit more thinking, but not much. In order to ensure that we are testing all possible game scenarios, we need to create a function that given a board will return a collection of boards, on each of which the human player has played on a different space. (For a board with n empty spaces, the simulated human player will create n possible game states).

Simulating playing all possible games starts with an empty board then alternates between simulating the two players’ moves, passing along an ever-growing collection of potential game states. We keep looping through this process until all of the boards have reached a terminal state, at which point we check that none of the boards has the human player as a winner.

Here’s an example of what I created:

It’s probably easier to look here

For a 3x3 tic tac to board, this takes about 42 seconds to run on my Macbook Air. Actually it’s ~42 seconds for this and the 59 other specs that are currently part of the suite, but the other 59 only take about 1/2oth of a second to run.

Although 42 seconds is probably too long to have a tight feedback loop for TDD, it’s honestly not as terrible as I might have imagined. I mention this because one generally sees that generative tests are run for a specific number of iterations, since there is a direct relationship between the time it takes to run the specs and coverage of your input domain. In other words, testing the whole input domain may be impractical or impossible because the tests would take too long, so we compromise by testing a random(ish) subset of the input domain.

This tic tac toe testing scenario feels different to me for two reasons:

  1. testing the whole input domain for a 3x3 board is feasible, and
  2. the cost of (time spent) running a single input board through the test is not necessarily negligible

If we consider our input domain, which again was any valid board state from the beginning of the game (an empty board) up to but not including an end state (a board that’s one step before a cat’s game or a win for one of the players), we can think through this second point.

If we start with a board that’s nearly finished, one iteration of the test will take only a fraction of a second (fewer potential board scenarios will need to be generated and played out to completion). If we start with the “worst case scenario” from a time perspective, the empty board, how long will it take? 42 seconds, like I mentioned earlier. So there’s a wide range of times for testing our input domain.

Another interesting consideration — since the test as is plays through all possible game scenarios, that also means that all of the input domain can be derived from an empty board. In fact, that’s exactly what’s happening inside of the test right now. Knowing that, would there really be much value in building out a generator that produces possible game states and running the test (or a modified test) against it?

Right now, I don’t think so. That’s probably a sign that I’m on the right track with this OR that there’s something very wrong with the code I’ve written — I suppose time will tell which it is.