Solving the 15-Puzzle
The 15-Puzzle is a simple puzzle you’ve likely encountered mixed with other worthless knick-knacks. It consists of a 4 x 4 grid with tiles numbered 1 through 15, the last tile omitted (call this the blank tile). The goal of the puzzle is to go from a scrambled position to the solved position, which is to arrange the numbers in ascending order from left to right, top to bottom. You can only move tiles adjacent to the blank tile, and you do this by “swapping” positions with the blank tile. In other words, just move the tiles around the board until they’re in order (see the gif). After a little time to play with it, it’s easy for humans — the real fun is trying to get a computer to solve it.
Every state on the board has an optimal solution between 0 and 80 moves (A. Brüngger et al., 1999). With an average branching factor of 3, this gives ~10³⁸ states to explore. Only counting unique board states, it’s cut down to about 10¹³. This is still far too many for an exhaustive search.
Enter the heuristic function
If we could somehow quantify what a good board state looks like, we wouldn’t have to arduously traverse meaningless paths. This is the idea behind a heuristic function. We define a domain-specific function to tell us which paths are worth exploring, and which aren’t.
In order to find a short path to the solved state, we explore the states that have the lowest estimated number of steps to the solved state (i.e., the states with the lowest heuristic value). We do this under the guiding principle that the solutions will be near states with low heuristic values — though, you can get stuck in local optima. This is why it’s important to use a heuristic function that approximates as closely as possible the actual number of steps. This algorithm is called greedy best-first search, or pure heuristic search. Watch this video to see how it works visually.
For the 15-Puzzle, one of the simplest ways of quantifying how bad a state is is to count the number of tiles that are in the incorrect location. We would travel down the paths that have fewer tiles in the incorrect position in the hopes of finding a state with no tiles in the incorrect solution, i.e., a solution. Another good heuristic function is to sum up the Manhattan distances (absolute horizontal distance + absolute vertical distance) of every tile to its correct location. Both of these heuristic functions provide a lower bound on the number of steps required to find a solution, which means they are admissible heuristics.
Now, let’s look at how well our two heuristics perform when using a pure heuristic search. To evaluate them, I shuffled the board by taking 10 (mostly) random actions so that the optimal solution would be no more than 10 moves. I then use the greedy best-first search algorithm with each heuristic on the same 100 scrambled boards to find a solution for each board.
As you can see, the Manhattan distance heuristic is much better at finding the best solution. If we look at the number of states it explores before arriving at a solution, we can see it is also much more efficient:
The number wrong heuristic had a couple outliers there where it had to search ~17,000 states before it found a solution, and the solutions were about 95 actions long. It also had a few more that were over 1000. The Manhattan distance heuristic is a lot more consistent.
Even so, using the pure heuristic search with the Manhattan distance heuristic still takes a while to compute, and it hardly ever finds the optimal solution when the length of the optimal solution is past ~20. So let’s see what else we can do here.
A* search
A* search is the same as the pure heuristic search, but with one small, intuitive alteration that is surprisingly useful. The difference is that instead of using just the heuristic function, h(s), from some state to estimate the number of steps to a solution, you count the number of steps to that state, denoted by g(s), and then add it to the heuristic of that state. So you use f(s) = g(s) + h(s), instead of just h(s). It’s super simple. I scrambled the puzzle 25 times, the pure heuristic using the Manhattan distance took on average 13 seconds to find a solution (averaged over 100 iterations), while the A* search with the same heuristic took on average 7 seconds on the same boards. That cuts the time in half, and look at these statistics:
Basically the A* search algorithm is way better than the pure heuristic search. It’s more consistent, more efficient, and finds better solutions. In fact, as long as the heuristic function is admissible, A* guarantees an optimal solution. However, I still want to knock a couple orders of magnitude of speed off this thing, because when it tries to find a solution of length greater than 25, it takes longer than I care to take the average of.
What can we do? Well, the A* algorithm is a great way to search through possible solutions. The bottleneck here seems to be the Manhattan distance heuristic, since it explores many states unnecessarily. If we can find a heuristic function to map the board state closer to the optimal number of moves, we can minimize the number of states explored.
Neural heuristic
I don’t really have any ideas myself on how to count more accurately the number of remaining steps in a given state — but this sounds like the perfect thing for a neural network to figure out.
To set up my neural net, I first transformed my board from a 4x4 grid to 16 one hot vectors of length 16, so that each vector signified each of the tiles’ position on the board. I then passed this through a few hidden layers and had it output a single neuron. Simple enough. But how do we train this thing? I could collect a bunch of samples using the Manhattan distance heuristic, and then have the neural network learn the mapping from each board state in the solution to the number of remaining steps in that solution. This would actually work just fine, but as I said earlier, it is quite computationally demanding to find the solution of really scrambled boards using the Manhattan distance heuristic.
To get around this, I scrambled the board just one move from the solution a bunch of times, and had the neural network figure out how to solve that. This was easy, since it’s using the A* search. Then, I scrambled it two moves away from the solution a bunch of times, used the A* search to find the solution, and then retrained the neural network on all the board states it had seen thus far, including from the previous iteration. I kept doing this until the optimal solutions were 35 moves long. I had to train this using Google Cloud, but that’s fine since it only cost me a few bucks.
To test how well it did, I scrambled 100 boards such that their optimal solution would be about 25, but no more than 25. Here are the statistics for the neural heuristic compared to the Manhattan heuristic.
What can be drawn from this set of data? The Manhattan heuristic varies a whole lot more than the neural heuristic. The Manhattan average time was 7, while the median was 0.7 — an entire order of magnitude difference. It’s just getting hung up on some particularly difficult board states. Even then, though, the 50th percentile is still about 6 times greater for the Manhattan heuristic than the neural heuristic, and the neural heuristic doesn’t get stuck on anything nearly as much as the Manhattan heuristic.
Looking at the number of states explored, we can see that the neural heuristic still fares much better. It explores on average over an order of magnitude fewer states, and is much more consistent.
We started out using a pure heuristic search with some hand-crafted heuristics to slowly find some solution to the 15-puzzle. Then we used the A* search with these same heuristics to find optimal solutions in about half the speed. Then, using a neural network, we were able to speed it up on average by another factor of ~40, while making it much more consistent.
But there’s no reason to believe this is all that can be done. For example, using a convolutional neural network to better learn the spatial relationship between tiles could probably increase performance quite a bit. However, this is a project for another time.
Check out the GitHub repository here.