Solving “Letter Boxed” in Python

Wordle is for babies

8 min readFeb 9, 2022

My family has always loved word puzzles. A clan of journalists, librarians, teachers, and editors, it’s no surprise that we ended up crowded around a game of Boggle or Scrabble at every holiday.

My mother and I still play Scrabble every time I visit, and we like to send word games back and forth to each other when we’re apart. We are, frankly, bored to death by Wordle. Though it’s far from new, we recently became enamored with The New York Times’ Letter-Boxed, a refreshingly different and fairly challenging puzzle.

First, the rules of the puzzle. Letter Boxed presents 4 sets of 3 letters each. Starting anywhere, the player must form valid 3+ letter words, moving to a different side with each letter. Once a word is completed, the next word must begin with the last letter of the prior word. The puzzle is solved when all letters have been used at least once.

We found ourselves speculating on the actual difficulty: are solutions rare, or are they just hard to find? The difficulty bar is usually set at 4- or 5-word solutions. How impressive is it to find a 3-word solution? 2 words? As the lone software engineer of the family, I set to finding some answers as a fun weekend project.

I never intended to turn it into an article, but the results were interesting, and the solution showcases a less-known data structure that I have a personal fondness for, so here we are. Without further ado, I’d like to walk you through the design of my solver.

Based on the rules of the puzzle, I came up with a relatively short list of what’s needed to solve the problem:

An official dictionary of valid words
An efficient method of finding and trying solutions
Fast checking of word validity

For the official dictionary, I googled around and found a few .txt files on GitHub purporting to be lists of all valid English words. After some testing, I concluded that Letter-Boxed is using something like the Scrabble dictionary, and that this is a good copy of it.*

*It is worth noting that this file only contains words up to 8 letters, as Scrabble requires unusual board states for words longer than 8. A superior dictionary could potentially expose more few-word solutions, even 1-word solutions (!)

As for searching for solutions, this sounds like a classic use case for good ol’ recursive backtracking. Ideally, this concept is at least somewhat familiar. If not, don’t worry, this should be a good example to learn from. The usual concerns apply: in order to solve quickly, we’ll need to reduce the size and dimensionality of the problem space as much as we can.

Now the interesting part: Fast checking of word validity. To simply check “is this string a valid word?” is trivial, since we already have a list of valid words. Just put them in a set or some other O(1) lookup structure and you’re good to go. There are also some fascinating fuzzy solutions like Bloom Filters that can be extremely fast.

But only having a “word or not?” check leaves us in a tricky situation. We could keep joining strings of letters and checking if the result is valid, but even if that check is very fast, how would we know when to stop? Exploring letters blindly gets pretty hairy, pretty quickly. Exhausting just all 8-letter combinations in a 12-letter puzzle would take 12⁸ guesses (~430 million), and we could need to find a word as long as 12 letters. Even if we wrangled that number down a few orders of magnitude, we’d still end up going through all the guesses for “obviously wrong” starts, like ‘qzd-’.

So what we need is a data structure that goes beyond “Is this string of letters a valid word?”, and can further answer “Are there any valid words that start with this string?”

This is what a trie, a.k.a. a prefix tree, is for. A trie allows us to search through a set of valid words in a natural and efficient way. At its core, recursive backtracking thrives in situations where the question “is it a waste of time to explore further from here?” is easily asked and answered, so this is a great fit.

A trie that encodes the words ‘pot’, ‘pass’, ‘past’, ‘par’, and ‘part’.

More formally, a trie is a k-ary search tree where parent nodes recursively represent the prefixes of their children, and nodes have a boolean flag marking them as valid/invalid, with at least all leaf nodes being valid. They can be used for anything with an appropriate notion of ‘prefix’ (numbers, strings, lists, etc.), but we’ll be sticking to strings here.

Alright, enough exposition, time to code.

I always like to get I/O, file loading, etc. done first. Hopefully, everything going on here is clear:

Standards set here: Puzzle input format is ‘abc-def-ghi-jkl’, solver works exclusively in lowercase.

Building the trie comes next. First, a simple class for the nodes. The use of a dictionary for the children speeds up the search; in many k-ary trees there is no inherent distinction between children, but here we will know specifically which one we want (by the next letter).

A class that optionally takes an instance of itself as an argument requires some interesting type syntax.

Now to build the trie from our dictionary of words:

It’s worthwhile to step through adding some words to get a feel for this. Starting with an empty tree and calling add_word('ton') , a ‘t’ child node is created with root as its parent, an ‘o’ node under that, and an ’n’ node under that. The ’n’ node is marked valid.

Later calling add_word('top') , the ‘t’ and ‘o’ nodes are found, and a new, valid ‘p’ is made a child of ‘o’. Further calling add_word('to') , find ‘t’, find ‘o’, set ‘o’ valid. The commonality between adding and searching is clear: adding is just searching with permission to create missing steps. Indeed, they are functionally the same: both are O(log(n)). This means building the whole trie is O(n log(n)) .

A brief, more advanced aside: Sorting a list is also generally O(n log(n)). See the similarities between a trie and a sorted list? Given that we started with a sorted list, did we actually need to build the trie at all? Like any tree based on a well-ordering, the answer is no; binary search and some simple rules allow us to “treat the sorted list as the tree”. But that would be a bit opaque for this article, whose intent is to highlight this particular type of tree. Implementing this solver without the explicit tree structure is left as an exercise to the reader. (I’ve always wanted to say that)

We’ve already built a pretty useful thing. It’s a map of the English language with some handy properties. You could use it to do a basic implementation of spellcheck, for example. It’s also fairly similar to how T9 text prediction worked. Our plan is to use it to quickly find all the words available in this specific puzzle. This will be another significant reduction in the size of the problem space. We’ve already gone from “all strings of letters”, to “all valid English words”. Now we can further reduce it to “valid English words playable in this puzzle”.

This is a standard recursive depth-first search. The outer loop iterates over each starting letter, and the inner recursive function explores all the possible sequences of added letters, leveraging the trie to avoid exploring fruitless directions.

Try running the code we’ve written so far. You should get a pretty enormous printout. We’ve got all the words! Now for the final step, one more search, through this reduced search space. In the same way that words have been built out of sequences of letters, solutions are built out of sequences of words.

And there it is! This search is a little more complex than the prior one, but it is fundamentally the same: iterating through all the branches, being smart about not wasting our time, and accumulating answers.

The only decision of note here is that I have chosen not to count “frivolous” solution extensions, i.e. adding a 3rd word to a 2-word solution is not a 3-word solution, nor is inserting a pointless (no new letters) word in the middle.

I was absolutely floored by how many solutions there are. This solver is fast, but it’s made slower by just how many solutions it has to find. I thought this puzzle was hard! For the default puzzle and dictionary I’ve included, there are:

11 2-word solutions19,234 3-word solutions9,743,848 4-word solutions

On my CPU (i7–10700K), building the trie takes 0.25 seconds, and finding all the words in the puzzle takes 0.004. These sets of solutions take 0.06, 5.4, and 377.5 seconds to find, respectively. Warning, this seems to be a relatively fast one. Other puzzles may take considerably longer. To help prevent absurd runtimes, I have included a len_threshold argument in the final code.

Clearly, this is a difficult problem to scale with. It’s a fundamentally combinatorial process. The worst-case complexity is O(nᵏ), where n is the number of words in the input dictionary, and k is the longest solution. Based on the rules of the puzzle and our choice not to explore steps that add no new letters, kₘₐₓ=11.

Wall-clock performance could be improved by multiprocessing, using numba to speed up core routines, or porting to a compiled language. As each solution is independent, and data structures do not change during search, it should parallelize very well. But fundamentally, the rough scaling is inescapable.

There is one more fun trick to speed things up a bit: our search currently spends time exploring functionally identical branches. Say we need to start with f , and we can make either fares or fears .

Those both contain the same set of unique letters, and leave us in the same spot (next word starts with s), so from the perspective of the puzzle, they are the same. We can gain an advantage by understanding this redundancy, finding solutions in the form of a path of “letter edges”, and then expanding those paths into all the valid words. The implementation looks like this:

In addition to reducing the number of iterations, this code spends a lot less time calling set()

These improvements yield an almost exactly 2x speedup but do not change the fundamental complexity.

I think this puzzle provides a great illustration of how subjective puzzle difficulty can have very little to do with how many total solutions there are. I’ve spent hours trying to find a single 3-word solution to many of these. It certainly doesn’t feel like there are tens of thousands of them.

I hope that this has been fun. Please find the complete code available here on GitHub.

Solving “Letter Boxed” in Python

Wordle is for babies

Written by Phil McLaughlin