Solving Wordle with Python
Wordle is inescapable these days. Can we use Python to help us win every single time? Let’s find out!
Let’s first load some useful packages.
The first thing we’re going to need is a list of English words. Initially I downloaded a generic one, but it turns out you can find the list that is actually used in the game. This is where I got it. One thing to note is that the game actually uses two word lists: there’s a more complete one for all the words you’re allowed to guess (with many rare words you’ve probably never heard of), and there’s a shorter, curated list containing more common words for the possible solutions.
Now that we have our words, it would be useful to have a function that is able to tell us, given a guess and the solution, which letters are not present, which ones are present but incorrectly placed and which ones are present and correctly placed.
I’m going to use a Wordle game as an example where I start with the word RAISE (because I heard it was a good starting word, but more on this later) and, spoiler, the solution is GORGE.
I chose to represent the results as strings, with a 0 representing a letter that is not present, a 1 representing a letter present but in the wrong spot and a 2 a letter present and in the right spot. Here it is in Python:
Now that we have this, we can create a function that filters the words to keep only the ones that match the pattern, giving us the list of words that are still possible solutions.
Now a question arises: which one is the best for us to choose ? It would be nice if we could assign a score to each of them, telling us how beneficial playing the word will be. A first idea for a scoring function is this: a word which contains common letters will be better to play, because it’s more likely it will match letters from the solution. So let’s look at the list of letters ranked by frequency of use in English words.
A score for a word can then be, for example, constructed like so:
where rank(letter) is the rank of the letter in the frequency list.
Here it is in Python:
Now that we have this, we can rank all our possibilities by score and choose the best one.
We will then choose TROPE as the next word. Let’s continue our example game.
And we can once again filter the possible words and choose the best candidate.
The next word we’re going to play is then BORNE. If we go on to the end, we get this game:
That was close, but we won. We may have gotten a bit unlucky with this one. Now we can test our solver over all the possible solution words, to see how we do on average.
We find the word in about 3.82 tries on average, which seems not bad! There are 28 words for which we don’t find the answer within 6 tries however.
In a tweet, Tim Urban of WaitButWhy gave this idea for Wordle scoring :
So we are already under par. However, being -5 after 22 days would put Tim at an average of about 3.77. Damn, Tim is pretty good! Can we beat him?
We can try to improve the way we score our candidate words. A more concrete way to optimize word choice is to take into account all the possibilities after playing a candidate word. We can compute, for every possible resulting pattern the number of words that will match it. The best candidate will be the word which leaves the fewest possibilities when averaged over all the possible results.
Here we show the number of words matching each possible resulting pattern after playing the word RAISE as the first word:
We can then define a new score function in the following way:
This time we want to choose the word that minimizes this score. By the way, this idea is linked to the notion of entropy in information theory.
We can also use this to actually compute the scores for every starting word.
Apparently, ROATE (whatever that means) is the best starting word, but RAISE was indeed pretty good and appears as the second best choice.
We can then create a new solver, simply by swapping the old word scoring function with the new one.
And we can again test this solver on all the possible solutions to see how it does on average. This time we also start with ROATE instead of RAISE.
We get a nice improvement and we’re able to beat Tim Urban’s average ! We’re still failing to win on a few words however.
Can we do even better? Well, for now we limited ourselves for our guesses to words that could be a solution. But in some cases, it might be better to use words which we know cannot be the solution, but which would give us more information. Then, when the number of possibilities gets small (here I chose 3), we go back to only trying words which could be a solution.
Let’s again test this strategy on all possible words.
We get another improvement, and this time we win the game for every possible word, which is quite satisfying! A 3.50 average is the best I was able to get, but I wouldn’t be surprised if better scores were yet possible…
Here’s how our improved solver plays the game from earlier:
As we can see, this time the solver is not afraid to play the word PRONG which doesn’t have the E but reaps more information.
I also made an interactive version using the ipywidgets library, which allows you to play (cheat) more easily.
And that’s how you can use Python to win at Wordle!
You can get the notebook there.