An Informed First Wordle

Mark Scherschel II
6 min readJan 9, 2022

Despite my enthusiasm for Wordle and the simple pleasure of sitting next to my partner in the mornings as we alternated cries of joy and despair with each row attempted, the engineer in me couldn’t stop thinking about how to optimize my strategy. Assuming that many others who had also been captured by this phenomenon would have had the same idea, I investigated the top 10 or so search results for ‘best wordle starting word’. I was disappointed to say the least. EVERY ONE of the articles I saw took a greedy approach to finding their first word (they tried to eliminate as many words as possible in the first attempt). They suggest words like “SOARE”, “ROATE”, and “REACT”, but they were focused on basic information like how frequently each letter appears (hence every one of the words I listed having ‘A’, ‘E’, and ‘R’). Fortunately, this is exactly the kind of problem that I thought I would be able to apply my halfway-through-a-Masters-in-Analytics-from-Georgia-Tech skills towards.

TLDR;

  • Eliminating words has been the focus of other solutions
  • Considering the additional information gained from looking at the letters and their position could be useful
  • The closer a word matches the target, both in letters used and their position, the easier to get to the target
  • I wrote a script to calculate this ‘match distance’ for each pair of words
  • The word with the smallest total match distance to every other word was selected as the best

Let’s get in to it!

If you would like to follow along with the code (it is in a Jupyter notebook and could be complete garbage, but it shows what I did) it can be found here: https://github.com/schersch/wordle The wordlist I used came from: https://www.wordgamedictionary.com/word-lists/5-letter-words/

First, I’d like to point out that the greedy approach isn’t necessarily a bad thing. Getting rid of as many possible wrong answers in the first line is a really nice way to start things off. It helps you focus your attention on a smaller subset of possible solutions. But that solution space is still huge, THERE ARE STILL MANY MANY POSSIBILITIES! It’s extraordinarily unlikely that going from 7, 10, 12k+ words down to even 500 is going to make you suddenly think of just the right one. So how are people solving these at all? Doesn’t the success of the people who have used those starting words suggest that maybe they’ve found the right ones? …Possibly… It could be that all my frustration is completely unfounded. It certainly wouldn’t be the first time. However, I suspect that the success of those words isn’t entirely rooted in the elimination of words that don’t work. I suspect that the success is a combination of that and the additional information that each letter and its position provide. After all, Wordle doesn’t just stop at telling you if your word is correct or not, it also tells you whether each individual letter is correct. Even better, it tells you if you’ve chosen a letter that is in the word but placed in the wrong position. That’s a lot of unused information in the ‘most used letters’ and ‘most frequent letters per position’ and ‘most words eliminated’ greedy starting words. So how do we take advantage of that extra information?

Greedy: eliminate as many wrong answers as possible. If your starting word uses the letter S and the target word doesn’t have one, you can quickly eliminate almost half of the possibilities.

First, let’s think about eliminating words. (Didn’t you just say that’s not the right approach!? Sure, but it could be a good component of the right approach!) If the starting word you choose has a letter that is only in half the words, then either way you will be able to eliminate half the possible choices.

Letter-based: try to get as many letters in the right position as possible. S is much more likely to be used at the end of a word than any other letter. If your starting word has an S at the end and it is correct then only a third of the possible words remain, otherwise only two-thirds (which is still pretty good).

Second, think about letters that are in the right spot. If you get that beautiful green box on the first word, you are well on your way to a quick solution. So, choosing the most frequent letter in each position (or as close as possible while still creating a word) can tell you a lot.

Position-based: try to get as many of the correct letters in your starting word as you can even if they aren’t in the right position. S,E,A,O,R are the most used letters in my 5-letter English word list.

Next, those letters that are in the word but incorrectly placed are worth considering. Sure, it’s no green box, but that yellow box is still really nice. Not only do you know that the letter is in the word, but you also know that it can only go in a max of 4 other places (or fewer if you also got some green boxes). If you take a greedy approach to position-based information then SOARE is perfect.

Frequency-based: get rid of letters that aren’t in the target. Very useful when you’ve already used your starting word, finding letters that get grayed out will help you focus on the unused letters.

Finally, we can ignore the letters that are completely wrong. Wait, that’s not right. The gray box isn’t ideal, but when used wisely it can send you down the right track.

Information-based: Combine all of the above into a more complete strategy.

So how do we combine all these pieces of information into a single metric for how good the starting word may be? I wanted to find a word that was as close as possible to all the others. A word where the letters used were as close as possible to the right letters in the target word, while also having each letter be either in the right position or with as few alternatives as possible. The metric I chose was a measure of ‘distance’. If a letter was in the same space for each of the two words, the distance between them would be zero. If a word had 3 letters in the correct place and two letters that were swapped with each other, the distance to the solution would also be zero (because you know all the letters and you know that all you have to do is swap the two yellows). If the letter doesn’t appear in the word at all, the distance is very large. To get the total distance between each pair of words, I added up the distances created by each letter pair. I then added up the distances from each word to every other word. Using that compiled distance measurement, I was able to find a single word that had the shortest combined distance to all other words. A word that not only took advantage of eliminating a lot of words, but also considered the position of each letter.

TARES arranged in boxes like on the Wordle site

TARES takes advantage of both the most common letters and their most common positions. Hopefully helping you go from barely finding solutions to having half your attempts be unnecessary.

Barely Surviving, image of wordle solution grid showing that all rows were required to find a solution
Barely Surviving
Clearly Thriving, a wordle solution showing that only two rows were required
Clearly Thriving

Now, is this absolutely the best possible approach? Probably not! Perhaps I didn’t use the best word list. Perhaps I made a critical logic error above or in my code. Perhaps I just overlooked some very obvious improvement (or even a very clever one). There are lots of ways to attempt this problem, but since I have already spent a lot of one of my rare free Saturdays coding a solution and writing about it, those will be left to others.

--

--

Mark Scherschel II

AI Developer, Roboticist, Children's Book Publisher, Tinkerer, Wizard, Seattleite