How to properly play Wordle using Dataflow and BigQuery.

Iñigo San Jose Visiers
Google Cloud - Community
11 min readFeb 1, 2022

EDIT: I happened to have talked about this during Beam Summit 2022. Here’s the talk, which is pretty much the same as the article but with some extra optimizations. In case you want to check the code, it’s on GitHub.

If you have been on social media in the past few weeks, you probably know what Wordle is. If not, allow me to explain why these random people are sharing weird boxes all around Twitter and Whatsapp.

Wordle is a game where you have 6 attempts to guess a 5-letter word, each word you try will tell you the letters that are in the final word (yellow) and if they are in the right position (green). For example, if the final word is “hello” and you try “below”, the substring “el” would be green (since they are in the right position) and the “o” would be yellow (since it’s in the final word, but in the wrong position).

Talking with my friends I came up with a strategy to optimize the win rate. My idea was to start with 3 words that would contain the most common letters and then use those to guess the final word. As expected, my friends said my strategy was stupid and it’s better to guess with the knowledge each new words gets you. Being as stubborn as I am, I decided to prove them wrong, and when there’s a will there’s a way, and that way was to use my favorite Big Data tools, Cloud Dataflow and BigQuery.

There have been other articles talking about the best starting word (they normally point to “soare”) and even a solution finder using code, both by coders and linguistics. My goal here is to find the best set of words that can be used regardless of the end word and will give you (on average) the best outcome. Note that this approach doesn’t try to minimize rounds, but maximize win rate.

The natural question here is, which combination of three words is the best? This may seem a simple question, but given that the amount of combinations is huge, calculating the actual best is not that simple. The Wordle dictionary contains around 10.000 words, so if we naively want to test every single possibility, we’d end up with 10¹² combinations, we need to be smarter than this.

Optimizations and design

There are a few optimizations we can make:

  1. We don’t want a combination of two words that have a letter intersection, we already gathered the information about that letter using the previous one.
  2. Since there’s no intersection between words, the end result is the same regardless of the order you play it (more on this later). Since we are combining 3 words, we have 3! possible combinations and we only care about 1 of them. So we can reduce by a factor of 6 the total combinations.
  3. We can take out words that have a repeated letter in it, using a similar logic that (1). It’s true that the end word may have repeated letters, but again, our goal is to gather as much information as possible with 3 words regardless of the end goal.

The first optimization take us from 10¹² combinations to a bit less than 8 * 10⁹ combinations (for reference, around 230 GB). Applying the second optimization too reduces the combinations to the expected 1.3 Billion words (one sixth of the previous step). Lastly, taking out the words with repeated letters and making the combination out of those results in less than 65 Million combinations, or 1.8 GB of data. “Mas vale maña que fuerza” (Better brains than brawn), we say in Spanish.

Ok, that’s good and all, but how did I make the combinations, and what about the best words? This is a highly parallelizable task, so Apache Beam and Dataflow are a great fit. Given that I am error-prone, and this is a heavy task, I am going to split this into multiple Dataflow jobs. This didn’t only help me with the mistakes I made by having checkpoints, but also makes this article and the code more readable. Of course, this can be made by a single pipeline, but here I only present the individual ones. Once we have the results, we evaluate using BigQuery.

The tasks I split into were:

  1. Filter out words with repeated letters.
  2. Combine the words with each other, first only two and then three. As mentioned, I only combine if there is no intersection and then deduplicate the words.
  3. Check the word combinations against the possible answers, counting yellows and greens.
  4. Store and evaluate the results using BigQuery.

Removing words with repeated letters

Filtering out words is rather easy, the data size is quite small and the task is not heavy, so this can be done with a standard Python script without needing Beam nor Dataflow. For consistency, I did it with Apache Beam using DirectRunner, here’s how:

def single_letter(word):
for i, l in enumerate(word):
if l in word[i + 1:]:
return False
return True
(p | ReadFromText(f"gs://{bucket}/allowed.txt")
| Filter(single_letter)
| WriteToText(f"gs://{bucket}/no-repeated",
file_name_suffix=".txt",
shard_name_template='',
num_shards=1)
)

Filtering permutations

OK, we have the words we need to combine with each other and then remove words that are a permutation of each other (i.e, “hello,world” = “world,hello”). To do that, we are going to use the PTransform Distinct, but we would need to have the same element, so we are going to sort the words before running the Distinct PTransform, so that the same two words output the same combination regardless of the order.

def combine_words_dedup(main, side_words, size=3):
def _leter_intersection(word1, word2):
for l in word1:
if l in word2:
return True
return False

for side in side_words:
intersection = _leter_intersection(main, side)
if not intersection:
if size == 2 and main > side:
yield f"{side},{main}"
else:
yield f"{main},{side}"
{..}side_words = p | "Side Words" >> ReadFromText(f"gs://{bucket}/no-repeated.txt")words = (p | "Main Words" >> ReadFromText(f"gs://{bucket}/no-repeated.txt")
| "Combine words" >> FlatMap(combine_words_dedup,
side_words=beam.pvalue.AsList(side_words),
size=2)
| Distinct()
| WriteToText(f"gs://{bucket}/two-words", file_name_suffix=".txt")
)

For combinations of two words, we can just sort them in place as shown above, for combination of three words, we need to split in and then sort, since the new joined word may be in the middle once sorted (i.e, “hello,world” + “green” should be “hello,green,world” not “hello,world,green”). So we need to add a step to our pipeline, which is Sort:

words = (p | "Main Words" >> ReadFromText(f"gs://{bucket}/two-words*")
| "Combine words" >> FlatMap(combine_words_dedup,
side_words=beam.pvalue.AsList(side_words))
| "Sort" >> Map(lambda x: ",".join(sorted(x.split(",")))) # Only used for size three words
| Distinct()
| WriteToText(f"gs://{bucket}/three-words", file_name_suffix=".txt")
)

Comparing against possible answers

Great, now we have the combinations of two and three words with no repeated letters and without duplicates from possible permutations, now it’s time to actually compute how good those combinations are.

We are comparing the combinations with the possible final answers Wordle has, which is a bit more than 2300 final word with 63M combinations , meaning we have to do around 1.45 * 10¹¹ comparisons. I am so glad I am doing this with Dataflow.

First we need a way to make a score, so I am simply counting how many yellows and greens each combination got against the possible answer, and then I will store that into BQ and compute it there. This is the function I am using (which relies on the fact that there are no repeated letters).

class WordleRow(DoFn):
def process(self, words, answers, num_words=None):
words_letters = [list(w) for w in words.split(",")]
greens, yellows, total = 0, 0, 0
for answer in answers:
total += 1
for word_letters in words_letters:
for i, el in enumerate(word_letters):
if el == answer[i]:
greens += 1
elif el in answer:
yellows += 1
yield self._format_result(words, yellows, greens, total, num_words)

def _format_result(self, words, yellows, greens, total, num_words):

if not num_words:
num_words = len(words.split(","))

d = {
"words": words,
"yellows": yellows,
"greens": greens,
"total": total,
"yellows_avg": yellows / total,
"greens_avg": greens / total,
"amount_words": num_words
}
return d

As you can see I am also storing the total words I compared each combination with as well as the average, just for completeness.

The full pipeline looks like this:

answers = (p | "Answers" >> ReadFromText(f"gs://{bucket}/answers.txt")
| Map(list))
words = (p | "Combinations" >> ReadFromText(f"gs://{bucket}/three-words*")
| ParDo(WordleRow(), answers=beam.pvalue.AsList(answers), num_words=3)
| WriteToBigQuery(
table,
schema=schema,
write_disposition=BigQueryDisposition.WRITE_APPEND,
create_disposition=BigQueryDisposition.CREATE_IF_NEEDED
)

I did the same with the two word combinations and the original filtered words, so that we also have in BigQuery that info. You’ll see why this comes handy later.

Calculating the score

We now have a table containing the results of all the combinations, but we still need to find out which combinations are better.

This is not such a straight-question as you may think, because how do we score the combination? It’s clear green letters should be more valuable than yellow, but how much? Is this constant or are green letters more valuable early on?

To answer this question, I did check the data itself and talked with some people (including my friends who didn’t like my strategy and thought this was overkill). The top 3-words-combinations get around 2.3 yellows and 1.5 greens, so I cannot overvalue green letters. Having two yellows and two greens makes the final word pretty much obvious, adding one green or yellow doesn’t make such a difference at this point. So I decided to value greens as 1.75 and yellows as 1.

Of course, it’s not the same getting 2 greens in the first attempt that in the third. I also want to know in what order I should play the combination of words. So I took the top 50 words using 1.75 value for green, and generated the possible two words that are part of that combination, and ran the query again, filtering by those words. In this case, I valued green as 2.25, since it’s clearly better to have greens early on.

Being able to make these calculations fast was the main reason I chose BigQuery to store the results. I knew I would need to investigate the data and have some extra context to find out the best possible combinations, BigQuery was the right choice.

But Iñigo, “you still haven’t told us the best combination”, you may be thinking. Well, you’d be right, I haven’t, and the reason is that I want to explain how I got to the answer, since the definition of “best” it’s not as clear as some may want, but wait no more, here it comes:

The best combination is (in this order): PRATE, SOILY, DUNCH

Here’s the reasoning: using 3-word-combinations, the “magic combination” ranked 14th, with a score (1.75 for green) of 5.01 and averaging 1.49 greens and 2.4 yellows. The top 1 was “dhole,pricy,saunt” with and score of 2.365 yellows, 1.53 greens and 5.04 weighted which is a very very small improvement, but when levering the two words combinations created out of those two, “prate,soily” it’s better by a long margin (0.4 less weighted score, 0.16 less greens than the best combination of “dhole,pricy,saunt”).

Top 16 words using weight 1.75 for greens

Having a strong 2-word-combination gives you quite the advantage and you may even not need a third word. “prate,soily” ranks 6th using a weight for greens of 2.25 and it’s the second combination with more greens, but it’s the best when using words from the 3-word-combination top.

Top 10 2-word-combinations generated from the top 3-word-combinations

Now we only need to check if “prate” is better than “soily”, which it is.

For completeness and for those with curiosity, here’s some trivia for other metrics:

  • Most greens:

(3 words) brant,coude,shily with 1.5632 greens

(2 words) brane,soily with 1.205

(single word) saine with 0.666

  • Most yellows:

(3 words) estro,unhip,yclad with 3.382 yellows

(2 words) estro,inula with 2.652

(single word) estro with 1.486

  • Best weighted:

(3 words, 1.75 value) dhole,pricy,saunt with 5.046

(2 words, 2.25 value) clint,soare with 4.465

(single word, 2.75 value) soare with 2.923

  • Best yellows + greens:

(3 words) dhole,pricy,saunt with 3.897 (many ties here, using green as tie breaker), “dunch,prate,soily” gets the same sum, but less greens

(2 words) irone,sault with 3.05 (again, many ties)

(single word) roate with 1.789

You may notice “clint,soare” having quite a strong 2-word-combination and the best weighted single word (“soare”), but the best combination with those two words (“soare,clint,humpy”) ranks above 400th when checking 3-word-combinations. It may be a good combination if your goal is to reduce the number of rounds.

Notes and other improvements

Of course, all these calculations rely on some assumptions I made like the combination should not have repeated letters and the weights I put into the greens vs yellows.

Also, there may be some nuances in the English language that make some letters be more valuable than others, for example, in Spanish, if you get a “Q”, you know there’s going to be an “U” after. In theory, this should be already counted by making averages, but there may be outliers.

My assumptions may lead to the results not being the optimal, but the answer I provided should be quite a good approximation to the optimal.

As mentioned before, there are many improvements that can be done to make the pipelines run faster, as well as having a single pipeline instead of multiple. One of the biggest improvements that can be done it’s to calculate the greens and yellows for single words and then make the combination of those by simply adding the values if there’s no intersection, which would avoid calculating the score of a single word many many times. The reason I didn’t do that to start with is that it makes the code harder to read and the reason behind the design harder to follow. Also, this improvement would’ve saved me around 30 mins or so of waiting, which I didn’t really care about.

Lastly, some may think that computing the top single words and making combinations out of those in order as long as there’s no intersection would return the best possible combination of words. Unfortunately, this is not true and this approach outputs words way outside the top 100 combinations.

Learnings

While making these calculations I wanted to “fall” into most of the problems a naive approach would have (except allowing words to intersect), so I did run all the combinations without the improvements (deduplicating and filtering words with repeated words), just to know how much time I saved. My estimations are that a naive approach would have taken around 100 times more (and costed 100 times more).

It’s clear when dealing with these type of problems sometimes we need to take a step back and use the knowledge we have about the data to build our code around.

The “bad thing” about this article is that it has destroyed the joy of playing Wordle for me. I have been playing to double check if this approach works, and I have been averaging around 4 rounds to get the right answer, which I think it’s pretty good (again, note that the goal of this article is to help out with getting the word right, regardless of the rounds it takes)

(SPOILER ALERT)

Here you have my strategy put to test with today’s Wordle:

Wordle 227 4/6

Hope you learned something, happy Wordling!

--

--

Iñigo San Jose Visiers
Google Cloud - Community

Googler, Beamer and Dataflower. I also love Galois Theory and topology.