WORDLE, Python, and crowd-sourcing in real-time

A computer science example even my teenagers like

Deephaven Data Labs
Geek Culture
5 min readFeb 13, 2022

--

By Pete Goddard

Source: image by author

Wordle is a clever game, but that’s not why I like it. Instead, I’m fascinated that a game I play in total isolation can feel so social.

My teens play. Their friends play. Sometimes they even want to talk to me — the lame dad — about it. This is unusual, something to cherish.

Such banter usually includes jabs like “5 tries, dad?… Do better.” Or phrases of solidarity, like “Really, Wordle… ‘Humph’… really?“ However, on two occasions, with a handful of grazing high schoolers in my kitchen, they’ve let me move the conversation toward math and computer science.

The first time was a total fail. I tried to stoke chatter about the most efficient way to play the game, even encouraging them to check out 3Blue1Brown’s well-done YouTube video about it. However, it quickly became clear that analyzing the optimal playing strategy was a yawner. Each kid had their favorite set of starting words and style of play. Passions ran deep. Regardless, none of them were warm to my notion that this was “fun computer science” — an oxymoron to these teens (despite that shockingly awesome mobile computer in their hands).

But last night, between slices of frozen pizza with 3 freshman girls and two junior boys, I said, “You guys should check out the video my friend Colin made. He can guess the Wordle answer on the first try every day…. And all he looks at are the blank colored squares on Twitter. Not a single letter as a prompt, and he can guess the word.”

“Bull sh**, Mr. Goddard.” Young-J, one of my son’s friends, sometimes loses his tongue. (He quickly apologized.)

It’s real, Young-J. Here’s how you can predict the Wordle solution of the day just by ‘watching’ empty squares on Twitter for a couple of minutes….

A Simple Model

[Beware: the example below spoils Wordle #235, from Feb 9.]

Colin was inspired by a Kaggle post by Ben Hamner. He riffed a little — working with the full universe of possibilities rather than simulating games, and using table operations instead of loops (but I don’t want to lose Young-J by writing about dictionary cross-joins).

The premise is quite straightforward.

Every day Twitter is littered with pictures like this:

Source: Image by author

Ignoring the self-evident all-green row, the key is to reflect on the information in any given row.

Source: image by author

Even though there are no letters, by simply looking at the color pattern in that row you know something. For my typing ease, indulge me in using yes-no-maybe shorthand (“YNM”) to represent the red box above as NYYNM.

Without any knowledge about the actual letters, when we see a NYYNM box, we know:

  1. The Wordle could be “HUMOR” — yesterday’s Wordle. (Yes, there’s the spoiler.)
  2. The Wordle could also be “ALARM” or “EPOXY” or 1813 other 5-letter words.
  3. But the Wordle cannot be “ABBEY” or “ENVOY” or 497 other possible Wordle solutions.

In HUMOR’s case, guesses like JUMBO to BUMPH (a new word for me, meaning “useless or tedious information”) would produce this pattern.

For ALARM, guesses of PLAZA or GLAIR would yield the NYYNM pattern. For EPOXY, guesses of SPORE and APODE (“a group of soft-finned fish” — who knew?) work.

But, as noted in #3 above, there is no guess that would produce a NYYNM pattern for ABBEY or ENVOY.

Perhaps in reading the above you actually checked my work. In doing so, you performed the quite simple task Colin asked a computer to do: “For every possible 5-letter guess, calculate the 5-box-YNM pattern for every possible solution.”

It’s about 30 million combinations. Simple stuff for a computer processor.

Let’s call this process “Result-Mapping-Table”.

The Power of Crowds

Equipped with the Result-Mapping-Table, you only need one more thing to convincingly predict the Wordle-of-the-day: A lot of guesses.

Enter Twitter.

Social sharing of these Wordle results creates a bounty of guesses to observe in real-time. After listening to the Twitter feed for a few minutes, you only need two more things:

  1. For each grid-picture, equipped with the YNM patterns it provides, determine from the Result-Mapping-Table all the possible answers that could have satisfactorily yielded a picture like that. Let’s call each of those a “Very Qualified Guess”.
  2. Do a count across the Twitter universe of all the Very Qualified Guesses.

The Very Qualified Guess with the highest rank is the Wordle of the day.

Actually, if Twitter never produced garbage, then by definition the winning Wordle of the day would have a count that is exactly the same as the number of Tweets you listen to.

Source: Image by author

Open, Easy-to-Use Machinery

Listening to Tweets in real-time to win Wordle on the first go requires a few bits of gear:

  1. Code to scrape Twitter’s API with that day’s search term — ‘Wordle 234’ (in yesterday’s case).
  2. Logic to turn HTML-colored-squares into simple indicators and remove the aforementioned garbage (like entries in Turkish and German).
  3. A data system that can make this run in real-time.
  4. A user experience to deliver code and see results.

Colin used Deephaven and Python for all of the above. Deephaven is uniquely capable with real-time table data, and is specifically engineered for data-driven applications and analytics that combine embedded code (like the web scraper and html-parser listed above) with real-time table operations (like those needed to match guesses to that Result-Mapping-Table noted above).

The Code… and Colin at Work

I encourage you to watch the 10-minute video of Colin describing his code and the calculations in detail. I find watching him deploy a web scraper and inherit all the Python calculations in real-time quite satisfying. (And educational.) Maybe it’s his calm Minnesota tone.

Alternately, all of the code is available in the Deephaven examples-GitHub.

Enjoy the game. It’s fun. And your friends. They’re probably fun, too.

Let us know by joining our Slack community.

--

--

Deephaven Data Labs
Geek Culture

Deephaven is a high-performance time-series query engine. Its full suite of API’s and intuitive UI make data analysis easy. Check out deephaven.io