Member-only story
Wordle — A Frequency Analysis Approach
The frequency distribution of words and its character positions can possibly reveal more information to help one optimize the search for the correct answers.
Like most of you, my social media feed was recently filled with strange-looking green, black, and yellow squares with Wordle scores. I had no idea initially what it was but the continual floods of it made me curious enough to find out what the hype was all about.
After learning about the game, it’s a simple twist to the game Mastermind which we used to play as kids. And following my habit of losing friends by codifying lazy solutions to games (you might recall I have already spoilt the game of Sudoku for some), I decided to analyse whether there was an efficient way to guess based on statistical measures.
In this game, the search space comprises of English words of 5 characters in length. You might already know that there are some sites that have dug deeper into the Wordle source codes and found the word list that was used. For my approach, I kept it at a generic English word list using the NLTK toolkit. The only drawback to this approach is that some high-likelihood words may be rejected and you have to pick from a list of recommended words instead.