Using AI to Solve Connections

Paul Prescod
7 min readSep 14, 2024

--

If a time traveller had asked me, in early 2022, to write a computer program that could reliably solve a word association puzzle like the New York Times game Connections, I would have asked: “What is Connections? Don’t you mean Wordle? The New York Times is all about Wordle! They just bought it!”

Then the chrononaut might explain the basic concept of Connections. It is a highly abstract game of word association, and it will be very popular in 2023 and 2024 (second only to the still-dominant Wordle).

I would have said that I am far from having the skills to build a Connections playing game. But I could totally build a Wordle-solver. Why don’t I do that instead? In fact, not only will I build one, I’ll use my Wordle-solver to build a Wordle-variant game called Cludl.

But no, the time jumper might insist, Connections is coming soon, and we need to be ready. “You have two years to build a Connections-solver.”

“Okay,” I would answer, “I’ll need a team of PhDs working for two years, and we might be able to build it.”

Making abstract associations between word groups designed to be tricky for humans would have seemed far beyond the AI available back then, even GPT-3.

If the temporal wanderer had set me a deadline of early 2024, my PhDs would almost certainly have failed the task. We know that in hindsight because a team of PhDs did fail. A team from NYU tried a host of Natural Language Processing and Large Language Model techniques and tools, including GPT-4, and they came up short. As Newsweek reported:

All in all, the AI struggled to complete the Connections puzzles. GPT-4, the most successful of the models, only solved about 29 percent of the puzzles, struggling more with the “tricky” associations, just like humans. When GPT-4 was given step-by-step prompts to guide it through the reasoning of the puzzle, its performance was slightly better, at 39 percent.

But…if the time navigator had set the deadline for September 13, 2024, then the project would take a single person an hour to complete. We know that in hindsight, because I did it.

That’s how rapidly Language AI technology is advancing. The project that was impossible for a team of PhDs in May (or even in early September) took me an hour yesterday. A few months ago Newsweek was reporting on how this task is still impossible for computers and today I just solved it for fun on my laptop.

This is all because of the release of a new AI model from OpenAI called o1-preview. “O” as in Omni, 1 as in the first, preview as in not the real thing yet. It is a preview of their “reasoning” model o1 .

The o1 series of models are trained, like a math student, to go away and break problems down into steps, take their time, “think it through” and answer when ready. This is in contrast to a traditional LLMs like GPT-4 which rush to generate a token and typically cannot correct themselves if they go down the wrong path. o1 is trained to back up and try again a few times.

Purists will argue that o1 does not “really” think: it just emulates thinking. And they will point out that in many domains it is far, far from human-level thought. Thank goodness! But when it comes to Connections, it is eerily similar to human thought and it seems roughly equivalent in skill to a human player. I’ve only tried a few puzzles, because running them through is a bit expensive, but it gets all of them in one or two turns.

Connecting with Connections

Connections is a word association game. You put 16 words into four groups of four words. You are allowed four mistakes before you lose. It looks like this:

And your goal is to sort the words into groups of four to get to this:

Connecting with AI

The heart of my solver program is a Prompt to the AI. Here is what my prompt looks like:

Find groups of four items that share something in common.

Category Examples
FISH: Bass, Flounder, Salmon, Trout
FIRE ___: Ant, Drill, Island, Opal

Categories will always be more specific than
"5-LETTER-WORDS," "NAMES" or "VERBS."

Each puzzle has exactly one solution. Every item fits in
exactly one category.

Watch out for words that seem to belong to multiple categories!

Order your answers in terms of your confidence level, high confidence first.

Here are the items:

Tidbit,Amethyst,Lavender,Eggplant,Mosquito,Bundle,Dinosaur,
Fortune,Pearl,Nugget,Paleontologist,Pile,Mint,Kernel,Grimace,Amber

Return your guess as ONLY JSON like this:

{"groups":
[
{"items": ["item1a", "item2a", "item3a", "item4a"],
"reason": "…"},
{"items": ["item2a", "item2b", "item3b", "item4b"],
"reason": "…"},
]}
No other text.

After some time “thinking”, the AI responds with a guess like this:

{
"groups": [
{
"items": ["Amethyst", "Lavender", "Mint", "Eggplant"],
"reason": "All are color names."
},
{
"items": ["Nugget", "Bundle", "Mint", "Pile"],
"reason": "All relate to money or wealth."
},
{
"items": ["Dinosaur", "Paleontologist", "Amber", "Mosquito"],
"reason": "All related to paleontology and fossils."
},
{
"items": ["Pearl", "Fortune", "Tidbit", "Kernel"],
"reason": "All represent small components or units."
}
]
}

This response is from the o1-mini model, the cheaper of the two new “reasoning” models from OpenAI. These are mostly good guesses, but ‘Fortune’ as a small component is not very good, which throws off the “relates to money” category, which messes up another category and so forth.

To be similar to the real game, I deal with the lines one at a time. The first one is wrong, so I send a variant of the prompt to tell the AI what it got wrong and give it another try. Like the real game, I give the AI four chances.

The AI has no memory, so I need to teach it what the game is again in each prompt.

Find groups of four items that share something in common.
… etc….
Here are the items:
Amethyst,Mosquito,Amber,Mint,Pearl,Nugget,Dinosaur,Kernel,Paleontologist,Grimace,Fortune,Tidbit,Bundle,Pile,Eggplant,Lavender
You previously guessed
"Amethyst","Lavender","Mint","Eggplant"
Those answers are not correct. Do not repeat them.

It guesses again, and if necessary keeps going until it runs out of turns.

That’s essentially it; the rest of the program is boring. And the prompt is also pretty boring, insofar as it is mostly just explaining the rules of Connections. How did something so mind-blowing become boring?

When I tried this particular puzzle on the new o1-previewmodel, it got the entire answer in one try:

{
"groups": [
{
"items": ["Dinosaur", "Paleontologist", "Mosquito", "Amber"],
"reason": "All related to Jurassic Park; Amber contains Mosquito with Dinosaur DNA studied by Paleontologists"
},
{
"items": ["Lavender", "Eggplant", "Grimace", "Amethyst"],
"reason": "All are shades of purple"
},
{
"items": ["Fortune", "Bundle", "Pile", "Mint"],
"reason": "Words meaning 'a large amount of money'"
},
{
"items": ["Tidbit", "Nugget", "Kernel", "Pearl"],
"reason": "Words for small bits of information or wisdom"
}
]
}

It costs about $0.40 per turn on o1-preview, so it’s a good thing that it usually does not take more than one or two turns to solve.

According to Lech Mazur, who tracks AI support for Connections, o1-preview does not get 100%, but is more than twice as good as the next best model, which was an ensemble of other models.

Connecting this to the real world

Despite continual claims that AI progress is hitting a “wall” or “plateau”, these AI companies continue to pull new rabbits out of hats. Now that OpenAI has proven that one can train an AI to construct fairly reliable chains of thought, we should expect this capability to show up in competitors next year.

This is unlike any trend in my 25 years in this industry. Other technologies, such as the Internet, Mobility, Virtual Reality, unfold incrementally over decades. AI makes leaps in months.

Where this ends up, I have no idea. But AI continues to get more and more capable and we do not know whether there exists an upper bound on the scaling of known techniques — to say nothing of future innovations.

I’m fortunate enough to be paid to keep an eye on this stuff in my job at Elation Health, but that does not mean that I am entirely sanguine about the fact that AI seems to be on a path to one day exceed human capacity. A future where we are no longer the most intelligent species in the solar system will be strange and disorienting at best and dangerous at worst. It could be utopian, dystopian or the usual mix of both.

The only thing we can say for certain is that it will not be boring. The code may end up being boring, but the societal impacts certainly will not.

Thanks to Elation Health for sponsoring the $10 in compute cost that it took for me to build this app. I use caching to keep my compute costs down!

--

--

Paul Prescod

Paul Prescod develops AI-driven solutions at Elation Health. We streamline administrative tasks, reducing physician frustration and attrition.