The Parlor Game that Learned to Parlez

Published in

Thomson Reuters Labs

13 min readJun 27, 2024

Conceptual game-box design for The Shannon Game — generated by Pixlr.

Early in the 1950’s, newlyweds Claude and Betty Shannon invented a new kind of game in their parlor. At least, that’s how I like to think about it. People in the 50’s still had parlors, rite?

The game itself is similar to Hangman. It goes like this: One player finds a passage of text — perhaps a sentence or two — and writes down dashes to represent hidden letters, including the spaces between letters. The guessing player then tries to guess the first letter of the passage. For each incorrect guess, a tally is made below the dash, until the correct letter is guessed and filled in. Then proceed to the next letter and continue guessing.

Let’s walk thru an example game to see how it goes. To start with, consider the opening play. Here, we see nothing but 37 dashes:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

If you were the guesser, what would you guess first? Maybe ‘T’, thinking the first word could be “The”? Sorry! That isn’t it. Then maybe ‘E’ which is just a common letter. Wrong again. Finally, after several wrong guesses you would stumble on the correct letter, ‘C’. Now the board looks like this:

C _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The next guess is much easier because we know it has to follow a ‘C’. It’s likely to be an ‘H’ or an ‘R’, or a vowel. So maybe you guess ‘H’ and you’re right. Now the board looks like this:

Ch _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Next letter you may think is likely to be a vowel. You’ll probably find the correct ‘E’ in only one or two guesses. You continue in this manner for two more letters – both happen to be ‘S’. Now the board looks like this:

Chess _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Here you know that “Chess” is a word, so you can finally guess Space, and that is correct. After that you’re back at the start of a word and guessing letters is harder again, but knowing that the first word is “Chess” makes it easier to guess the next word, and so on.

Eventually, you will have guessed almost everything and the board will look like this:

Chess is a board game for two player _

The final ‘S’ is a trivial guess at this point. The game is over. You’ve won!

For the guesser, the goal of the game is to reveal the full message with the fewest bad guesses. For the player who selects the passage, on the other hand, the goal is to collect statistics on the number of guesses, use these stats to compile averages, and then write “Prediction and Entropy of Printed English” like Claude Shannon did for the Bell System Technical Journal in 1950.

It’s good clean fun for all ages!

Games in Artificial Intelligence Research

Ok, I’ll admit, this sounds like a pretty boring game. And well… it is – at least for people. This game may not have monopolized the hearts and minds of human game lovers, but it became very influential in Artificial Intelligence (AI) research, and has come to be known as the Shannon Game.

Games have always been closely associated with the development of AI. Games are valuable in part because they are intellectual activities which can be measured, ranked, or pitted head-to-head. In addition, their limited scope and strongly enforced rules make them great for computer simulation and large-scale testing. Chess… Go… Jeopardy… Even the Turing Test is described as a kind of game. See the trivia inset below for a quiz on some of the best known game-playing computer systems.

The Shannons’ interest in games was not limited to this one letter guessing activity. Claude, and his often unnamed but ever-present collaborator, Betty, are also known for early contributions to several playful ideas bordering games and AI. These include computer chess, a maze-solving robot mouse, and a “mind reading” machine.

Claude Shannon admitted that computer chess was “perhaps of no practical importance” but that studying chess playing would help to understand “other problems of similar nature.” Many other early AI researchers felt similarly, and a lot of attention was paid to the chess problem.

In contrast, the Shannon Game was not originally intended as an exercise for AI. But in the years since its publication, the challenge implied by the game — that of guessing the next letter in sequence — has become the key to the development of language models, one of the most important developments in AI.

Trivia: AI Gamers
Match each of the following game playing computer systems to their claim to fame and year (Answers at the end of the article):

Language Models

Formally, a language model is a mathematical model that can be used to estimate the probability of a sequence of language elements called “tokens”. Tokens can be letters and spaces – as the original game used. Or they could be words

Less formally, a language model is simply a computer program capable of playing the Shannon Game. The better it is at playing, the better it is at modeling language. Improvements in language models over the years started with N-grams, then moved to neural networks with a variety of network architectures. The latest networks are based on the Transformer network architecture. These models are used in a wide variety of applications, from speech recognition to machine translation to text generation.

N-gram statistical models

An N-gram language model is an accountant-like approach that estimates the probability of a sequence of words based on how many times those words have been seen previously. The estimates are gathered from the words found in a training corpus split into sequences of N words each. For example, a 3-gram or trigram model splits the corpus into three-word sequences.

IBM began developing Shannon’s work into the first fully-formed language models in the 1980’s. The initial motivation was to find and correct errors from their speech recognition systems. Since speech sounds are ambiguous, the recognizer would often output a list of possible words. The language model would then be used to pick the most likely based on the other context words.

How to play the Shannon Game with an N-gram model

N-gram models are capable of playing the Shannon Game. For example, to guess the next word in a message using a 3-gram word model: 1. Identify the last two context words 2. Find all the 3-grams that start with those two words 3. Find the most common 3-gram among those 4. The third word in this 3-gram is the most likely next word in the hidden message

You can also use the model to generate text. For example, you can start with a random word, then use the model to guess the next word, then the next, and so on. What you end up with is a word collage built on tiny overlapping quotes. The results may seem somewhat coherent but also often nonsensical. There’s a logical relationship between individual words, but not necessarily within the sentences as a whole.

To demonstrate this, I trained a simple 3-gram model using Aesop’s Fables. Here’s an example of random text generated by this model:

The Eagle built her nest at the point of drowning. A Countryman in the Western Patriarch, whose interests he henceforth advocated with so much by the ear, but conquer to your cost. your it, till another is told.

Compare this with the output of another 3-gram model built using Hoyles Games Modernized, a book on card games:

Every system has its Waterloo – it is always easy to remember the disadvantage that follows leading from a {282} centre, or lead one of the punters exceed the amount originally placed in actual contact with the queen third in hand.

You can see how the training material is reflected in the random output. The model captures the vocabulary and style of the training text, giving us a clue about it’s origin. But it’s clear it doesn’t understand the meaning of the words. Words that are close together seem to go together well, but a few words later the meaning can be lost.

As successful as the N-gram technique was for it’s time, there were issues. The main one being that larger N’s are required to get better models, but a language model with a large N would grow larger than time, storage, and available training data could accommodate.

You can see more examples of text generated from N-gram models in the offset trivia below. Also, if you’d like to try this out yourself, I’ve shared the application I used to make it on the Streamlit sharing platform. You can find it here.

Trivia — Guess the book behind the model
Can you guess which well-known public domain book was used to train the model that generated each of the following random word collages? Answers and explanations at the end of the article.
1. I want to be dwelt on; and between her sisters were of all that has passed off pleasantly to the door, she could hardly have resented a comparison that did not quit her present pursuits are not serious in your present distress, which delighted in anything else. Elizabeth was forced to be at home at any time, though he has not authorized me to show cause.
2. EMILIA. Good faith, how now? What is the very error of the Turkish fleet, and your bags! Thieves, thieves! Look to’t; O heaven defend your good grace shall think To be call’d. Look to’t as well i’ the dark! Where be these bloody thieves? How then?
3. Their irreconcilable hatred of the western armies had no temptation to flatter a minister, who, according to the throne. His short visit the public carriages, and immediately satisfied, unless it were, “that they would still persevere in the Roman legions were mustered by the length of the Roman pontiffs, who recommended him to lay the head of John XII.
4. “Some ships sail from their flaming ship; seemingly seeing in it except at long intervals; but, when, while the Pequod should fairly have entered the heads of all ships separately sailing the seas of life. But to what headsman each of these whale-bone whales no longer snuffing in the ambergris affair Stubb’s the difference now between man’s side…
5. I told Clerval that I was unable to bear me on the fifth from our house near Belrive, the monster of my murdered friends heard and that I should be rash enough to convict her.
6. Maybe it’s the stupidest tea-party I ever saw in my kitchen at all. Soup does very well without — Maybe it’ll do next! As for pulling me out of a well?”
7. He has refused his Assent should be perpetual, by the Congress Information of the Vice-President, or engage in War, conclude Peace, enter into any Agreement or Compact with another, shall appoint Ambassadors, other public Ministers and Consuls, Judges of the Electors shall meet in their respective States, and destroyed the lives of our political fabric depend, and to do with it.

Machine Learning

Supervised machine learning using neural networks were found to be effective for language modeling. In supervised learning, you train a network to solve a problem by showing it examples. The most important thing in training a neural network is good quality training data. The more training examples you can provide, the better it is able to learn.

It turns out that the Shannon Game is a fantastic task for supervised neural networks. This is because you can convert any corpus of text into a huge number of training examples – basically every word in the corpus is another example; another opportunity for the model to guess and be trained. With the entire internet available as a training corpus, the sky’s the limit.

How to play the Shannon Game with a neural network model

Using a neural network to play the Shannon Game is a bit more complicated than using an N-gram model. Here’s a simplified version of how it works:

Train the model on a large corpus of text
To guess the next word in a message:
1. Encode the context words as a vector of bits – kind of like Morse code
2. Pass the vector through the model
3. The output is a probability distribution over all the words the model has seen
4. The word with the highest probability is the most likely next word

You can make models bigger and bigger, feed them more and more text, and they will get better and better at playing the Shannon Game. This requires more and more computational power.

Help for the computation problem came from an unexpected source – video games. Demanding gamers drove the development of powerful graphics processing units (GPUs) needed to render the 3D worlds they liked to shoot up. As it turns out, GPUs are also great at training incredibly large neural networks. Cloud computing providers started buying up computers with several GPU’s and renting them out for AI training. This made it possible for researchers to train much larger models than ever before.

The advancement wasn’t just in computational power. Lots of human brain power went into a lot of transformational ideas. One such leap was called learned embeddings. Here, a network trained on the Shannon Game was used to build a kind of code for a word like ‘king’ – a vector of numbers which represent ‘kingness’ – kind of like a personality profile. You do the same for ‘queen’ and all the other words there are and make a dictionary. With this dictionary you can now train a new neural network using semantic encoding instead of the morse-code like encoding, and this makes it easier and faster for a second network to learn language-based tasks like the Shannon Game.

Anecdotally it was found that you could kind-of play around with these learned embeddings mathematically, since they were all just vectors of numbers. For example, if you take the embedding for ‘king’, subtract the embedding for ‘male’ and add the embedding for ‘female’, the resulting embedding is similar to the embedding for ‘queen’. Altho this kind of example is rare, it did at least suggest that embeddings were capturing some kind of meaningful semantics of the words they represent.

Another leap incorporated an analogy to attention. In a noisy and distracting world, humans are able to focus attention on things that matter to what they are doing at the moment. The Transformers neural network architecture, introduced by Google in 2017, is one of the most successful attention-based networks. These days, most of the best functioning large language models use the attention-grabbing Transformers architecture. For example, the T in GPT stands for Transformers.

Once these tricks were put to use, along with the general computational power and massive textual resources of the 2010’s, people were amazed to see what large language models have been able to do.

Going well beyond the simple act of picking the next word, they are able to answer questions, summarize long documents, hold reasonably coherent conversations, and generate new-sounding prose, poetry, and technical writing.

And surprise – they can even play chess!

Conclusion

It started with a game

Played in a parlor by a newlywed couple.

Minutes to learn, and a lifetime to master. For computers, at least. It’s taken nearly 75 years to get to this point.

So, if you like games, don’t be shy about it. The point doesn’t have to be about keeping score or finding the best (tho that can be fun too). It’s about training the mind. For a great example of this, check out Rahul Jain’s recent chess-inspired post, ELO as a tool for ranking LLM’s.

After all, look at the Shannon Game. It’s a parlor game that learned to parlez.

💬 Let the conversation begin, here, or start a chat on our LinkedIn group!

Answers to the quizzes

Trivia

Guess the book behind the model

Pride and Prejudice by Jane Austen
2. Othello, the Moor of Venice by William Shakespeare
3. Decline and Fall of the Roman Empire by Edward Gibbon
4. Moby Dick by Herman Melville
5. Frankenstein by Mary Shelley
6. Alice’s Adventures in Wonderland by Lewis Caroll
7. Declaration of Independence and US Constitution by Thomas Jefferson et. al.

Note: For both an explanation, and also an apples to apples comparison of the quality of n-gram model generation vs LLM generation, I asked a chatbot powered by Anthropic’s Claude language model to identify each of the books from the random text as well, and here are its answers:

This appears to be from Jane Austen’s novel “Pride and Prejudice”, with references to characters like Elizabeth and mentions of things like pursuits and comparisons that align with the novel’s storyline and writing style.
The language and references to “bloody thieves”, “good grace”, and phrases like “O heaven defend” suggest this is from one of Shakespeare’s plays, likely “Othello” given the character name Emilia.
The mentions of Roman legions, pontiffs, hatred of western armies, and a reference to the throne point to this being from Edward Gibbon’s historical work “The History of the Decline and Fall of the Roman Empire.”
The nautical language, whaling references like “whale-bone whales” and “ambergris”, and the name “Pequod” indicate this is from Herman Melville’s classic novel “Moby Dick.”
With the mention of a monster, murdered friends, and the name Clerval, this seems to be from Mary Shelley’s Gothic novel “Frankenstein.”
The whimsical, nonsensical language like “stupidest tea-party” and “pulling me out of a well” is very characteristic of Lewis Carroll’s “Alice’s Adventures in Wonderland.”
The formal legal language, references to the Congress, President/Vice-President, and political fabric clearly identify this as being from the U.S. Constitution.

Note that the Claude language model is named after Claude Shannon.