Filling in the blank with a Cloze Test

Assessing if your words make sense

Ashwini Kamath

Published in

IBM Design

5 min readNov 18, 2019

I am a user researcher on a product team that was tasked with establishing a content style guide in order to:

Establish a consistent language when presenting information to customers
Create a content model that resonates with customers

In a true user-centered fashion, the team started thinking about ways to engage customers and potential customers in a co-creation process.

User research is often synonymous with interviews and field research and is usually known (notoriously) for taking a few weeks to deliver an outcome. I love learning about methods I can use to bust that myth by getting feedback more quickly — and the Cloze test was one such opportunity.

Tom Roach, a designer on the product team I work with, was looking to test the terminology used for some of the key concepts in the product. There was a lot of debate within the team about what the “right” terms were, and we decided to leverage our users to help us decide.

In short, we wanted to know if our users understood the terms we had come up with, or if we needed to keep exploring.

First, we evaluated some of the widely used statistical comprehension and readability testing methods like the Flesch-Kincaid Index and the SMOG Index and realized that:

These methods are meant for testing paragraphs of text, not just labels or terms by themselves.
These methods evaluate if the text is readable and don’t necessarily indicate if it is understandable/comprehensible.
They also do not help evaluate alternative options.

Sticking to the spirit of the IBM Loop, we wanted to establish a process which would allow us to quickly and effectively test and iterate until the terminology was clear to our users. Also, phase 1 was testing only a few key concepts, so we needed something that could be easily repeated for future phases. Tom suggested the Cloze Test. I had read about this before but had never applied it.

So, what exactly is a Cloze Test?

A test in which one is asked to supply words that have been removed from a passage in order to measure one’s ability to comprehend text.
(Source: Oxford Dictionary)

A Cloze Test is like Mad Libs or fill-in-the-blanks. Participants add in words that have been removed from a passage, thereby demonstrating their understanding and expectations of the content.

Example of a Cloze Test passage with words removed

How we adapted the Cloze Test

Usually, a Cloze Test is open, wherein participants can fill in any word they think fits in the space. To add some structure, we decide to try out a closed Cloze Test, where we would provide a list of options to choose from. This also aligned with our goal of evaluating different options for each term. So we drafted a paragraph that including 10 key concepts, with 4–5 alternative options for each concept.

Example Options List

For each word, we provided a list of options that participants could choose from. Participants could also add their own words if none of the words in the list resonated with them.

Method

We published the test to an online testing tool in a survey-like format so that participants were able to see the entire paragraph at all times. For each missing word participants were asked to select the best fit from the list of options provided to them. They were able to navigate between words if they changed their mind along the way.

Screenshot from a session showing the participants screen with the paragraph and the list of options for word

These sessions were unmoderated and we recruited thirty participants that represented the targeted demographic of the product.

Metric for success

We adapted the Neilsen benchmark for comprehension, where each word must get at least a 60% agreement score:

Neilsen says, “If users get 60% or more right on average, you can assume the text is reasonably comprehensible for the specified user profile employed to recruit test participants”.

We adapted it to 60% agreement, which is considered statistically significant:

If the agreement score between participants is 60% or more, you can assume the text is reasonably comprehensible for the specified user profile employed to recruit test participants.

Method to the madness

Also known as synthesizing the results and playing back to stakeholders

For each option of each word, we calculated an agreement score, which is the % of participants that chose that option.

For presenting the results to stakeholders, the word that got the top votes was filled into the blank, but we used color to identify if the word had met our “60% or greater agreement score” metric. This was a simple but important detail, which served as a reminder that a top vote didn’t always mean high agreement. We may need to go through multiple words until we get significant agreement.

Green: high agreement i.e 61–100% of participants chose this option
Orange: medium agreement i.e 31–60% of participants chose this option
Red: low agreement i.e 1–30% of participants chose this option

Afterthoughts

This simple method helped us gather feedback on the effectiveness of the terminology relatively quickly.

After listening to the sessions, I realize how important it is for product related Cloze tests to recruit participants that represent your demographic, even if your paragraph is simple. The context and subject matter expertise they come in with eliminates the chances of guesswork.
Listening to the recordings of a few sessions and the participants’ rationale as they made their choices were very insightful.
Last but not least, it’s important to align as a team on the important terms that you want to test.

Who knew, a quick Mad Libs-like exercise could be leveraged to inform enterprise product design.

Resources we used to learn about Cloze Testing:

Ashwini Kamath is a Sr. Design Research, who works on IBM Data and AI products in Austin, TX. The above article is personal and does not necessarily represent IBM’s positions, strategies, or opinions.