How Algorithms help us to measure Creativity

Julian Leßmann
7 min readApr 6, 2022

--

Would you consider yourself a creative person? Many of us then immediately think of artists painting beautiful pictures like the own below and would negate. But actually there are much more dimensions to creativity than the artistic side and at the end of the day all of this it doesn’t really matter as long as you’re enjoying life. And accordingly, there is no need to measure creativity at all, right? Right? Well, kind of. Scientific studies often need to measure creativity, for example when it comes to the connection between creativity and happiness.

Photo by ractapopulous on Pixabay

And as it turns out, there are indeed scientifically valid ways to quantify creativity. But these often require a manual evaluation which makes them labour-intensive, error-prone, and vulnerable for biases.
In this article I will explore a common way to think of creativity, some proven procedures to measure it, and a new approach that makes use of computational algorithms rather than manual analysis.

Convergent and divergent thinking

Creativity can simply be defined as the ability to create something new. Following this broad definition, it’s difficult to measure creativity in its entirety, we can only approach it from different directions. In this sense it is comparable to physical fitness. We can conduct tests to measure how fit someone is in a specific category. But we’re not able to say “person A is X% more fit than person B”, the term is too broad — and the same goes for creativity.

While there are many different approaches to understand the matter and process of creativity in more detail, there is one that is especially capable for scientific measurements. The idea is to distinguish between convergent thinking and divergent thinking. Convergent thinking describes the setting to find a single solution for a given task. Divergent thinking on the other hand, aims at the generation of multiple answers from a given problem.
The theory combines these two to explain the generation of creative output. It’s important to note that this is one working hypothesis among many and, on its own, does not allow us to fully understand creativity.

How can we measure creativity?

One common test for convergent thinking is is the Bridge-the-Associative-Gap Task (BAG) in which you’re given two words for which you have to find a related third one. For example: giraffe and scarf are both related to neck.
The remote associates test is similar; here you’re given three words to find a fourth one. Try it for yourself: broken, clear, eye. Too easy? Try another one: wise, work, tower. You can find the solution at the bottom of the chapter.

The classical approach to measure divergent thinking is the Alternative Uses Task (AUT). It challenges you to come up with as many uses for an everyday object as possible. So for example, a brick could be used as a doorstep, a paperweight, or to throw it through a window. The answers then get manually assessed based on their

  • fluency, the total number of use cases generated,
  • flexibility, number of distinct categories of uses (cannot really be evaluated objectively),
  • originality, how rare is the use case compared to the sample, and sometimes also on their
  • elaboration, the amount of detail given in the answers.

You can try the Alternative Uses Task for yourself here.

However, there are some problems with the AUT. People who work with the given object on a regular basis have an unfair advantage. In addition, it is culture dependent: ten-ish years ago it was uncommon to use a paper clip to remove a SIM card slot from a smartphone — this has changed over time. Furthermore, the ‘originality’ scoring depends on other answers for the same question which is not good. And from the perspective of the evaluation process one can argue that it is time-intensive as multiple judges are required to ensure reliability.
To tackle these limitations, recent efforts focused on using computational algorithms in the evaluation.

The solutions for the remote associates task are glass and clock. Click here for more tasks.

Using technological assistance

There are some obstacles when automatically processing results of the AUT, though. The meaning of a word often depends on its context, for example: “Time flies like an arrow; fruit flies like a banana” (source). It’s also visible when playing with the order of the words. You could use a pen to “record a break”, which is not very creative while “break a record” is more difficult to come up with. These subtleties are incredibly difficult to interpret correctly using algorithms, including neural networks.

However, it is pretty easy to process single-word responses with computational assistance. And this is where a team of researchers came up with a neat solution. The newly developed Divergent Association Task (DAT) challenges you to come up with ten nouns that are as different from each other as possible — in all meanings and uses of the words. If you’re curious about the test you can take it here. I recommend taking it right away as this article contains spoilers regarding the processing and algorithms behind it. Here you can find the entire DAT paper.

The way the DAT works is that it takes the first seven valid nouns out of the ten input words. This makes the model more tolerant towards invalid inputs such as misspelled words or terms unknown to the model.
The algorithm then calculates the semantic distance among the remaining seven words. The idea behind this is to quantify the proximity of two words or “how much does term A have to do with term B”. So pizza and burger would have a smaller distance than pizza and river.

This calculation is performed with an algorithm that’s called GloVe and requires some pretty heave math. If you’re into that, here’s the paper and the concrete implementation on GitHub. GloVe uses the input from millions of websites to calculate the semantic distance. Generalised, the semantic distance is then based upon the distance of the two words in these texts.

So the DAT takes the first seven valid words from your input and compares them in pairs. That gives us 21 possible combinations and values for the semantic distance. Then, the average of the scores is calculated which results in a final score. So higher scores represent a better ability to come up with remote associations or to inhibit overly related associations — ergo, creativity.

My result of the DAT with a score of 82.55

Is it any good?

The DAT results correlate well with the categories fluency, flexibility, and originality in the AUT. The same was shown for the BAG answers and the remote associates test.

So the DAT might actually be a valid alternative for testing divergent thinking. Due to its algorithmic evaluation it requires less time than manually evaluated tasks and has no raters bias, which is great. Also, the AUT is at risk being abused by ‘spamming’ answers that lead to a higher score in fluency and originality (because there’s a greater chance to hit an answer nobody else submitted). This is not possible with the DAT as it requires the same number of inputs from every attendant. Most of the participants in the study said, they enjoyed taking the test, so that’s an advantage as well.

On the downside is to report that the DAT measurement is only as good as the GloVe algorithm and its input. If you’d feed it with nonsense, the results will be nonsense as well. And unfortunately, not everything published online is of high quality. I think this argument is neglectable though, as the websites were specifically chosen.
Another disadvantage is that the DAT can be easily abused once you know how the algorithm operates. For instance, you could feed it with technical terms that are rarely used outside their subject domain.
One thing I’ve been thinking about are differences in demographics and DAT results. Some dimensions (age, gender) have been investigated by the authors but others remained untouched. My hypothesis would be that people with a higher educational level are familiar with a wider range of vocabularies, technical terms, and abstract constructs and thus, will have better DAT results.

All in all, I think the Divergent Association Test provides a great alternative, if not even substitute, for traditional divergent thinking tests such as the Alternative Uses Task. I love the idea and the simplicity behind this approach. It’s also interesting to note that computational algorithms help us in evaluating creativity while some would argue that bits and bytes cannot even generate creative results. Looking at the results that were achieved using GPT-3 and its module DALL·E I’d generally disagree with that, though. On the other side, one could make the argument that GPT-3 just mirrors the stories we’ve been telling ourselves as humans.

So what do we do with it now? For me, the DAT might have some value in seminar situations. From time to time, I lead Business Model Innovation workshops using these business model patterns to challenge an existing business model. This creative process is divergent thinking par excellence and I’ll definitely use the DAT for a creative warm-up next time.
Apart from that, I really enjoyed taking the tests myself and diving into the topic of creativity and its measurement from a scientific perspective.

I’d like to leave you with one last thought: when we’re talking about measuring creativity there’s something we should not forget.

“Comparison kills creativity.”

Karen Walrond

Even though we are able to measure creativity does not mean we should compare each other with it. It’s easy to benchmark ourselves on a one-dimensional scale. But at this point, it is just like with the IQ: a comparison does not help anyone. Just enjoy life and love yourself. Cheers!

--

--

Julian Leßmann

I write articles swirling around psychology and parenthood that I would like to have read myself. My aim is to provoke and share interesting thoughts.