A Test for Artificial Creativity

Can a machine create a crossword puzzle as pleasing as one by Will Shortz?

Backdrop: From Automated Computing to Automated Intelligence

In the 1940s, the first fully electronic computing machines allowed for the rapid calculation of solutions to differential equations needed for in-theatre artillery firing and, of course, to build the first atomic bomb. Human computers, once the lifeblood of precision numerical computation, were becoming a bottleneck that the military complex could not afford given the time pressures of the war. Ever since, even at the dawn of the “Big Data” era, electronic computation capabilities, thanks to breakthroughs in hardware and methodologies, have kept pace with many of the commercial needs and utility of the data collected. However, numerical computing for the purposes of simulation and analysis—whatever the scale and whether done by people or machines—does not imply an understanding of the results. Brute force mastering of chess by machines is as effective—and as dumb—as a sledgehammer. Doing math on data does not connote inference nor intelligence. Inference is left to humans running the computing to interpret the results in light of a physical theory or pre-conceived model.

The field of artificial intelligence—with roots in Alan Turing’s initial papers positing the modern precepts of computation using hardware and software—grew out of a broader curiosity of the roles and capabilities of machines. Could machines understand and react to language? Could machines be taught to answer inferential questions (“machine learning”), mimicking and outsmarting the insights of people? Could machines assimilate disparate knowledge and facts and draw conclusions? Could machines think? This academic field has matured tremendously in the past two decades just in time to help relieve the mounting pressure to make decisions quickly on a vast amount of streaming data. Imagine if we needed humans to look at all our email to decide what is spam or not. Or if we needed someone working at Amazon as a personal concierge for every customer to help make suggestions for us. Indeed, the practical, scalable implementations of machine learning throughout industry are flourishing. Just as electronic computers replaced human computers, it appears that electronic thinking machines are replacing knowledge workers. I and many others have pondered some of the societal implications of this not-so-distant future.

Machines that Ask Questions

Fundamentally, machine learning helps answer questions that humans posit. Can machines do even more than help make decisions—can they learn to ask fundamentally important (and new) questions that people have not or cannot think of? Most of us would find this preposterous (and certainly subversive to the role that scientists play in hypothesis-driven inquiry) in part because we believe that asking great questions is an art form, it is a creative process.

“…asking great questions is an art form, it is a creative process.”

The question of whether machines can be “creative” and inventive could be construed as the next frontier. Indeed, artificial (or computational) creativity, as a field of study, is starting gain traction. Can machines write poetry? Draw something that astonishes? In reading through some past scholarly work on the subject, it is clear that people are trying to build creative machines but it seems that the field still has not settled on a set of metrics of whether artificially creativity is actually working or not.

One of the wonders of machine intelligence was that Turing proposed a test of it before anyone knew how to do it: in a carefully constructed laboratory setting, an intelligent machine would be indistinguishable from a person in a conversation. In today’s software engineering parlance Turing set up what we call “test driven development”. Establish a test for what we want something to do then build that something until the test passes.

The Crossword Test

Along the same lines, I’d like to propose a blind test for artificial creativity: A creative machine builds a crossword puzzle and, the test is said to pass, if it is indistinguishable from a crossword puzzle created by a top puzzle maker. The one rule is that no clue can be knowingly reused verbatim from any previous puzzle published (otherwise a computer could brute-force build a crossword from a huge library of previous puzzles). Why would passing this test be a notable measure of creative accomplishment? Because clues themselves are artful. They surprise, they delight, they test the range of knowledge of the creator and the solver. They challenge semantics and our playfulness with language and popular, shared culture. Case in point:

34 Down: First name in Twerking?

Answer: Miley

(I’m not a puzzle maker, but I think this clue could be awesome in a New York Times puzzle, Mr. Shortz).

Clues themselves are not islands unto themselves, they interact in the puzzle physically but also conceptually. Referencing 34 Down we might devise a question related to the past history of Ms. Cyrus.

30 Across: 41st state

Answer: Montana

The M in Montana and M in Miley could intersect on the page just as Miley Cyrus overlapped with Hanna Montana for her first big role. (Again, I’m not a crossword puzzle maker…)

Creative machines would also be able to create new phrases that trip off the tongue but have never been uttered before.

12 Across: Ho-hum visit to godfather of computing birthtown?

Answer: TriteTuringTour

One of the things I like about the crossword test is that it is highly constrained: there are a finite number of clues to be generated and the mechanics of evaluation are straightforward. Art always lives in context; the canvas, the medium, the epoch it is created, the audience, the creator, the viewer: all of which serve to both constrain and liberate the artist (in this context, Colton, Wiggins and others have argued against blind validation of artificial creativity). Also it would be simple to conduct the test. And, at time of writing, I have no idea how anyone could program a machine to win it.

“The recommendation engine becomes the personalized art engine”


One of the remaining places in society for those displaced from knowledge work is in creative work. At some level, it’s easy to believe that we do not lack a sufficient number creative workers and so an excess of creatives would mean that they are paid even less than they are now. However, in an age with more creative workers we might expect that demand for personalized creative work might grow: a poem written just for you at this moment in time, an original painting for everyone in your family on their birthday using colors and themes they individually love. A TV show auto generated for you and your spouse to watch together. The recommendation engine becomes the peronalized art engine. And so while we have no need for artificial creativity now, in a decade or two we might really need it.