Creative AI: let’s automate storytelling

Lancelot Salavert
My Messaging Store Blog
6 min readNov 9, 2015

--

Robot writers that generate basic stories based on data interpretation are flourishing in some businesses. What about getting to full-length creative novels?

At dawn on the 17th of March 2014, the inhabitants of Los Angeles were woken up by some terrifying earthquake vibrations. Within 3 minutes right after it, the Los Angeles Times website had already published an initial post on the subject. How can they have published such a detailed report in such a marginal amount of time? Well, a clue was embedded within the last sentence of the article: “This information comes from the USGS Earthquake Notification Service and this post was created by an algorithm written by the author.”

The later is a Los Angeles Times staff, who is not only a journalist but also a computer programmer. What happened is that various seismographs along the Californian coast automatically sent data to the USGS servers, which translated them into figures, which then selected the most relevant information and automatically drafted an article in proper English based on these pieces of info. From there, the author mentioned, who happens to be a computer programmer as well as a journalist, just had review it and hit send.

As a matter of fact, this basic incident got a lot of media attention as at that time, the Times was letting a lot of journalist go, due to financial problems. You did not need more coincidences for letting some people blaming that robots replaced them. To which extent is that accurate? How much of the current written documents are partly or fully automated?

The science behind robot writing was largely developed by artificial intelligence specialists at the Northwestern University in Illinois. Professor Larry Birnbaum, joint head of the Intelligent Information Laboratory, was among the inventors of the Quill system. This technology became the first product from a business entity called Narrative Science, which was founded in 2010 out of Chigaco.

The way it works can be breakdown in four different steps:

  1. First Quills requires getting imported structured data (tables, lists, graphs), such as company accountings or sport game box scores.
  2. Quills will then carry out narrative analysis. “Data is sorted and ranked using a method which focuses exclusively on building a narrative,” explains Birnbaum “It selects certain facts, underlines actions, highlights figures.”
  3. Then, based on these key facts, it will generate a narrative. “The algorithms define a plan, with a list of facts,” Birnbaum adds. “Then, thanks to a modelling process, they choose the appropriate editorial angles. In practice the result is a mixture of words, lines of code, graphs — a representation which only machines can understand.”
  4. Last but not least, it will finally draft a report in proper English. “To compose sentences it has a library of rules, words and turns of phrase, taken from everyday English, but also specialist professional terminology,” Birnbaum says.

This last stage might sounds impressive but the underlying technical complexity is mostly inside the third step. Long story short, Quill can be considered as one of the first advanced natural language generation platform powered by artificial intelligence.

Narrative Science primary focused was on generating news stories on baseball game recaps. But since then, its team also took an interest into finance which is another field which is backed by large set of organized data and that requires some regular (if not repetitive) reporting. Hence for a couple of years already, Forbes.com posts material authored by Narrative Science and a number of banks, brokers and rating agencies use Quill to draft the countless reports required by the federal administration (anonymously, of course). Finally, Narrative Science also took an interest in online marketing as in March 2014, it launched Quill Engage, a Google Analytics application that delivers plain English, narrative style reports for website owners.

Kris Hammond, Science Narrative CEO, recently claimed that by 2025 90% of the news read by the general public would be generated by computers. According to him, this shift will not occur by replacing the existing journalists but through a drastic inflation of the volume of published material, enabled by the enlargement of the media coverage capabilities. But the real deal will be when such technology will merge with individual tracking through the big data and the internet of things. According to Hammond, one day there will be millions of articles out of the same template where each articles will be customized for each reader based on their interests, theirs habits and their routine. We will be shifting from one article read by millions of people to millions of articles read by one person only.

Of course, Science Narrative competition did not took long to appear. Competing companies in the Narrative Analytics industry include Automated Insights, headquartered in Durham, North Carolina, which sells a very similar system called Wordsmith, but also the French startup, Yseop. Along with this expansion, this technology starts to be inserted pretty much anywhere there is a reasonable amount of data and a need to short standardized text. Hence Wordsmith is used by Yahoo for Fantasy Sport, a game in which players create their dream football team using the professional profiles of real athletes, then compete in fictitious games with virtual teams fielded by other players. Automated Insights also leverage its technology for property advertisements. Yseop itself offers reports in no less than 5 different languages. On its website if offers small demos of countries description that are automatically and endlessly created. According to Yseop CEO, Jean Rauscher, we will soon see some collaborative work between journalists and robot journalists. “If the algorithm realises some data is missing it will stop and ask for it. Once it has what it needs, it goes back to work” Rauscher explains.

Some other Parisian startup called Labsense is specializing in online marketing content for SEO. They help website with large catalogues and countless products by providing them complete, detailed and unique descriptions. “For example, almost 300,000 hotels worldwide are listed on travel sites but many of them lack a proper introductory text, or if they do have one it’s the same on all the sites,” Edouard de Ménibus, cofounder of Labsense says. “Our system drafts a different text for each hotel and each site.” In that case, the end user not so much potential travellers but more of Google bots for ranking optimization. “Just for hotel blurbs we’ve produced more text than a human could write in several lifetimes,” he adds.

Creating short form reports is one thing, but what about real books? Novelists, at least, have nothing to fear. Or so you might think. Literature has something unique where an author will share his personal life experiences through a set of emotions. No matter how smart a string of zeroes and ones gets, it will never have lived such experiences, right?

Fred Zimmerman, CEO of Nimble Books, made the headlines in 2012 for his AI-driven algorithms that produce complete books on a given topic from simple queries. These algorithms were rather simple as at that time, it main strategy was “finds all content whose title includes the keyword”. The machines was going through all publically available libraries and documentation and combine and arrange the results in alphabetical order. As video games is a universe that requires lots of basic stories and where pretty much all the data is accessible and understandable by an AI, Nimble Books has been focusing on that specific industry. Unfortunately for them, the real world is not a video game. It is huge, messy, dynamic and never really understandable.

In 2013, thanks to a twitter proposal from a certain Darius Kazemi, the National Novel Generation Month was created. It is basically the techy version of the National Novel Writing Month in the US. Their only rule for participating is that entrants must share a novel of more than 50,000 words along with the code that generated it. Each year, Kazemi collects various attempts and he confesses that the best ones still have a large part of human inputs.

One of the biggest challenges, for anyone who intended to tackle such project, is said to be teaching the machine to understand what is interesting from a fictional point of view. Another obvious challenge is length. “Once you hit that 3,000-word barrier, it starts to get very difficult to sustain people’s attention”, Kazemi says. Putting these limitations together, bots are not partially good in creative tricks such as making characters vanish and then reappearing 83 pages later. In addition, they have a hard time convincingly integrate sensory details, as they are rarely findable from a database.

In 1975, Romain Gary tricked the whole literature field by winning the Goncourt price for the second time, which is against the price rules, by having created a fake writer. It was revealed a few months after his death in 1981 and it is still considered as one of the biggest spam from the novel world. When will a greater scam take place through the participation of an AI?

--

--