Generating Stories about Images

Recurrent neural network for generating stories about images

4 min readNov 5, 2015

Stories are a fundamental human tool that we use to communicate thought. Creating a stories about a image is a difficult task that many struggle with. New machine-learning experiments are enabling us to generate stories based on the content of images. This experiment explores how to generate little romantic stories about images (incl. guest star Taylor Swift).

neural-storyteller

neural-storyteller is a recently published experiment by Ryan Kiros (University of Toronto). It combines recurrent neural networks (RNN), skip-thoughts vectors and other techniques to generate little story about images. Neural-storyteller’s outputs are creative and often comedic. It is open-source.

Experiment

This experiment started by running 5000 randomly selected web-images through neural-storyteller and experimenting with hyper-parameters. neural-storyteller comes with 2 pre-trained models: One trained on 14 million passages of romance novels, the other trained on Taylor Swift Lyrics. Inputs and outputs were manually filtered and recombined into two videos.

Generating Romance

Using Romantic Novel Model. Voices generated with a Text-to-Speech.

Generating Taylor Swift

Using Taylor Swift Model. Combined with a well known Swift instrumental.

How does it work?

Train a recurrent neural network (RNN) decoder on romance novels.
Each passage from a novel is mapped to a skip-thought vector.
Conditions RNN on skip-thought vector & generate the encoded passage.
Train a visual-semantic embedding between COCO images and captions. Captions and images are mapped into a common vector space.
After training, embed new images and retrieve captions.

A selection of generated stories

Final thoughts

neural-storyteller gives us a fascinating glimpse into the future of storytelling. Even though these technologies are not fully mature yet, the art of storytelling is bound to change. In the near future, authors will be training custom models, combining styles across genres and generating text with images & sounds. Exploring this exiting new medium is rewarding!

Get in touch here: twitter.com/samim |http://samim.io
Sign up for the Newsletter for more experiments like this!