Let’s Read A Story: talking to children’s books using semantic similarity

ml5.js
ml5.js
May 17, 2019 · 13 min read

Written by Itay Niv. Edited by Ashley Jane Lewis.

Image for post
Image for post
Using ml5 SketchRNN as a homage to endpapers in bookbinding tradition

The Heart of Reading Together

It comes as no surprise that the experience of using this project is just as contemplative and considered as the creator himself. Let’s Read a Story is the beautiful outcome of what happens when you build from the heart, working towards something special for the ones that you love. This agile system will help parents and kids cuddle up, sprinkling creativity into the quiet book nooks, the bedtime stories, the read-on-go moments and everything in between.

As an undergrad, Itay studied graphic design while working as a motion vfx artist and exploring interactivity and physical computing. Before coming to ITP, he worked for several years as a UX and UI designer. In this masters degree he learned how to code robustly and endeavoured into machine learning for the first time shortly before building Let’s Read a Story. Itay’s work is impressive as a stand alone piece but also a motivator to those who have a big idea before they have the skills to execute it.

It has been a pleasure to help document and share this wonderful thesis project with our community. Follow Itay Niv directly to see where Let’s Read a Story goes next! You can find his website here and follow him on Twitter here.

Ashley Jane Lewis, 1st year ITP student and ml5 Community Manager Research Assistant

Let’s Read A Story is a speculative exploration on how computers and technology can turn story time into a conversation between parents, children and a computer. The project takes the corpus of Aesop fables and investigates the possibility of exploring the connections between different characters and ideas from the original fables in a new and fun way using recently available machine learning language models.

You can find a working demo here (best performance on chrome desktop).

In the following post I will try to describe the thought process and some of the technical aspects I encountered while designing and building the project.

👶 Some Background

So when the moment came to think about what I wanted to work on for the thesis project, the arrival of my child was a great inspiration on my thought process. I understood quite fast that I wanted to build something with technology that would allow me to connect with him. As my partner and I started to read texts about child development and parenthood, I came across this quote by the poet Friedrich Schiller:

“Deeper meaning resides in the fairy tales told to me in my childhood than in the truth that is taught by life.”

Reading this quote reminded me of my childhood and the memories of children books my parents and brothers had read to me, before there were any screens in our lives. The simple pleasure of browsing through pages of a children’s book, the smells and textures, the colors and sounds that lived once when we were kids. No different than other children, as a child, books and movies were my entire world. From them I learned what friendship meant, the values of family, love and at a later stage it is from animated movies which from I learned english.

Image for post
Image for post

So I got to think of what has changed in the last 30 years since I was a child? How will my child explore the stories, sounds and textures of fairy tales and other children’s narratives?

In recent years there has been a rise in popularity of smart devices and speakers using advanced NLP technologies. Next year, on average, each American household will have a smart device like Apple Homepod, amazon Alexa or Google Home. With applications Intended for preschoolers available on various of these popular platforms, children are talking to technology and technology is starting to talk back.

Image for post
Image for post
Talking to Alexa

Does that mean we can start talking to books? This question echoed in my mind for days and was the main drive in the research of this idea for my thesis project.

🏗️ The Narrative Structure

Image for post
Image for post
Gustav Freytag’s Pyramid

For children, a story is an interactive experience — as a story progresses and develops, children ask questions. This is a great learning activity, kids learn to associate images in the book with the story and this develops their visualization capacity and imagination. They learn how to read and think.

For me, a good children’s book is much more than just text. It’s an immersive experience that includes(but is not limited to) textures, illustrations and music that accompanies it. One of my favorite books growing up was peter and the wolf by Sergei Prokofiev — that book had an orchestral soundtrack, a symphony — each character in the book had its own theme music played by a different instrument. The illustrations were amazing too!

Image for post
Image for post
Peter and the Wolf, Sergei Prokofiev — unknown publication

For my son, as we move away from printed matter into digital storytelling for children, my hopes are that we would seize the opportunity inherent in new technologies to somehow, not only preserve but enhance these unique experiences.

Peter and the Wolf, Sergei Prokofiev

📚 Finding a Suitable Dataset

Since the main part of this project is the text generation, it was clear to me that the avenues which I wanted to research and explore were first and foremost language models, and I had to choose a dataset to analyze.

After experimenting with a few canonic corpuses, I chose to focus and analyze Aesop Fables. I was drawn to the Aesop Fables text because of its concise, yet rich story lines, the use of animal archetypes as metaphors, and the strong morals embedded in each story.

Image for post
Image for post
Aesop Fables

Each original Aesop Fable contains:

  1. A short title, usually very descriptive of the story’s content and characters.
  2. The story itself, usually no more than 30 sentences.
  3. The moral of the story, usually contains a metaphor built on the inherent nature or trait of the animals in the story.

⚙️ Analyzing the Data

While embedding a sentence or a paragraph, along with words the context of the whole sentence needs to be captured in that vector, this makes the embedding process a little more tricky. This is where the Universal Sentence Encoder comes into the picture. The publicly available pre-trained model enables users to encode corpuses of text (sentences as well as whole paragraphs!) into high dimensional vectors that can be used for text classification, semantic similarity, clustering, and other natural language tasks.

For Let’s Read A Story I used the publicly available tensorflow.js model of universal Sentence Encoder and implemented it to work on a node application. After compiling a JSON file that holds all the stories broken down to individual sentences, their titles, characters, and animals.

Image for post
Image for post

The node application analyzed all sentences derived from the fables (~1500 sentences) and created another file that holds all the embeddings. This yields an ‘embedding map’ containing sentence embeddings for each sentence in a high dimensional space, we can then visualize it in 2 dimensions like this:

Image for post
Image for post
A live version of the map is available here

In this interactive map we can see that similar semantic meaning sentences are clustered next to each other.

💡 First Results

This was the result I got:

Image for post
Image for post
First test output (10 lines from a random seed)

This was exciting to me as this was a new piece of content that made sense and had some sort of narrative structure to it.

I continued experimenting with this technique. After talking to Allison Parrish about this technique, an idea popped up to try and generate content based on an original story’s progression. To do this I started exploring different ways in which I can visualize Aesop fables on the embeddings map. Using tensorflow.js projector tool, If we visualize the story “The Swallow and the Crow” on the map, this is how it looks like this:

Image for post
Image for post
“The Swallow and the Crow” → red line signifies story progression

If we can visualize a story and see its progression through vector space, what if we replace the the original sentences with the nearest semantic neighbor of each sentence in the story; this might give us a version of that story while keeping the story’s structure.

Image for post
Image for post
Variation on “The Swallow and the Crow” → purple line signifies similar story progression

*** Hypothetically, if we use a larger dataset of sentences, or augment the current one, the output would appear less nonsensical.

🛠️ Building the Application

Based on the concept of retrieving similar stories I began building the application which the main focus was to try and simulate a conversation. As speech is the most natural user interface for kids and allows minimum friction between action and result, I decided to utilize the web speech api. Readers can input content using their voice.

Image for post
Image for post

Using Universal Sentence Encoder in the backend with tensorflow.js-node, the captured speech is turned into a vector.

Using ml5's Word2Vec modified Sentence2Vec class (see code snippet here) the script iterates through all the sentences embedded beforehand by the Universal Sentence Encoder, i.e the ‘embedding map’. This gives back the sentence which is most similar in its semantic meaning to the reader input. Once I obtain the most similar sentence from the dataset i serve it to the reader as the first line of the new story. Also, the story algorithm takes the similar sentence and finds out the origin story of that sentence. The story algorithm generates a similar story based on that original story, derived from the reader’s prompt, and that is the seed structure that each story begins with.

As the story unfolds itself, readers can intervene and change the direction of the story by inputting their idea on how the story should continue. The story algorithm knows to fetch relevant content based on similar plot lines form the dataset. Every time the reader inputs text the story the algorithm pivots the narrative arc and matches a new narrative arc based on the new readers input.

To enrich the stories, I chose to use a recurrent neural network trained on the quick, draw! dataset. The story algorithm knows how to identify different elements in the text and using the ml5 library and p5.js, draws them to the page as the sentence is being added.

There is a simple RegEx search on the resulting sentences. The story algorithm functionality determines which animal, character or object appears in the generated story and then reconstructs an illustration from the trained sketch RNN model using p5.js. If the sentence contains an animal that does not exist in the model, there is another functionality that ‘enriches’ the model’s keywords and matches similar animals, objects and character specified in the story, for example a dog class will generate foxes and wolves.

Using AFINN-based sentiment analysis library the story algorithm analyzes each sentence to determine whether it has a positive or negative sentiment. Positive sentiments get a major scale melody and negative sentiments get a minor scale accordingly. Each animal gets a different musical instrument as well, according on its characteristics.

After user testing the application on dozens of children, I noticed the growing desire for kids to engage with the screen and story medium. As a solution to this I turned the entire screen space into a drawing canvas — in this manner, a child can add drawings to the story and be even more engaged.

Image for post
Image for post

In the end of each generated story, the story algorithm embeds the new generated story using Universal Sentence Encoder. By preprocessing all the stories through the Universal Sentence Encoder and compiling a vector embedding for each fable, the story algorithm is able to fetch a moral from the dataset that best fits, based on the semantic similarity to original stories.

👨‍👩‍👦‍👦 User Testing

Image for post
Image for post

Through testing with children I learned that my early assumptions about content creation and interaction had to change with different stages of the development.

A strong example to this was the reaction I saw from kids before and after enabling drawing capabilities in the experience. I hope to iterate on this in the next versions. I believe drawing could be a great force for creation, connecting kids drawings as input to the experience would be an extremely interesting experiment.

Image for post
Image for post

🏁 Final Thoughts

As I tested and documented Let’s Read A Story I noticed something magical in the interaction between the child and the parent, something that transforms ordinary story time into a creative adventure. My son, Carmi, is 3 months now, and I can’t help but anticipate using it with him.

Image for post
Image for post

This is not the end of my experiment and there’s much more to learn and explore. But I did reach one firm conclusion that i’ll take with me through the next parts of the project; There is a human soul at the core of every story, that’s irreplaceable. And I must always remember to create tools that would help people tell their story.

Let’s Read A Story was completed with the help and guidance of Nancy Hechinger, Daniel Shiffman, Allison Parrish, Gene Kogan and friends from the ITP + ml5 communities, spring 2019.

ml5js

Friendly Machine Learning for the Web.

ml5.js

Written by

ml5.js

Friendly Machine Learning for the Web. ml5.js aims to make machine learning approachable for a broad audience of artists, creative coders, and students.

ml5js

ml5js

Friendly Machine Learning for the Web. ml5.js aims to make machine learning approachable for a broad audience of artists, creative coders, and students.

ml5.js

Written by

ml5.js

Friendly Machine Learning for the Web. ml5.js aims to make machine learning approachable for a broad audience of artists, creative coders, and students.

ml5js

ml5js

Friendly Machine Learning for the Web. ml5.js aims to make machine learning approachable for a broad audience of artists, creative coders, and students.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store