Improving a chatbot with relevant movie quotes

How ULMFiT helped us improve a chatbot

Nicolas Crausaz
Empathic Labs
3 min readMay 1, 2020

--

When a chatbot does not understand the user input (the intent), most of the time it will answer something along the lines of “Could you rephrase, I don’t understand”. This can be quite annoying.

During my semester project at HEIA-FR (engineering school of Fribourg, Switzerland), along with the folks of Deeplink (a Swiss start-up specialized in chatbots), HumanTech Institute and Empathic Labs, I did explore a possible rather fun way to replace this “Could you rephrase” answer from chatbots.

The idea is to use quotes from known movies and, with the help of machine learning, show the best quote to the user depending on its input when the bot comes out of its script.

Movie quotes preparation

The first task was to collect and sort data from famous quotes. For this purpose, two datasets were used.
https://www.cs.cornell.edu/~cristian/memorability.html
https://www.kaggle.com/rounakbanik/the-movies-dataset

Data preparation has been largely handled with the Pandas library. The goal was to create a dataset with famous quotes, the name of the actor telling it, name of its character in the movie and the name of the movie. The ready-to-use dataset looks like this.

Prepared Dataset

Another important part of the data preparation was to extract the sentence preceding the quote in the movie script. The reason is simply that the end algorithm will compare a user input to preceding quotes sentences to find the best match and then answer with the quote itself.

Algorithm

To begin with, all sentences preceding the quotations have been tokenized. Then, with the help of fast.ai ULMFiT NLP model , vectors of 400 dimensions for each sentences (preceding the quote) have been generated.

When the user sends a sentence to the bot, the algorithm will tokenize the sentence and compute the associated vector.

Then, the algorithm will try to find out which preceding sentences are the closest to this vector. Once the closest sentence is found, the algorithm returns the associated quote and background information about the quote origin.

Is it working?

That’s the big question about this project. Can this kind of response entertain the user while making him understand that the bot could not respond to his request?

That’s when we need you. The algorithm is available as a telegram bot:
@ps6-movie-dialog-bot-test

Try it and tell us what you think about it!

Have fun ! Nicolas

Demonstration video
Example of using the bot

Special thanks

Mrs Elena Mugellini
Mr Daniel Peppicelli
Mr Jacky Casas
Mr Karl Daher

HumanTech Institute
Deeplink
Empathic_Labs
HEAI-FR

--

--