Dialogue systems, or chatbots, are often divided in two categories: goal-oriented dialogue systems, and chit-chat dialogue systems. Goal-oriented chatbots have a specific task to accomplish, e.g. helping the user book a hotel or get a train schedule. Chit-chat bots focus on keeping the user entertained and engaged in the conversation. Recommendation dialogues, on the other hand, typically involve both types of tasks.
In order to give a relevant recommendation, it’s usually necessary to get to know a person better and understand their tastes — what they already like from that category. That conversation needs to be engaging, while keeping a clearly defined goal.
In a collaborative project with Polytechnique Montréal, Microsoft Research Montréal, Element AI, MILA and HEC Montréal, we created the ReDial dataset, the first large-scale dataset of real-world dialogues for movie recommendations. We also propose a novel architecture combining dialogue and recommendations, setting the first benchmark on the ReDial dataset.
ReDial consists of more than 11,000 dialogues centered around movie recommendations. The dataset was collected using Amazon Mechanical Turk. Users were paired up, with one asking for and the other providing movie recommendations. Participants were also asked to tag all movies that they mentioned by using the “@” character. Finally, participants filled out a questionnaire about each movie mentioned in the conversation.
The proposed architecture, shown in the figure below, combines a denoising auto-encoder recommender system with a Hierarchical Recurrent Encoder-Decoder, which is a well-established dialogue model. To populate an input to the recommender system, we use a sentiment analysis module. This module extracts whether a user enjoyed any of the movies mentioned thus far. Having some information on a user’s movie preferences, the recommender system predicts a vector of movie recommendations for that user. Finally, a gating mechanism in the decoder of HRED chooses whether the next generated word should be a regular word (chosen by the regular decoder), or a movie name (provided by the recommendation vector).
While 11,000 conversations are not enough to train a vanilla HRED without overfitting, this modular architecture allows one to pre-train the different sub-components and thus compensates for the small size of the dataset, compared to others that are composed of millions of dialogues. Our sentiment analysis module is pre-trained using the questionnaires filled in by participants, while the recommender system is pre-trained on the MovieLens dataset. We also make use of general purpose sentence representations in the first layer of the utterance encoder, giving the model some prior knowledge of language.
This model serves as a good baseline for future works on the ReDial dataset. It opens up the possibility to explore neural architectures on the task of conversational recommendations and will hopefully raise the community’s awareness on this interesting dialogue task.
This project was performed by Raymond Li (Polytechnique Montréal, now at Element AI), Samira Kahou (Microsoft Research Montréal), Hannes Schulz, Vincent Michalski (Mila, Université de Montréal), Laurent Charlin (HEC Montréal, Mila)and Christopher Pal (Element AI, Polytechnique Montréal, Mila).
This article is part of a series on Element AI papers presented in NeurIPS 2018. Click here for a full list of papers and our NeurIPS schedule. This blog is released alongside one from Microsoft, which can be found here.
Blog written by Raymond Li and edited by Rachel Samson, with visual design by Manon Gruaz.