Movix.ai — movie recommendations with Deep Learning

Supervise.ly
Supervisely
Published in
11 min readMay 2, 2017

“What movie should i watch this evening?” — have you ever had to answer this question at least once when you came home from work? As for us — yes, and more than once. Here we will say a few words about what we’ve been working on for the past six months: an interactive movie recommender system Movix.ai. The system is based on Deep Learning and it adapts to the user preferences in real time. As big movie fans we felt the need for such a service, and we believe that it will be useful for every movie lover.

Movie recommender. How the Idea was born

At Deep Systems we are engaged in creating solutions and products based on machine learning and Deep Learning. Among our projects: developing a “mind” for self-driving car prototype and automatic defects detection for roads and airport runways. The important part of our work are recommender systems. The strong desire to create our version of recommender system has long prevented us from sleeping peacefully.

The reason of writing this post is to share our service with the World, to get feedback about our work, vision and, at the same time, to share experiences that may be of interest to deep learning practitioners, and other people.

From Big Dreams to actual product

Here is the phrase from J.Schmidhuber*, which we can not get rid of:

Google of the future together with all its services is just a single giant LSTM.

Here we mean that there is one large neural network that interacts with the user and solves a variety of his tasks.

*not completely sure that Jurgen actually said this but deep learning researchers are aware now of how dangerous it is not to cite him :-)

The idea seems too ambitious, perhaps utopian. We tried to “land” this idea and find the domain where a single large neural network can solve all the tasks for the user. So the idea was born to build a movie recommender system, which will interact with the user in a smart way, and to model the interaction in “end to end” manner with deep lstm like network.

Today there is a huge hype around chatbots. As for the academic community, at the end of the day, it’s all about passing the Turing test. For the large companies operational cost optimization is a concern, so guys from tech support should keep a weather eye open. All jokes aside. In many cases, typing text as a way to talk to the computer may be inconvenient and the “language of clicks” is more appropriate.

Many recommender systems are built on the concept of similar items — that is, for each movie there is a predefined set of movies similar to it. This does not take into account the preferences of a particular user. As a consequence, the user is forced to explore the static content and have no tool to tell the system about his preferences. This is not interactive approach. One the other hand, we believe that interactivity is a “must have” component of a good recommender system.

Our concept is the following: no registration is required, the user visits the site, makes a few clicks on movies or tags and receives recommendations reflecting his current mood and preferences. There are two entities which the system predicts: the movies and tags. The movie is the ultimate goal, i.e. the user is here because he wants to find a movie to watch, whereas tags are additional user interaction tool allowing the user to feed his current preferences faster into the system.

Why Deep Learning?

There is one reasonable question to ask: “Why using neural networks after all? Collaborative filtering approaches exist for many years, well understood and work fine.”

We will answer step by step. When talking about collaborative filtering, we should clearly distinguish the following two tasks: (1) rating prediction and (2) top N recommendations.

The task of rating prediction is much more popularized and, as a consequence, tons of papers and open source libraries are there. However, speaking about top N recommendation task, the situation is quite the opposite. The reason — Netflix Challenge (2006–2009) with a prize fund of $ 1 million, where the participants were asked to predict “how user u will rate the movie m”.

However, in most business applications, it is required to give top N recommendations. Typical cases are: based on historical data for a particular user, show 10 items that he is most likely would buy or, as in our case, show 10 films that he most likely wants to see.

Without a doubt, the task (2) can be reduced to the task (1) in a following naive way: take the user, predict the ratings for all the movies from our catalog, sort the movies in descending order by the predicted ratings, then take the top 10 movies and recommend them. It sounds like a good idea, but there is one problem — it does not work (we were the ones who did this mistake). So when you look at the recommendations you do not like it (metrics also reflect inner feelings)!

We are done with rating prediction problem, let’s return to the top N recommendations. Again, there are two ways of solving the problem: (a) Matrix Factorization, (b) Nearest neighbours approach.

Matrix Factorization methods have the following drawback — they are not interactive, in the sense that if a user has rated to a movie, then to update the recommendations for him, you need to re-do the factorization procedure. Since we want to recommend “on the fly”, for us it is unacceptable.

Nearest neighbours are interactive. Pedro Domingos classifies this approach to “Lazy machine learning methods”, as training procedure is equal to saving new training examples in the database. So, in terms of computational costs, training is free and all the work is done during inference stage. But when it comes to a metric, the best one can do is to rely on some sort of heuristics. In case we want to go beyond movies and work also with other entities, like tags, the metric issue is even bigger.

We are not saying that standard approaches are bad, we are just pointing out the evident advantage of deep learning approach for this task: just feed all the available data to the deep model and formulate training objective that is somehow correlated with quality of user experience. If one does it in a right way, all we have to do is to wait until the training procedure is converged to some local minimum.

Top N recommendation problem in terms of Deep Learning

The Deep Learning revolution first came to the area of ​​speech recognition, then to computer vision, and, after that, to natural language processing (NLP). Many NLP tasks are reduced to answering the question: what is the probability distribution for the next word, if we know N previous words? Or simply — to predict the next word in the sentence (in the text).

Today, in most NLP tasks, large recurrent neural networks (LSTMs) dominate other approaches, i.e. neural networks are pretty good at predicting the next word in a sequence.

Many recommender systems are built on the concept of similar items — that is, for each movie there is a predefined set of movies similar to it. This does not take into account the preferences of a particular user. As a consequence, the user is forced to explore the static content and have no tool to tell the system about his preferences. This is not interactive approach. One the other hand, we believe that interactivity is a “must have” component of a good recommender system.

The point is that the database of user ratings can be represented as one very long text. This text will consist of sentences, and each sentence is a list of movie IDs that a particular user liked.

Consider a very simple example:

“100 200 123/0 100 10 300/0 1 2 3 4 5/0”

We can see that

  • There are 3 users in our database
  • The first user likes movies with the identifiers: 100 200 123
  • The second: 100 10 300
  • The third: 1 2 3 4 5
  • “/0” — special symbol for separating different users

User IDs are not important, only movie IDs (and its relative order) are important

Afterwards, in theory, one can take the state of the art NLP model and train it to predict the next identifier in our “text”, which during the serve phase will represent the actual recommendation.

More than a year ago, we took MovieLens dataset, torch7 based NLP project, and done the above procedure to obtain our first movie recommender prototype.

But we wanted more, both in terms of the quality of the recommendations, and in the way we utilize Deep Learning techniques for our task.

Movies plus tags, putting it all together

The hypothesis is that by allowing the user to operate both movies and tags, we speed up his way to a list of relevant movies reflecting his current mood.

The task of constructing a neural network architecture, capable of working with the two mentioned entities, arises. See figure 1.

Fig.1. The neural network architecture which recommends movies to the user who has chosen (liked) three movies — Avatar, District 9, I’m Legend and two tags — Dystopia, Police. The “Emb” block stands for Embedding, “Avg” — Average, “FC” — Fully Connected.

With each movie that the user has liked, a fixed, predefined set of tags is associated. For both movies and tags embedding takes place that is just the mapping from movie and tag identifiers to the fixed size vectors. For tags, the vectors obtained as a result of embedding, are averaged. So, for each movie that the user liked, the LSTM cell takes as input the concatenation of the following vectors:

  1. Movie embedding vector
  2. Average of the following vectors:
  • Tags embedding vectors associated with the current movie
  • Tags embedding vectors associated with the next movie in a sequence (in figure 1, we call these vectors “tags of future movie in a sequence”)

The output of a 2-layer LSTM (output vector of the upper right LSTM cell) goes to two separated fully connected layers (FC). Then softmax layers allow to estimate the “like” probability for each move and tag in the database. Top N movies and tags are shown to user.

Let’s say a few more words about tags. In terms of recommendations quality, tags may be useful even if we do not directly predict them. They give the model more information that some movies are similar to each other. For example, consider two movies in a case when there is no user in the database that liked both of them. The fact that these movies may have a lot of common tags gives an opportunity for the system to figure out that the movies are, indeed, similar. The type of tags we have just talked about is called “tags associated with movies” (figure 1).

Another scenario is to allow the user to like tags along with the movies ( “tags chosen by user” in figure 1). It is important that we can simulate this scenario at the training stage. Initially, the neural network predicts the next movie that the user likes based on previously “liked” movies, but we know the tags for the next movie in a sequence. Therefore, a significant amount of the training time, we can force the model to solve the following problem: knowing the movie history for a user and some set of tags associated with the next movie in sequence, guess what exactly the next movie is (it would be convenient to formulate the last statement in terms of conditional probability, but in this post we decided to do without formulas, if there is interest, we will write a more technical post). Also note, that in the scenario under consideration there may be no “liked” movies at all — the user, for example, choose a group of diverse tags and still receives recommendations.

Technology stack and training data

Our deep model is LSTM based neural network that is built using the TensorFlow framework.

To create a training data, we’ve used the MovieLens dataset, where we took user’s movie preferences. We parsed IMDB and used the Movie DB API to form tags database.

The API interacts with TensorFlow through ZeroMQ, and Elastic Search acts as a storage for information retrieval about the movies.

The frontend is made using Vue.js and Element UI.

Movix.ai features

Movix allows you to perform the following actions:

  • Like and dislike both movies and tags;
  • Reorder “liked” or “disliked” movies (the more to the right, the more important);
  • Filter recommendations by rating and release date;
  • View trailer and stills from the movie;
  • Save your favorite movies.

The features mentioned above, allow you to interact with the system in a flexible way. For example, choosing a few old favorite movies, and then, turning on the filter “2010s”, allows you to discover the most recent movies that are similar to the ones you have chosen (“liked”). The same logic works in opposite direction — to discover the old movies.

Improvements to make: deep learning aspects

Let’s say a few words about the improvements to make the system more interactive and intelligent. We will focus more on Deep Learning aspects, rather than possible features and GUI:

  • Negative feedback or dislike button. For now, implementation of this function is a little hack outside the neural network. Of course, negative feedback must be modeled in the training stage. Our current attempts to do so did not lead to the results we expected. Often, if some movie strongly recommended by the model, disliking this movie does not make the model change its mind, neural net still recommends this movie. The issue has a lot to do with neural network structure, objective function and the way training data is generated.
  • Shorter path to relevant movies. Now, in some cases you need to like more than one movie to get something relevant. And the key question is: what our model is trained to do? Currently, the model is trained to guess the next movie in a sequence for a given user. So if a user liked only one movie (i.e. we have very little information to determine his preferences) it’s not an unreasonable strategy, at least from a human perspective, to guess that the next movie is a blockbuster.
  • More Deep Learning techniques. There are a number of deep learning techniques that are keys to obtain state of the art performance in various tasks — attention, beam search, bi-directional RNNs, and others. We need to figure out how to utilize the state of the art approaches in order to make the system smarter.
  • Our dreams in term of deep learning. We would really love to build a truly intelligent neural network within well specified domain such as movie recommendations. For example, a neural network could ask user questions so that he could quickly find a movie to watch. After all, the ability to ask the right questions is a good sign of intelligence. For example, such questions might be: “Are you interested in movies until the year X or after?”, “Do you like the actor / director Y”, etc. The good news is that the training datasets (MovieLens + what is possible to parse from the Web) have enough information to train the system to ask these kinds of questions and use the answers to recommend better. Bad — these questions are too trivial, we want something beyond it. We’ll try to formulate: We would like to train the neural network to maximize the “quality of user experience”. Any ideas? Let us know!

Conclusions

It was a pleasure to work on the service that may be useful to so many people. We love the service we managed to build and regularly use it to discover new movies to watch.

Any feedback, comments, ideas and suggestions are very much appreciated!

We hope, you’ll discover your movie!

Press about us: Movix uses artificial intelligence to hit you with the best movie suggestions

--

--