Collaborative Filtering — Understanding embeddings in User Movie Ratings

Collaborative filtering is an important technique used in recommendation systems. It helps predict interests of a person based on comparing & collecting preferences from other persons who may have similar interests. User Movie Ratings is a well quoted example of collaborative filtering. Users are asked to rate movies between 1 to 5. Based on these ratings, the recommendation system can help predict the rating of a new combination of user and movie which it has not seen before. There are many resources available on different machine learning techniques that help solve the above problem.

In Lectures 4 & 5 of the FastAI Deep Learning course Part 1 v3 in November 2018, (link available only for registered course attendees as of now, to be available to all in January 2019); Jeremy Howard, FastAI Co-Founder explained the workings of Collaborative Filtering and the concepts of Entity Embeddings via an Excel Workbook as applied to User Movie ratings. He then proceeded to use the Pytorch based FastAI library to demonstrate impressive results using the same.

The Movielens website maintains a huge collection of such user movie ratings data. An extract of the Top 15 user movie ratings from the Movielens database done by Jeremy is shown in one of the worksheets of the above workbook below.

He then introduced the concept of Entity Embeddings as applied to Users & Movies.

What is Entity Embeddings ? An Entity is a noun. An Embedding is an instance of a mathematical structure contained within another instance. In Entity Embeddings, an instance of every entity (in this case a user and a movie) get represented as a vector of numbers. These numbers capture the latent characteristics of each entity.

In the User Movie ratings example, for each user id, a vector of 5 numbers was introduced as the user embedding vector for each user. The same was also done for movie id where a movie embedding vector of 5 numbers was introduced.These are shown in yellow background in the diagram below.

You can ask a question here as to why should the user & movie embedding vector size be restricted to 5 only ? Well, there are different ways to determine the size of the entity embedding vector. This can be covered in a future article.

On one of the FastAI machine learning study groups, someone asked me a question as to what do these numbers represent ? What do they stand for with respect to users & movies ?

My response — These numbers are supposed to represent the different latent factors of that entity. So, a user embedding vector will have numbers, each of which represents different aspect of the user. Similarly, a movie embedding vector will capture different aspects of a movie. This is the real world intuition behind entity embeddings. Let us look at the same more in detail.

Let us assume that 5 factors are important for user rating of movies:

f1 — Age of movie

f2 — Degree of Action

f3 — Quality of Dialogues

f4 — Degree of Romance

f5 — Extent of Comedy

Based on the above factors f1 to f5, any movie can have the corresponding movie attributes (we designate them as m1 to m5). Similarly, user preferences can also be categorised (u1 to u5) based on the five factors f1 to f5. These factors and the corresponding movie attributes & user preferences information is illustrated in Table 1 below :

These have been indicated in the Collaborative Filtering Excel worksheet below.

Each of the above five movie attributes m1 to m5 in the Movie Embedding Matrix and the five user preferences u1 to u5 in the User Preferences Embedding Matrix are each given a random numeric value to start with. This is shown below.

Now, for the first factor f1 — Movie age; the assumption is that if a movie is recent (indicated by value of m1) and if a user likes recent movies (indicated by value of u1), then the portion of user rating corresponding to the age of the movie (f1) will be denoted by m1 × u1

Similarly, for all the movie factors f1 to f5, the corresponding contribution to user rating will be as below in Table 2.

Hence, it is possible to theorize the entire movie rating (with all its factors) into a mathematical formula as:

User Movie rating = *m1* × *u1 *+ *m2* × *u2 *+ *m3* × *u3 *+ *m4* × *u4 *+ *m5* × *u5*

This is the dot product of the individual movie attribute and the individual user preference vectors. (The individual user & movie bias factors have not been mentioned by me but have been covered in the Excel Workbook and Jeremy Howard’s class in quite some detail.)

The above User Movies Ratings example of collaborative filtering can be extended to other scenarios in a generic fashion. Because the example is in Excel, it can easily be understood even by non technical people easily.

In summary, the understanding of the real world intuition of embeddings is important to solve collaborative filtering problems in recommendation systems.