Opening the black box: Matrix factorisation for content and consumer insight
Whichever media services you use for films and music, you will have encountered recommendations – suggestions as to what you should watch or listen to in the future.
Broadly, there are two main ways for a company to make these recommendations:
- Content-based filtering, which uses similarity between items (i.e. we recommend this film because it’s similar to another one you liked)
- Collaborative-based filtering, which uses similarity between users (i.e. we recommend this film because you’re similar to another user who liked it)
Collaborative filtering is a more flexible approach because it doesn’t rely on user or content metadata to generate recommendations. That’s why, in 2006, Netflix launched an open competition to find the best collaborative filtering algorithm – the optimum way to predict user preferences based on nothing but previous ratings.¹
One particularly influential solution was matrix factorisation. Deriving from the principles of linear algebra, matrix factorisation works by representing users and content in the same lower-dimensional latent space. Although Simon Funk proposed the original algorithm in 2006, several alternatives have since emerged.²
Typically, matrix factorisation is deployed through a ‘black box’ without regard for interpretation. Indeed, the ultimate goal is to improve user recommendations — not necessarily to understand why those recommendations are being made.
However, by examining the model outputs, you can gain rich insight into your content and consumers alike. To understand the kind of insight matrix factorisation facilitates, it’s worth taking a quick tour through how it works.
A short introduction to matrix factorisation
Imagine you have a data table where each row represents a user, each column represents a film, and each value represents some kind of user feedback. This is called a user-item matrix (A).
Feedback can either be explicit (e.g. where the user provides a numerical rating) or implicit (e.g. where the system infers interest from whether a user watched a film or not). For the sake of simplicity, we will only deal with explicit feedback here.
In this imagined (and highly contrived) example, it is obvious – despite the missing values – that Amy, Ben and Chloe like comedy films, while Daniel, Emily and Fred like horror films. In other words, comedy and horror are the two latent factors that explain the variance in the dataset.
The goal of matrix factorisation is to predict the missing values using the information described by the latent factors. To do this, we first need to decompose (or ‘factorise’) the user-item matrix into two matrices:
- A user matrix (U), which scores each user on each latent factor
- An item matrix (V), which scores each film on each latent factor
We do so in such a way that the product of these two matrices is a good approximation of the original matrix. In other words, we need to minimise the distance (in this case, the squared Frobenius distance) between the original user-matrix A and the estimated user-matrix UVᵀ.³
A common algorithm for minimising the objective function is Alternating Least Squares (ALS), which alternates between fixing U and and solving for V and vice versa. ALS accepts many parameters, but the crucial one is rank (k), which is equal to the number of latent factors in the model.⁴
In this case, we know that k = 2. The model will therefore return a 6×2 user matrix (one row for each user; one column for each latent factor) and a 2×6 item matrix (one row for each latent factor; one column for each item).
Looking at the item matrix, we can see that:
- Latent Factor #1 is positively associated with Airplane, Bridesmaids and Superbad, so it represents comedy
- Latent Factor #2 is positively associated with Halloween, Psycho and Scream, so it represents horror.
Looking at the user matrix, we can see that:
- Amy, Ben and Chloe are positively associated with Latent Factor #1, so they like comedy
- Daniel, Emily and Fred are associated with Latent Factor #2, so they like horror.
As in this example, one of the benefits of matrix factorisation is that it allows you to describe both your content and your users using the same set of latent factors. In other words, since the items and users share the same latent space, you can use the same language to describe both. This provides an insightful way to describe your content and why certain audiences enjoy it.
How does this work in a recommendation model? Well, if you take the dot product of those two matrices, you end up with an estimation of how each user would rate each film. In effect, this means that we have filled in the missing values of our original matrix.
You can reproduce the above example in R – including the graphics – by running the code published here.
How we use matrix factorisation at the BBC
At the BBC, we have used matrix factorisation to explore how cross-product recommendations across iPlayer and Sounds might deliver greater value to our audiences. Although the aim of matrix factorisation here and elsewhere is to improve user recommendations in the end product, we have also been using the outputs of the model – the user matrix and the item matrix – to gain insights into our audiences and content portfolio respectively.
One way to figure out what our own latent factors represent is to look at the top and bottom programmes associated with each of them. Let’s take Latent Factor #1 as an example. Most of the positively associated programmes are human-centred documentaries that deal with sobering issues such as murder and sexual abuse. By contrast, the negatively associated programmes are often comedies or panel shows. This latent factor therefore might be a measure of thematic gravity or seriousness.
Turning to Latent Factor #2, we see that the positively associated programmes (The Rap Game UK, Mayans M.C. and Breaking Fashion) generally do well with a younger, metropolitan audience, while the negatively associated programmes (The Repair Shop, Scarborough, Countryfile) seem more tied to an older, rural audience. This factor therefore appears to be a measure of metropolitan youth interest.
We now have two latent factors that seem to describe each programme on iPlayer by: 1) how serious it is; 2) how youth-skewing/metropolitan it is. By plotting these two latent factors against each other, we can visualise our portfolio in a way that opens up fresh insight about our programmes and how they relate to each other.
In the top right, we have programmes that score highly on both seriousness and metropolitan youth interest, such as Louis Theroux and Secrets of Sugar Baby Dating. In the bottom right, we have programmes that score highly on seriousness but low on metropolitan youth interest, such Les Miserables and Mrs Wilson. This implicit categorisation of our programmes makes intuitive sense and consolidates our belief in the validity of the latent factors.
Although matrix factorisation is a highly technical method, the process of understanding and naming our latent factors is inherently qualitative and requires domain knowledge. To fully understand the outputs of our recommendation model, it is therefore important for us to work with other teams across the BBC – particularly those who have a deep understanding of the semiotic dimensions of our programming.
Talking about matrix factorisation with non-technical stakeholders
Matrix factorisation is not what you would call ‘stakeholder friendly’. Both the procedure and its outputs can be difficult to understand, even to those who are schooled in its underlying mathematics. How then do we go about communicating the details of such a complicated method to a non-technical audience?
Firstly, we keep these details to a minimum. Stakeholders don’t need to know about how to decompose a matrix or the algorithmic alternatives for doing so. In fact, they don’t need to know anything about matrix factorisation itself. All they need to grasp is the concept of latent factors.
Latent factors are relatively easy to understand, since we speak about them in our everyday lives. Happiness, extraversion, conservatism – these are all familiar concepts that cannot be directly observed. Instead, we infer them from behaviour or the answers on a personality test – in other words, from observed variables that can be directly measured.
To communicate the latent factors that we derived, we produced a micro app using R Shiny that showed some of the graphics above. It allows stakeholders to see the top and bottom programmes associated with each factor, and to see what happens when you cross-reference two factors against each other. The app refers to latent factors in less technical parlance as ‘hidden variables’.
When showing these visualisations to our colleagues in creative planning, we found they had an intuitive understanding of what each latent factor represented. Through their contextual knowledge of our programming and audience ‘need states’, they were able to say what was threading certain bits of content together – for example its association with certain subcultures or wind-down viewing modes.
That said, the conversations were not without their challenges. Two common questions were:
- How can you derive content-level descriptions (e.g. ‘human-centred’) from audience-level data (i.e. what people do and do not watch)?
- Why are some programmes appearing as opposites when they do not seem so different?
The answer to the first question is that this method assumes there are aspects of the content that are driving differences in audience consumption. For example, the presence of a strong female lead is a content feature, but one that might affect who is drawn to the programme.
The answer to the second question is simpler. Even if two programmes are opposite on one dimension, they might be very similar across all other dimensions. For example, Fleabag and Killing Eve might be opposites in the sense that one is a comedy-drama and one is a spy-action thriller, but they are both written by Phoebe Waller-Bridge and share a similar sense of humour.
Despite its challenges, collaboration with the audience planners was a fruitful exercise that helped us to better understand what the latent factors were representing, and helped them to uncover patterns in the data that wouldn’t have been observable otherwise. Matrix factorisation may be esoteric, but its outputs are accessible enough to extend beyond the realm of data science.
¹ Netflix Prize (Wikipedia): https://en.wikipedia.org/wiki/Netflix_Prize
² Matrix factorization (Wikipedia): https://en.wikipedia.org/wiki/Matrix_factorization_(recommender_systems)
³ Matrix factorization (Google Developers): https://developers.google.com/machine-learning/recommendation/collaborative/matrix
⁴ Collaborative filtering (Apache Spark): https://spark.apache.org/docs/2.2.0/ml-collaborative-filtering.html