Recommending Articles in Real-Time at News UK

6 min readSep 21, 2017

Recommending news is not like recommending items for sale at Amazon. Amazon has a fairly static set of products, whereas the news is changing all the time. This is the problem we have generating personalised recommendations for The Times and The Sun at News UK.

Because the news changes constantly, we need to be able to generate recommendations quickly.

This week we published how we’ve built our system in a paper, Algorithms and Architecture for Real-time Recommendations at News UK, written together with my colleagues Dion Bailey, Tom Pajak and Carlos Rodriguez. It is available as a preprint on arXiv, and will be presented at SGAI 2017 in December.

Finding the best algorithm

When I arrived at News UK to help out with this problem, I thought I knew what I was doing. After all, I had done news recommendations before, working for a startup called Lumi. However the scale and speed needed at News UK matched nothing I’d seen before.

In addition, we quickly found that the algorithms that worked best for a small number of users were different to those that worked best with hundreds of thousands of users. With a small number of users, a content-based system worked best. This system builds a model for users based on the words in the articles a user has read. The features we used in our system are outlined in red below:

In contrast, a collaborative system generates recommendations based on user behaviour. Two users who have read the same articles in the past are more likely to read similar things in the future. It turns out that at the scale of The Times and The Sun, collaborative systems work much better than content-based ones. This is counter-intuitive because there is so much information in the content of an article — it is hard to ignore this information, especially for a company whose lifeblood is content.

But the results of our offline evaluation were clear: you could make a content-based system that worked as well as the baseline (just recommending popular items), but the collaborative system knocked it out of the ballpark.

We implemented both the content and the collaborative system in production because we wanted to check that the results of the offline experiments could be replicated in online experiments (results of this are still to come).

Recommending in Real-time

Recommending using a content-based system in real-time is not too hard: you can build a model for users offline and store it in a database. As new articles are created, you can extract features and store the resulting vector in a database. When recommendations are requested, all you need to do is pass the vectors for the current set of articles to the user’s model, and rank them by the resulting score.

But you can’t do this with a collaborative system, at least, not one based on the state-of-the-art matrix factorisation methods, because the factorisation is a one-off computation, and it is not clear how to update it when there is new information.

Well, you couldn’t before; we found a way to do it. Actually, the idea is very simple. Matrix factorisation works like the diagram below. We have a row for each user and a column for each article. We put a “1” in a cell if the user has read the article, otherwise we put a “0”. This matrix is the input to the algorithm. The output is two new matrices, one for the articles and one for the users. The product of these matrices approximates the original matrix. [Note: we got the matrix factors the wrong way round in the paper!]

What we observed is that you can take away (or add) a row for a user, and all you have to do is remove (or add) the corresponding row from the output user matrix. You can do the same with articles (but with columns). This means we can easily handle new users or articles.

The matrix factorisation algorithm we use is an iterative one called Alternating Least Squares. We start off by initialising the factor matrices randomly, and iterate until we get close enough to the result. You can add a new row (or column) to the source matrix, and add a corresponding randomly initialised row (or column) to the factor matrix. We then perform some more iterations, and the user’s vector will converge, making the new matrix a factor again.

We took this observation one step further. We don’t need to use the whole matrix when updating with some new data. As long as there is enough to learn about how users interact, we can just use a subset of the users. We looked at how many users you needed in each batch, for our data set:

It is clear that above around 10,000 users in a batch, there is no discernible benefit to adding more users. This is great, since it means we can update the model much faster than if we had to use all the data, and we don’t have to worry about it all fitting in memory.

Training in Parallel

Updating the model incrementally is a huge win for us, but it would be even better if we could do it in parallel. Well, it turns out we can! Simply parallelising the batch updates seems to work fine, as long as we don’t run too many batches in parallel.

Batches will often have overlapping sets of users and articles, and so one batch will frequently overwrite the output of another batch. This is why things fall apart when the level of parallelism gets too high. But as long as the level of parallelism is reasonable, the accuracy is not harmed, and we can update the model even faster.

Complexity

Keeping the model up-to-date in this way comes at a cost. The algorithm we’re using to perform the matrix factorisation is O(n), where n is the number of data points (user-article interactions). Since in general we have to update the matrix with a user’s data multiple times, the complexity over training the whole data set becomes O(n²). In practice, we limit the articles in the training set to the last six months, effectively applying the same trick we performed with the users to the articles. This allows us to recover O(n) complexity.

Conclusion

The challenges involved in recommending news are unique, but the algorithm we have developed has broad applicability. The ability to incrementally update a model is useful in many contexts. We have shown that this is possible, and explored the limits of its applicability on our data. We’d love to hear your view on what we’ve done.