How we built a recommendation system that helps readers discover 38 million interesting articles per month

Taringa!
Taringa!
Published in
8 min readApr 6, 2016

We deliver recommendations to our audience from all over the world 180MM times per month, or in others terms, 70 times per second. 21% of those times (38MM) readers click on a recommendation. In this article I’ll walk you through the situation we faced when we started this project and how we did it.

Around 96.5% of our readers didn’t have an account on our website, so we knew little about their interests. Since 2013, with the advent of SSL for everything (a Good Thing®), Google Search stopped sending a very useful bit of information: what was the visitor searching for when they where led to this article?. Basically, all we had was the article they just landed in, and maybe (if we where lucky), he had recently read other articles from the same device. For the other 3.5% of the audience it couldn’t be more different: they had an account on our website and they read -on average- 40 articles per month. Our recommendations had to work well for both cases.

We don’t only deliver recommendations as a widget on the side of an article, we also show them on our websites’ homepage, and we send them through an email newsletter every day.

Overall, recommendations in our articles provide 26.7% (green slice labeled ‘taringa_post’) or approximately 38 million article reads every month in this way:

In our desktop website, this means 13 million or 13.9% (green slice labeled ‘taringa_post’) of all article reads:

More interesting: recommendations in articles explain 45,2% (blue slice labeled taringa_mobile_post) or approximately 25 million article reads in our mobile website.

During 2015 we built most of the recommendation engine and its four main products: recommendations for anonymous users, recommendations for active users, recommendations for the homepage and email recommendations. In a very creative moment of inspiration, we decided to nickname this project “Discovery”.

So how did we build Discovery?

Metrics

First of all, we needed to create a system to collect and store usage data, both to measure and compare performance of different recommendations and to feed recommendations engines. For this purpose, we extended our metrics system to measure the internal source of any page view in our website. If a user is in the homepage coming from an article, we know it, if they read an article from search or from another article, we log that too. This information is pushed to a message queue powered by RabbitMQ. The benefit of pushing this kind of information to a message queue is that we can create different programs that use this data easily. In message queue slang, each program is a consumer, and receives an exact copy of each message with no extra effort required by our programmers.

A/B Testing

Once we had this messages sent, we started feeding this messages to an ELK instance. ELK is a system that allows to log, search and visualize messages. With ELK we were able to create a visual dashboard showing how many people read an article from each source. Once we had this running, we added an extra piece of information to all messages: a Test Track label. So if we wanted to compare two versions of a recommendation widget, we sent the Test Track name to each tracking message and we could see the performance of both widgets in real time from the ELK instance.

Aggregate data and collaborative filtering

Our first recommendation engine aggregated data of article reads from all users using an algorithm known as item per item collaborative filtering. This is the same algorithm used by Amazon when they say “People who bought this also bought this”. Our system logs every article read and a unique identifier of that navigation session. When a second article is read, a counter increments for the combination Article A + Article B. Whenever someone reads article A, we can say that people who read Article A, also read article B. This system is quite straightforward and also uses a limited amount of storage, in the worst case a square of the total amount of articles available, in our case, approximately 15 MM articles.

Another collaborative filtering approach is known as user similarity. In this approach, each time two or more articles are read by the same users they are categorized as a “reading pattern”. Clusters of this reading patterns are detected and scored by their frequency. When someone reads article A, the most frequent pattern that includes it is used as a source of other potentially interesting articles. Once someone reads two articles this pattern detection begins to become more and more precise. This helps detect people that might be “looking for the biography of Lionel Messi”, which is typically satisfied by three different articles that cover the most popular aspects of the soccer player.

During the implementation of both algorithms we found that limiting the time allowed for patterns to emerge changed results dramatically, with shorter time spans having better results but also recommending more recent content, while longer time spans had lower results but allowed older but high quality articles to be recommended. To clarify, by time span I’m saying that we only process activity from 2–3 days, and any article reads outside of that time span are not considered for cluster detection.

Once we are set up, the item per item collaborative filter engine just needs the article a user is currently reading to recommend dozens of others, and the user similarity needs the recent reads by the user as input. As we tune the time and cluster detection algorithms, we send recommendations to different AB Testing Tracks to detect the best combinations.

In the end, we found that combining both long, short, user similarity and item per item recommendations provided the best results. So we created a small recommendation engine aggregator, that allows to set % of recommendations per source.

Example configuration of a recommendation engine mix.

Similar Content

Before this project, Taringa! already had content recommendations powered by Sphinx, and they where all based in a simple similarity system. Users tagged their articles when they uploaded them, and this tags where used to search for content in the database. Those search results where displayed as a recommendation. This worked in some cases and results where not so bad to be trashed. There was a significant number of cases where users didn’t tag their content properly and recommendations for that article where really bad.

To keep similarity recommendations but avoid the pitfalls from the previous implementation, we used and approach called TF-IDF this approach involves reading every term found in every article in a collection and counting how many times it is repeated in each article to see how “important” it is to this article and how many times it is repeated across the collection of articles to see how “normal” it is. If an important term in an article is also particular to that article and seldom found across the collection, TF-IDF determines that is a significant term that described the article´s content. By using this algorithm we can extract the most significant terms in millions of articles and find similar articles that share many of those terms. This proved to be more reliable than user tagging.

Individual navigation history and personal recommendations

Once aggregate recommendations where in production, we started working in the most personalized recommendations. In this case we would start storing each article read by each user and based on this information, find other articles that might be interesting to them. We logged article reads, scores (when a user rates an article), shares, comments, and added to favorites. Our plan was to use this data to recommend content using user collaborative filtering.

Our first results weren’t very good, and we found a big culprit. From the Active Users that read articles every month, only a smaller share had significant participation activity to be able to gather enough data to create recommendations. 75% of this users dedicated most of their time to the primary activity in Taringa! reading articles. For the other 25% we could create a better registry, but this wasn’t good for us. We analyzed navigation patterns from the users and found that we could infer interests if we started working out which articles they spent more time reading. From this insight we created a new metric called “QualityView”. This is triggered when a user spends a certain time reading an article, with their computer focused in the article and occasional mouse or keyboard usage to scroll while reading. With this, we could expand our dataset.

This helped creating better recommendations, but still they weren’t beating aggregated collaborative filtering, and this was really annoying for us. It seemed counter intuitive that even when we had such precise information of our readers interests we couldn’t generate more engaging recommendations than those for anonymous readers. We tested several approaches, but found that on the article recommendation widgets we couldn’t beat aggregate data by a significant margin. We decided to test this recommendations in other contexts.

Take recommendations beyond the article: email and home

We figured that when a user is reading an article and focusing on a certain topic our aggregate recommendation engines where delivering very good results. But there are a few situations where our users are not reading any particular article, and there’s no way to know if the user is focusing on a topic, or if they are exploring, looking for anything interesting. One of those contexts is the websites homepage. The other one is their email inbox.

We decide to package our recommendations in a new homepage widget that recommended content “based on your interest on X”, similar to what Netflix does in their movie exploration UI. This proved to beat our current homepage algorithms for those active users. We also started sending personalized emails to active users with some of those same recommendations. This emails have the highest conversion ratios from all our email campaigns with 31% open rates and 16% click through rates, against 18% open rate and 4% click through rates for our standard “top ranking” posts emails. This 2x the open rate and 4x the CTR.

Closing remarks

Ok, this is my first post here, and I feel it definitely has been longer than planned and it has gone a bit too deep on the technical side, but the whole process of developing and figuring out this has been so interesting for us, that we didn’t want to keep all this learning just for ourselves. I’’m interested on hearing from you on any doubts, or insights you might have to make recommendations in Taringa! better.

--

--