A digital news giant’s journey towards developing a personalized user experience

Published in

Analytics in Action @ Columbia Business School

4 min readDec 19, 2019

Recommendation engines have been a compelling AI concept for years, but have rapidly become the norm for online players in almost every industry. From retailers like Amazon telling you what you want to buy before you’ve even realized it, to Netflix predicting your next weekend binge, as consumers we have come to expect a deeply personalized digital experience. It has become increasingly difficult to engage users online unless our minds are read and our needs almost instantaneously met. Digital news publications have been testing this trend as well, and through a hands-on course called Analytics in Action at Columbia Business School, MBA and engineering students had the unique opportunity to partner with a major digital publisher to dive into the real world journey of developing an in-house recommendation engine. We were able to leverage the company’s anonymous subscriber clickstream data, and observed several opportunities to boost engagement. As Exhibit 1 illustrates, while average session length (amount of time spent on the site) is ~12 minutes, there is a large group of users who bounce from the site more quickly, perhaps because they don’t find what they’re looking for fast enough. Specifically, 11% of subscribers leave the website within 30 seconds… it’s possible that an ML enhanced recommendation engine to capture a user’s attention more immediately could drive deeper engagement and increase return visits!

Exhibit 1:

In the spirit of tech sprints and MVPs, which aim to provide business value incrementally rather than only at the end of the project, we decided that providing our client with a deeper understanding of its users along the way to building the recommendation engine would be beneficial to the company in multiple capacities. To do this, we wanted to create user clusters that the client could later use for personalized recommendations as well as targeted branding and marketing. The first challenge we encountered, however, was the sheer number of variables in our dataset. To run a clustering algorithm effectively, we needed to trim down the variables. To do this, we used PCA analysis, which helped us identify the variables with the lowest correlation but highest explanatory impact. As Exhibit 2 below illustrates, by running this analysis, we observed that more than 85% of the variance in our dataset of customer characteristics could actually be explained using only 5 dimensions.

Exhibit 2:

Using the results of PCA, we were then able to derive five unique clusters using a K-Means clustering algorithm:

Exhibit 3:

We were excited to have derived five unique and interpretable user clusters, and even more pleased to find that the behavioral trends identified through clustering made logical sense when we looked into what kind of content each of the clusters was reading. For example, users in Cluster 0 (The Frequent User) read the most news articles by far compared to all the other clusters, which makes a lot of sense… they visit the site most frequently at many different times of the day, which is aligned with the behavior of someone using the platform to stay up to date on current event headlines. Similarly, Cluster 3 exclusively reads only one of the company’s many websites and almost only articles related to that website’s historical specialty, despite recent diversification of content. Content type aligning generally with behavioral trends of the clusters gave us the confidence in our results to move forward with developing a recommendation engine based on these learnings. The flowchart below illustrates the strategy behind what we ultimately built:

While we believe this strategy is a good MVP for the client to test on live users in order to determine if cluster-specific content recommendations perform better than simply recommending the most popular content for all users, we developed a long-term strategy for a recommendation engine that starts out with cluster-specific optimization goals but ultimately makes recommendations based on a user’s specific engagement with content. See below for a summary of our vision:

While we did not have the resources to A/B test our recommendation engine or the time to build out the more complex engine, we are excited about the opportunities our findings can potentially unlock for our client. Particularly for subscription-based businesses, which is largely where quality online news content is going, the first month or so of a new user’s experience is extremely crucial in terms of keeping that user as a long term (paying) customer. A deeper understanding of users will enable the company to more quickly classify new users into a cluster, immediately tailor recommendations that feel more personalized to that user, keep the user onsite for longer with higher engagement levels, and ultimately drive stickiness, which means more topline revenue from retained subscription fees. Of course, this is all still strictly in theory… we are excited to stay in touch with the client and learn the results when this is actually applied and tested on live users! Stay tuned.

A digital news giant’s journey towards developing a personalized user experience

Written by A Mehndiratta