Using data, machine learning, and recommendation systems to understand and engage media audiences

5 min readMay 8, 2016

Note: This article is the last in a 3-part series that outlines technology solutions to the big challenges that traditional media companies (particularly those in Asia) currently face. If you have not read the first two parts yet, do take a look here and here.

In the last two posts, I have talked about how media-houses can drive internal cultural and technological changes to adapt for the web. This post is about how companies can and should collect data about how readers are using their site and social media. It then talks about how companies can use this information to drive content strategy and personalize offerings for users.

Google Analytics is great, but you need more

Google Analytics (GA) is used by practically all media houses around the world, both for editorial and advertising purposes. While GA is a great tool for getting a high-level overview of how your content is performing, it does lack features that are essential for media companies (and tech companies in general):

GA only accurately measures pageviews. Its measurement of time-spent of page and bounce rate are wildly inaccurate. In a world where time spent is becoming an increasingly important metric for both editorial and sales, an accurate representation of time spent engaging with the page (instead of switching to another tab, or opening the page without really engaging with it) is paramount. Upworthy wrote a great analysis on why a better way of measuring time spent it needed.
GA does not tell you how your articles performed on social media. As Facebook becomes the dominant source of pageviews for news sites, it is imperative that media-houses get a unified platform to see both internal and social metrics. Moreover, Facebook also provides the Click Through Rate on your posts, which are essential for measuring the ‘clickability’ of your headlines
GA does not effectively measure advanced engagement metrics like how far a user scrolled down a page, whether or not she engaged with interactives etc.
The GA interface is great for ad-sales teams, but does not provide editorial value. It is difficult to get aggregated analytics by author, categories or topic. It is also difficult to see a single, unified view for an individual page.

Providers like Parse.ly and Chartbeat are much better

GA’s shortcomings for editorial have created an opportunity for other players to fill in the void, and Parse.ly and Chartbeat have done so effectively. These companies give users a comprehensive, visual overview of content performance on their website. Parse.ly also allows you to access some granular data, although it does not measure scroll depth. It is an effective, albeit somewhat expensive, option if you do not have in-house data engineering and infrastructure capabilities.
(Note: an earlier version of this post claimed that Parse.ly does not allow access to raw data. This has since been rectified.)

Storing and analysing your own data is critically important

Media companies are not traditionally known for high-end tech, but companies like BuzzFeed and the New York Times are increasingly changing that perception. Both these companies store and analyse their own data as well as third-party data, and use it to understand their audience and improve their content strategy. BuzzFeed’s POUND, for instance, is something that a company can only do if they build their own data collection and analysis capabilities.

With this in mind, we have built our own data collection and social listening capabilities at The Broadline. This has helped us understand how our users behave on the site, and what they talk about online.

Social and search listening is critical for understanding what is hot right now

We scrape Facebook and Google Trends (along with articles on popular websites) every 30 minutes to find out what people are sharing and searching for. This can allow content-creators to figure out what content to publish in the short-term, as well as what kind of content tends to persist over time.

Social Trends for India, as seen at 4PM IST on Sunday, May 8 2016

Combining pageview, social shares, attention seconds and scroll depth can give you a comprehensive idea of how your content is performing

We decided to create our own data collection and analysis platform to understand our audience better, and to get a better idea of how our content was performing. This is what we came up with.

By collecting referral data, geographic data, as well as devices used, and combining it with social shares, pageviews, active time spent and scroll depth, you can create a comprehensive understanding of how your content performs and which audience segments it most closely resonates with. You can also look at more segmented analysis by referrer type, city and/or device.

Understanding the user at a granular level

More importantly, collecting your own means that you can profile users at a granular level. This means that if a user has historically engaged well with policy related articles on your website, you can profile her as a policy-buff. By aggregating these profiles, you can begin to see what kind of people your content attracts.

Collecting data at a granular level is also the only way to reach the holy grail of the web — effective personalization. By tracking what a user reads, shares, and engages with, you can offer content that she would be interested in, and hide content that would turn her away. Doing this is a great way to turn your users into rabid fans while still retaining general appeal.

Bringing it all together

While these changes seem enticing, implementing them is hard. The media industry generally does not attract high-end tech talent. NYT is the only traditional player that has been able to do so effectively so far. Companies need to start breaking down the walls between editorial, product and technology teams, and start to invest in effective data collection solutions. As the online media begins to consolidate, and advertisers and platforms start becoming more technology-savvy, this is critical for traditional companies to stay relevant.

Part 1: Traditional Media is far from doomed, but only if product and editorial start having lunch together

Part 2: Creating better articles faster: here is how you can half the time for creating in-depth, interactive articles

Part 3: Using analytics, machine learning, and recommendation systems to understand and keep users