Combining the Facebook API and web analytics: how does sharing correlate with reading?

More meaningful metrics thanks to teh interwebs

In the not-so-new digital age it’s important to look at the most helpful and appropriate metrics on offer. Coming from a media agency background into BBC Three as an analyst, I had been used to looking at things with an explicit business impact such as ROI, ROAS, or CPA. However, an organisation such as the BBC doesn’t sell directly to its consumers, so we have to use other ways to find out what’s working.

For years there have been teams at the BBC looking at a huge number of different data sources; qualitative and quantitative; 1st party, 3rd party; research, public data; individual pieces of content or aggregations at a channel, brand or series level. The diversity and volume of all this information is one of the reasons that we have a number of teams and numerous people working on sharing and analysing it.

A bit of background on BBC Three: we’re just one smallish part of the BBC, but we have a specific mission — to provide content that will interest and entertain youth audiences, and help them to make sense of the world. We are pretty much 100% online though so we can’t measure success in the same way as a regular, broadcast television channel might (e.g. using TV ratings measured by BARB). As a result, we need a different approach to measuring success — and crucially a faster one, since the world of online video and journalism doesn’t wait for anybody.

Combining Facebook with in-house analytics

Thankfully we have a robust and incredibly detailed analytics setup that helps us to understand which content is doing well (both in aggregate and real-time), with a selection of possible metrics used for interrogating performance.

However, it’s also fair to say that for all the traffic that BBC Three gets from around the BBC site, looking at consumption of content in conventional ways isn’t necessarily the best way of judging a richer metric like its value to an audience. We wanted to go further and look at content that was not just being consumed by an audience, but actively shared by them.

Knowing that the vast majority of our audience on Facebook is within our target audience of 16–34, we set out to compare how our editorial content performs vs. its shares on Facebook.

Using the Facebook API

Facebook are kind enough to offer an API that lets you access a number of pieces of information about pages, individual posts, videos, users, and also external URLs (the extent of this access is dependent on the permissions you have regarding each of these items).

In actual fact the API is always running different versions so that as Facebook updates its apps do not all suddenly break — so the process I’m about to describe may not always be available, but as an example of using an API to create a simple tool to answer a specific problem, you’ll get the idea.

For the following I’m using v2.8 of the Facebook API, which lets you do a basic call (see below) with a GET request with a specific URL and returns a few choice bits of info about that URL, including its “share_count” (which is really shares, likes and comments, which all generate an ‘organic’ story, i.e. your content appears in people’s Facebook news feeds).

Facebook’s Graph API Explorer in action

There are sites out there like MuckRack that will let you do these queries one by one or pay for bulk processing (to be fair, they will also deal with Pinterest, Twitter and so on too). I didn’t want to have to pay or even upload a CSV periodically, I wanted this all to be done for me. Here comes the fairly mild programming bit.

I’m definitely not a developer which you might be able to spot that from my coding style in the screenshot below, but I felt confident enough to take this on using R to first query our digital analytics API and then the simple API call in the screenshot above (plus a token that Facebook gives you once you register as a developer) to merge in the Facebook shares for each bit of content. Naturally you don’t have to use R for something like this — go for it in Python or whatever floats your boat.

The advantage of using a script for this is that at any given time (thanks to an always-on AWS instance and a cron job in the middle of the night) just running the script can take care of the following:

  1. Pulling the URLs, titles and article views on the BBC’s website. I make an API call to our own analytics system to do this.
  2. Cleaning the data which might contain characters designed for readability that don’t necessarily help R understand what it’s looking at initially.
  3. Filtering out some categories and topics that we considered to be either anomalous or not relevant. Having easy access to regular expressions within most modern programming languages makes this step a huge timesaver vs. manual work.
  4. Cross referencing the URLs with Facebook’s API to obtain shares, and merging on the URL in question.
  5. Creating a combined “score” for what works according to both Facebook and our own internal metrics — I’ve used a simple ratio at this point to expose those things that are shared far more proportionally but depending on your data some other transformations might also make sense.
  6. Outputting key columns to Excel and using the mailR package to share directly and regularly with key stakeholders such as our editorial teams for their own interrogation.
  7. Storing data in a Redshift database that means there’s an up-to-date, robust source of data at all times that can be connected to Tableau or other visualisation tools.
R & RStudio are a great way for non-Comp Sci grads to get stuck into data analysis & visualisation quickly

As you might expect, it turns out there is not always a correlation between between the most shared content on Facebook and the most popular content on our own sites, validating the assumption we started with.

Sometimes this might be because we’ve got considerable traffic from around the BBC site and elsewhere, but that hasn’t automatically converted into shares. In a case like this it could be that an article or film is interesting and informative but isn’t something that people feel they can share (perhaps it’s a particularly contentious or difficult issue). Sometimes, a decent read is just not that shareable, and that’s okay.

Plotting data (this is an extract) in R reveals that there are outliers in both dimensions that will help us to focus our efforts depending on the KPI in question, but no obvious direct relationship — meaning we could consider 2 distinct strategies.

On the other hand, sometimes we will see a piece that people share proportionally more than we expect because of the cause it supports, or the issue it raises. Sometimes a well-written link on social media tells people all they need to know — and if that’s the case and they get behind it, we are only too happy for our work to be shared to their friends too.

Then there’s the sweet spot — where the shares on Facebook and traffic to the site are both at the upper end of what we expect. Sometimes these can be things that appeal to everyone (in this example — heart-warming, visual, well written) and sometimes they are those issues that affect everyone or about which passions run high (politics, gender/sexuality, social conscience). These don’t happen all the time but you can bet that we are always looking to these for inspiration.

Conclusion & next steps

For BBC Three, one of the most pleasing parts of this is that this type of analysis consistently shows that things that we wouldn’t expect to be promoted heavily by other parts of the BBC due to their more risky or post-watershed subject matter were working well for us in social regardless. They were not simply being read, they were actually being shared — and moreover, they were generating discussion (but analysing that is a topic for another day!).

I plan to extend this by looking at which bits of content get which ‘reactions’ on Facebook so that we can make even greater use of the info that Facebook shares with us and see if certain themes are more likely to make people happy/sad for example.

Yes, there are tools out there that can help with this, but using APIs to provide bespoke reporting and insights in this fashion has a number of benefits:

  1. It saves time.
  2. It can save money too if it doesn’t take you too long to write the code!
  3. It keeps people up to date as often as needed.
  4. It’s rewarding to make stuff…
  5. Metrics normally hidden behind APIs can sometimes help a brand or publisher to take a different look at how their content is performing!

This is only the tip of the iceberg for using Facebook’s API. It’s chock-full of valuable data for videos for example, and can be used to look at comments, analyse page growth and audiences, and much more. Other people in the team are doing cleverer stuff already which we’ll share in future articles!

Artist’s rendition of the sheer power of data