The discouraging lack of correlation between time and engagement when publishing news stories.

It’s Friday afternoon, I’ve spend the week elbow deep in building entity and concept relationship graphs for Kaleida. It’s hard work and I can’t get my charts to cluster in any useful way, pretty yes, helpful not so much.

Instead I bring you these, because 1) we’ll talk about how to read the graphs in a second and 2) Medium posts look better with an image near the top.

The time of day front page stories are published, along with number of engagements (distance from center)

What we’re looking at here is a circular graph thingy showing the time of day that articles that appear on the front page of various news organisations get published, in this case the BBC and Fox News. Each dot is an article, how far around is the time, how far out is the number of engagements that we measure (across various social networks), the further out the higher the engagements. There’s a whole bunch of details & caveats* to that I’ll list at the bottom.

What I was hoping to see, and didn’t, was some kind of clustering that indicated that stories published at a certain time of day got better engagement.

Reading the charts we see that the BBC appear to publish their stories throughout the whole 24 hour news cycle, maybe they could do some kind of branding around that! If we squint a little we can pick out some slight clustering around midnight and 6pm, and a touch of sparseness between 6pm-midnight.

Compare this to Fox that appears to publish a bunch at 1pm, which I’m somewhat convinced may be a daylight saving error time conversion on our side, but doesn’t go on to explain the 5am (maybe 4am) cluster.

The other difference we can see are the bands of very low engagement at the center of Fox News that you don’t get with the BBC. Conclusion: Fox puts out a lot of low engagement stories, while the BBC has a good spread.

However, none of this is what I wanted, what I was hoping to see, and didn’t, was some kind of clustering that indicated that stories published at a certain time of day got better engagement. Turns out that from looking at these graphs we could guess that time has no correlation to engagement.

Which I guess is what we’d expect in a global news culture, but there goes my proving a theory with science and pretty charts. While there are lots of good reasons for publishing stories at certain times, chasing social engagement doesn’t seem to be one of them.

Here’s two more…

I mean, I don’t think I even need to explain these ones. The Guardian has a strong publish on the hour ethos starting at around 6am, with a huge embargo/print newspaper push at midnight. While at The Times, well, I’m not really sure what to make of that.

What’s interesting to me is that when looked at together they seem very much like news orgs publishing fingerprints, and how everyone has different approaches. Here’s another selection.

Yes, you are correct, that last Washington Post looks all kinds of wrong. This is where visualising data helps, when I 1st ran these graphs a couple of weeks ago and that one popped up it made us go back and look at the data. Surely it couldn’t be right that they were only updating the front page between 7am & 7pm?

Turns out that the times we were extracting from the pages (2016–10–04T11:30–700) were written in what looked like standard time convention; i.e. somewhat like “2016–09–27T12:21:04.885–07:00” and so it looked fine.

But, for reasons I can’t even, they were using a 12 hour format with neither an AM or PM to be seen, and some additional timezone issues threw the conversion to local time out, giving us that graph.

Which allowed us to fix the code and go back and correct the data using a different (more annoying) method of getting the publish time. I’m using the uncorrected data here because showing mistakes is cool.

So there we have it, quick Friday afternoon graph fun. Now for me either go back to sorting out why my clustering graphs go flying off in all directions (I think I may have my vector signing the wrong way round) or the pub, decisions decisions!


MANY *CAVEATS

As promised here are the caveats.

  1. Over at Kaleida we scan the front pages of a whole bunch of news organisations looking for updates. Normally when a story is published we pick it up within a minute or so. However it’s possible for stories to be published on one date at one time, and later moved to the front page. The graphs are using published time rather than discovered time. I don’t expect the difference to be significant, but maybe I’ll plot that at a later date to see.
  2. When there’s an option we look at EU versions of a news orgs front page, at some point I’ll graph the difference between US and EU front pages and dig into that a little more.
  3. On the engagement axis the distance from center is logarithmic. For reference the highest engagement value we have is around 600,000, but those are outliers, most fall around the 2,500–10,000 mark, hence using the log axis. This explains somewhat the banding near the center on Fox News, the Telegraph and Washington Post. When you get down to very low numbers after log has been applied & then the scaling function for the graph, rounding errors creep in and lump them all at the same value.
  4. The colours don’t represent anything other than our branding, but if you want your colours to mean something on a graph consider them the dusky night and daylight sky colours. There was a temptation to plot the colour as positive & negative, but my guess is that would either show also no correlation to time or when people drink their coffee.