A Jolly Jaunt Through the Sentiment of News.

Over at Kaleida I have the joy of digging through all the data we’re grabbing to see if I can tease out interesting nuggets and trends. Of which there will more of in the coming weeks and months.

Today though, I wanted to start with something simple, the rather inexact science of Sentiment Analysis. I confess to getting a little eye-rolly when it comes to “Data Scientists” talking about sentiment, it’s a bit like they’re drawing back a richly embellished curtain and beckoning you in to read your fortune. It seems to have the potential to be the fluffiest and most hand wavy of all the analysis tools.

That said, for the sake of adventure and fun we can see what happens when we throw a lot of data at it. In this case around 50,000 front page* news articles from the last 4 weeks (22nd Sept — 13th Oct 2016) through a sentiment analyser and see what happens.

We can see that according to computers the overall positive and negative sentiment within articles are roughly the same.

Let’s dig in a little more and look at a couple of what we call “Entities”, although you may have different names for them; Donald Trump and Hillary Clinton. Of our 47,958 articles, 5,844 mention either Trump, Clinton or both. That’s a whole 12% of the total front page articles we’ve collected.

To be more precise 1,627 mention exclusively Trump, 808 just Hillary, while 3,409 mention both.

A question we could ask ourselves is… How do articles featuring one or the other (or both) compare positively or negatively to the “global” values? The answer would look something like this…

As with all stats you can cut this a number of ways, articles exclusively about Trump are overall more positive than ones about Clinton. Yet at the same time they are also more negative too. Trump is more of everything except neutral — see how maths can give us these wonderful insights we may not have known before ;)

Another thing we can see is that articles discussing both candidates (and so presumably the election) are 10% more negative than the base level of general negativity, 47.8% vs 37.9% negative.

One thing’s for certain though, Trump certainly got more front page coverage than Clinton for better and for worse.

Turning our attention to those front pages and publishers for a moment, here’s a quick breakdown of overall positive vs negative vs neutral sentiment of articles published that ended up on the front page.

There were a few different ways of displaying this data, including lots of proportional pie charts, but that was going to get complicated fast. I opted for some percentage bars ranked from most positive to least instead. Again this is because “Sentiment” seems to work best when compared to the system as a whole. You can only get so far with “is this article/tweet/review positive or negative” it’s better to ask “is this more positive or negative than the usual”.

Here are all 47,958 front page articles from the last 4 weeks or so broken down by publisher. The thin line in the middle is the general overall sentiment.

As you can see we have the Huffington Post at the top, and CNN down at the bottom, although Buzzfeed takes the prize for most negative in general.

We shouldn’t be surprised by the Huffington Post though as they’ve actively attempted to be more positive in their news reporting, in an earlier article Arianna Huffington includes this quote…

“We see countless proof points on Twitter that positive messages have more engagement and obtain more reach on our global platform than negative content. [snip] The implications of these findings should be far-reaching, from how we think about creative and editorial content as well as how companies think about public engagement and customer service.” — Chris Moody, Twitter’s VP of data strategy.

…which is something our own data appears to back up (there’ll be another blog post about that in the future), overall articles with higher engagement tend to be more positive and (intuitively) articles that are positive tend to have higher engagement.

This all goes with the underlying caveat that sentiment analysis is far from an exact science. For example a number of Clinton exclusive articles are about phenomena, containing words like “disease” and “illness” which pushes it towards the negative, while the article itself is one a reader would most likely consider to be positive.

Positive articles about negative things and negative articles about positive things oh my!

But before wrapping up, one last graph in which I’ve dispensed with numbers altogether, I shall be winning no prizes from Edward Tufte for this.

This shows the various publications’ sentiment for exclusively Trump and exclusively Clinton articles compared to their normal +/- levels. Trump is shown above and Clinton below the thinner fainter publications usual overall +/- values from the chart above.

Honestly it makes more sense looking at it then trying to describe it. To start off though, back with the Huffington Post we can see it’s more positive about Trump and more negative about Clinton than its normal articles. I’d suggest Buzzfeed as the next stop for drawing your own conclusions.

The fun part of this is being able to throw different terms into the engine behind it all and seeing what pops out, there’ll be a whole Brexit I’ll post at some point, along with sentiment over time (place your bets now)!

It’s possible to drill down further, although much harder to graph as static images. Which is another way of saying keep an eye out for some interactive graphs on the Kaleida site in the future (but not quite yet because uh javascript). Where we can start exploring the differences between positive and negative article, the connections between different “entities” and are you better off reporting news about cats & kittens over clowns in the long run?

What I can tell you though is there are more articles connecting Trump than Clinton to crafting and blanket making. Although what you now do with that knowledge is up to you, use it wisely!


*front-page: The news articles we gather are stories that appear on the international (when possible) homepages of the various news sources we check. Sometimes these stories are placed there editorially and sometimes they’re promoted there algorithmically. Stories placed on the home page by editors are normally published and discovered by us within a few minutes. Stories than wind their way to the homepage (due most likely to social engagement) can often have hours and even days between being published and ending up on the homepage. At the time of writing early clown stories would be an example of ones that didn’t start on the front page but as later moved there as newer stories surfaced.