Tone deaf

5 min readNov 13, 2022

A few years back, the top PR researchers in the field tried to tackle an intractable problem — how to reliably measure the tonality of news.

One of the first efforts to crack the code on tonality had fallen well short of that goal. After reading dozens of news stories about major companies, and trying to refine a code frame devised as an industry standard, several experienced PR industry content analysts were asked to record basic information, including whether the tone of a published news story was positive, negative, neutral or balanced.

“The research results yielded low to moderate inter-coder reliability,” the researchers concluded. Translation: It didn’t work.

A few years later, the PR researchers tried again, using experienced content analysts, a refined code frame, and a renewed push to demonstrate the viability of the measurement standards. This time, they cleared the bar. But just barely. And not before what might be considered by some as a bit of p-hacking.

Frustrated by media monitoring approaches that yielded inconsistent results and unreliable data, industry leaders eventually flew to Barcelona to launch a push to standardize media measurement. Standards, so the thinking went, would get everyone on the same page, effectively transforming unreliable data into reliable data. Voila.

Why were a group of highly experienced content analysts ultimately unable to produce reliable data? Because tonality is simply not a discrete metric. Mathematically speaking. It cannot be counted.

In the research study, for example, the coders were asked to assess the influence of a news story on a reader’s likeliness to support, recommend, work or do business with a company mentioned in the coverage. Those are four separate and distinct attributes. But tonality scoring called for a single, binary data point.

Lessons learned

That’s not to say news does not impact those four brand attributes. It does. No question. Proprietary research has connected news to significant shifts in brand equity, purchase intent, employee satisfaction, and, sales. One study, a multivariate model, found that news had equivalent impacts to ads in supporting customer acquisition.

Using virtually any online news aggregation tool and some intuitive Boolean search strings, you can quickly generate news stream data these days. We built more than 40 of them in our N.C. State undergraduate PR case studies class alone over the past year.

Here are some of the things that we learned. One, the highest and lowest average daily news tonality scores tend to occur on days when there is relatively little coverage. Two, not surprisingly, news tonality skews negative. More on all of that in future blogs. Three, the tonality scores do not correlate with survey data tracking consumer perceptions of news. In fact, statistically speaking, that relationship is close to random. More on that later, too.

But the most critical challenge with tonality data is that it is not statistically reliable. What do we mean by that? If you are into media monitoring, here’s a test you can do at home (or in the office, if you have gone back). Load a spreadsheet listing all of the stories published about your company, and the corresponding tonality scores. Then sort the spreadsheet by headline and date.

What you likely will see is that on big news days, virtually the same story is posted to a number of news outlets. Why? That’s a spillover effect from consolidation in the newsrooms. There are a limited number of news outlets publishing stories, and a virtually unlimited number of online sites that repost that content.

Now, look at the tonality scores for the otherwise identical news stories. One would think that machines would code the tonality of the stories uniformly. Often, that’s not the case. Based on news streams for dozens of companies, we found that 80% reliability is the gold standard. That equates, though, to a 20% margin of error. In business, confidence intervals are typically 95%. That makes tonality data unreliable.

Here’s one final challenge. In our digital age, news has become increasingly polarizing. Media outlets can take the same fact set and draw polar-opposite conclusions. Take Chick-fil-A. For years, the CEO testified to his beliefs about the Biblical covenant of marriage, and people hardly noticed. When the L.A. Times cast the story as a rejection of gay marriage, about 40% of U.S. adults considered the story to be negative, about 35% as positive.

Tonality is a binary metric in a multi-dimensional world.

And finally, the math. Tonality scores and news volume both measure news coverage, but the data points do not move in the same direction at the same time. When a negative story generates national coverage, volume spikes positive and tonality spikes negative. That’s problematic when it comes to generating a single data stream representing published news that can be integrated with time-series business models.

Bottom line

So what can we do about this. Let’s start with the basics. In business terms, news is an external variable associated with a broad spectrum of business impacts.

The trick, then, is to rationalize our media measurement to better monitor, manage, and at times mitigate those impacts. This is part of the new communication research framework we talked about in Turn down the Volume. Like news volume, a new type of tonality scoring will need to be developed and integrated with discrete time-series dataset representing news coverage.

What we will eventually need is a little alchemy. Because business decisions are guided by discrete, reliable and accurate data, we are going to need another way to assess news valence — the headwinds, tailwinds or swirl created by news. That’s a blog for another day.

So, net-net, what do we do about tonality scoring? Until we build a better mousetrap — and we will build a better mousetrap — my recommendation for now would be to scrap tonality and sentiment scoring altogether. The natural language processors do what they are built to do. But the data they spit out is random and unreliable, because the assumptions that went into creating the data are flawed.

Instead, let’s concentrate on news volume — what media monitoring calls “reach” — and actual readership. Can we really distinguish between what was published, and what people actually read, heard or saw?

Rhetorical question.

Next up: A proof of concept.

Are these blogs resonating with you? Let’s connect.

Drop me a line on Jim.Pierpoint@HeadlineRisk.com

Tone deaf

Written by Jim Pierpoint