How data-driven organisations can harness the power of ‘many eyes’ and improve the accuracy of their data and analytics
Analytic Ops, simply put, is Dev Ops for data analytics and it’s essential if you’re serious about scaling up data analytics and its applications so that it benefits the whole organisation. But with this wider reach should come an even greater regard for appropriate use and interpretation of analytics. Understanding, or at least the ability to scrutinise the underlying data, makes appropriate use more likely. And yet, metadata, including among other things, details of the means of creation of the data, is a critical but often overlooked element of effective Analytic Ops implementations.
An example might help illustrate the point. It’s got nothing to do with data analytics but I do think it highlights the downsides of insufficient or absent metadata. Several months ago the ONS revealed that issues with the way it measured the telecoms sector meant that it had missed large cost-efficiencies of up to 90 per cent over a five-year period [2010–2015]. The mis-estimation had knock-on effects as noted in a Financial Times article:
“The error, which covers the period from 2010–2015, means that inflation statistics may have been significantly too high and economic growth figures too low, calling into question the consumer price index and retail price index, which help determine pay and pension increases.”
The FT article focused on the impact of this mis-estimation on other metrics of economic health and what the least disruptive way of addressing this impact might be. These are valid considerations but just as valid was the unaddressed question of how this mis-estimation went undetected for so long.
The problem appears to be rooted in a measurement approach which, over time, became sub-optimal. That this wasn’t spotted suggests a contextual awareness issue. An FT reader’s letter* in response to the article captured this well:
Statisticians have admitted they missed recording properly the degree to which prices in the sector have fallen, meaning “real” telecoms output is likely to have risen much more than they thought. In this area it is easy to get a measure of the nominal cost of transmitting one data unit — a bit. It can be relatively simple to assess how prices over time have altered for doing essentially the same job.
What letter writer, Peter Marsh has thoughtfully captured here is how measurement-induced blindness creeps in, limiting our ability to accurately interpret what we’re measuring. This happens once metrics become unmoored from context. In this particular case, over time the efficiencies in telecoms and changes in the market meant that the nominal cost of transmitting one data unit no longer meant what it once did. The ONS statisticians who aren’t subject matter experts don’t appear to have had a way of working that made this shift easy to spot. As a result their assumptions didn’t change and they kept measuring and interpreting their measurements the way they always had.
These statisticans measure loads of things and while I’m sure they draw on the insights of telecom experts, it’s not realistic (or even helpful) to expect that they would be experts themselves. More importantly, they don’t need to be, there are loads of telecoms experts out there. The challenge is that there aren’t many avenues for the many, many telecoms industry experts to assess the way that this particular measurement is conducted or the assumptions underpinning the process. The upshot is that they couldn’t raise warning flags. This is a metadata (in this case, data about how the data is created) transparency problem. Further, once a metric becomes part of a business as usual (BAU) process, it’s even easier to inadvertently cut off opportunities to review and check the validity of assumptions, especially when the report generation process is as complex as those that make up the consumer and retail price indices.
I think the ONS does many things right, a view that was reinforced by a presentation that Andy Dudfield gave at a recent Citizen Beta event. So this isn’t an attempt to single them out for criticism. I found this example involving them particularly interesting but the underlying issue is by no means unique to the ONS, it happens in other organisations (across sectors) too. And it’s why people have started pointing out the need for data ethnography to become standard practice in organisations.
Data-driven organisations and a culture of ‘many eyes’
One of the defining characteristics of a data driven organisation is a workforce that’s confident about accessing data and applying data analytics in a broad range of day-to-day activities. This has many knock-on effects; for example, it drives up demand for data as well as the quality of the data in circulation within an organisation. More pertinent to the points made in this post, by exposing data to more users, more questions get asked about it. These include questions about the method and frequency of collection, the variability of the data etc. The ease and extent to which people get answers to these questions determines how intelligently they can use the data. It also means they can provide useful feedback on the data itself and increase the contextual understanding of all those who use it. This improves the quality of any models built on such data and reduces the chances of issues like the one highlighted above.
But this doesn’t happen automatically. The underlying infrastructure needs to support easy generation of metadata, ideally with a good degree of automation (so there’s some standardisation). It’s also necessary to give people the tools that make it easy for people to access relevant metadata and give feedback. That’s when an organisation can start to harness the power of many eyes.
*Don’t you just love that in our current climate of unexamined, hastily dashed off (and often needlessly vicious) ‘below the line’ comments, people still take time to write courteous, thoughtful letters such as this? Well done, Mr Marsh.