Are Insights Always Good?

Alan Mitchell
Mydex
Published in
9 min readFeb 1, 2023
Image generated by AI using openai.com/dall-e-2

This is one of a series of blogs exploring Hidden in Plain Sight: The Surprising Economics of Personal Data, the subject of Mydex CIC’s latest White Paper.

There’s a famous psychology experiment where subjects are asked to watch a video of people playing basketball and count the number of times they pass the ball. You can see the video here. If you haven’t already done it, try it. It’s fun. It only takes a minute.

The astonishing thing is that a very high proportion of those counting the passes simply don’t notice a gorilla that walks through the game and stands there, right in front of them. Plain as plain can be. They just don’t see it. When asked, many of them even deny it: they are so intent on looking for something else they can’t believe that a gorilla was standing there, right in front of their eyes. This is the phenomenon of selective attention, and it’s extremely powerful.

In this blog series, we’ve explored many aspects of the economics of personal data, showing how and why personal data stores offer huge opportunities for positive economic transformation. But if this opportunity is so big, why aren’t organisations and governments clamouring for it?

A narrative of our times

Selective attention is a key reason. One particular narrative has come to dominate the conversation to such a degree that many people simply assume it accounts for the whole picture; counting passes and not seeing the gorilla in front of them. This is the narrative of Big Data and it goes something like this.

“Data is needed to create the insights that drive improved and innovative services. The more data there is to analyse, the better the insights will be, the greater the benefits. Therefore, everything should be done to help organisations amass as much data as they can.”

There is a small element of truth to this narrative (every effective lie includes a small element of truth to make it seem credible). Like the telescope and microscope before it, data is helping us see the world in ways we never could before. Tackled the right way, this is indeed an opportunity. But most Big Data today falls far short of this. It’s either flawed, unnecessary, irrelevant or downright dangerous — a bandwagon fueled by hype.

Big Data, we are being told, is a ‘game-changing’ development, revolutionising industries as diverse as marketing and health care and helping to ‘solve humanitarian issues around poverty, health, human rights, education and the environment’. It’s now ’a key factor in how nations, not just companies, compete and prosper’, a ‘foundation for disruptive business models’, ‘transforming processes and altering corporate ecosystems’. It can even predict the future (apparently).

Little wonder it dominates Government policies such as the UK Government’s National Data Strategy and the EU’s Horizon Europe research programme. But what is the reality behind this hype?

Are Big Data insights really transformational?

The Big Data bandwagon revolves around two words: ‘insights’ and ‘analytics’, which are widely referred to with close to mystical awe, as if they had magic powers.

Genuine insights, used for the right purposes, are wonderful. But Big Data has never generated an insight and never will. Why? Because only humans have insights. This is flaw Number One.

Big Data analytics are done by a computer crunching lots of numbers to surface correlations and patterns. But, in themselves these correlations tell us nothing because statistical correlation can be meaningless and is not the same as causation. While Big Data analytics can alert us to something we hadn’t seen before, only humans can understand the why; what it all means.

Confusing the two is one of the reasons why there is currently a crisis in medical research. Around 50% of medical research findings cannot be repeated if the experiment is conducted again. Why? Among the many reasons are that doctors (like most data scientists) are not trained in statistics and frequently confuse correlation with causation.

The trouble with Big Data is that the more data you crunch, the more noise: the more meaningless correlations pop up. Without clear ways of distinguishing between noise and signal, chances are you’ll end up in multiple wild goose chases.

This assumes that the data used by Big Data is reliable in the first place. But often it isn’t. One of the best kept secrets about most organisations’ large databases is that the quality of their data is often very poor: incomplete, out of date or simply wrong. And most data sets, no matter how large, are biassed in some way. No matter how big they might be, they are still a sample.

Building genuinely representative samples isn’t something that happens easily or automatically. In fact, one of the biggest barriers to Big Data research is the cost of accessing reliable data. According to the National Audit Office 60–80% of researchers’ time is spent cleaning and merging data. In a world where each individual was able to aggregate information about themselves into their own personal data store, these costs could be slashed — assuming, that is, that the researchers can make a compelling case for the value and ethics of their research.

But as it is, much of the data used for Big Data analytics is of questionable quality. GIGO is an ancient IT catchphrase. It stands for Garbage In, Garbage Out. With Big Data the risks of Big Garbage are high, such as Big Data-generated algorithms that build racist and sexist assumptions into how they work.

Can the insight be applied?

Even when a good signal is identified, it doesn’t mean it’s automatically useful. To be useful, it has to be applied. But Big Data doesn’t help here, because it only deals with statistical data — e.g. probabilities — not specifics.

If you toss a coin many times you can know with certainty that 50% of the time it will be heads, and 50% of the time tails. But complete, certain knowledge of probabilities doesn’t translate into complete, knowledge of the real world — of what it going to happen next. What’s it going to be? Heads or tails? For the real world, we need ways to deal with different, specific outcomes, not just generalised observations.

This can wreak havoc when it comes to applying Big Data insights in the real world. Say for example that Big Data analysis tells us that if you do X the probability of Y happening is doubled. Great! But if the probability of Y happening has doubled from 2% to 4%, it still means that 96% of the time it won’t happen. How useful is that when it comes to applying the ‘insight’ in the real world?

OK. What about a 99.9% probability rather than just 4%? You might think you can rely on a 99.9% probability in a real world application. The German Ministry of the Interior thought so when it installed facial recognition programmes outside big railway stations to identify known terrorists.

The 99.9% figure meant that 0.1% of observations generated false positives: that declared an individual to be a terrorist when they weren’t. With 12 million people passing through big railway stations every day, the German police were presented with 12,000 people a day wrongly identified by the system as terrorists. Acting on this information would have resulted in a large-scale invasion of civil liberties (hundreds of thousands of people being treated as if they were terrorists when they were not), while wasting all available policing resources hounding them. The programme was quickly (and quietly) dropped.

Behind this lies another flawed assumptions that keeps Big Data hype afloat: confusion between statistical data about populations — data that deals in probabilities — and the actual data about real, individual people that is needed to actually do something useful. The one does NOT automatically translate into the other.

Once Big Data is applied to an identifiable individual, it stops being Big Data and becomes deeply personal instead. It requires access to and use of personal data, and a matching, testing and application process to see if the generalised statistical ‘insight’ is relevant to this particular case. For that, you need to access exactly the right data, at the right time, for the use case in hand: you need the personal data logistics capabilities provided by personal data stores.

Without such a translation process from statistical probabilities to personal circumstances, Big Data insights can only be applied in a blanket manner — in a way that is often irrelevant, if not harmful.

Do you need Big Data to get big insights?

All this assumes that Big Data is the only or main way to generate useful insights. But this isn’t so. Most of the insights used to inform decisions and resulting actions today don’t come from Big Data. Other simpler, easier, cheaper processes such as traditional research or even mundane activities such as filling in forms provide the surprises — the new information needed to keep decisions in line with a changing world — that most people and organisations need to get stuff done.

So Big Data isn’t the be all and end all of ‘insight’ that it’s made out to be. Far from it. Viewed across the economy as a whole its contribution is actually marginal.

Beware hidden agendas

When you listen to Big Data hype however, it’s as though Big Data is a magic bullet that is going to save the world: it comes with a touching but naive faith that, by definition, if something is an ‘insight’ it must be ‘good’ — it will be used to help people rather than harm them or take advantage of them.

Unfortunately this is not true. Much of the hype about ‘insights’ derives from the activities of Silicon Valley advertising giants like Google and Facebook. But their insights weren’t generated to help people live better, richer lives. They were generated for the purposes of manipulation and control, to get people to do what advertisers wanted them to do. Cambridge Analytica was a business driven by Big Data insights. Its agenda was profit via manipulation and control — for which it needed as many ‘insights’ it could get.

The sad reality behind Big Data hype is that it is widely used as camouflage to extend surveillance capitalism into new areas, to justify a privacy-invading corporate data landgrab; an excuse for even further concentrations of data power and rewards in the hands of a tiny number of organisations (along with increasing pressure to bypass citizens’ data protection rights).

Are these corporations suddenly really that interested in data analytics for ‘data for good’? Or is this sudden interest a cover for a much more cynical hidden agenda?

Squandered resources, missed opportunities

The biggest damage wreaked by Big Data hype lies in a different direction, however. In a classic case of selective attention, countless politicians and policy-makers around the world have drunk the Big Data Kool Aid to sanction and promote vast Big Data programmes that promise the world and cost almost as much — billions of £/$/€.

In doing so, they failing to notice the gorilla in their midst: the immense opportunities for productivity improvements, service quality, outcome improvements and innovation that lie in the opposite direction of empowering citizens with their own data. The result is missed opportunities on a vast scale, coupled with equally vast misallocation of available resources — a misallocation of resources that deepens the imbalances of power and reward that already blight today’s data-driven economy.

Conclusion

Don’t get us wrong. We have nothing against Big Data per se. We are all for genuine analytics and insights, if done properly and ethically. But we are also all for seeing gorillas as well as counting passes — for seeing opportunities that are NOT reliant on big data, such as empowering individuals with their own data.

It’s not one versus the other. To really work well, Big Data needs to advance hand in hand with the personal data logistics capabilities offered by personal data stores. Focusing only on Big Data while ignoring personal data logistics — and associated citizen empowerment — is like trying to run with just one leg. Extremely hard work to not get far.

That, unfortunately, is what we’ve got right now. As long as policy-makers look determinedly in just one direction — only counting the passes — they won’t see the gorilla standing there, right in front of them. It’s time for them to look up.

Other blogs in this series are:

  1. The Great Data Delusion 19th century doctors thought bloodletting would cure most diseases. Today’s prevailing theories of personal data are little better.
  2. Why is personal data so valuable? Because of two fundamentals: reliability and surprise
  3. Is it a bird? Is it a plane? No! It’s Super…! With personal data, what sort of a problem are we dealing with?
  4. The vital issue people don’t want to talk about Productivity is key to prosperity. But nobody wants to talk about it? Why?
  5. When organisations become waste factories The single design flaw at the heart of our economic system, and what happens if we can fix it.
  6. Why are joined-up services to difficult to deliver? Because the organisation-centric database is designed NOT to share data.
  7. People: the dark matter of the economy Individuals and households are all but invisible to economics. They shouldn’t be.
  8. An engine of economic possibilities How personal data stores open up new vistas of innovation and growth
  9. What has data got to with net zero? A lot more than you might think.
  10. Google and Facebook: Steam Engines of the Information Age They hardly touch the fundamental economics of personal data.

--

--