The Big Lie about Data

Matthew G. Johnson
DataSeries
Published in
5 min readDec 5, 2022

Say something frequently enough, and you can get anyone to believe it. The best lies are often the biggest…

Figure 1: The Big Lie about Data, DALL*E 2

The value of data is in insights
The value of data is in insights
The value of data is in insights

Millions of careers and billions of market capitalisation are dependent on classical data systems: data warehouses, high-performance query engines and data visualisation software. Classical data systems are founded on this story. Nonetheless, the truth is slowing starting to emerge…

The value of data is not in insights

Most dashboards fail to provide useful insights and quickly become derelict. A usage report of any online business intelligence portal will quickly reveal that 80–90% of all dashboards are rarely if ever accessed.

Meanwhile, those few dashboards that do offer useful insights rarely provide a concrete basis for action. Multiple additional analyses are often required over subsequent weeks and months, many of which fail to deliver the basis required for action.

The hard reality is that the intelligence of classical data systems is limited and they destroy most of the information value of the data they process. Some insights they deliver do have value and a few are actionable, but sadly only a handful result in action. We assuage our conscience by insisting our insights are “actionable” while rarely putting in any measures to assess whether they are truly “actionable”. The truth is starting to emerge…

The value of data is in actions

However, if the truth is so obvious, why does anyone believe in the Big Lie about Data?

The challenge is that getting from data to actions is hard. Most classical data systems lack the required intelligence. Most classical data systems are only capable of providing basic insights and deliver limited value.

We have chosen to accept the Big Lie about Data, not because we believe it. We have chosen to accept the Big Lie about Data because it is easier to work with the limitations of classical data systems than to challenge and overcome them. We have chosen to accept the Big Lie about Data because it is convenient.

Thankfully new data systems are arriving which overcome these limitations. These future systems have the intelligence to surpass the “actionable insights” of the past, and reach directly from data to actions.

However if we are going to understand these future data systems, we need to compare them with classical data systems. Perhaps the best way to illustrate their differences is to take a simple business problem and examine how we might approach it with each type of system.

Imagine we are trying to enhance the distribution of health insurance policies. In a classical data system we might start with a simple query using SQL (Structured Query Language):

SELECT
policy_year as year,
policy_type as policy,
COUNT(*) as volume
FROM policies
GROUP BY
year,
policy;
+------+--------+------------+
| year | policy | volume |
+------+--------+------------+
| 2019 | Alpha | 20,637,609 |
| 2019 | Beta | 15,234,792 |
| 2020 | Alpha | 17,840,839 |
| 2020 | Beta | 18,840,839 |
| 2021 | Alpha | 16,998,797 |
| 2021 | Beta | 20,223,711 |
+------+--------+------------+
6 rows in set (3.7 sec)

We typically translate this into a chart to aid in comprehension.

Figure 2: Policy issuance by year and type

We see that the policies issued for Alpha are declining, while those for Beta are increasing. We used to issue more of Alpha and now we are issuing more of Beta. This is an insight, but what action should we take?

Perhaps, we can direct agents towards Alpha to reduce the decline, or maybe direct agents towards Beta because it is popular? The challenge with this analysis is that it is very hard to tell. Our analysis was able to describe the past, but was not able to predict how each action might impact the future. While we could certainly produce better analyses, the situation of having insights, but no clear action, is very common.

So what is the source of the problem? We started with three years of detailed policy data with over a billion data points and compressed this down to only six data points. We have discarded almost all the information value of this data: for example which type of policy does each customer prefer, how do they connect, and what additional coverage might they need? So why did we do it?

The fundamental challenge, is that no human is capable of looking at a billion data points and making sense of them. We summarise the data to six, sixty or maybe six hundred data points to ensure we have the capacity to consume them. The fundamental constraint is the power of human intelligence.

Sadly, the tools we use in classical data systems to compress data are simply not intelligent enough to retain sufficient information value. This is the fundamental reason why so few insights are actionable.

Thankfully, new data systems are arriving which overcome these limitations. These are powered by a range of technologies such as machine learning, deep learning, statistical methods and others. We increasingly refer to these technologies collectively as Artificial Intelligence (AI). In an increasing number of cases, AI now allows us to surpass the limitations of human intelligence.

So, how might we apply AI to approach the insurance policy challenge? The fundamental difference is that instead of starting with a target insight, we start with a target action. A well proven action for this use-case is a personalised policy recommendation for each customer.

We use AI to understand each individual customer: what types of policy and additional coverage they may need, and perhaps the message and best channel of communication. We use this intelligence to make individually targeted communications and increase premiums; one customer at a time.

Figure 3: Macro-insights versus macro action

With classical data systems, we compress millions of data points into a handful of macro-insights. We do this to work within the constraints of human intelligence. However, in the process of compression, we lose most of the information value of the original data.

With AI-powered data systems, we take a different approach. We use AI to extract every last drop of information from the data to develop detailed and comprehensive understanding. We use this understanding to power millions of micro-actions.

This new approach to data requires new tools and new skills. However, it unlocks new value; value which can be proven through controlled tests. The truth is starting to emerge…

The value of data is in actions
The value of data is unlocked with AI

Figure 4: Connecting human and artificial intelligence, DALL*E 2

--

--

Matthew G. Johnson
DataSeries

I am an informatician, fine arts photographer and writer who is fascinated by AI, dance and all things creative. https://photo.mgj.org