Data is not an asset, it’s a liability

3 min readSep 10, 2015

If you work in software development, sooner or later you learn that code is a liability — all things being equal, the less code you have, the better off you are.

This is because code slows you down. Code equals complexity, and complexity makes it hard to change things and move forward. Code also has bugs, and bugs will make you spend time chasing after them. Code makes it harder to scale up, because it makes onboarding new developers more difficult while simultaneously hurting the productivity of even the most senior members of your team. The list of downsides goes on and on and on.

This, of course, is the in-the-trenches view. Ask top management, and they are very likely to view code in the opposite way. It’s a valuable asset for which the company has paid dearly in developer salaries and consultant fees.

The same dichotomy is beginning to emerge in our attitude towards data. The big data megatrend has taught companies in general and publishers in particular that user data is hugely valuable. And unlike code, data seems almost free: user activity generates an essentially endless amount of it. You just need to write it down on a disk somewhere.

On the proverbial business end of a big data operation, different viewpoints appear. As with code, more data makes things more difficult. When the amount of data gets truly big, so do the problems in managing it all. IDC estimates that big data companies will sell $125 billion worth of solutions to those problems in 2015 alone. These direct costs are huge, but they are dwarfed by inherent risks in storing unbound amounts of private user data.

Regulatory compliance is a factor that big corporations, publishers among them, may be uniquely suited to tackle, but the business risks of storing data are manifold. Nobody wants to be the next Ashley Madison, but the even bigger risk is breaking the trust of users in more mundane ways.

Here’s a hard truth: regardless of the boilerplate in your privacy policy, none of your users have given informed consent to being tracked. Every tracker and beacon script on your web site increases the privacy cost they pay for transacting with you, chipping away at the trust in the relationship.

So data is a liability with an ongoing cost. But what are we getting for the price? The all too typical corporate big data strategy boils down to three steps:

1. Write down all the data

2. ???

3. Profit

This never makes sense. You can’t expect the value of data to just appear out of thin air. Data isn’t fissile material. It doesn’t spontaneously reach critical mass and start producing insights.

The solution we at Richie advocate is simple. You don’t start with the raw data. You start with the questions you want answered. Then you collect the data you need (and just the data you need) to answer those questions.

Think this way for a while, and you notice a key factor: old data usually isn’t very interesting. You’ll be much more interested in what your users are doing right now than what they were doing a year ago. Sure, spotting trends in historical data might be cool, but in all likelihood it isn’t actionable. Today’s data is.

This is important, because it invalidates the whole premise of storing data just in case you’ll need it later. You simply won’t, so incurring the cost of storing and managing and safeguarding it makes no sense at all.

Actionable insight is an asset. Data is a liability. And old data is a non-performing loan.

Data is not an asset, it’s a liability

Written by Marko Karppinen