Why it’s time dark data came out of the shadows

Ellen Children
InfoSum
Published in
3 min readAug 9, 2017

Imagine buying a ring for £10, wearing it everyday for 30 years and then finding out it’s worth a massive £350,000. Well, similarly, companies worldwide are potentially sitting on their own diamonds: dark data.

What is dark data?

Gartner defines dark data as “the information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes”. For example, this includes “text messages, documents, email, video and audio files, and still images”.

Dark data gained it’s rather unfortunate name from being difficult to process and analyse using traditional methods. It is generally unstructured, often hoarded in one large repository and then left untouched — so it usually gets forgotten about like the bummock of an iceberg.

Image credit: NASA/JPL-Caltech

This hidden data can contain confidential or sensitive information, such as “log archives and other untagged, non-inventoried data” — posing a range of risks if it ends up in the wrong hands. Crucially, dark data is subject to the same data privacy and security laws as it’s lighter alternative (or the hummock). As it has, after all, been generated from people and so some of it may be personally identifiable data.

Time to shine

In a similar way to integrating external data, harnessing previously unanalysed dark data gives companies the opportunity to unearth new insights and inch ahead of competitors. According to the International Data Corporation, dark data makes up a whopping 90% of the data collected by companies — and is growing at an exponential rate.

The technology to find the gems in dark data is out there. Earlier this year, Apple interestingly spent $200 million to buy Lattice Data — essentially to get hold of their machine learning which makes dark data useable. Although shining a light on dark data will require some time and investment, the benefits will be worth it in the long term. There is also the risk that competitors will invest in processing dark data, so companies may end up paying the cost through lost opportunities anyway.

In the same way that only the right big data is useful, only the right dark data is actionable and the rest should be deleted. Finding that diamond ring does require some creative thinking around asking the right questions. For example, as pointed out by Steven Dough in a recent piece for Ad Age, in the context of e-commerce:

Using data collected during an interaction with the customer can also help markers infer and test different types of behaviours for which they don’t have direct answers.

Take hover data. Consumers who hover their cursor over a link for a while but do not click it, or hover over a product but never put it in their cart, might have a totally different intent than someone who clicks and buys or simply scrolls on through.

This “hover insight” can be the basis of exploratory re-marketing campaigns to see if a consumer didn’t take action because of pricing, product details or simply bad timing.

Dough also references the insight which can be derived through examining how many comments a customer reads, and whether they read more comments with positive or negative connotations. This information would enable an e-commerce retailer to take actions and encourage customers to make that purchase.

Of course this is only one use case, but there are plenty more: from improving health care and travel, to making scientific discoveries and preventing fraud. So the choice is up to companies: stick to the norm and risk becoming a laggard, or explore something new and potentially find a diamond.

--

--