Data is useless

We are drowning in a sea of data, and yet we are no smarter


Quick quiz. Can you identify the dismantled product above, with all its parts neatly arrayed? If you haven’t seen the image before, it’s not easy to guess. For the impatient, the answer is at the bottom of the post; for the rest, keep reading for a detour into a discussion of data and design before your moment of zen.

We are drowning in data.

Everywhere you look, we are bombarded by data in all forms. And yet, it is a paradox of our time that the more data we have, the harder it is to find that nugget which is of value. That’s why I get annoyed when folks use the words data and information interchangeably.

Data

noun: facts and statistics collected together for reference or analysis

Information

noun: what is conveyed or represented by a particular arrangement or sequence of things

Let’s consider this distinction in the context of one of the largest sources of data — the internet. Over the past few years, transparency and sharing of data has become an important indicator of transparency in societies. The image below plots the growth of the open data movement across the world.

http://visual.ly/open-data-movement

Shining a light on the data is useful for many reasons, but we haven’t been able to harness its full power. One reason is the lack of co-relation between most of the data out in the web. What we seem to have is a predominance of un-related sparse data sets or chunks of unstructured data. Interesting pieces by themselves, but of limited utility.

There have been some attempts at co-relating these disparate sets of data. For instance, Google uses keywords to link data sources. This approach works well for a search engine, but it lacks context. Wikipedia uses the wisdom of the crowds to collate snippets around a topic into something meaningful.

And yet, all this feels a bit unsatisfying; it’s like going to a fancy restaurant with a sumptuous spread and limiting yourself to the appetizers.

Enter Mashups

No. Not the music ones, but it’s geeky web counterpart.

In web development a mashup is a web page, or web application, that uses content from more than one source to create a single new service displayed in a single graphical interface.

In mashups, data from various sources are combined in the right order and presented in an intuitive visual. Most mashups combine data from various sources which have at least one common link. This makes data accessible and easy to interpret. Data can now become information.

Now, most mashups are created out of structured data sources which have APIs or have some inherent structure to the data itself. But the more interesting bits of insights can be found by combing blobs of unstructured data. Unfortunately this isn’t easy. Co-relating unstructured data to find useful information requires serious computing horsepower and substantial amount of engineering effort.

But it can be done.

In fact, at Compile, we have been working on gleaning actionable information from the mounds of unstructured data on the web. Some think of us as a contextual search engine while others view us as a contact/lead stream. We like to think we are a mix of both — our engine crawls the web, specifically, the dark side of the web and identifies documents that can indicate buying intent. We then identify the organisation and the right contacts for this opportunity.

The key is that we assemble both structured and unstructured information from around many different sources to arrive at an opportunity.

So how do we do it?

compile.com/technology/

It all starts with a single keyword which is relevant for your business. The machine then scours the web to find documents related to the keyword. Next, our scoring algorithm kicks in and attaches opportunity scores to each document. Based on the organization, the location and contact details of the relevant people are identified. This is then neatly presented to our customers.

But that’s enough about us. The point is, data by itself isn’t of much use. Co-relating it with other data sets as well as events and presenting it in an easy-to-consume package is the key. And when people have access to all this data, they go crazy in finding all sorts of co-relation between data. That’s when you start to uncover some interesting trends.

Finally, since you have read, here’s your reward — the picture above is a dismantled Adler Favorit typewriter from the early 1900’s.

It’s elegant when the pieces are put together, don’t you think?


Cross-posted from Compilations