Dark data is more important than big data

This is based on a talk given at the Big Data Debate organized by Import.io. Here are my slides.

Imagine if…

Imagine if you had Google Glass, or the Iron Man suit, and your heads up display (HUD) could tell you anything you wanted to know about everything in your field of vision.

What would you want to know? What would you benefit from knowing?

  • How old is this?
  • Who owns this?
  • How much does it cost?
  • How was it manufactured?
  • What material is it made of?
  • Where did it come from?
  • Who else has been here?

These are just a few of the many questions that you could ask of your surroundings.

What is “Dark Data”?

There are three types of dark data. Let me briefly define them and provide an example for each:

  • 1) There is data that is not currently being collected.
  • An example of this is location data before Foursquare, or social data before Facebook. Where did the people go? Who did the people know? Now we know.
  • 2) There is data that is being collected, but that is difficult to access at the right time and place.
  • In front of you, there is a pine tree. How do you know it is not a fir? Because, in some book, in some library, there is an explanation of the difference. That’s useless. Here and now, we need information applied to the present.
  • 3) There is data that is collected and available, but that has not yet been productized, or fully applied.
  • You’re walking down Fifth Avenue in Manhattan. Every building you look at, Wikipedia has vast amounts of data about. But technology startups are only just beginning to figure out how to bring that data to you, and make it valuable. The burgeoning field of augmented reality is full of opportunities like this.

What’s the difference between “Dark Data” and “Big Data”?

Big data problems are problems caused not by the inaccessibility of data, but by the abundance of it.

That’s why big data opportunities are smaller than dark data opportunities. Dark data is a bigger problem, because it hasn’t been surfaced yet. And the bigger the problem, the bigger the opportunity.

Big companies tend to have big data problems, and they know it. That’s why big data is a great market. Lots of customers with lots of data willing to pay startups to help them make sense of it all. Think banks, insurance companies, telcos, hospitals, and on and on…

Startups going after dark data problems are usually not playing in existing markets with customers self-aware of their problems. They are creating new markets by surfacing new kinds of data and creating unimagined applications with that data. But when they succeed, they become big companies, ironically, with big data problems.

Dark data is everywhere

In my “useless” liberal arts background I learned about this dude named Immanuel Kant. Kant split experience in two . There is the experience of reality itself. Reality is infinite, multi-layered and complex. Kant called this the “phenomenal” realm. Then there is the way we interpret and understand reality, as we describe it with language and data. Kant called this the “numenal” realm. To make sense of reality, and to navigate our way through it, we have to abstract away meaning from it by simplifying it through creating models, frameworks, world-views, etc.

If reality is infinite, multi-layered and complex, the good news is that there are always more types of data to extract, and new types of applications to create on top of that data. That’s why there are so many dark data opportunities all around us.

Great companies that are surfacing dark data

If your startup is surfacing dark data, I’d like to hear about it, feel free to reach out. Several of these companies I am either friends with or advise, so full disclosure, but here are some that come to mind:

Boxes — a social network for stuff. Stuff is dark data. All of your stuff is not online. There’s no place online that has all the things that I own, all the things that I want to own, etc.

NewHive — the blank canvas for the web; a social network for creativity. Expression and art is dark data, but create the right platform, and all of a sudden, all of it springs into light.

Xola — a booking and distribution platform that powers businesses offering lifestyle experiences. Their software helps these businesses manage their back-office and online reservations, payment processing, calendaring, inventory and guide management, and customer relationship management. All of this is dark data: until Xola, most of these businesses were being run with pen and paper, out of a cigar box. Now, all of their data is running through their platform.

The Tip Network — these guys are taking tips at restaurants (and eventually bars, hotels, casinos, etc.), which are currently all handled old-school, with receipts and cash and paper records (dark data), and moving them into the digital era, with beautiful software that adds value (in multiple ways) to both servers and restaurants. They will be processing the $35B in tips in the US every year, and soon will be adding other services for restaurants and services, from payroll to banking, on top of that platform.

Newtrust — Louis Anslow’s startup idea is based on the realization that everything from the school you go to, to your LinkedIn profile, is ultimately about signaling credibility to create trust, so that you can be employable and well compensated, but that instead of relying on proxies for trust, we should go right to the source: the work itself, as it is done, every hour of every day, and track and measure that — it is valuable dark data.

NeuroVigil — your brain activity is dark data.

Nest — your home energy consumption patterns are dark data.

23andme — your DNA is dark data.

My friend Louis Anslow trotted out this great line recently:

“Often that is treated as important which happens to be accessible to measurement”
Friedrich Von Hayek

That which is not accessible to measurement may be very important tomorrow, even though it is dark to us today, it just needs to be brought to light.