The Metadata Economy — The Future of Trusted Data Sharing

Published in

ScribbleDataBlog

7 min readMay 29, 2017

*Metadata — the portable trailer for your data movie* (Photo by Tracy Thomas)

So here’s the idea — The best companies of the future will share data among themselves to multiply the rewards for each. Data qualifies as an asset on a company’s balance sheet (Gartner account required), only that archaic accounting principles prevent it from showing up. Businesses should treat it as such and use this asset maximally.

What this post is about

This post talks about symbiotic businesses buying and selling their data from and to each other, like an ecommerce business with a property developer, unearthing new areas in the city with higher disposable incomes to ramp up their delivery capacity in those areas, in the case of the former, and consider fancier apartment construction for the latter. The businesses participating can be orthogonal in the markets they target, in the industries they’re in, and even their revenue models, but the key to building value is in organizations discovering the right, trusted external data to grow their business.

What this post is NOT about

It’s not the whispered nexus between your whatsapp contacts and your facebook friend suggestions — no, that’s all intra-company. Also not the mashing of a company’s data with public records. Valuable, but not the thrust of this post.

What this post is DEFINITELY NOT about

This post departs massively from what is commonly called the “data brokerage” industry, where companies exist to shadily collect consumer information and sell to any Johnny. Limited regulation and massive loopholes have allowed the industry to reach its stable equilibrium in a cesspool of unethical behaviour, abject lack of concern for impact on the consumer, and expectedly, crappy data. The legitimate companies that sell data (and there are several) are a casualty of this post.

The general argument that data sharing adds economic value isn’t likely to be disputed so let’s dig into a possible framework.

A data transaction scenario

Let’s imagine that Company A and Company B are able to exchange data that adds economic value to each, or that either company is able to buy data from the another. Trust between the two companies is critical. Once established, the framework should extend to a multipartite arrangement (i.e. Company N can also participate).

Depending on the type of data that’s being transacted, the toes most vulnerable in this arrangement are those of the layman (the ‘C’ in B2C), who might’ve consented to Company A storing and using their data, but hasn’t permitted reselling it to Company B.

The second set of exposed toes are those of the companies themselves. In this utopian data-sharing future, these are the best and biggest businesses, and the data they share cannot compromise their relationship with their customers. The data also has to be limited in its use by other companies (i.e. collateral use of the data in time or purpose shouldn’t compete against the original company that owned the data), and finally, the data sharing arrangement, i.e. the data that they get in return, actually needs to adds to their topline (e.g. via better marketing) or bottomline (e.g. risk mitigation).

So what does this dance look like?

I propose two key players

A Trusted Data Custodian (TDC)
Metadata

The TDC’s office has three mandates — the first is to establish trust with companies, the second to enable valuable discovery of data and opportunities external to each company, and the third is to set up a framework for these data transactions. The market will not clear if any of these are deficient.

To accomplish this, bear with me as I use a dating analogy below. The TDC’s office has three layers:

Layer 1 [Repository]: The TDC is like a Wingman, but this wingman, Company A’s trusted confidante, plays the same role for companies B through N. They entrust the TDC with their data. The TDC does an audit to examine the quality of data held by these companies and provides a rating to each. Companies are protected by a chinese wall.

Companies can confide in the TDC about the business contexts or business cases for which they need external data. A more accurate description of this layer might be that it operates like a large law firm (but I went with wingman so that the extended dating metaphor below holds).

Layer 2 [Discovery]: Here, the TDC presents a Tinder-like experience for businesses to swipe right or left on the data of other businesses. The main value-add here is the matches that the TDC suggests. The TDC knows the preferences (business context) in which companies are looking for data so the matches suggested are between companies that have the kind of data that would be useful to one another. Filters such as data quality, adjacency of business models and overlapping data sets (i.e. data sets of two or more companies where the TDC finds a primary key) can all be applied.

The profile pic? Why, it’s the metadata — depending on the type of date the companies want to go on, the metadata could be descriptive or structural. It could be technical metadata or business.

When Company A finds a fit-for-purpose dataset in Company B, Company A swipes right. Unlike Tinder though, Company B gets notified when this happens, so that even if they don’t see immediate value in swapping data, perhaps there will be an opportunity down the line, or through the TDC’s guarantee, Company B trusts Company A as a buyer. This notification process helps expedite the matchmaking.

It should be entirely possible to extend this to a multipartite scenario.

Layer 3 [Facilitation]: The TDC plays Chaperone — It’s once the companies have swiped right that the TDC’s hard work begins. For brevity, I’ll just list the broad activities and responsibilities that this stage entails. Deeper dive in a subsequent post.

Set boundaries on the use of the data (specific purpose, period of validity, NDA, etc)
Set stringent standards for itself as it facilitates the data transactions. These should be far tighter than any government regulations and policy. It should specifically address third party privacy, i.e. the privacy of any parties identified in the data
Maintain a ledger — quantity and quality of data exchanged, especially for transactions separated in time, and for those involving multiple companies. Critical if the transactions are unilateral, i.e. data from Company A bought outright for cash by Company B.
Set up a transparent data transfer process, even if any underlying technology for the purpose is proprietary. The process should be auditable by any participating company or third party. The competitive advantage for this TDC over others that emerge in the market will be in the technology they use, not in the process, which is a trust-builder
Measure and report on the economic benefit of the data share
Grow the virtuous cycle of building trust, surfacing the right data to the right company, and facilitating the transactions seamlessly. The idea is to build a repeatable and scalable process, the cost of which should come down with each transaction.

* The obligatory disclaimer — This dating analogy, especially the Tinder interface is simply an abstraction of the concept — no company is actually going to go on a data date this way.

The TDC here is a for-profit organization that sets standards and processes around the data transactions conducted through it. It’s a technology services firm (using cutting edge analytical tools to study the metadata and data of its clients), with a strong legal and accounting arm.

A short note on Metadata

So many businesses are under-utilizing their existing structured data because they’re either caught in a FOMO cycle of focusing on collection and storage (both structured and unstructured data), or they don’t have the right people, processes and tools to digest what they already have, meaningfully.

Their efforts are split between Big Data collection, and, given the torrent of new AI and ML technologies/startups, playing catch-up as to what wringer to best put their data through. As a result, the structured data, even with its messiness, isn’t being optimally used. Money is routinely left on the table.

I see metadata playing a big role in the coming years. It’s possible to annotate a number of attributes about data that will make the pre-processing of data rich and insightful. The adoption of the right tools and processes for metadata management will ease the identification of reliable data from among the haystacks as well as the ad hoc analytic process immensely. It will give users a preview of what to expect in the actual data, and so, will ease the sharing, querying, and enriching of data, not just within an organization, but between organizations.

Metadata will be dynamic — tools will auto-update the metadata on the fly, based on the actual data. In that sense, Metadata will be the trailer of a movie you can carry around with you, and share as you will, rather than having to lug the movie itself. Perhaps it makes sense to call it Small Data.

Indra is a co-founder of Scribble Data, a data analytics product company that, unsurprisingly, makes lightweight, cloud-based analytic tools that turn metadata into analytic assets. You can reach him here, and here.

The Metadata Economy — The Future of Trusted Data Sharing

Written by Indrayudh Ghoshal