The Datawallet Difference

Published in

Datawallet Blog

7 min readDec 10, 2017

This post is for those of you asking “What is this ‘Datawallet’ and how is it different than {other company that says they want to host and create a marketplace for my data}?” The common thread running through these companies is that they are developing permission-based data exchanges (PBDEs) and the recent excitement around them is validation of what Datawallet has been arguing for years — it is time for your data to work for you. But how exactly is Datawallet unique amongst the flood of new entrants in the C2B data supply ecosystem? This post will help clarify this question.

Distilled down to its essence, there are two major differentiating features of Datawallet’s approach. Both stem from our front-line experience developing an ethical data exchange for the last 2+ years.

Difference #1: An Operational End-to-End Ecosystem

The first feature that sets Datawallet apart from other C2B data marketplaces, is our unique ability to deliver a high-availability ecosystem throughout the development process. Our existing operational end-to-end system allows us to continuously deliver a functional ecosystem without any question marks and far-reaching promises while we develop, rigorously test, and then integrate the decentralized components.

We have been building our personal data marketplace for 2+ years and have already developed a complete ecosystem — native mobile applications for data producers to manage their data, sophisticated data products that anonymize and add value to our users’ data, and a proprietary customer insights platform that delivers best-in-class insights and therefore enables enterprise clients to offer data producers a compelling price for their data. This end-to-end marketplace provides the backbone of our tech-roadmap, and allows us to provide continuous service to all stakeholders throughout the development and deployment of our decentralized system.

We like to think about it as we already have the bridge over the river — in fact we are the only bridge for internet users that spans the turbulent waters of the data brokerage industry. And while we will be offering additional lanes and upgrading the infrastructure, you’ll always be able to get where you want to go. This dependability will nurture the development on each shore. Contrast this with PBDEs that only span a part of the river, or worse, are just blueprints (with a range of specificity). Would you build on the shoreline waiting for the realization of the blueprint to ferry traffic to you?

Construction of new Tapen Zee bridge across NY’s Hudson River (courtesy of weewestchester).

Like Datawallet, commuters can still get to where they want to go with the original span while the new bridge is being developed.

We also follow the tenets of modular and composable design. At a functional level, the decentralized components will be drop-in replacements for the current implementations. This gives us an unique opportunity for user-guided adoption. Users can choose when they want to transition to available decentralized implementations, for example, when they want to transition from a centrally hosted Datawallet to a decentralized or localized Datawallet. Allowing users to drive the process smooths out the often jarring experience of forced updates and choosing which component is right for them as there will be advantages associated with each implementation. For example, many decentralized components will have a number of compelling advantages (security, transparency, control, enabling trustless exchange), but there are trade-offs (complexity, friction, gas costs). Which is to say that users not only have the assurance that they can get across the river with Datawallet, but they will be able to choose which lane to use as we add additional spans.

Difference #2: The Datawallet API (or what some may label a “data protocol”)

The success of any PDBE will depend on the value the data consumers can derive from the data that is provided. Most companies we see entering the market, operate on the assumption that simply giving data providers the tool to source their data is enough.

This is an extreme oversimplification of the complexity of the data supply ecosystem and misses a very important point: data in its raw format has little to no value. If you were offered to buy a barrel of oil, you probably would not be willing to spend a single dime on it. The reason therefore is that oil in its raw form does not provide any value to you. You don’t attribute value to oil itself — you value the result it can deliver for you, which is going from A to B. In order to achieve this result, oil needs to undergo a lengthy value chain — from exploration, to production, transportation, storage, refining, and finally retail distribution. We can observe the exact same phenomenon in the realm of data. Similarly to how crude oil doesn’t have any real value to you in your everyday life, companies don’t attribute value to raw data in their normal course of business — they attribute value to the insight that can be derived from it.

Exploration, which is what most PDBEs focus on, constitutes the upstream activity in the data supply ecosystem. However, this is simply the first of three major steps to arriving at a truly valuable product. And these two are arguably a lot harder to master, since they require domain expertise. At Datawallet, we have spent the last two years developing an ecosystem that does not simply focus on the first step of data exploration, but on the entire value chain of data productization. We don’t simply deliver crude oil. We deliver the data equivalent of petroleum, namely actionable data.

But what exactly does actionable data mean and how precisely do we derive it?

The biggest hurdles to gaining insights from data — big or small — is data preparation. This task is given many odious monikers — Data munging, data structuring, pre-processing, data janitor work — and is the most time consuming and least enjoyable task data engineers, data scientists, and ML/AI experts engage in. This challenge posed by messy data is exacerbated in decentralized data sharing ecosystems. PBDE’s that allow for the transfer of unverified data of unknown structure pose a serious hurdle to the scientist and developers looking to use the data.

We can break down three features of actionable data:

1) The data is verified (if it say’s it is a tiger, it is a tiger)

2) The access (place/route/method) to each particular datum is defined (I know where to go to get a tiger, and where to go to get a bear)

3) The access is persistent (If I got a tiger from there last week, I can expect to get tiger there next week, and not get surprised by a random aardvark) with pre-declared null values (If there isn’t a tiger in this zoo, I’ll get a clear indication, not another animal).

Our ecosystem is based upon actionable data that adhere to these principals. In adhering to the first principal, verified data, we distinguish ourselves not only from other PBDEs, but also the current data brokerage industry. Contemporary data brokerage services obscure the sources of their data such that the only way to trust the data is to trust the broker. Similarly other PBDEs require data consumers to trust the data producers. In comparison, Datawallet collates multiple independently verified data sources which ensures the identity of the data source, and the veracity of the data (with the probability of faked data decreasing exponentially in the number of sources collated). We will make our existing and future data pipelines open sourced such that the path from trusted external data source (e.g. Facebook) to collated Datawallet is transparent and the data cannot be tampered with.

We address the remaining features of actionable data with our RESTful Data API. This is how we tame the unruly menagerie of data. Our API provides scientist, engineers, and developers in our ecosystem coherent, reliable, and persistent endpoints to build the next generation of data-driven applications. Through our experience as data scientists and product developers, we have divined a high-level taxonomy that allows data consumers to search and access information intuitively — starting from high-level categories and smoothly branching into more granular endpoints. Overlapping information from different sources are automatically collated, while still providing service-specific access for interested consumers (for example, /demographics/age is based upon a weighted average of all age information available, but a provider’s age from a specific source is also available /demographics/age/fb). Access through the API provides continuous availability (which is difficult in many PBDE architectures) and persistent endpoints. Developers will not need to keep track of what a particular data source calls a particular datum at a particular time and worry about the inevitable breaking source API update.

Taken together, these features of our API will allow confident development in our ecosystem. Combined with the material support from the developer pool, and community support in the form of data and token, the Datawallet Data API will enable data-product creators access to their production-environment data throughout the development process. The success of a PDBE will be determined by its adoption by those creating value from the data made available, and there will be no ecosystem where it is easier to go from idea to innovation than the Datawallet application exchange built upon our Data API.

The Datawallet Difference

Difference #1: An Operational End-to-End Ecosystem

Difference #2: The Datawallet API (or what some may label a “data protocol”)

Written by Datawallet