Tokenized Data Markets

Bharath Ramsundar
Computable Blog
Published in
3 min readJun 4, 2018

We just posted our first research paper, Tokenized Data Markets, onto the Arxiv. This post provides a teaser introduction to the paper contents. Enjoy reading and download the full paper to learn more!

Data markets connect buyers and sellers of datasets with one another. Such markets may prove a fundamental new primitive for the next stage of the internet, especially as machine learning and AI systems continue to embed themselves at the heart of the modern technology ecosystem. Learning methods are often data hungry, and require access to large datasets in order to make accurate predictions. Unfortunately, such datasets are nontrivial to gather, and existing data markets lack liquidity. Only the largest and most connected organizations have the resources to secure access to the data they require. The construction of liquid data markets would fundamentally shift this distribution of power and facilitate the broad adoption of machine learning methods.

How can such a data market be constructed? One option is to identify a trusted entity to act as a centralized data broker. Such a broker could enable transactions between buyers and sellers of data by storing datasets on-site and transferring them upon payment. Unfortunately, such a model creates a heavy burden of trust; how can buyers and sellers know that the broker is behaving fairly? Centralized cryptocurrency exchanges already have a checkered history of fraud and theft. It seems all too likely a centralized data exchange could fall prey to similar problems. For these reasons, the construction of a decentralized data exchange could prove an enabling technology for liquid data markets. Such an exchange would facilitate transactions of data between buyers and sellers without the need for a trusted third-party broker. Furthermore, tokenization of data offers a powerful new primitive for solving cold-start problems that generally make boostrapping a marketplace difficult. While many might agree that pooling data creates non-zero sum value for all participants, most hesitate to be the first to contribute without some contractual guarantee of value. With decentralized data markets, the earliest contributors see financial incentive because they can receive tangible cryptoeconomic assets (tokens) even before buyers enter the market.

The construction of a decentralized data exchange is not straightforward. How can participants ensure that their datasets are stored and transferred correctly? How can cheaters be caught and removed from the system? These are deep questions which delve into the heart of multiparty protocols. Luckily, the advent of blockchain based systems with associated smart contract platforms has triggered significant research into the design of multi-agent systems designed to perform nontrivial work. For example, prediction markets, decentralized token exchanges, curation markets, token curated registries, storage markets, and computational markets provide various examples of systems designed to perform useful work by coordinating selfish actors. Primitives introduced by such protocols can be repurposed to serve as a foundation for decentralized data markets.

The token curated registry (TCR) in particular provides a powerful abstraction for how a collection of participants can work together to build a curated list. For example, such a list could contain the names of colleges which enable students to rapidly pay back student debt after graduation. Basic implementations of TCRs in Solidity already exist. However these implementations have a number of limitations. For example, storage is typically on-chain for simplicity; this basic design wouldn’t permit for the construction of a list of images since images are too large to be stored on existing smart contract platforms. In addition, the contents of the registry are publicly visible, so sensitive information can’t be assembled.

To overcome these issues, it proves useful to specialize the basic design of token curated registries to fit within a structured framework which explicitly allows for off-chain storage and private data. In addition, we introduce the new notion of recursively nesting TCRs to allow for the construction of more complex data structures. We call this modified mathematical class of structures tokenized data structures

An Example of a Tokenized Data Structure

To continue reading, download our research paper on Tokenized Data Markets

Originally published at www.computable.io.

--

--

Bharath Ramsundar
Computable Blog

Co-founder and CTO at @ComputableLabs. Prev: Creator of https://DeepChem.io . Author @OReillyMedia. Stanford CS PhD on deep drug discovery