Blockchain-based Machine Learning Marketplaces

Machine learning models trained on data from blockchain-based marketplaces have the potential to create the world’s most powerful artificial intelligences. They combine two potent primitives: private machine learning, which allows for training to be done on sensitive private data without revealing it, and blockchain-based incentives, which allow these systems to attract the best data and models to make them smarter. The result is open marketplaces where anyone can sell their data and keep their data private, while developers can use incentives to attract the best data for their algorithms to them.

Constructing these systems is challenging and the requisite building blocks are still being created, but simple initial versions look like they are starting to become possible. I believe these marketplaces will transition us out of the current era of Web 2.0 data monopolies into a Web 3.0 era of open competition for data and algorithms, where both are directly monetized.

Origin

Having data scientists compete seemed like a powerful idea. So it got me thinking: can you create a fully decentralized version of this system that could be generalized to any problem? I believe the answer is yes.

Construction

Data Data providers stake data and make it available to modelers.

Model building Modelers choose what data to use and create models. Training is done using a secure computation method which allows models to be trained without revealing the underlying data. Models are staked as well.

Metamodel building A metamodel is created based on an algorithm that takes into account the staking of each model.

Creating a metamodel is optional — you can imagine models that are used without being combined into a metamodel.

Using the metamodel A smart contract takes the metamodel and trades programmatically through decentralized exchange mechanisms on-chain.

Distributing gains/losses After some time period passes, trading produces a profit or loss. This profit or loss is divided up amongst contributors to the metamodel based on how much smarter they made it. Models which contributed negatively have some or all of their staked funds taken. Models then turn around and perform similar distributions/stake slashing to their data providers.

Verifiable computation Computation for each step is either performed centralized but verifiable and challengeable using a verification game like Truebit or decentralized using secure multiparty computation.

Hosting Data and models are either hosted on IPFS or with nodes in a secure multiparty computation network, as on-chain storage would be too expensive.

What makes this system powerful?

Competition between algorithms Creates open competition between models/algorithms in places where it previously didn’t exist. Picture a decentralized Facebook with thousands of competing newsfeed algorithms.

Transparency in rewards Data and model providers can see they are getting the fair value of what they’ve submitted since all computation is verifiable, making them far more likely to participate.

Automation Taking action on-chain and generating value directly in tokens creates an automated and trustless closed loop.

Network effects Multi-sided network effects from users, data providers, and data scientists make the system self-reinforcing. The better it performs, the more capital it attracts, which means more potential payouts, which attracts more data providers and data scientists, who make the system smarter, which in turn attracts more capital, and back around again.

Privacy

A partial solution to the free rider problem is to privately sell data. Even if buyers choose to resell or release the data, its value decays with time. However, this approach restricts us to short duration use cases and still creates typical privacy concerns. As a result, the more complicated but powerful approach is to use a form of secure computation.

Secure computation

The Ultimate Recommender System

This recommender system would be extremely potent. More than any of the existing data silos of Google, Facebook, or others could ever be because it has a maximally longitudinal view of you and it can learn from data that otherwise would be too private to consider sharing. Similar to the prior cryptocurrency trading system example, it would work by allowing a marketplace of models focused on different areas (ex: web site recommendations, music) to compete for access to your encrypted data and recommend things to you, and perhaps even pay you for contributing your data or your attention to the recommendations generated.

Google’s federated learning and Apple’s differential privacy are one step in this private machine learning direction, but still require trust, don’t allow users to directly examine their security, and keep data siloed.

Current approaches

A simple construction from Algorithmia Research places a bounty on a model that is accurate above a certain backtesting threshold:

Simple construction creating a bounty on a machine learning model by Algorithmia Research

Numerai currently takes things three steps further: it uses encrypted data (although not fully homomorphically), it combines crowdsourced models into a metamodel, and it rewards models based on future performance (in this case, one week of stock trading) rather than backtesting through a native Ethereum token called Numeraire. Data scientists must stake Numeraire as skin in the game, incentivizing performance on what will happen (future performance), not what has happened (backtested performance). However, it currently centrally distributes data, limiting what feels like the most important ingredient.

No one has created a successful blockchain-based marketplace for data yet. The Ocean is an early attempt to outline one.

Still others are starting by building secure compute networks. Openmined is creating a multiparty compute network for training machine learning models on top of Unity that can run on any device, including game consoles (similar to Folding at Home), then expanding to secure MPC. Enigma has a similar tact.

A fascinating end state would be mutually owned metamodels which give data providers and model creators ownership proportional to how much smarter they’ve made them. The models would be tokenized, could pay dividends over time, and potentially even be governed by those who trained them. A sort of mutually owned hive mind. The original Openmined video is the closest construction to this I have seen so far.

What approaches are likely to work first?

One thesis I use to evaluate blockchain ideas is: on a spectrum of physically native to digitally native to blockchain native, the more blockchain native, the better. The less blockchain native, the more trusted third parties are introduced, increasing complexity and reducing ease of use as a building block with other systems.

Here, I think that means a system is more likely to work if the value created is quantifiable — ideally directly in the form of money, and better yet, tokens. This allows for a clean, closed-loop system. Compare the prior example of a cryptocurrency trading system to one that identifies tumors in X-rays. In the latter, you’d need to convince an insurance company that the X-ray model is valuable, negotiate how valuable, and then trust a small group of physically present people to verify the model’s success/failure.

That’s not to say more clearly positive sum for society uses which are digitally native won’t emerge. Recommender systems like the one previously mentioned could be enormously useful. If attached to curation markets, they are another case where the model can take action programmatically on-chain and the system’s reward is tokens (in this case from the curation market), again creating a clean closed loop. It seems obscure now, but I expect the realm of blockchain-native tasks to expand with time.

Implications

Standardization and commoditization cycles in tech, where we are nearing the end of the networks era of data monopolies. Chart from Placeholder.

Said another way, they create a direct business model for AI. Both feeding and training it.

Second, they create the most powerful AI systems in the world, attracting the best data and models to them through direct economic incentives. Their strength increases through multi-sided network effects. As the Web 2.0 era data network monopolies become commoditized, they seem like a good candidate for the next re-aggregation point. We are probably a few years out from this, but it seems directionally correct.

Third, as the recommender system example shows, search gets inverted. Instead of people searching for products, products search and compete for people (credit to Brad for this framing). Everyone might have personal curation markets, where recommender systems compete to place the most relevant content in their feed, and relevance is defined by the individual.

Fourth, they allow us to get the same benefits of the powerful machine-learning based services we are used to from companies like Google and Facebook without giving away our data.

Fifth, machine learning can advance more quickly, as any engineer can access an open marketplace for data, not just a small group of engineers in the large Web 2.0 companies.

Challenges

Calculting the value a particular set of data or model provides to the metamodel is hard.

Cleaning and formatting crowdsourced data is challenging. We’re likely to see some combination of tools, standardization, and small businesses pop up to solve this.

Finally and ironically, the business model for creating the generalized construction of this sort of system is less clear than creating an individual instance it. This seems to be true of a lot of new crypto primitives, including curation markets.

Conclusion

Thanks to Andrew Trask, Richard Craib, Trent McConaghy, Brad Burnham, Joel Monegro, Simon de la Rouviere, Gavin Uhma, Morten Dahl, Jonathan Libov, Matt Huang, Laura Behrens Wu, Naval Ravikant, and Daniel Gross for conversations which contributed to this post.

--

--

@Paradigm. Previously co-founder @Coinbase, trader @GoldmanSachs, computer science @DukeU.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Fred Ehrsam

@Paradigm. Previously co-founder @Coinbase, trader @GoldmanSachs, computer science @DukeU.