Blockchains are a data buffet for AIs

Sam Altman recently wrote that we are entering an era of hyperscale technology companies. These companies own massive troves of data with strong network effects around them and they are only getting stronger. Google and Facebook now own almost 70% of internet ad revenue and rising, growing a combined 103% in 2016 while the sum of everyone else shrank.

This has important implications for the development of AI. AIs are only as good as the data they are trained on. And while many of the tech giants working on AI like Google and Facebook have open sourced some of their algorithms, they hold back most of their data.

In contrast, blockchains represent and even incent open data. While some blockchain-based data will be encrypted and private, much of it will also be open out of necessity. To create an open protocol which helps coordinate resources towards a common goal, the resources need to be known at some level in the same way a lot of of data on the web needs to be public for it to be traversable and useful. For example: creating a decentralized Uber requires a relatively open dataset of riders and drivers available to coordinate the network.

The network effects and economic incentives around these open systems and their data can be more powerful than current centralized companies because they are open standards that anyone can build on in the same way the protocols of the internet like TCP/IP, HTML, and SMTP have achieved far greater scale than any company that sits atop them. Bitcoin is an early example of this: it went from zero to the largest computing network in the world — 10,000x bigger than the top 500 supercomputers in the world combined — in just a few years. And oracle systems (a fancy way of saying getting people all over the world to report real world information to the blockchain in a way we can trust) like Augur will inject more data.

This open data has the potential to commoditize the data silos most tech companies like Google, Facebook, Uber, LinkedIn, and Amazon are built on and extract rent from. This is great for society: it incentivizes the creation of a more open and connected world. And it creates an open data layer for AIs to train on.

This is important for AI development for 3 reasons:

  1. It allows for more rapid and open innovation by creating a global data buffet for anyone who wants to create AI.
  2. It gives us a higher chance of creating safe AI. AIs trained on open data are more likely to be neutral and trustworthy instead of biased by the interests of the corporation who created and trained them.
  3. Since blockchains allow us to explicitly program incentive structures, they may make the incentives of AI more transparent.

Simplified, AI is driven by 3 things: tools, compute power, and training data. OpenAI is doing important work by releasing tools which promote AI to be developed in the open. Compute power is largely produced by NVIDIA and Intel and still relatively expensive, but openly purchasable. Blockchains may be the key final ingredient by providing massive pools of open training data.

So what happens to business models in computing when you commoditize the data layer? My guess is they shift to 1) creating blockchain protocols and their native tokens and 2) AIs that leverage the open, global data layer of the blockchain. And there will be a lot of value created economically and societally by this combination. You could imagine token-incentivized marketplaces for the best AIs to come to you. I suspect more projects will be created at this intersection going forward. If this is something you’re working on, I’d love to speak with you.

Thanks to Joel Monegro, Sam Altman, Richard Craib, Chris Dixon, and Trent McConaghy (his expanded thoughts on the subject here) for discussions leading to this post.