Generative AI needs Blockchain

Andrew Christian Davis
10 min readDec 6, 2022

--

An image made by Dall-E with a prompt created by ChatGPT
An image created with Dall-E 2 with a prompt made using ChatGPT

I started digging into and writing about generative AI in the fall of 2021 on the heels of winding down a company. We were experimenting with GANs, autoencoders, and DRL models to create a holistic, digital taxonomy for music for social and discovery use cases.

While we wanted to use deep learning to make discriminative models to better understand music, I soon saw very clearly how generative models would be used to create music and other media in powerful ways.

I knew we were at the beginning of an exciting period in AI, but I didn’t think the space would gain as much steam as it has so quickly. It’s stunning to witness what’s happening in image and text generation with Dall-E, Stable Diffusion, Midjourney, ChatGPT, and other models.

With all the fiascos we’ve seen this year in crypto, industry interest is moving away from blockchain and web3 to AI. The thing is, blockchain is well positioned to underpin the AI explosion that’s ahead in order to secure and protect this ecosystem. We’ve got big challenges ahead if we don’t.

Generative AI’s data problem

Have you ever wondered where the data comes from to train AI models? As it stands, massive amounts of data are needed to train these powerful models that we’re witnessing. In the case of Dall-E 2, as an example, it’s taken hundreds of millions of pictures to train. There are critical questions for the industry both now and as we move forward:

  • How is data for these models acquired?
  • If there are protections or licenses of any sort for this data, how do we ensure that they’re honored?
  • What systems will track and verify ownership and usage of this data, particularly as they’re used and transformed throughout an AI pipeline/value chain
  • As new, derivative assets are created from the models trained on other’s people’s data, will value continue to flow down, even in part, to the original IP owners?
  • For rights holders, where does their claim to the value created by their data and IP stop?

Microsoft, parent company to Github and OpenAI (the company behind GPT3 and Dall-E), is coming under a proposed class action lawsuit with allegations of “software piracy on an unprecedented scale,” relating to their approach to building an AI coding model that’s further discussed here. This is an example in a string of recent situations including disputes involving Clearview AI’s alleged usage of 3 billion people’s biometric data and a Disney illustrator whose design style was unwillingly used to train an AI image model. These are in addition to lawsuits over the past few years involving Google’s Book Search algorithm and Facebook’s usage of data for AI model training. While the latter lawsuits were dismissed, these situations are symptomatic of a core issue around data ownership that needs to be addressed.

Companies, engineers, and data scientists are taking a build-first, ask-questions-later approach and undoubtedly driving the industry forward, technologically. From a legal standpoint, there simply aren’t clear frameworks around this yet. That said, exploiting non-sensitive, public data and IP is an ethical gray zone that needs clarity. While the industry is moving forward and likely won’t slow down, my sense is that a day of reckoning is ahead for companies and developers using this data, and the bill will come due.

These are concepts and topics we have to figure out in order to have a transparent, equitable, and scalable system.

We need standards and technical infrastructure

Here’s the thing: we need insanely large amounts of data to move this Generative AI industry forward. Data is owned by people and organizations, and most of them don’t want to give up that data — their IP and assets — for free or at least without attribution. This sets up a three-sided dynamic:

  • Data owners need to receive value for the use of their data/IP
  • Engineers and data scientists are hungry for bountiful data to build innovative AI models
  • Consumers and creators have had a taste of generative AI’s capabilities and are eager for more amazing experiences

Blockchain-based data and IP infrastructure and protocols could underpin this entire system at every stage in the AI value chain. This means that every participant in this ecosystem — those who provide data, train models, create applications, and produce IP from those applications — can all benefit and participate in the upside. Not only would every participant layer recieve compensation or some meaningful value from the layer above (ex. model developers paying for data, application developers paying model developers for access to their models, etc.), value could flow and accrue down the whole stack as end users purchase and engage with content/data.

In practice, issuing blockchain-based standards would enable, “an understanding of how data…has moved or been processed within a complex data ecosystem from the time the data was originally collected” and “track all the downstream applications that touched or may have used this data” — qualities needed for data provenance, as noted by Krishnaram Kenthapadi, Chief Scientist at Fiddler, a machine-learning model monitoring company, and award-winning reporter, Kate Kaye in her Protocol article.

Particularly for the media & entertainment industry — a market with powerful walled gardens owned by labels, publishers, and studios — this would introduce a system where they could still derive value from their assets while giving the other industry participants needed data to raise the tide for everyone. Such a system could be formed by protocols, platforms, products, or some combination thereof.

Where this is going

Tech stack evolution

Over time, generative AI will augment the entire tech stack — increasing ease of use and accessibility in addition to output capabilities. This will happen in model development, app development, and app usage. We’ll see several noteworthy impacts from this:

  • Data will become even more accessible and abundant
  • Rich AI models will become easier to build
  • Devs will have a field day with building apps, and no-code solutions will continue making this layer in the stack more accessible to non-technical people
  • End-user creators will do what they do best and take what will be more intuitive and robust tools at their disposal to bring us to new heights

The collective impact of these things will lead to an explosion of data that I don’t think we can fully grasp yet. Further, it would unleash a system that would exponentially build on itself — a powerful, circular, and progressive daa economy.

Generative AI’s impressiveness and value will soar as models tackling more rich, complex media emerge. Text and image-based models are the stars of this first act we’re in, but it’s not difficult to see how these capabilities can be applied to video, audio and games, which span a number of industries.

Creating the kinds of models that we’re seeing in text and images in other forms of media will be much more difficult to create due to their data having heightened complexity and progressive inaccessibility.

This reinforces the need for a blockchain-based system that will unlock a sea of properly accessed data. This is critical for us to push further into the dynamic, AI-driven creativity that’s ahead.

Composable data and IP

We’re already seeing people using ChatGPT in fascinating ways like building a Linux virtual machine and creating games from scratch. This kind of creativity will only increase as time goes on, especially as these tools’ get more robust.

With the combined capabilities of generative AI and various blockchain tools, the day is nearing when all of the data, models, apps, and creative works will converge and build on each other. We’ll see composability interfaces emerge in various sectors similar to what exists in crypto with applications such as DeFi Saver, which leverage the capabilities of protocols like Aave, Maker, and Compound in one application.

Being on chain, it’s plausible to have a system where each aspect of this ecosystem is more easily composable and tracked:

  • A plethora of data and data types can be pooled together
  • Collaborative model building/training mechanisms can succeed
  • Combining diverse models and their capabilities will be possible
  • Applications and protocols will be combined to create super apps — unlocking incredible experiences for consumers, and more interestingly, robust tools for creators
  • Creators will create in new ways — for example, taking what we’ve seen made rudimentarily, and elevating it to pristine, professional-grade quality work

This coagulated, divergent, and derivative work infrastructure would establish powerful plumbing for DAOs and decentralized IP creation — media or otherwise. It’s here that incentive mechanisms, tokens, smart contracts, oracles, and other Web3 tech and frameworks — used in tandem with generative AI — create a beautiful, wide, and protectable design surface area for builders and creators.

This is the evolution of the internet that’s ahead of us. It’s blockchain-enabled and AI-fueled.

Implications

As we move forward here, we must consider the implications of this progress. This is uncharted territory with powerful tech that will improve exponentially and have incredibly far-reaching effects.

Our workforce

What this means for our workforce and existing roles/functions throughout industries

There is no question that there are jobs and roles that will either no longer exist or become niche as AI’s capabilities expand. It will be comparable to the decrease in horse breeders, stable hands, and wagon makers as cars and railroads came on the scene and grew in ubiquity. Similarly, there will also be new jobs and functions that will arise in this new era we’re walking into. However, in certain instances, like creative and artistic fields where craftsmanship is valued, certain roles illicit even higher premiums.

IP law

A massive consideration here is how the regulatory environment will evolve with generative AI and copyrights. There has been a mix of outcomes in the courts involving the protection of work from AI — some ruling in favor of protection, but others ruling against it.

There are a bunch of questions and thoughts to dig into on this topic:

  • What can be protected in the courts, if at all, with AI in the mix?
  • How and where do you draw lines of ownership between AI and human contributions? With any given creative asset that’s developed, how much ownership goes to the creator(s)? How much to the AI? How much to the developer(s) of the model(s)? How much to the people whose data contributed to the model?
  • Does legal protection even matter here as long as everything is tracked on chain? Do you need to protect something that code, via smart contracts, can defend and ensure value is dispersed properly? If any asset (be it data, a model, or output from a model) is verified on chain, and that asset is consumed or used in some way, theoretically, the creator of that asset could always receive some value back — be it monetary or otherwise.

Blockchain and AI introduce new considerations relating to IP law that our regulators must keep pace with to maintain protection and fairness for the market in this period of transformation.

Transparency

Transparency in generative AI touches on various areas as it pertaint to how how we consume and engage with data and content from generative models.

Imagine you’re buying a piece of art or a song. Would knowing how much human effort or creativity went into the creative process influence your price sensitivity?

Imagine a farmer in Kansas leveraging an agronomic analysis model (to help them in their crop planning) that was developed, unbeknownst to them, and in small part, with data from another farmer in a sanctioned country. If monetary value were to accrue back to the farmer in the sanctioned country, should the Treasury Department enforce a fine on the Kansas farmer?

Imagine deepfakes becoming indistinguishable from originally recorded videos. How would we protect ourselves from widespread misinformation — malicious or not?

Things can get messy very quickly as we start to look into our future with this technology — balancing transparency, equity, and privacy. The business community and our policymakers have to keep pace with the technology’s development to ensure fair, clear, and holistic approaches to protect all industry participants.

This is just the beginning

We’re at the start of a very special and important time in history. We have an opportunity to reshape fundamental aspects of business and creativity across industries with the combination of generative AI and blockchain.

We’re in the middle of a light appetizer in a generative AI meal and it’s exciting to see what’s ahead. It’s also critical to remember the past. We’ve had a bitter taste of what happens when we allow tech to progress without safeguards and considering the costs. The stakes will be higher this time.

I’m hopeful for the future and believe builders in Web3 and generative AI will rise to the occasion to create things that make a dent in our world for the better.

If you’re doing deep thinking about some of these topics or investing/building in this space, drop me a note at andrew.davis@techstars.com or you can find me on Twitter and LinkedIn. I’d love to chat.

--

--