Chapter 7.2: The World’s 1st zkGAN NFTs

Modulus Labs
8 min readOct 11, 2023

This is Part 2 of a 2-part series on how we are using Ethereum to verify generative AI art models, thereby creating the world’s first zkGAN NFTs.

7.1 explores why this might be valuable, while 7.2 chronicles how we constructed the model and corresponding ZK circuits. Special thanks to Peiyuan Liao and the Polychain Monsters team.

Artists aren’t born. They’re built!

Nay…

Trained!

Nay again…

They’re forged from fire and clay (read: large sums of training data and learning parameters that balances diversity + coherence)!

Training progress of zkMon over the Midjourney-generated dataset!

And you, dear friend, have come to the heart of our crucible of creation… welcome!

Here, the water boils with the rage of a thousand soundness challenges! The sky thunders with the roar of a million GPUs!! And together… together we will create that which itself, creates!!!

But first, a recap:

Recap Hour: ZKPs & AI Gen Art

Zero-knowledge proofs are a compute integrity technology.

“Integrity is doing the right thing, even when no one is watching” — C.S. Lewis

To put it plainly, ZKPs prove that some compute was executed correctly (without risk of tampering). Though this alone doesn’t make ZK unique, the magic shows up in the compute overhead of verification. Specifically, verifying the correctness guarantees of a zero-knowledge proof is significantly easier than running the compute naively — a property often known as succinctness.

Cue the blockchain application: if we only perform compute verification on-chain (as opposed to the original, compute-heavy operation), we can bring more sophisticated algorithms on-chain — all without giving up a single ounce of blockchain security. Pretty neat!

AI generative art, on the other hand, combines the power of artificial intelligence with artistic expression. It begins with collecting a large dataset of visual data, typically a set of images, which serves as the training data for the AI. The model then analyzes and learns from this data, developing its own internal representation of artistic styles, colors, shapes, and other visual elements.

Once trained, the AI model can generate new artwork by starting from a random vector acting as a seed, and slowly refining this internal representation until out pops a fully-fledged image.

GANs, or Generative Adversarial Networks, are a type of generative model which trains via a detective-and-forger game, with the forger (generator model) attempting to create images from random noise which mimic those seen in the actual training set, and the detective (discriminator model) attempting to determine, given an image, whether that image is real or counterfeit. At the end of this game, the forger/generator model is producing images highly similar to those found in the training set, and has thus learned to generate novel artwork in the style which it has already seen.

Kachow! Modulus is all about EFFICIENCY, after all

Now that was a whirlwind! Alright, let’s get into the meat-and-potatoes.

Step A: Creating the Creator

1. Knowledge Distillation

An old deep learning trick for compressing models, i.e. taking a larger model and creating a smaller model which performs the same task with roughly the same proficiency, is that of knowledge distillation. In the context of supervised learning, this involves taking the larger “teacher network” and training a smaller “student network” on both the true labels from the training set as well as the teacher’s output labels — the student thus is given feedback on not only the ground truth, but also what the teacher thinks of each training example.

2. Generative “Distillation”

Our task is slightly different — rather than taking an image and categorizing it into one of many buckets, we instead seek to train a model which takes in a latent “noise” vector and turn it into an image. Thus, rather than training on teacher labels for an input, we instead simply use the teacher network to generate conditional examples for our student model, effectively collecting training data by querying the teacher model. Our basic data collection pipeline is as follows:

  • Generate a basket of prompt templates for Midjourney, describing the various characters and monsters we wish to be eventually generated by our model. For example,

“64-bit pixel art style, {species} monster wielding {equipment} over a dark background, full body visible”

  • Unfortunately, Midjourney doesn’t yet have a programmatic API, and the only way to interact with their model is through the Discord bot. Fortunately, Discord has a consistent interface which can be manually automated — that’s right, we used Hammerspoon to physically automate the tasks of copying a generated prompt, pasting it into Discord’s chat interface, and hitting “enter” to activate Midjourney’s generation cycle before starting the whole process all over again. Forty-thousand times.
Yes we actually did this. No it was not fun.

3. The Student Becomes the Master

With the training data collected, we march onward to preprocessing! Species and equipment labels are parsed from the prompts associated with each image, and images themselves are split into 4 (Midjourney generates four at a time) and collated into tensor form for training.

Finally, we arrive at the training process. I won’t bore you with the model details here, but the deep learning enthusiasts among you might be keen to know that —

  • The architecture of the generator is that of an upsampling convolutional network, complete with residual blocks and the classic 3x3 convolutional filters all throughout.
  • The overall training pipeline is that of SNGAN (using the spectral norm as a regularization parameter on the discriminator, an alternative way of approximating the Lipshitz constraint introduced in the foundational WGAN work).
  • For more details (including training hyperparameters and samples), see our code repo, a modified version of SNGAN’s!

Step B: Zap it with ZK!

Now that you’re sorted with poking around SNGAN — you’re ready for the main course: the nitty gritty details of the ZKP.

Imagine a world without free aws credits… hmm OH GOD MAKE IT STOP

As you wish! The what, the why, and the how, all wrapped up in three neat bullet points right here:

  • What? Excellent question! For all the jargon-lovers out there, we used the PSE fork of Halo2 with KZG commitments and recursively aggregated and proved our models using Axiom’s Halo2 recursive verifier. For the entire rest of the planet, the tl;dr is that most modern zero-knowledge proving systems have a prover frontend, where the computation which is being proven (the generative model in this case) is encoded. Halo2 serves as this encoding layer, and additionally provides support for a KZG backend, where the cryptographic mechanisms which convert the frontend description of the model into a small proof which can be verified on-chain. (For those of you who are curious — a recursive verifier is a verifier whose execution is itself proven; a “proof of correct proof verification”, if you will. The reason for having such a step is that the initial proof size might be too large to directly send on-chain, and thus an intermediate compression step is necessary)
  • Why? Incredible inquiry! Halo2 + KZG ended up being our proof system of choice for several reasons — firstly, the PSE team at the Ethereum Foundation, alongside many other contributors to the library, have done an exceptional job at creating helpful developer tooling and a flexible, helpful interface to write program specifications in. Moreover, Halo2 boasts several modern ZKP bells and whistles, including lookup tables, in which a verifier can enforce that certain values belong to a particular pre-agreed set, and rotations, in which a prover can reference data values from rows other than the one which a particular constraint applies to.
  • How? Superb interrogation! We drew heavily on gate designs and gadgets we’d created for our previous on-chain deep learning project, Leela vs. the World. The tl;dr is that for each individual model operation (e.g. convolutional layer, activation functions, etc.), we painstakingly replicated these within both Rust and Halo2, and applied significant optimizations for the prover to run even faster and with less memory usage. (For those of you who are curious about these gate designs, check out our open-source GitHub repository!)

What was the end result of all this? Two things — firstly, a standalone verifier whose sole job is to strictly guard what is allowed to be posted on-chain: such a verifier only accepts proofs generated from the exact circuit which describes the generative model, and thus only outputs from such a model are allowed on-chain.

Secondly, a prover which both computes the generation process of the model and a proof of correct generation alongside this, such that both the art output from the model and the proof get submitted to the chain, with the former being minted as a real zkMon NFT if and only if the latter is accepted by the on-chain verifier.

We take security seriously. Especially for pixelart.

Next Station: The Stratosphere

Welcome to a world of machine creativity, now authenticated with cryptographic proofs.

A world where AI models stamp their outputs with digital signatures that are nearly impossible to fake.

One where we can fairly manage AI models, build autonomous collaborators, and of course, create AI based services (artistic or otherwise) that can never betray our trust — all made possible thanks to zero-knowledge cryptography.

It’s wild. Except…

Yikes!

We are reaching the limits of modern provers. Even as we eek out more algorithmic efficiency at every turn, it’s clear to us from our time building Rocky, Leela, and now, zkMon, that today’s proving paradigm has become a deafening ceiling to the ambition of projects we’re giddy to build.

And it’s not just the Modulus team alone. Over the past couple months, everything from protocols just getting off the ground to services already supporting millions, frens across the ecosystem have come to us looking to integrate ZKML for themselves.

And time and again, the same deafening cost barrier strangles our design space, killing our nascent category before its first real steps.

The cost of intelligence is simply too high for scale…

For now.

--

--