Chapter 5: The Cost of Intelligence

8 min readJan 30, 2023

*In a warm, British accent* — “Would you be a darling and put this on in the background“?

Hey you, we know it’s been a little bit since our last update. Please forgive our relative quiet the past two months — don’t worry, we’re back with a protein-filled chapter. And as always, our Twitter and Discord are great places to stay in the loop.

Lack of updates makes the heart grow fonder?

We are so psyched to finally be sharing our first paper with you, brought to you via grant support from the Ethereum Foundation: “The Cost of Intelligence: Proving Machine Learning Inference with Zero-Knowledge” (or paper0, as the cool kids call it).

That’s right, real numbers! Graphs! Discussion of theoretical constructions and their impact on performance! It’s the whole enchilada — in fact, paper0 is the first work to benchmark ZK proof systems across a common suite of AI primitives. And yes, you can read it right now, right here.

To that end, please treat this blog as a companion piece (or a course reader for those lecture skippers out there) that captures the high level takeaways from paper0. While we’ve worked hard to make sure the rigor of the paper honors the impressive technical works we reference, this blog will just be summarizing the juicy bits. For the details, please refer to the paper.

Limited for now… thanks for the plug ChatGPT ;)

Without further ado, let’s dive in:

Paper0: The Point of our Survey

Autonomy is here. Indeed, the future of compute will involve the heavy use of sophisticated Artificial Intelligence. Just look at my text editor:

Notion’s tooltip prompt telling me their LLM would do a much better job on this sentence

Yet, there exist no functional neural networks on-chain, not even the smallest recommender system or matching algorithm. Heck! Not even an experiment… The reason, of course, is abundantly obvious — it’s simply too expensive. After all, the cost of running even several hundred thousand FLOPs worth of compute (barely enough for a single inference pass on a tiny neural net) is millions of gas, equivalent to hundreds of dollars.

So what do we do if we want to bring the AI paradigm into the trustless world? Are we going to roll-over and give up? Of course not… wait a minute! Roll-over… Give up…

If rollup services like Starkware, Matter Labs, and others are using zero-knowledge proofs to massively scale compute while preserving cryptographic security, could we do the same for AI?

This question became the motivating seed that drove our work in paper0. Spoiler alert, here’s what we found:

Here’s the well known secret: AI performance almost always scales with model size. This trend also doesn’t look to be slowing. So long as this remains the case, it will prove especially painful for those of us here in web3.

After all, compute cost is the ultimate, unavoidable source of our nightmares.

Today’s ZKPs can already support small models. But moderate to large models break the paradigm

The Benchmark: Experimental Design

For paper0, we focus on 2 foundational metrics in any zero-knowledge proof system:

Proof generation time: the amount of time it takes a prover to create an accompanying proof to an AI inference, and
Peak prover memory usage: the maximum memory the prover uses to generate the inference proof, at any given time during proving

This was primarily a practical choice, and made from our experience building Rockybot (proof time and memory-use are direct priorities in determining feasibility for any trustless AI use-case). Additionally, all measurements were performed with respect to proof generation time, and did not take into account pre-processing or witness generation.

There are, of course, also other dimensions of cost to track. This includes verifier runtime and proof size. We may revisit these metrics in the future, but consider them outside of the scope of paper0.

As for the actual proof systems we tested, by popular vote, we settled on 6:

Summary table of the proof systems tested in paper0, as well as the authors who assisted us

Finally, we created two suites of Multi-Linear Perceptrons (MLPs) to benchmark — notably, MLPs are relatively simple and are comprised mostly of linear operations both common and similar to more sophisticated architectures. This included a suite of architectures that scaled with increasing parameter count (up to 18M parameters and 22 GFLOPs), as well as a second suite of architectures that scaled with an increasing number of layers (up to 500 layers). Each suite, as seen in the tables below, tested the prover systems’ abilities to scale in different ways, and roughly represented the scale of well-known deep learning architectures, from LeNet5 (60k parameters, 0.5 MFLOPs) to ResNet-34 (22M parameters, 3.77 GFLOPs).

Results: Fast and Furious

Proof generation timing results across param and depth scales, for all 6 proof systems

Peak memory results across param and depth scales, for all 6 proof systems

For a full breakdown of these results, along with in-depth analysis of bottlenecks within each system, please see Section 4 of paper0.

Use-Cases & Final Takeaways

Alrighty, those are some pretty neat graphs, but here’s the headline:

But of course, what does that actually mean in practice? We’ll spotlight 2 examples:

1. Worldcoin — Worldcoin is building the world’s first “Privacy-Preserving Proof-of-Personhood Protocol” (or PPPoPP, if you have a high quality sense of humor). In other words, solving Sybil attacks by tying authentication to a deeply unique biometric characteristic — the iris.

It is… a wild idea, and one that uses a convolutional neural network to compress, transform, and attest to the stored iris data. While their current setup involves a trusted computing environment within a secure enclave in the orb’s hardware, they would like to instead use a ZKP to attest to the correct computation of the model. This would similarly allow users to have self-custody over their own biometric data with cryptographic security guarantees (so long as it’s processed on the user’s hardware — say, their mobile phone).

Now to get specific: Worldcoin’s model features 1.8M parameters and 50 layers. It’s the kind of model complexity necessary to distinguish between 10 billion different irises. Yikes!

While proving systems such as Plonky2 on compute-optimized cloud CPUs can generate proofs-of-inference for a model of this size in just a matter of minutes, the prover’s memory consumption would overwhelm any commercially available mobile hardware (tens of gigabytes of RAM).

The fact is, none of the systems tested can prove this neural network on mobile hardware…

2. AI Arena — AI Arena is an on-chain platform fighting game in the style of Super Smash Bros, with a distinctive twist: rather than player-operated avatars fighting against one another in real-time, player-owned AI models compete and battle autonomously. And yes, it’s as cool as it sounds.

Over time, the spectacular team at AI Arena is working to move their game to a fully trustless tournament scheme. The problem is, this involves the challenge of verifying a staggering amount of AI computations per game.

Matches are run at 60 frames per second and last 3 minutes. This translates to 20,000+ inference results between the two player models each round. Using one of AI Arena’s policy networks as an example — a relatively small MLP requiring ∼ 0.008 seconds to perform a single forward pass with — proving this model using zkCNN requires 0.6 seconds, i.e., on the order of a 1,000x compute blowup across every single action taken.

This also mean a 1,000x increase to compute cost. As unit economics become increasingly important for on-chain services, devs must balance the value of decentralized security with the literal cost of proof generation.

Whether it’s the examples above, ZK-KYC, image generation in the flavor of DALL-E, or even large language models in smart contracts, there exists an entire universe of use-cases in the world of ZKML. To actually realize these, however, we strongly believe that ZK provers still need to be massively improved. Especially for a future of self-improving chains.

So where do we go from here?

We have concrete performance numbers. We know which techniques tend to perform best when it comes to proving neural networks. And of course, we’re starting to discover the kinds of use-cases which both inspire and excite our growing community

More updates for you soon ;)

Remembering Our Manners: Thank Yous

It’s not lost on us that a lot of y’all in the community have been incredibly patient as the paper got closer — thank you for your continued support throughout! You deserve the rigor and care of a well-researched, composed, and reviewed piece of work, so we’ve put in the effort to make sure we got there.

But of course, we didn’t get there alone; they say it takes a village. They, however, never mention the part about optimizing circuits… To that end, and recognizing that this is a far-from-exhaustive list of all the folks we absolutely must acknowledge, the world’s biggest thanks to:

Our world-class advisors + mentors: Zhang Zhenfei, Riad Wahby, and Rand Hindi
The Ethereum Foundation, for funding our research
And, for their feedback and critique: Jens Groth, Daniel Lubarov, Tianyi Liu, Max Gillett, Michele Orrú, Yi Sun, Kobi Gurkan, Luke Pearson, Nima Vaziri, Jonatan Luther-Bergquist, dcbuilder, Brandon Da Silva, and guiltygyoza

And that’s a wrap for today!