TensorPlonk: A “GPU” for ZKML, Delivering 1,000x Speedups

5 min readSep 18, 2023

By Suppakit Waiwitlikhit and Daniel Kang

Over the past year, we’ve seen incredible interest and progress in verified machine learning in the form of zero-knowledge machine learning (ZKML). ZKML can enable audits of the Twitter timeline, help fight deepfakes (images and audio), and provide transparency into opaque ML systems. Nathan Labenz has even proposed using verified machine learning to power autonomous lawyers!

Unfortunately, ZKML is currently too slow for practical applications. For example, proving the Twitter model currently takes 6 hours to prove for a single example using ezkl. Verifying the tweets published in one second (~6,000) would cost ~$88,704 on cloud compute hardware.

Enter TensorPlonk, a “GPU” for ZKML. We’ve developed a new proving system to enable high-performance proving for a wide range of ML models. TensorPlonk can deliver up to 1,000x speedups for certain classes of models, such as the Twitter recommendation system. Using TensorPlonk, the proving cost for the Tweet example above would be ~$30 compared to ~$88,704.

We’ll describe how we achieved these speedups below! And reach out to us if you’re interested in using TensorPlonk.

TensorPlonk has the potential to be the “GPU” of ZKML. Credit: https://www.viperatech.com/product/nvidia-h100-tensor-core-gpu/

Common ML operations

To understand TensorPlonk, we first need to understand the breakdown of ML operations. ML models consist of linear layers interspersed with non-linear layers. The linear layers are operations like matrix multiplication. The non-linear layers are typically operations like the ReLU.

For many models, including the Twitter algorithm model, the linear layers take up the bulk of the computation. This is due to the nature of the linear layer computations, which are commonly matrix multiplications:

Matrix multiplication: a common ML operation

The runtime of matrix multiplication is cubic in the size of the inputs! Namely, if the weights are of size m x n, and the input is n x d, the runtime of performing the matrix multiplication is O(m * n * d). In the Twitter example, the largest weight matrix has over 8 million elements. A single layer can have more than more than 15 million floating point operations!

The non-linearities are often relatively cheap on traditional computing platforms, such as GPUs. However, they can be expensive in ZKML when done naively. To speed up the computation of non-linearities, we can use lookup tables. However, using lookup tables enforces constraints on the ZKML proving in ways that can dramatically increase the cost of proving.

Finally, for verified ML, we often want to keep the weights hidden (as in the case of Twitter) but ensure that the model provider is using a fixed set of weights. This process is often done by hashing the weights, which can be as expensive as the model computation itself!

TensorPlonk: 1,000x faster proving

To accelerate ML model proving, we built TensorPlonk. TensorPlonk optimizes matrix multiplications, non-linearities, and weight commitments with new optimizations and by extending recent work.

The first part of TensorPlonk accelerates matrix multiplications. We leverage recent work by Eagen & Gabizon called cqlin, which can prove the result of matrix multiplication in O(n) time, if one of the matrices is fixed ahead of time. Luckily, this is the case for many ML models, like the Twitter model.

However, naively using cqlin would require large proofs and excessive work for the verifier. Technically, it would require sending a commitment per matrix multiplication, and the verifier would need to do a pairing check per matrix multiplication. Doing so would increase the proof size and verifier work by nearly 10x compared to our prior work. Large proofs and increased verification times would make ZKML infeasible for blockchain applications.

To solve this problem, we can use a randomized check to verify all of the matrix multiplications within a model at once! Stay tuned for our technical report, which we will release in the coming weeks.

The second part of TensorPlonk accelerates the non-linearities. To understand why we need this, existing work in ZKML that uses lookups uses a lookup proving argument called plookup. The proof is split into a “circuit” and “lookup tables.” With plookup, the circuit must be the size of the table, so if we use a large table, we must use a large circuit. With our optimizations, the lookup table can be as large as 2²⁴ elements but the circuit can be as small as 2¹⁶ elements! This means we would have an extra 250x overhead to use a large lookup table for the non-linearities.

Instead, we can leverage recent work called cq, which allows for lookup tables of size independent of the circuit size. We further optimized cq to “batch” work across lookup tables, which can reduce the computations for the lookups by up to 3x.

Our third optimization optimizes the weight commitments, which binds the model provider to use a fixed set of weights. Instead of using a hash, which can be extremely expensive, we use a KZG commitment of the weight vectors used in cqlin. This can reduce the proving time by up to 10x.

Benchmarks

To benchmark TensorPlonk, we used the c5a.16xlarge AWS instance, which has 64 vCPU cores. We benchmarked the Twitter recommendation system model, which is used in production. We compared against ezkl, a ZKML library. For ezkl, we compared against public weights (“no hash”) and when using a hash to commit to the model weights (“hash”).

To understand the magnitude of these improvements, let’s go back to the Twitter example. Users produce around 6000 tweets per second or ~500M tweets per day. Using stochastic verification, we could verify ~1% of the tweets for ~$21,000 per day. This is <0.5% of Twitter’s yearly infrastructure costs! In contrast, using ezkl would cost ~$75,000,000 per day, or around 18 times Twitter’s yearly infrastructure costs.

Our results show the feasibility of verified ML in these increasingly important settings.

Conclusion

TensorPlonk is the next step of verified ML. However, there’s still much more to do. While our system has been algorithmically optimized, we have barely scratched the surface of optimizations. And if you’d like to discuss your idea or brainstorm with us, fill out this form and join our Telegram group. Follow us on Twitter (Pun’s Twitter) for the latest updates as well!

Note 1: We were unable to successfully prove with ezkl on our test harness, since ezkl requires memory resources beyond our test harness. We instead estimated the proving time, verification time, and proof size based on the configuration provided by the ezkl compiler.

Note 2: TensorPlonk is being actively developed and has not been audited.

TensorPlonk: A “GPU” for ZKML, Delivering 1,000x Speedups

Common ML operations

TensorPlonk: 1,000x faster proving

Benchmarks

Conclusion

Written by Daniel Kang