# Flexible precomputation for verification checks

Last week, I published version 1.1 of `curve25519-dalek`, which has two main features over 1.0: a new SIMD backend using IFMA instructions, and a new API that allows using precomputation for verification checks. As far as I’m aware, the IFMA backend is the fastest Curve25519 ever; I wrote about it in a previous post, and it’s now integrated into the released library.

This post will describe the other new feature: a flexible precomputation API aimed at accelerating verification checks across a wide range of protocols.

## Multiscalar multiplications versus multiple scalar multiplications

Often, rather than just needing to compute a single scalar multiplication, an implementation needs to compute an equation like

`a_1*A_1 + ... + a_n*A_n`

It turns out that performing this computation as one operation (a multiscalar multiplication) is potentially much more efficient than computing `n` individual scalar multiplications.

At a high level, this is because specifying the entire problem at once (“compute this entire equation”), rather than specifying multiple sub-problems (“compute this part, then this part, then this part”) allows the implementation to share work between the different parts, in a way that’s not possible when the scalar multiplications are done individually.

For example, one method for computing a scalar multiplication decomposes the scalar multiplication operation into a sequence of additions and doublings; in a multiscalar multiplication, the doublings can be shared between all the points, rather than done once per point. (More details are available in the `curve25519-dalek` internal documentation.)

This operation is provided in `curve25519-dalek` by the `MultiscalarMul` and `VartimeMultiscalarMul` traits, which do constant-time and variable-time operations respectively. These interfaces take iterators of points and scalars, allowing them to be used very flexibly, as in these examples.

## Saving time with precomputation

For instance, an implementation can save time by building lookup tables of small multiples of the points, then using these to perform “chunks” of the scalar multiplication. Larger tables save more time (at the cost of more memory), but when the points aren’t known in advance, the time cost of the precomputation also has to be balanced against the time savings.

## Specifying the problem at a high level

`a_1*A_1 + ... + a_n*A_n + b_1*B_1 + ... + b_m*B_m,`

where the `B_1,...,B_m` are points fixed in advance and the `A_1,...,A_n` are points that vary with each computation.

## The VartimePrecomputedMultiscalarMul trait

• `vartime_multiscalar_mul`, which handles the special case where `n=0` and there are no dynamic points;
• `vartime_mixed_multiscalar_mul`, which takes the dynamic points as already-validated `Point`s and is infallible;
• `optional_mixed_multiscalar_mul`, which takes the dynamic points as `Option<Point>`s and returns an `Option<Point>`, so that a verification check on compressed point data can compose point decompression/validation into the input iterators.

This trait is implemented for ristretto255 points by the `VartimeRistrettoPrecomputation` struct and for Curve25519 points by the `VartimeEdwardsPrecomputation` struct. At the moment, these all share a common backend implementation, but they could be specialized in the future.

Currently, only variable-time computations are supported. This is because these are the computations used for verification, and many protocols can use multiscalar multiplications for verification even when they can’t use them for proving (because a verifier can combine multiple statements into one, or verify proofs in a batch).

The speedup varies by the problem size, but for medium sizes can give a respectable 20–30% speedup.

More importantly, by specifying the problem at a high-level, the underlying implementation of the multiscalar multiplication can be optimized transparently to the implementation of the protocol.

This means that any future optimization work will propagate its speedup to users of the precomputation API when they run `cargo update`, with no changes required to each protocol’s implementation.

interested in zero-knowledge, privacy, freedom, mathematics, & the number 24

## More from Henry de Valence

interested in zero-knowledge, privacy, freedom, mathematics, & the number 24