Flexible precomputation for verification checks

Last week, I published version 1.1 of curve25519-dalek, which has two main features over 1.0: a new SIMD backend using IFMA instructions, and a new API that allows using precomputation for verification checks. As far as I’m aware, the IFMA backend is the fastest Curve25519 ever; I wrote about it in a previous post, and it’s now integrated into the released library.

This post will describe the other new feature: a flexible precomputation API aimed at accelerating verification checks across a wide range of protocols.

Multiscalar multiplications versus multiple scalar multiplications

Often, rather than just needing to compute a single scalar multiplication, an implementation needs to compute an equation like

a_1*A_1 + ... + a_n*A_n

It turns out that performing this computation as one operation (a multiscalar multiplication) is potentially much more efficient than computing n individual scalar multiplications.

At a high level, this is because specifying the entire problem at once (“compute this entire equation”), rather than specifying multiple sub-problems (“compute this part, then this part, then this part”) allows the implementation to share work between the different parts, in a way that’s not possible when the scalar multiplications are done individually.

For example, one method for computing a scalar multiplication decomposes the scalar multiplication operation into a sequence of additions and doublings; in a multiscalar multiplication, the doublings can be shared between all the points, rather than done once per point. (More details are available in the curve25519-dalek internal documentation.)

This operation is provided in curve25519-dalek by the MultiscalarMul and VartimeMultiscalarMul traits, which do constant-time and variable-time operations respectively. These interfaces take iterators of points and scalars, allowing them to be used very flexibly, as in these examples.

Saving time with precomputation

For instance, an implementation can save time by building lookup tables of small multiples of the points, then using these to perform “chunks” of the scalar multiplication. Larger tables save more time (at the cost of more memory), but when the points aren’t known in advance, the time cost of the precomputation also has to be balanced against the time savings.

Specifying the problem at a high level

a_1*A_1 + ... + a_n*A_n + b_1*B_1 + ... + b_m*B_m,

where the B_1,...,B_m are points fixed in advance and the A_1,...,A_n are points that vary with each computation.

The VartimePrecomputedMultiscalarMul trait

  • vartime_multiscalar_mul, which handles the special case where n=0 and there are no dynamic points;
  • vartime_mixed_multiscalar_mul, which takes the dynamic points as already-validated Points and is infallible;
  • optional_mixed_multiscalar_mul, which takes the dynamic points as Option<Point>s and returns an Option<Point>, so that a verification check on compressed point data can compose point decompression/validation into the input iterators.

This trait is implemented for ristretto255 points by the VartimeRistrettoPrecomputation struct and for Curve25519 points by the VartimeEdwardsPrecomputation struct. At the moment, these all share a common backend implementation, but they could be specialized in the future.

Currently, only variable-time computations are supported. This is because these are the computations used for verification, and many protocols can use multiscalar multiplications for verification even when they can’t use them for proving (because a verifier can combine multiple statements into one, or verify proofs in a batch).

The speedup varies by the problem size, but for medium sizes can give a respectable 20–30% speedup.

More importantly, by specifying the problem at a high-level, the underlying implementation of the multiscalar multiplication can be optimized transparently to the implementation of the protocol.

This means that any future optimization work will propagate its speedup to users of the precomputation API when they run cargo update, with no changes required to each protocol’s implementation.

interested in zero-knowledge, privacy, freedom, mathematics, & the number 24