# Flexible precomputation for verification checks

Last week, I published version 1.1 of `curve25519-dalek`, which has two main features over 1.0: a new SIMD backend using IFMA instructions, and a new API that allows using precomputation for verification checks. As far as I’m aware, the IFMA backend is the fastest Curve25519 ever; I wrote about it in a previous post, and it’s now integrated into the released library.

This post will describe the other new feature: a flexible precomputation API aimed at accelerating verification checks across a wide range of protocols.

## Multiscalar multiplications versus multiple scalar multiplications

Many elliptic curve protocols spend most of their time performing scalar multiplication operations: computing `a*A` for a curve point `A` and a scalar (integer modulo the curve order) `a`. In this post I’ll use capital letters `A,B,P,Q,...` for curve points and lower-case letters `a,b,x,y,...` for scalars, so it’s always possible to distinguish the type from the variable name.

Often, rather than just needing to compute a single scalar multiplication, an implementation needs to compute an equation like

`a_1*A_1 + ... + a_n*A_n`

It turns out that performing this computation as one operation (a multiscalar multiplication) is potentially much more efficient than computing `n` individual scalar multiplications.

At a high level, this is because specifying the entire problem at once (“compute this entire equation”), rather than specifying multiple sub-problems (“compute this part, then this part, then this part”) allows the implementation to share work between the different parts, in a way that’s not possible when the scalar multiplications are done individually.

For example, one method for computing a scalar multiplication decomposes the scalar multiplication operation into a sequence of additions and doublings; in a multiscalar multiplication, the doublings can be shared between all the points, rather than done once per point. (More details are available in the `curve25519-dalek` internal documentation.)

This operation is provided in `curve25519-dalek` by the `MultiscalarMul` and `VartimeMultiscalarMul` traits, which do constant-time and variable-time operations respectively. These interfaces take iterators of points and scalars, allowing them to be used very flexibly, as in these examples.

## Saving time with precomputation

Unfortunately, what these traits don’t allow is taking any advantage of precomputation. If the points are known up-front, it’s possible to trade memory for time by performing some precomputation on the points, then using it for each multiscalar multiplication.

For instance, an implementation can save time by building lookup tables of small multiples of the points, then using these to perform “chunks” of the scalar multiplication. Larger tables save more time (at the cost of more memory), but when the points aren’t known in advance, the time cost of the precomputation also has to be balanced against the time savings.

## Specifying the problem at a high level

As noted above, one of the core reasons that multiscalar multiplication can be faster than multiple scalar multiplications is that the entire problem is specified at once, and an implementation can share work between all the parts. We can write the general problem of multiscalar multiplication with precomputation in the form

`a_1*A_1 + ... + a_n*A_n + b_1*B_1 + ... + b_m*B_m,`

where the `B_1,...,B_m` are points fixed in advance and the `A_1,...,A_n` are points that vary with each computation.

## The VartimePrecomputedMultiscalarMul trait

This functionality is now provided in curve25519-dalek by the `VartimePrecomputedMultiscalarMul` trait, which takes the fixed points `B_1,...,B_m` in a constructor and returns a precomputation struct with three slightly different variants of the problem:

• `vartime_multiscalar_mul`, which handles the special case where `n=0` and there are no dynamic points;
• `vartime_mixed_multiscalar_mul`, which takes the dynamic points as already-validated `Point`s and is infallible;
• `optional_mixed_multiscalar_mul`, which takes the dynamic points as `Option<Point>`s and returns an `Option<Point>`, so that a verification check on compressed point data can compose point decompression/validation into the input iterators.

This trait is implemented for ristretto255 points by the `VartimeRistrettoPrecomputation` struct and for Curve25519 points by the `VartimeEdwardsPrecomputation` struct. At the moment, these all share a common backend implementation, but they could be specialized in the future.

Currently, only variable-time computations are supported. This is because these are the computations used for verification, and many protocols can use multiscalar multiplications for verification even when they can’t use them for proving (because a verifier can combine multiple statements into one, or verify proofs in a batch).

The speedup varies by the problem size, but for medium sizes can give a respectable 20–30% speedup.

More importantly, by specifying the problem at a high-level, the underlying implementation of the multiscalar multiplication can be optimized transparently to the implementation of the protocol.

This means that any future optimization work will propagate its speedup to users of the precomputation API when they run `cargo update`, with no changes required to each protocol’s implementation.

Written by

Written by