Optimizing Numeric Outcome DLC creation

Thibaut Le Guilly
Crypto Garage
Published in
11 min readMar 4, 2021

TLDR;

Creation of adaptor signatures in the case of numeric outcomes was made drastically faster.

Introduction

We’ve recently announced that we implemented so called numerical decomposition in our P2PDerivatives application. This technique makes it possible to reduce the number of adaptor signatures required to cover ranges of outcome with identical payout values (if you didn’t get the meaning of this sentence, the first section is for you, if you did, you can skip it). While this is a great improvement in terms of space (less signatures to exchange between two peers of a DLC), Matthew Black reported to us that the creation of adaptor signatures was significantly slower. In this article, we’ll first give a refresher on numerical decomposition, before explaining why this slow down happened, and then finally how we managed to significantly reduce it.

Numerical Decomposition DLC

Before using numerical decomposition, our Oracle (reporting BTC/USD rate) was simply signing a string representing the price at a certain time. Now imagine that we have Alice and Bob wanting to enter into a DLC based on BTC/USD price with the following conditions:

  • Price above $50000 Alice gets everything,
  • Price between $40000 and $50000 there is a linear payout function distributing the total collateral to both parties,
  • Price below $40000 Bob gets everything

If they want to make sure that they can close their contract properly for a price of Bitcoin between $0 and $1000000 they would have had to create 100000 different adaptor signatures. This is quite impractical, both in terms of generating these signatures, exchanging them and storing them.

With numerical decomposition, our Oracle now signs each digit separately. For example, if the price is $45000 (and that the Oracle is set to report on 8 digits), it would release signatures over the numbers ‘0’ ‘0’ ‘0’ ‘4’ ‘5’ ‘0’ ‘0’ ‘0’ separately.

How does that help Alice and Bob? As a quick example, in order to cover the range of outcome below $10000, they could just create adaptor points using the first four digits set to zero (‘0’ ‘0’ ‘0’ ‘0’) and ignore the remaining four digits. So they need a single adaptor signature instead of 10000 to cover the $0-$9999 range.

Note that in practice, our oracle signs binary representation of outcome values (instead of base 10 as in the example) as it was found to be more efficient. You can also find more details in the DLC specification repository.

Slow down

So what caused the slow down of adaptor signature creation after switching to numerical decomposition? To understand, we need to dive a little bit into how adaptor signatures are created (nothing more than addition and multiplications so no worry!).

Adaptor Signature

Here we’ll only recall very quickly the concept of adaptor signatures, for a more in-depth description have a look at our previous article on that. An adaptor signature is a signature that has been encrypted with a secret, and for which we can prove that after decryption (or with knowledge of the secret) we obtain a valid signature for a given message. Under ECDSA, an adaptor signature s’ for a signature s over a message m is created as follow:

Given x a private key and t a secret

the public image of t (also called adaptor point)

the public key of x

We’ll skip over the verification part for brevity (and again check our adaptor series if you’re interested), but with knowledge of the secret t we can obtain a valid signature s as follow:

Which is a valid ECDSA signature for m.

For DLC, we use the signature of an oracle as a secret to create adaptor signatures. The interesting thing is that the actual secret is not known by any of the parties involved at first. And the secret to most of the adaptor signatures that are created for a DLC will (hopefully) never be revealed, as the oracle is supposed to release a single signature for each event. So in practice, both parties compute the images of all the possible signatures that could be released by the oracle (also called signature points) and use them to encrypt their signatures to each of the payout transactions. Note that the oracle uses BIP340 Schnorr signatures whose equation looks like this:

So for each possible m (which represents the value of a possible outcome of the DLC), the parties will create an adaptor point S as follow:

Note that S can be created using only publicly known values (recall that DLC oracles need to release the R values in advance for each event).

Adaptor Signature with decomposition

Now to understand why creating adaptor signatures is much slower with numerical decomposition, let’s see what changes. With numerical decomposition, we use as encryption secret the sum of all the signature images. So if we have an oracle that will release signatures s0,s1,s2,…,s7, we will use the adaptor point S constructed as follow:

Where each Sx is constructed as shown previously.

This works thanks to the linearity of Schnorr signatures, and enables us to later decrypt an adaptor signature using:

So we can see that the main difference with “regular” DLC is that we need to compute an adaptor point that is a sum of points.

Adaptor Signature with decomposition and multi-Oracles

Using a similar approach than with decomposition, it is possible to create adaptor signatures encrypted using signature points from multiple oracles. This is done by simply summing up the signature point for each oracle. Assuming oracles A, B and C and numerical decomposition, we can compute a signature point for each oracle as follow:

Then we can compute an aggregate S by summing up the signature points:

Understanding

To understand why adaptor signature creation is slow when using numerical decomposition and multi oracles, we need to note that Elliptic Curve (EC) multiplications are much more computationally intensive than addition. This is because EC multiplication is performed by successively adding a point to itself. So to compute:

Where A is an EC point, we actually do:

While there exists algorithmic optimizations to compute them (like double and add), it remains generally true that multiplication of EC points is much more costly than addition.

Now let’s look again at our equation for computing signature points in the case of numerical decomposition:

We can see that we have seven multiplications (or as many as we have nonces or digits). If we in addition use multi-oracle, this number gets multiplied by the number of oracles. We can thus hypothesize that this increase in the number of multiplication in adaptor point computation is the cause for the slowness of adaptor signature computation. To confirm this, we benchmarked the creation of a single adaptor signature on a single decomposed outcome with 5 oracles and 20 nonces for each oracle. Here are the results:

Full adaptor signature computation

6,444,640 ns/iter (+/- 628,459)

Adaptor signature computation from signature point

280,801 ns/iter (+/- 62,948)

Signature point computation

6,222,608 ns/iter (+/- 667,325)

We can see that indeed most of the time is spent on the computation of the signature point.

Optimizing

Now that we have an understanding of what makes the computation costly, let’s see how we can improve the situation.

Factoring out

Recall the equation for signature point computation in the case of numerical decomposition:

Again we have seven multiplications, but notice that they all involve the public key P. So we can use something that you probably learned a long time ago, factorization! And thankfully it works as well with elliptic curves. So we can rewrite our equation as:

And we now have a single EC multiplication operation!

To check we benchmark this factorized version against the “regular” one. We expect our factorization optimization to perform increasingly better than the original one with an increasing number of nonces. We verify this by benchmarking both for different numbers of nonces, with 5 oracles. The following graph shows the results:

We can see that while the original version is slightly better when using a single nonce, our optimized version is already faster when using two nonces.

While this seems already quite a good improvement, we still need to perform as many EC multiplication as the number of adaptor signatures that we compute.

Pre-computing

To further reduce the number of EC multiplications when creating a large number of signature points, we need to change strategy. Recall that the signature points for each “final” outcome is the sum of a set of signature points computed using the expected value of each digit. By pre-computing the set of signature points for each possible value of each digit, the maximum number of EC multiplication will thus become nb_nonces * nb_outcome_per_nonce, and after that we only need to perform additions. So if we have two nonces and two outcomes per nonce, we need to pre-compute:

Then we can compute all possible combined signature points as follow:

In that example case we didn’t reduce the number of EC multiplication since we pre-computed 4 signature points and that we have 4 possible outcomes in total. But for example, with 10 nonces, instead of performing 2¹⁰=1024 EC multiplications, we only need to do 20.

To see the impact of this optimization, we benchmarked the creation of signature points for all possible outcomes when using 10 nonces, 3 oracles and 2 outcomes per nonce, for the “original” version, the factorized version and the pre-computation version.

Original

1,935,444,337 ns/iter (+/- 187,182,386)

Factorized

518,583,393 ns/iter (+/- 19,961,490)

Pre-computation

142,026,577 ns/iter (+/- 6,604,498)

We can see that as expected, the pre-computation version performs much better.

Memoization

To further optimize, we should note that in the above case, we re-compute some additions. For example, when computing the signature points for the outcomes with values “0 0 0 0 0 0 0 0 0 0” and “0 0 0 0 0 0 0 0 0 1”, the intermediary signature point for the value “0 0 0 0 0 0 0 0 0” is computed two times. By caching intermediate values, we can greatly reduce the number of additions performed. Roughly, the memoization algorithm that we apply works as follow:

  1. Compute signature point for value “0” with nonce R0
  2. Compute signature point for value “0 0” adding the signature point using nonce R1 and value “0” to the previous obtained value
  3. Compute signature point for value “0 0 0” adding the signature point using nonce R2 and value “0 0” to the previous obtained value

Once we have computed the signature point for the outcome value “0 0 0 0 0 0 0 0 0 0”, we backtrack and reuse the signature point computed for value “0 0 0 0 0 0 0 0 0” to compute the signature point for value “0 0 0 0 0 0 0 0 0 1”. And so on and so forth until we have computed a signature point for all possible outcome values.

Using the same benchmark as above, we obtain the following result:

Memoization

42,049,313 ns/iter (+/- 2,844,778)

Which gives us an algorithm more than 3 times faster! We could expect that as the number of nonces increase, the performance gains will as well.

Parallelizing

Another common trick to improve algorithms performance is to parallelize operations, making it possible to run them on separate cores. While it is not always possible or worth doing, in our case we can easily parallelize the signature point computation for each oracle, as well as their aggregation. So instead of sequentially computing:

We can instead compute (where || means in parallel):

then

and finally

(note that in practice the order in which operations are performed is non deterministic)

Using the same benchmark as before, we obtain the following results:

Memoization + Parallelization

24,089,747 ns/iter (+/- 5,290,419)

Which is roughly a 2 times improvement. If we increase the number of oracles, the improvement will increase as well.

Conclusion and future work

Thanks to reducing the number of EC multiplications and applying parallelization, we were able to drastically improve the computation of signature points. As DLC is a peer to peer protocol, it is expected that adaptor signatures will be computed by end user devices, making it important to reduce the computational resources required. And while it might not always be possible to apply parallelization, the pre-computation and memoization approaches already showed to provide great benefits.

We will be implementing these optimizations in our P2PDerivatives application in the near future to hopefully further improve user experience.

Finally, Nadav Kohen mentioned that we could further improve by adapting the batch verification algorithm mentioned in BIP340. A proposal from Lloyd Fournier to replace the message hashes by indexes could also yield further gains. We’ll hopefully cover that in a follow up article, so don’t forget to follow us on Twitter to be notified!

Acknowledgements

Thanks to Lloyd Fournier, Ichiro Kuwahara, Nadav Kohen and Matthew Black for useful suggestions and discussions that led to the various optimizations described in this article.

--

--