Mathemagic: Multiplicative Inverses

Remco Bloemen
wicketh
Published in
6 min readJan 13, 2018

In this post, I will show a trick to divide a large number using only a small multiplication by an inverse. I will show two ways of computing these inverses and how to build a small fast lookup table. In the process, we will see that a single instruction can sometimes be more expensive than many cheaper instructions!

In this post, I build a function to compute the modular multiplicative inverse for modulus 2²⁵⁶. Given a number a, I want a number r such that

Note on notation: As before, I use square brackets with a subscript to denote the modulus operation.

We write the solution formally as

This number r has an interesting property: multiplication by r is the same as division by a. Let me illustrate this with an example. Take 1000 as the modulus and let a be 73. The inverse of 73 is 137:

If I want to divide 876 by 73, I can multiply 876 by 137 which is 120,012 and take the last three digits, 12. Indeed 876 = 12 · 73. This is an interesting party trick.

The real magic happens with larger numbers. Let’s say we want to divide 46,209 by 73. For this, we multiply 209 by 137 and take the last three digits. This is 633 and indeed 46,209 = 663 · 73. But wait! We did not even use the digits 46! How can we divide a number without even looking at the full number? This trick works when 1) the division is exact, 2) the multiplicative inverse exists, and 3) the result fits in three digits.

Those who have read my post on 512-bit multiplication can guess where I’m going with this. We can use this to do division on 512-bit numbers with a 256-bit multiply! But before we get to that, we first need a way to compute these inverses.

The Fermat-Euler-Carmichael theorem

This funny theorem is an elementary result in number theory. It started out as Pierre de Fermat’s little theorem (1640). Later proven and improved by Leonhard Euler (1736). Finally perfected by Robert Carmichael (1914). The theorem is:

where λ(m) is the Carmichael function. This implies that for numbers modulo m, the exponents behave as numbers modulo λ(m). For our modulus λ(2²⁵⁶) = 2²⁵⁴. Thus, for 256-bit integers, exponents behave as 254-bit integers. This also extends to negative numbers and we can us that to get modular inverses:

Solidity supports exponentiation, so we can directly implement this:

Inversion using the Fermat-Euler-Carmichael theorem

This compiles to a single EXP instruction with a constant exponent, which is about as short as it can get. But this function takes no less than 1614 gas! This is because the EXP operation takes 10 gas plus 50 gas for every byte in the exponent. In our case the exponent is almost all ones, so we pay the maximum price of 1600.

Why is this operation so expensive? In the virtual machines, the operation is likely implemented using exponentiation by squaring. In this case, a single EXP instruction does the same amount of work as 512 MUL instructions. At five gas each, those MULs cost about 2560 gas. In light of this, the EXP operation is not so oddly priced. In fact, it used to be a bit cheaper but that led to problems and the price had to be raised in EIP160.

Can we do better? Yes. We could use the extended Euclidean algorithm to solve r · a + 1 · m = gcd(a, m). This is generally faster than exponentiation. But there is an even better way.

Newton-Raphson-Hensel inversion

The Newton-Raphson method can be used to approximate division in real numbers. Thanks to Hensel’s lemma, this works even better in modular arithmetic. For powers of two it is the recurrence:

For odd numbers a, the values of this recurrence satisfy the formula

Each iteration doubles the number of correct bits. This means that we want to compute r₈ to get the modular inverse for a 256-bit number.

We can skip one iteration by observing that r₁ = a for all odd values of a. This leaves only seven iterations to compute.

Small lookup table

We can skip one more iteration by creating a small lookup table for r₂. The values values of r₂ where an inverse exists are a ∈ {1, 3, 5, …, 15} with corresponding inverses [1, 11, 13, 7, 9, 3, 5, 15].

For other values of a we use zero instead of the real values of r₂. Using zeros has the advantage that it makes the final result zero when an inverse does not exist. Zero is never a valid inverse, so this is a nice flag value.

A sixteen entry lookup table for r₂

The instructions the Solidity compiler (v. 0.4.18) produces from this snippet are a bit disappointing. An unnecessary bounds check is inserted. A multiplication is done to typecast from uint8 to bytes1 which is immediately undone by a division to convert from bytes1 to uint256. Finally, the % 16 is not replaced by the cheaper & 15. Let’s fix all this:

A much faster lookup table for r₂

With these optimizations, the table lookup is a net gain of a few gas. Going further is currently not worth it. For r₃ we would need a lookup table of 256 entries. This can theoretically be done using CODECOPY and MLOAD, but current unsupported. For r₄ the table would have to be 65 kilobyte which is not worth the code size.

Conclusion

Combining the lookup table with six iterations results in the final implementation:

Multiplicative inverse modulo 2²⁵⁶

Despite having 20 times more operations, this function takes only about 154 gas. Less than 10% of the EXP based implementation!

With this inversion function, I finally have all the tools I need to solve the problem I wanted to solve. In the next post, I will work around the constraints and create a proportion function that never fails.

Update: I have since learned that r₂ = 3 * a ^ 2 works instead of the table (Montgomery, cited in Mayer 2016). It is slightly cheaper and requires no assembly, but it does not return zero when the inverse does not exist.

If you enjoy these posts and/or want to encourage me to write more, please clap & follow!

--

--

Remco Bloemen
wicketh

Doing JFKs “other things”. Technical Fellow at 0x.org.