Mathemagic: 512-bit Division

Remco Bloemen
wicketh
Published in
5 min readDec 19, 2017

In the previous posts, I introduced the Chinese Remainder Theorem and used it to build a full multiplication function. In this post, I will introduce five small tricks and use them to build a simple division algorithm.

Find the Largest Power of Two Divisor of a Given Number

Given a number a. Find the largest power of two, r = 2ⁿ, that divides a given number. Equivalently, find a number r such that a / r is an exact division and its quotient is an odd number.

The answer is -a & a. It is hard to explain mathematically because it combines arithmetic with bitwise operations. But an example in binary makes it intuitively clear:

The key insight is that negation flips all high bits of a number up to but not including the least significant one. A simple and then removes all the flipped bits and we are left with only the least significant one.

For completeness, I’ll present it in Solidity as well:

Compute the largest power of two dividing a number

For zero there is no divisor, and the function returns zero. For odd numbers, it returns one. This makes sense as it is the largest power of two (1 = 2⁰) that divides the number.

Divide 2²⁵⁶ by a Given Number

The number 2²⁵⁶ is fundamental to working with the Ethereum virtual machine, so it’s useful to do some tricks with it. It can not be expressed directly, but we can express fractions of it. For a given number a, we can compute 2²⁵⁶ divided by a. For this I found a trick involving negated numbers:

Implemented in Solidity and optimized, this becomes:

Divide ²²⁵⁶ by a given number

For the answer to be defined and fit in a single word, a needs to be larger than one. The assembly is to avoid a redundant zero check on division.

Compute 2²⁵⁶ Modulo a Given Number

Like the previous, we can also compute the remainder of 2²⁵⁶ divided by a instead of the quotient. The same trick with negation works:

Again, in Solidity:

Remainder of 2²⁵⁶ divided by a given number

The answer always exists for non-zero a and is strictly less than a. The assembly is to avoid a redundant zero check.

Add / Subtract Two 512-bit Numbers

The multiplication algorithm from before produces 512-bit numbers. To add two such numbers we can use an expression like:

Simple 512-bit addition

The second line has an extra term that checks for carry and adds it. It takes about 65 gas. The Solidity compiler (v. 0.4.18) does not produce optimal code here. The ternary expression turns in to a branch. We can avoid this. Observe the yellow paper definition of the LT, the less-than instruction:

This function already returns zero or one exactly as we want it. We can thus add it directly. Casting a boolean to a number is not allowed in Solidity, so we need to switch to assembler again:

Add two 512 bit numbers

This function takes only 10 gas (when measured by comparing to a function returning zero). Similarly for subtraction:

Subtract 512-bit numbers

Dividing a 512-bit Number by a 256-bit Number

To show the usefulness of the above functions, I’ll make a simple division algorithm. We want to build a 512-by-256-bit division function, that is, we want to compute x such that:

Where a₀ and a₁ are the lower and higher half of the 512-bit number and b is the 256-bit number. The result, x, will also be 512-bit. So it is also split into its low and high bits x₀ and x₁.

To solve it, first, we observe the following. With the div256 and mod256 functions we can compute two numbers q = div256(b) and r = mod256(b) . These then satisfy:

If we substitute this identity in the fraction, we can split out a term:

Because r is smaller than 2²⁵⁶, the new numerator is smaller than the one we had before — we made some progress. We can keep repeating this process. Each time splitting of a term and computing the new numerator. Eventually, we reach a point where the new x₁ is zero. When that happens we do a regular division of x₀ by c. The final result is the sum of all the x₁ · q terms and the final division. This creates an elegant algorithm:

512-by-256-division for b > 1

How many iterations does the loop take? In the worst case is r is as large as possible because then we ‘scale back’ x₁ by the smallest amount. The largest r can get is 2²⁵⁵ — 1, when c equals 2²⁵⁵ + 1. This means x₁ is at least halved every iteration. Since x₁ is 256-bit, it takes at most 256 iterations.

As presented, it requires b > 1. A special case for b = 1 is easily added, just return the input (a₀, a₁).

I did a bit of searching, but could not find this approach to division in the literature; a division algorithm where the radix is rewritten in terms of the denominator. Please tell me if you know one.

With these five new tools, we are developing a nice toolbox. We can already tackle some interesting problems. The division algorithm shown here is not optimized because there are better approaches. The really good ones require one more tool, modular inverses, which I will present in the next post.

If you enjoy these posts and/or want to encourage me to write more, please clap & follow!

--

--

Remco Bloemen
wicketh

Doing JFKs “other things”. Technical Fellow at 0x.org.