OpenVDF: Modular Squaring Unit

Supranational
Supranational
Published in
3 min readMay 5, 2020

In the first post of this series we presented an ASIC architecture for the Öztürk modular squaring unit. We then studied the various components required to build this circuit, looking for the most optimal way to implement each of the pieces.

In this final post of the series we will put the pieces together and present an implementation of the modular squaring unit along with some initial results.

We can now walk through the various components of the design above and summarize our design and implementation parameters.

Partial product generation (PPG): We looked at using either AND gates or multipliers (standard unsigned or Booth encoded) and evaluated the approaches in the context of the entire squaring operation. The results point to using AND gates and leaving the CSA trees intact.

Squaring and reduction CSA trees: We looked at various ways to code CSAs (explicit trees vs “+”) as well as building single level or multi-level trees. For this design the results point to using single level trees coded with “+”.

Squaring and reduction CPAs: For CPAs we considered various style adders (ripple carry, Brent-Kung, Kogge-Stone, “+”). With modern tools the best approach is to simply use “+” and let the tools optimize for the design constraints.

Ideal bitwidth size: In the polynomial multiplier we can choose the size of the coefficients. The FPGA design used 16 bits to take advantage of the high speed 17-bit DSP. For the ASIC we can size resources to fit the need. Nonetheless, 16-bits turned out to be within the margin of error for optimal and so we decided to continue with that value.

Lookup Tables: The FPGA design utilized 8 bit addressable lookup tables to store the reduction constants in order to take advantage of the on-chip low latency BRAM resources. Since we assume a fixed modulus and no longer have the fixed-resource limitations, the ASIC design makes use of 1-bit reduction lookups. Because they are 1-bit, on average 50% of the reduction values are 0 and can be optimized away, while the other 50% manifest as a single AND gate and can be merged with the reduction accumulator trees. We did consider larger lookup tables in the ASIC context but for a fixed modulus 1-bit was the preferred approach.

In summary, the design parameters:

Results

Below you’ll find the relative performance of the Öztürk modular squaring unit with a baseline of 256-bits:

For production use of VDFs we are currently evaluating a modulus of 2048-bits or larger. In concrete terms, a 2048-bit MSU is expected to perform one square and reduce in the low single digits of nanoseconds with a size of ~25mm². In a future blog post we’ll explore the initial power estimates, as well as identify some opportunities for power reduction.

Source Code

Source code for the designs presented here are available on GitHub at supranational/hardware and is licensed under the Apache 2 license.

Acknowledgements

We’d like provide a special acknowledgement to Synopsys® for their support of this work.

--

--