# Introducing Monolith, for faster hashing in ZK settings

## Evaluating hash functions is a major bottleneck in many modern proving systems. Introducing Monolith, the fastest hash function to-date.

--

*This work is the result of a collaboration between Horizen Labs and various universities and other companies, including Ethereum Foundation and Graz University of Technology. For more details we refer to the **full paper**.*

The evaluation of hash functions is a major bottleneck in many modern proving systems. While using standardized hash functions such as SHA-2 or SHA-3 is certainly possible, using these classical constructions often result in a fast plain evaluation but **expensive proof generations**.

In recent years, new “arithmetization-oriented” or “circuit-friendly” hash functions were designed for settings of computational integrity. The Poseidon hash function, one of these new hash functions, is widely used in many different frameworks and projects in the ZK space.

**Here we introduce Monolith, a new family of hash functions.** Together with all the other recently proposed hash functions, it may seem that there are almost too many different hash functions to choose from. In the following, we’ll give a background of some clear distinctions between these proposed hash functions, before explaining Monolith and its advantages.

# Hash Functions in ZK Settings

Some components in modern (zero-knowledge) proving systems require the evaluation of hash functions, or more precisely chains of hash function evaluations. In Plonky2, this step needs a significant amount of the total time required to generate a proof.

Symmetric cryptographers have recently started to design hash functions to make modern proving systems more efficient. Multiple of these hash functions are now proposed every year, which makes it difficult to maintain an overview of the different tradeoffs.

# From Poseidon to Newer Approaches

Generally speaking, components with a *low degree* when written down as a mathematical function are more efficient in (zero-knowledge) proving systems. For example, the function f(x) = x³ is generally more efficient than g(x) = x⁸²⁶³¹⁷. The idea for the Poseidon permutation was, therefore, to only use low-degree components and to iterate them until a sufficiently large degree is reached in order to withstand cryptographic attacks. Of course, this means that potentially many of these iterations are necessary in order to reach a very large degree.

Hash functions like Poseidon2, Neptune, and GMiMC-based approaches follow the same approach. While many iterations of low-degree components (or “rounds” in symmetric cryptography terms) are needed, these functions usually provide a good plain performance (i.e., a fast computation time on software).

# From Low-Degree Components to High-Degree Components

In 2018, the Friday hash function was proposed. The main idea of the paper is that using exclusively low-degree components may not be the best approach when reducing the number of constraints (i.e., increasing the circuit efficiency) in a proving system. In particular, the authors use the fact that the generally high-degree inverse function *f(x) = x^(-1)* can be rewritten as *f(x) = x^(-1) = y* if and only if *x*y* = *1 *for a nonzero x (some details are omitted for simplicity). This means that an otherwise high-degree function can be represented as a degree-*2* constraint (degree *2* due to the multiplication of *x* and *y*).

While the Friday hash function was broken shortly after its publication, its main idea, later named *non-procedural computation*, is still a very relevant result. Indeed, the popular and efficient hash functions Rescue and Rescue-Prime and newer ones such as Griffin and Anemoi follow exactly this approach. The advantage is that fewer rounds are needed to achieve cryptographic security, and thus the circuit efficiency in terms of constraints can often be increased significantly. However, the high-degree functions need to be evaluated in software when computing a hash value, and these evaluations are generally expensive. As a result, hash functions following this approach achieve higher circuit efficiency, but often a lower plain performance in software.

# Combining Good Plain Performance with Circuit Efficiency

The two approaches shortly described above essentially offer two different tradeoffs. If we want to achieve the fastest hash computation, we opt for the first approach. If we want to reduce the number of constraints and hence achieve more efficient proofs, we opt for the second approach. But what if we want to achieve both? We could then efficiently compute hash values used in the proving system while at the same time maintaining a circuit-friendly representation in terms of constraints.

Luckily, a new component, called the *lookup argument*, allows us to do exactly that. In particular, it allows us to evaluate *any* function f(x) = y efficiently inside the circuit, as long as x and y cannot take too many different values (for example, 256 different values is perfectly fine). Following this definition, the function f can then essentially be defined as a small “lookup table”, where for each of the different input values a specific “output” or “result” is chosen. In software, this can be implemented rather efficiently using only memory accesses to different addresses.

Arguably, the first new hash function following this approach is Reinforced Concrete from 2021. It was immediately the fastest hash function for a particular type of proving system. Unfortunately, however, it isn’t possible to use Reinforced Concrete efficiently in other proving systems. This issue was solved with the Tip5 hash function in 2023, which makes it possible to increase the performance in proving systems for which Reinforced Concrete was not designed.

This already brings us to Monolith, which is the most recent hash function following the lookup-based approach at the time of writing this document.

# What is Monolith?

Just like Poseidon or Poseidon2, Monolith is a family of hash functions optimized for zero-knowledge protocols (or, more generally, computational integrity use cases). By default it supports 31-bit and 64-bit prime fields, which have gained popularity recently due to their advantageous implementation characteristics. Indeed, arithmetic operations like additions and multiplications are particularly efficient when using these fields.

Moreover, Monolith relies on the new “lookup argument” to speed up its computation. However, in contrast to other lookup-oriented schemes like Reinforced Concrete or Tip5, the lookup table is not actually stored in the software implementation, and instead simple CPU-friendly instructions are used. This has the following two advantages.

- It makes a parallelized implementation comparatively straightforward, which helps making it faster than actually using the lookup tables directly in software.
- It allows for a constant-time implementation, which is not possible for Reinforced Concrete or Tip5 in an efficient way. This mitigates the risk of side-channel timing attacks, which have for example been applied to AES in the past.
- To the best of our knowledge, this is the first time that CPU-friendly lookup-based S-boxes over a binary field are used in combination with a larger element of a prime field.

# How does Monolith work?

Similar to the Poseidon or Rescue permutation, the Monolith one consists of multiple round functions. A single round function is defined as given in the following illustration.

One round is divided into Bars, Bricks, and Concrete. Concrete is defined by a matrix multiplication and Bricks is essentially a nonlinear component called a “Type-3 Feistel network” in the area of symmetric cryptography. The most novel component is Bars, which implements the lookup-based behavior. It is responsible for achieving security against algebraic attacks while still being efficiently computable. Monolith uses 6 rounds, i.e., it iterates the above round function 6 times.

The components Bricks and Concrete are responsible for the security against statistical attacks. Bricks is chosen such that the number of multiplications is minimized. Indeed, while many competitors like Poseidon, Rescue, or Tip5 use low-degree monomial power maps such as f(x) -> x⁷, using the square map x -> x² reduces the number of multiplications significantly. Moreover, it means that we only need degree-2 constraints for this layer, which is again a major advantage of Monolith.

# Applications

We suggest using Monolith when two conditions are fulfilled. First, a proving system focusing on small prime fields is needed. This is the case for most FRI-based approaches, as used for example in Risc0, Plonky2, and Plonky3. Moreover, the proving system should support the lookup argument. Implementing Monolith without the lookup argument is certainly also possible, but it results in a larger proving cost.

We implemented Monolith and compared its performance against competing constructions. Our focus initially was plain performance, but of course the circuit performance also played a major role during the design phase.

In the following table, we analyze the plain performance of Monolith.

As can be seen, Monolith is significantly faster than all other arithmetization-oriented (circuit-friendly) constructions, especially when being used in compression mode with a state size of eight 64-bit elements. This particular instance is even faster than SHA-3, which makes Monolith the first circuit-friendly construction faster than a popular classical hash function.

In the next table, we compare the circuit performance of Monolith. Here we focus on the number of lookups being used and on the area-degree product, which gives a good indication of how efficient the function is when implemented in an arithmetic circuit.

We can observe that Monolith is very efficient also in terms of circuit performance. Indeed, its major advantage of having a native degree of only 2 makes it possible to use many different tradeoffs when writing the circuit. This is not the case for most of the competing designs.

We emphasize, however, that comparing the circuit performance of different hash functions is in general a nontrivial task. This is mainly due to the fact that various different tradeoffs are possible.

# Conclusion

The Monolith family of hash functions is specifically tailored towards lookup-based proving systems and exploits this fact in order to achieve a plain performance very similar to that of SHA-3. In some cases, it’s even faster.

While being very fast in a software implementation, Monolith is also circuit-friendly and supports various efficient arithmetization approaches.

Markus Schofnegger, mschofnegger@horizenlabs.io

# Further Reading

Monolith: https://eprint.iacr.org/2023/1025

Reinforced Concrete: https://eprint.iacr.org/2021/1038

Tip5: https://eprint.iacr.org/2023/107

Poseidon: https://eprint.iacr.org/2019/458

Poseidon2: https://eprint.iacr.org/2023/323

Rescue: https://eprint.iacr.org/2019/426

Griffin: https://eprint.iacr.org/2022/403

Anemoi: https://eprint.iacr.org/2022/840

Risc0: https://www.risczero.com/

Plonky2: https://polygon.technology/blog/introducing-plonky2

Plonky3: https://github.com/Plonky3/Plonky3