Incremental & Multiset Hashing

Here small variations lead to… small computations

Ramsès Fernàndez-València

Published in

RootstockLabs: Research & Technology

11 min readMay 6, 2022

Introduction

Hash functions map strings of variable finite length to strings of fixed length. The main property of these functions is its collision resistance, meaning that it is hard to find two different input strings yielding the same result.

Incremental hashing, introduced in (Bellare et al.) presents an interesting idea: could it be possible to define hash functions such that, given changes on the input strings, the computations required to update the hash were proportional to the amount of change in the input strings?

The applications of incremental hash functions in Blockchain are not abundant, nevertheless one can find a proposal in (Mihajloska et al.) of incremental hashing based on SHA3 which is put into practice by the private Blockchain Kadena (Martino) together with the use of Merkle trees. Kadena uses incremental hashing for the updates of the log among the nodes.

The other topic that we are covering in this post is multiset hashing, an interesting primitive where the ordering of the inputs is not important. This kind of hash function maps multisets of arbitrary finite size to hashes of fixed length. They are incremental in that when new members are added to the multiset, the hash can be quickly updated.

Multiset hash functions play a role in the construction of cryptographic accumulators, which provide proofs of membership of constant size, independent of the length of the list of members. In particular, the proposal ECMH (Maitin-Shepard et al.), is explored by the Dusk Network (Francioni and Venturelli) for the definition of a cryptographic accumulator since the ECMH proposal does not need a trusted setup. Multiset hashing is also being explored for the definition of efficient verifiable searchable encryption schemes (Cai et al.)

Concerning multiset hashing, we will include the main definitions avoiding as much as possible any cryptographic technicality. In particular, we will introduce some proposals in additive and multiplicative hashing, such as a proposal based on XOR addition, which is very efficient but requires a secret key and only provides a weak form of collision resistance.

In this post, we are going to explore incremental hashing, including the randomize-then-combine paradigm of Micciancio and Bellare, which is the main ingredient for the definition of this kind of function. We will include some examples of incremental hash functions.

Incremental Hashing

The main paradigm and properties

The main paradigm for the definition of incremental hash functions was introduced by Micciancio and Bellare in (Bellare and Micciancio). This paradigm, called randomize-then-combine, requires a randomizer function

which maps strings of b bits to elements in a group G.

Let (G, ⨂) be an abelian group, b > 0 an integer and H = {hᵢ}, i ∈ I, a family of randomizer functions, indexed by I = [0, k]. An incremental hash is a function

such that it maps a message M = M₁ || … || Mₙ to

Mind the intrinsic operation, where the message is split into k strings of b bits.

The incrementality property can be detailed as follows: if one changes the block Mᵢ in the concatenation M = M₁ || … || Mᵢ || … || Mₙ by a block Mᵢ, then the hash of the new message M’ = M₁ || … || M’ᵢ || … || Mₙ becomes:

One observes that the formula could be applied to the case of deletions: if we replace M = M₁||…||Mᵢ||…||Mₖ by M’ = M₁||…||M’ᵢ||…||Mₖ then:

Some examples

The Implagliazzo — Naor proposal

One of the first examples of an incremental hash function was introduced by Impagliazzo and Naor. This proposal hashes messages bitwise (b = 1), which makes it a bit inefficient for rather long messages.

Let (G, ⨂) be an abelian group and let us consider k elements g₁, …, gₖ ∈ G each one being associated with a block in the message M = M₁||…||Mₖ. The Impagliazzo-Naor proposal can be described as

where the family of randomizer functions is given by

The Chaum — van Heijst — Pfitzmann proposal

Let p = 2q + 1 be a prime for some large prime q and let a₁, a₂ ∈ ℤₚ* random. A message M ∈ ℤ_q² can be written uniquely as M = M₁ + q · M₂ for 0 ≤ M₁, M₂ ≤ q -1. This proposal can be described as

Where the randomizer functions are given by

Concerning incrementality, if we replace M = (M₁, M₂) with M’ = (M’₁, M₂) we get:

The Bellare — Goldreich — Goldwasser proposal

Let (G, ⨂) be an abelian group of prime order p together with g₁, …, gₙ ∈ G. If we have a message M = M₁ || … || Mₙ parsed in blocks of length b, the proposal for an incremental hash function from Bellare, Goldreich and Goldwasser follows:

In this case, the randomizer family is given by the functions

Here [Mᵢ] represents the integer representation of Mᵢ.

The integrality property of this proposal is given by the expression below. If we replace M = M₁||…||Mⱼ||…||Mₙ by M’ = M₁||…||M’ⱼ||…||Mₙ then:

The Bellare — Micciancio variations

This proposal led to the randomize-then-combine paradigm. Let (G, ⊙) be an abelian group, b the block size and l an upper bound parameter for the number of blocks. Let us consider a family of randomizer functions hi defined on bitstrings of size b + l with values in G. Let us consider a message M = M1 || … || Mn parsed into blocks of size b. Then the Bellare and Micciancio proposal is given by:

For Ii the l-bit representation of the block index i.

It is important to note that, depending on the group operation ⊙ (product, modular addition, vector addition in lattices or XOR addition), the authors proposed the incremental functions MuHash, AdHash, LtHash and XHash respectively.

Multiset hashing

Basic concepts

A multiset is a finite, unordered, group where an element can occur as a member more than once: all sets are multisets where all elements have multiplicity 1, but a multiset is not a set if an element appears more than once. If M is a multiset of elements of a countable set B, the number of times b ∈ B appears in M is called the multiplicity of b in M, and is denoted M_b. The sum of all multiplicities in M is the cardinality of M. The union of multisets combines two multisets M and N leading to a multiset M ∪ N in which elements appear with a multiplicity that is the sum of the multiplicities of that element in each multiset.

A multiset hash function is given by a triple {H, +H, ≡H} of probabilistic polynomial-time (henceforth PPT) algorithms such that the following properties hold:

Compression: H maps multisets of B into elements of cardinality approximately 2ᵐ for an integer m > 0. This property guarantees storing hashes in small bounded amounts of memory.
Comparability: since H is PPT, a multiset may not be hashed to the same value each time. To compare hashes one needs to set a comparing relation. The following relation must hold for all multisets M of B: H(M) ≡H H(M).
Incrementality: for all multisets M, N ∈ B, the following holds:

In particular, for b ∈ B one has:

A collision for M’ is a multiset M not equal to M such that H(M) = H(M’). A multiset hash functions family can be multiset collision-resistant or set collision-resistant: avoiding technicalities, the family will be (multi)set collision-resistant there is a negligible probability of finding a (multi)set S and a multiset M of B, with S ≠ M and equivalent hashes.

Additive & multiplicative multiset hashing

Let B = {0, 1}ᵐ represent the set of bit vectors of length m and let M be a multiset of elements of B. Let

be taken from a family of pseudorandom hash functions with indexes in a space of keys k ∈ K. Let L ≤ 2ᵐ and define:

We say that two triples [h, c, r] and [h’, c’, r’] are equivalent if, and only if:

h-H_k(0, r) ≡ h’ -H_k(0, r’) mod n.
c ≡ c’ mod L.

The addition of two triples [h, c, r] and [h’, c’, r’] is defined as

It is easy to check that

One can check (Clarke et al.) that the multiset hash corresponding to

for ⊕ the XOR addition is set collision-resistant. If one considers

the resulting multiset hash is multiset collision-resistant (Clarke et al.).

Let us consider a poly-random function (a function for such there is no polynomial-time algorithm with oracle access to it able to distinguish between values of the function and random strings)

One may define the following family of additive multiset hash functions:

Where the equivalence is set to be = and the addition is defined to be modular vector addition. Now, assuming that the worst-case shortest vector problem is infeasible to solve in polynomial time, this family is multiset collision-resistant (Clarke et al.).

It is time to define multiplicative multiset hashing. For q a large prime power, and H a poly-random function taking values from B to Zq. One may define a multiplicative multiset hash function by setting

where the equivalence is set to be = and the addition is defined as the product in Zq. This multiset hash function is multiset collision-resistant if one assumes that the discrete logarithm problem is hard.

Elliptic curves multiset hashing: the ECMH proposal

This proposal (Maitin-Shepard et al.) uses elliptic curves and the BLAKE2 hash function to define a function with good efficiency properties, resistant to timing attacks and unencumbered with trusted setups.

Besides BLAKE2, this is a proposal that relies on the Shallue — van de Woestijne algorithm. This algorithm is used to map a point

to a pair of points

which satisfies an arbitrary elliptic curve:

The algorithm, denoted as SW2, requires the following linear maps:

A trace function:

A quadratic solver:

A function COEFF₀(x) mapping x to the 0th coefficient of any (fixed) polynomial representation of x.

This function, given on input x, creates three possible values of w, among which only one is associated with y satisfying the elliptic curve. We now describe the algorithm:

The function requires a parameter

and computations

For values

We have, for each j = 1, 2, 3:

Finally, for M a multiset of elements of B, the ECMH proposal is defined as

In this definition, one may replace the hash function BLAKE2 with any other hash function compatible with the requirements of SW2. One should replace BLAKE2 with the proposal BLAKE3 to obtain a much more efficient multiset hash function.

Final remark

It is interesting to note that Bellare and Micciancio, in their seminal work in incremental hashing, set a framework that can be extended for the construction of multiset hash functions.

Let A be a possibly infinite set (resp. a multiset), and H a usual hash function taking values in A with outputs in an additive group (G, ⨂). One can define an incremental hash function by setting

Where mn is the multiplicity of an in A (mind the abuse of notation when writing the product). It is not hard to see that the latter expression for multisets is exactly the randomize-then-combine paradigm of Bellare and Micciancio.

References

Bellare, Mihir, et al. “Incremental Cryptography: The Case of Hashing and Signing.” Advances in Cryptology — CRYPTO ’94, Springer, 1994, pp. 216–233.

Bellare, Mihir, and Daniele Micciancio. “A new paradigm for collision-free hashing: Incrementality at reduced cost.” EUROCRYPT’97: Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques, Springer, 1997, pp. 163–192.

Cai, Chengjun, et al. “Towards trustworthy and private keyword search in encrypted decentralized storage.” 2017 IEEE International Conference on Communications, 2017.

Clarke, Dwaine, et al. “Incremental Multiset Hash Functions and Their Application to Memory Integrity Checking.” Advances in Cryptology — ASIACRYPT 2003, Springer, 2003, pp. 188–207.

Francioni, Emanuele, and Fulvio Venturelli. The Dusk Network And Blockchain Architecture. 2018. Dusk Network, https://dusk.network/.

Maitin-Shepard, Jeremy, et al. Elliptic Curve Multiset Hash. arXiv:1601.06502v1 [cs.CR]. 2016. Arxiv, https://arxiv.org/.

Martino, Will. Kadena: The first scalable, high performance private blockchain. Whitepaper. 2016. blockchainlab.com, https://blockchainlab.com/pdf/Kadena-ConsensusWhitePaper-Aug2016.pdf.

Mihajloska, Hristina, et al. “Reviving the Idea of Incremental Cryptography for the Zettabyte Era Use Case: Incremental Hash Functions Based on SHA-3.” Open Problems in Network Security. iNetSec 2015, Springer, 2015, pp. 97–111.

The header image was adapted from a photo by Andrew Seaman on Unsplash.