The Universal Hash and The HalftimeHash

Published in

ASecuritySite: When Bob Met Alice

4 min readAug 15, 2021

In computer science, we often want to limit our data ranges. Along with this we might take large amounts of data and represent them as a data hash value. So let’s say we have a data universe of U, and which contains m possible output values. These output values could be defined a data bins, and our inputs as keys. We can then have n possible keys which will map to these m possible bins. As n is likely to be greater than m, we will thus map different values to the same universe value. Thus for a value of x and a value of y, and for a hash function of h() we could get:

h(x) = h(y)

This is defined a collision, and we will not avoid it, if the input values are greater than our universe and all every input in the universe is possible. In the following, we have a number possible values in our universe, and where the keys of 99 and 17 map to the same output bin and thus cause a collision:

In Figure 2, we thus have n keys and which will map to m bins. For this Carter and Wegman in 1979 defined the term of a universal hash [here]. This defined a hashing method that aims to provide a probability of a match of the hash of x and the hash of y will have a probability of…

The Universal Hash and The HalftimeHash

Written by Prof Bill Buchanan OBE FRSE