The Universal Hash and The HalftimeHash

--

In computer science, we often want to limit our data ranges. Along with this we might take large amounts of data and represent them as a data hash value. So let’s say we have a data universe of U, and which contains m possible output values. These output values could be defined a data bins, and our inputs as keys. We can then have n possible keys which will map to these m possible bins. As n is likely to be greater than m, we will thus map different values to the same universe value. Thus for a value of x and a value of y, and for a hash function of h() we could get:

h(x) = h(y)

This is defined a collision, and we will not avoid it, if the input values are greater than our universe and all every input in the universe is possible. In the following, we have a number possible values in our universe, and where the keys of 99 and 17 map to the same output bin and thus cause a collision:

Figure 1: Causing a collision

In Figure 2, we thus have n keys and which will map to m bins. For this Carter and Wegman in 1979 defined the term of a universal hash [here]. This defined a hashing method that aims to provide a probability of a match of the hash of x and the hash of y will have a probability of…

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.