The Universal Hash and The HalftimeHash
In computer science, we often want to limit our data ranges. Along with this we might take large amounts of data and represent them as a data hash value. So let’s say we have a data universe of U, and which contains m possible output values. These output values could be defined a data bins, and our inputs as keys. We can then have n possible keys which will map to these m possible bins. As n is likely to be greater than m, we will thus map different values to the same universe value. Thus for a value of x and a value of y, and for a hash function of h() we could get:
h(x) = h(y)
This is defined a collision, and we will not avoid it, if the input values are greater than our universe and all every input in the universe is possible. In the following, we have a number possible values in our universe, and where the keys of 99 and 17 map to the same output bin and thus cause a collision:
In Figure 2, we thus have n keys and which will map to m bins. For this Carter and Wegman in 1979 defined the term of a universal hash [here]. This defined a hashing method that aims to provide a probability of a match of the hash of x and the hash of y will have a probability of…