The Wonder of Bloom

--

A Bloom filter is used to create a probabilistic guess on whether an item is in a data structure, and was created by Burton Howard Bloom (Bloom, 1970). Within the test, the query will define if the value is “possibly in the set” or “definitely not in the set”. Each added element is hashed with two or more hashing methods, and the values generated values are used to set the bits in a bit array. In this example we use a 32-bit bit vector, and use Murmur 2 and FNV for the hashes. Typically we use non-crypto hashes, in order to speed up the process.

In this demo, the first value is taken from Murmur 2, and the second one is from FNV. Each of these are used to generate a 32-bit bit vector. We will add “fred”, “bert” and “greg”, and which gives a Bloom filter of:

                01234567890123456789012345678901
Add fred: 00000000000000100000010000000000 fred [21,14]
Add bert: 00000000100000100000010000000100 bert [29,8]
Add greg: 00000000100100100000011000000100 greg [11,22]
We now have bit position 8, 11, 14, 21, 22 and 29 set.

We can now test for “amy” and “greg”:

Now we can test for amy:
amy is not there [16,12]
New we can test for greg:
greg may be in there [11,22]

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.