Bloom Filters

Márcio Althmann
2 min readJan 30, 2023

--

A Bloom Filter is a space-efficient probabilistic data structure invented by Burton Howard Bloom in 1970 and used to test whether an element is a set member with a small probability of false positives but no false negatives.

Do you want to store the recently viewed items from your store efficiently, the thousand of blocked IP addresses, or even a list of fraudulent payment methods? Bloom filters can help you!

Bloom filters are a relatively simple data structure. They use a fixed-sized bit array, with each bit initially set to 0. To add an element to the set, the element is hashed multiple times, and the bits at the corresponding positions are set to 1.

The same hash values are computed to test an element, and the bits at the corresponding positions are checked. If all the bits are set to 1, the element is likely in the set.

It is not possible to remove an element from a Bloom filter. Once a bit is set to 1 in the bit array, it cannot be reset to 0. This is one of the key limitations of Bloom filters.

Bloom filter representation

When designing your Bloom filter, the size of the filter, the number of elements, and how many hash functions will be applied will affect the false positive rate.

You can understand more about the mathematics behind these values in this great article.

You can use the Bloom Filter Calculator to help you to choose the best values for your scenario. The below example shows the best values (filter size and the number of hash functions) for storing 1.000.000 elements with a false positive rate of only 0.10%.

It is essential to choose a uniformly distributed and fast hash function for your Bloom filter implementation. This example requires ten hash functions for each element insertion and lookup.

And the filter size will be only 1.7MB.

Below you can see a simple implementation using C#. Given an acceptable false positive rate and the expected number of elements, the filter size and required hash functions are automatically calculated.

See you in the next post :)

--

--