Constructing Bloom Filters without False Positives

Ori Rottenstreich
The Orbs Blog
Published in
2 min readApr 18, 2018
Image by Marina Rudinsky

On April 17, I attended the IEEE International Conference on Computer Communications in Honolulu (😁) to present some new research on bloom filters.

The presentation will be based on a paper I co-wrote with Sándor Z. Kiss from the Budapest University of Technology and Economics (BME), Éva Hosszu and János Tapolcai from MTA-BME Future Internet Research Group, High-Speed Networks Laboratory and Lajos Rónyai from Computer and Automation Research Institute Hungarian Academy of Sciences and BME.

So what is a bloom filter?

A bloom filter is a probabilistic data structure for set representation with relatively low data storage use. It allows developers to efficiently test if an element is located within the set. Cryptocurrencies such as Bitcoin and Ethereum use bloom filters to improve their performance, where it helps them avoid searching for non-existent data and as a less memory-intensive alternative to indexing.

Why is it important for blockchains?

Since a blockchain is a (potentially) giant database, it contains a very large amount of data. Imagine a light client of a cryptocurrency supporting limited verification that refers only to a part of the blockchain. By representing by a bloom filter the set of addresses that refers to the light client, a regular client can determine whether or not to forward a given transaction to the light client.

A new construction for bloom filters

One of the drawbacks of bloom filters is that it can result in false positives — indication that the element is a member of the set when it really isn’t.

In the presented paper, “Bloom Filter with a False Positive Free Zone”, we suggest a new construction for bloom filters that avoids false positives.

This construction applies to a finite universe from which set elements are taken. It relies on existing non-adaptive combinatorial group testing scheme. Unlike the typical bloom filter, elements are hashed to a bit array through deterministic, fast and simple-to-calculate functions. The maximal set size for which false positives are completely avoided is a function of the universe size and is controlled by the amount of allocated memory.

The paper can be found at:

--

--

Ori Rottenstreich
The Orbs Blog

Ori Rottenstreich (PhD) is the Chief Research Officer of Orbs