A Case of False Positives in Bloom Filters.

A Bloom Filter is a Probabilistic data structure,that is used to test the existence of an element in a set. If ‘x’ is the element and ‘S’ is the set,the existence of the element ‘x’ in set ’S’ returns 1(true),else will return 0(false).

A Bloom Filter consists of a vector array of n boolean values,initially all set to 0(false), as well as ‘k’ independent hash functions, h0,h1,….h(k-1) each within range of {0,1,2,….,n-1}. An element can be added into the bloom filter but not deleted from it, when an element ‘x’ has to be added to it,the element is hashed with ‘k’ hash functions and ‘n’ array positions are obtained,and the value in those indexes are changed to 1. so when we want to search the existence of the element it is hashed with it hash functions and checked if the values at their array positions are 1 and if the value is 1(true) at all its array positions, the existence of element is returned 1(true).

There might be a case where an element can refer to the same array position as that of an another element,in those cases the new element doesn’t change the value as 1 in its array position ,but the previous element would have set the value as 1. Hence in these cases even if the element is not present in the set,its existence is returned as 1. This is called ‘False Positives’. There is no proper way to distinguish between the two cases of positives,we can just calculate the probability of false positives.

To calculate Probability of False Positives:

lets consider, 
n = size of bloom filter array.
m = no of elements.
k = no of hash functions.

The probability that a bit ‘j’ is not set to 1 is,
(1–1/n)^n which is equivalent to e^-km/n.

Hence the probability of False Positives and that all bits of new element is set to 1 is,
P = (1-e^-km/n)^k — — — ->(1)
where P = Probability of False Positives.
if m,n is constant, then P is minimum when
k = log(2)(n/m)

Substituting the value of k in equation (1)
P = (0.5^log 2 )*(n/m)

Hence considering the Bloom filter of length ‘n’ and ‘m’ number of elements and probability of false positives as P, the system is said to have minimal possibility of false positives if,
n/m > 0.7log(1/P)

Points to be Noted:

  • The bloom filter provides a faster way to check the existence of an element in a set.It just returns a true(1) or false(0) as results.
  • Unlike a standard hash table, a Bloom filter of a fixed size can represent a set with an arbitrarily large number of elements.
  • It doesn’t do a exhaustive search to find a element in set, it simply applies hash function’s on the element and check only the array positions which were the result of hash function.
  • The Bloom Filters are space efficient and space constant. The complexity is defined as O(k) ans since k is a constant, the space complexity of a bloom filter is always a constant.
  • As the Number of elements ‘m’ in a n-bit Bloom filter array increases, the probability of the False Positives ‘P’ increases.
  • False Negative is case wherein the element ‘x’ exists in the system but its existence is returned as 0(false). The False Negative cases are not permitted in Bloom Filters and hence the removal of an element from a bloom filter is not possible.
  • The hash Functions ‘k’ depends on the length of the Bloom Filter ’n’ and the number of elements ‘m’ present in the set.