data-structures algorithms probability
Definition
Bloom Filter
A Bloom filter is a space-efficient, probabilistic data structure used to test whether an element is a member of a set. It is designed to be incredibly fast and use minimal memory, at the cost of providing false positive results.
- False Positives: The filter may report that an element is in the set when it is not.
- False Negatives: The filter never reports that an element is not in the set if it has been inserted (i.e., it is 100% accurate for negative results).
Mechanism
A Bloom filter consists of a bit array of bits, all initially set to 0. It also requires different hash functions, each of which maps an element to one of the array positions.
Insertion
To insert an element , it is hashed by each of the hash functions to get array indices. The bits at all these indices are set to 1.
Query
To query for an element , hash it using the same functions. If all bits at these indices are 1, the element is possibly in the set. If any bit is 0, the element is definitely not in the set.
Mathematical Properties
False Positive Probability
The probability of a false positive depends on the number of elements inserted, the size of the bit array , and the number of hash functions :
For a given and , the value of that minimises the false positive probability is:
Complexity
| Operation | Time Complexity |
|---|---|
| Insertion | |
| Query |
Space complexity is , which is independent of the size of the elements themselves, making Bloom filters ideal for large-scale data processing.