A simple C++ library providing a few variations of Bloom filters.
Three types of Bloom filters are supported:
- Ordinary BFs
- Counting BFs
- Paired BFs (as seen in Mick et al.)
The following operations are supported on all types:
- Insert
- Query
- Serialize
- Deserialize
There are also a few operations supported by particular types:
- Counting and paired BFs also support a Delete operation
- Ordinary and paired BFs also support a Union operation
- Counting BFs support conversion into ordinary BFs
- Ordinary BFs support conversion into paired BFs
- Ordinary BFs can be compressed using the halving method (as seen in Wang et al.)
A few additional helper mechanisms may eventually be implemented:
- A helper to compute the optimal BF parameters given the number of content objects to be indexed
- Helpers to compute the probabilities of false positives for queries on existing populated BFs
Doxygen documentation can be compiled with make docs
.
Everything lives in the namespace bloom
.
These classes are templated, so you must specify the type of the object you are indexing when you instantiate a Bloom filter. For example:
OrdinaryBloomFilter<std::string> bf(4, 32);
This creates an ordinary BF with 4 hashes and a 32-bit array, capable of indexing strings.
You must specialize std::hash
for HashParams<T>
for each type T
you wish to construct a BF for. The HashParams
type is a structure consisting of the template parameter T
and a uint8_t
; the uint8_t
serves as a salt, to allow multiple hashes to be generated for a single object (as per the semantics of a BF).
A helper class implementing a 32-bit FNV-1 hash is given in FnvHash.hpp
. An example of how to specialize std::hash
using it can be found in tests/ordinary_insert_query.cpp
.
To insert an object o
into the BF, call bf.Insert(o)
, and to check for existence of an object, call bf.Query(o)
. If using a CountingBloomFilter or PairedBloomFilter, you can remove items using bf.Delete(o)
.
To serialize a BF into a std::ostream
os
, call bf.Serialize(os)
. To deserialize a BF from a std::istream
is
, use the static function Deserialize(is)
within the appropriate BF class.
For information about the other operations, refer to the Doxygen documentation or read the comments in the code.