HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMinor

Counting Bloom filter library in Java

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
countingbloomjavafilterlibrary

Problem

See also the next iteration.

This snippet is a counting Bloom filter supporting removal of elements as well. Instead of a bit vector it maintains a vector of "bucket counters". Also, the API supports well taking your customised hash functions.

Inserting: Just compute all hash values and increment all counters indexed by hash values.

Querying: Compute hash values once again, and if at least one indexes a counter with value of zero, the element can be safely inferred to be not in the filter.

Removal: Compute hash values. Check that the element may be in the filter, and if so, decrement by one all the counters indexed by the hash values.

All in all, Bloom filter may report false positives, but never false negatives.

A couple of questions:

  • Is the API design good enough?



  • What hash functions a grown-up programmer would use?



And the code follows:

net.coderodde.util.BloomFilter:

```
package net.coderodde.util;

/**
* This class implements a counting Bloom filter which allows not only
* inserting, querying, but deleting an element as well.
*
* @author Rodion "rodde" Efremov
* @version 1.6
*/
public class BloomFilter {

/**
* The minimum capacity of the counter array.
*/
private static final int MINIMUM_CAPACITY = 128;

/**
* The array holding the counts of each bucket.
*/
private final int[] array;

/**
* The array of hash functions.
*/
private final AbstractHashFunction[] hashFunctions;

/**
* Constructs a counting Bloom filter with array capacity {@code capacity}
* and given hash functions.
*
* @param capacity the capacity of the counter array.
* @param firstHashFunction the mandatory hash function.
* @param otherHashFunctions the array of voluntary hash functions.
*/
public BloomFilter(int capacity,
AbstractHashFunction firstHashFunction,
AbstractHashFunction... otherHashFunctions) {
capa

Solution

I can't speak to hashing functions, but I can talk about API.

BloomFilter

  • Should be final. You don't want users extending BloomFilter, since the class is not designed for that.



  • The documentation doesn't discuss what happens in any exceptional conditions, such as when null values get passed into any of the methods. The class level documentation should specify what the type represents. In the constructor docs, don't say array. Array is an implementation detail. Also, you label 'firstHashFunction' as 'mandatory', but that's the only place it's discussed. Call it 'first', and include in the class/constructor documentation that at least one is required. Finally, if clients may not be familiar with Bloom filters, you may want to summarize at the class level why they're useful and how to use the class.



AbstractHashFunction

  • Should be an interface named HashFunction. You provide no implementation, and this is SPI - something intended to be customizable by the library client. Don't take away their ability to extend a concrete class without a good reason.



  • The documentation is poor. The interface defines a contract, not an API. The class/method documentation should clearly indicate the contract the implementer is expected to uphold, including standard behavior, pre- and post- conditions, if applicable, and failure conditions.



BitPermutationHashFunction, XorHashFunction

While they are part of the library, these shouldn't really be part of the API. Currently they are, because you allow clients to extend them. Since they are not designed or documented for that, you should make them final.

Context

StackExchange Code Review Q#101208, answer score: 2

Revisions (0)

No revisions yet.