patternjavaMinor
Improved Van Emde Boas tree based map in Java
Viewed 0 times
mapimprovedvanboasjavabasedtreeemde
Problem
After working on Van Emde Boas tree -based map in Java, I came up with this code. I managed to fix the bugs and improve the performance by using internally primitive integers instead of wrapped
VanEmdeBoasTreeMap.java
```
package net.coderodde.util;
import java.util.Collection;
import java.util.Collections;
import java.util.ConcurrentModificationException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.Objects;
import java.util.Random;
import java.util.Set;
import java.util.TreeMap;
/**
* This class implements a van Emde Boas tree -based map that maps integer keys
* to arbitrary satellite data.
*
* @author Rodion "rodde" Efremov
* @version 1.61 (Mar 16, 2017)
*
* @param the type of the satellite data.
*/
public class VanEmdeBoasTreeMap implements Map {
/**
* Holds the minimum universe size.
*/
private static final int MINIMUM_UNIVERSE_SIZE = 2;
/**
* Used to denote the absence of an element.
*/
public static final int NIL = -1;
/**
* This static inner class implements recursively the entire van Emde Boas-
* tree.
*
* @param The type of the satellite data.
*/
private static final class VEBTree {
/**
* The universe size of this vEB-tree.
*/
private final int universeSize;
/**
* The mask used to compute the low index.
*/
private final int lowMask;
/**
* The shift length for computing the high index.
*/
private final int highShift;
/**
* The minimum integer key in this tree.
*/
private int min;
/**
* The maximum integer key in this tree.
*/
private int max;
/**
* The summary vEB-tree.
*/
private final VEBTree summary;
/**
* The children vEB-trees of thi
java.lang.Integers:VanEmdeBoasTreeMap.java
```
package net.coderodde.util;
import java.util.Collection;
import java.util.Collections;
import java.util.ConcurrentModificationException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.Objects;
import java.util.Random;
import java.util.Set;
import java.util.TreeMap;
/**
* This class implements a van Emde Boas tree -based map that maps integer keys
* to arbitrary satellite data.
*
* @author Rodion "rodde" Efremov
* @version 1.61 (Mar 16, 2017)
*
* @param the type of the satellite data.
*/
public class VanEmdeBoasTreeMap implements Map {
/**
* Holds the minimum universe size.
*/
private static final int MINIMUM_UNIVERSE_SIZE = 2;
/**
* Used to denote the absence of an element.
*/
public static final int NIL = -1;
/**
* This static inner class implements recursively the entire van Emde Boas-
* tree.
*
* @param The type of the satellite data.
*/
private static final class VEBTree {
/**
* The universe size of this vEB-tree.
*/
private final int universeSize;
/**
* The mask used to compute the low index.
*/
private final int lowMask;
/**
* The shift length for computing the high index.
*/
private final int highShift;
/**
* The minimum integer key in this tree.
*/
private int min;
/**
* The maximum integer key in this tree.
*/
private int max;
/**
* The summary vEB-tree.
*/
private final VEBTree summary;
/**
* The children vEB-trees of thi
Solution
I am sorry, but regarding the performance figures, you are comparing apples to oranges.
Your so-called VanEmdeBoasTreeMap implementation actually uses a java.util.HashMap for virtually all off its tested operations (apart from iteration).
On the other hand, you compare to a java.util.TreeMap. Well, it is clear that your internal java.util.HashMap is considerably faster than the java.util.TreeMap you compare to.
This single fact fully explains the ostensible speedup of your VanEmdeBoasTreeMap.
A second methodical flaw is that you are not benchmarking the two data structures independently of each other. The benchmark of the second data structure always inherits the memory garbage of the first datastructure's benchmark and thus is artificially slowed down by its predecessor.
And third, do not believe in a single measurement. In the following figures, I took the median measurement of nine independent measurements.
If you replace in the VanEmdeBoasTreeMap class
by
you receive the following realistic results, comparing your enriched TreeMap to the standard TreeMap:
Here, the two blocks of benchmarks figures were taken independently of each other, by commenting the benchmark code of the respective other data structure out.
As a result, you can see that your VanEmdeBoas implementation generates an average overhead of >35%.
Or, to have an alternative fair competition, replace in the demo code
by
and you get the following results, comparing your enriched HashMap to the standard HashMap (which still is termed "TreeMap" here, because I did not edit the console message strings):
Again, the two blocks of benchmarks figures were taken independently of each other, by commenting the benchmark code of the respective other data structure out. Every block is the median block of 9 measurements.
Still your VanEmdeBoas implementation generates an average overhead of >20%.
Obviously, the alleged speed up of your VanEmdeBoasTreeMap only attributes to the fact that you are indeed encapsulating a standard java.util.HashMap.
In terms of speed up, I cannot recognize any added value of your implementation (apart from the ordering in the iteration).
A nice programming exercise though.
If you are interested in true benchmark figures, please throw the
Your so-called VanEmdeBoasTreeMap implementation actually uses a java.util.HashMap for virtually all off its tested operations (apart from iteration).
On the other hand, you compare to a java.util.TreeMap. Well, it is clear that your internal java.util.HashMap is considerably faster than the java.util.TreeMap you compare to.
This single fact fully explains the ostensible speedup of your VanEmdeBoasTreeMap.
A second methodical flaw is that you are not benchmarking the two data structures independently of each other. The benchmark of the second data structure always inherits the memory garbage of the first datastructure's benchmark and thus is artificially slowed down by its predecessor.
And third, do not believe in a single measurement. In the following figures, I took the median measurement of nine independent measurements.
If you replace in the VanEmdeBoasTreeMap class
private final Map map = new HashMap<>();by
private final Map map = new TreeMap<>();you receive the following realistic results, comparing your enriched TreeMap to the standard TreeMap:
Seed = 1511631843489
VanEmdeBoasTreeMap.put in 1223 milliseconds.
VanEmdeBoasTreeMap iteration in 18 milliseconds.
VanEmdeBoasTreeMap.get in 423 milliseconds.
VanEmdeBoasTreeMap.remove in 85 milliseconds.
VanEmdeBoasTreeMap total time: 1731 milliseconds.
Seed = 1511631930097
TreeMap.put in 795 milliseconds.
TreeMap iteration in 6 milliseconds.
TreeMap.get in 401 milliseconds.
TreeMap.remove in 77 milliseconds.
TreeMap total time: 1273 milliseconds.Here, the two blocks of benchmarks figures were taken independently of each other, by commenting the benchmark code of the respective other data structure out.
As a result, you can see that your VanEmdeBoas implementation generates an average overhead of >35%.
Or, to have an alternative fair competition, replace in the demo code
tree2 = new TreeMap<>();by
tree2 = new HashMap<>();and you get the following results, comparing your enriched HashMap to the standard HashMap (which still is termed "TreeMap" here, because I did not edit the console message strings):
Seed = 1511632397148
VanEmdeBoasTreeMap.put in 260 milliseconds.
VanEmdeBoasTreeMap iteration in 17 milliseconds.
VanEmdeBoasTreeMap.get in 88 milliseconds.
VanEmdeBoasTreeMap.remove in 48 milliseconds.
VanEmdeBoasTreeMap total time: 396 milliseconds.
Seed = 1511632550587
TreeMap.put in 202 milliseconds.
TreeMap iteration in 8 milliseconds.
TreeMap.get in 80 milliseconds.
TreeMap.remove in 47 milliseconds.
TreeMap total time: 329 milliseconds.Again, the two blocks of benchmarks figures were taken independently of each other, by commenting the benchmark code of the respective other data structure out. Every block is the median block of 9 measurements.
Still your VanEmdeBoas implementation generates an average overhead of >20%.
Obviously, the alleged speed up of your VanEmdeBoasTreeMap only attributes to the fact that you are indeed encapsulating a standard java.util.HashMap.
In terms of speed up, I cannot recognize any added value of your implementation (apart from the ordering in the iteration).
A nice programming exercise though.
If you are interested in true benchmark figures, please throw the
java.util.HashMap out of your class VanEmdeBoasTreeMap and rework the benchmark code so that it only uses the actual VanEmdeBoas data structure.Code Snippets
private final Map<Integer, E> map = new HashMap<>();private final Map<Integer, E> map = new TreeMap<>();Seed = 1511631843489
VanEmdeBoasTreeMap.put in 1223 milliseconds.
VanEmdeBoasTreeMap iteration in 18 milliseconds.
VanEmdeBoasTreeMap.get in 423 milliseconds.
VanEmdeBoasTreeMap.remove in 85 milliseconds.
VanEmdeBoasTreeMap total time: 1731 milliseconds.
Seed = 1511631930097
TreeMap.put in 795 milliseconds.
TreeMap iteration in 6 milliseconds.
TreeMap.get in 401 milliseconds.
TreeMap.remove in 77 milliseconds.
TreeMap total time: 1273 milliseconds.tree2 = new TreeMap<>();tree2 = new HashMap<>();Context
StackExchange Code Review Q#157955, answer score: 3
Revisions (0)
No revisions yet.