patterncppMinor
Tree/array mashup
Viewed 0 times
arraymashuptree
Problem
I'm currently trying to prove a data structure which I've created to speed up lookups for integral keys. The specific case I have in mind has the following usage case:
I've tried the following:
And now this tree. I've found through profiling that the following holds
With this tree, I've found that it's quicker in all cases, but what I would like is some critique on the approach - are there any further optimisations that I am missing for example, and here is the tree:
```
/** Copyright: Nim 2011 */
template
class tree
{
public:
typedef ValueType value_type;
typedef ValueType& reference;
typedef ValueType const& const_reference;
typedef KeyType key_type;
typedef ValueType* iterator;
typedef ValueType const* const_iterator;
private:
static const size_t MAX_SIZE = 256;
typedef unsigned char index_type;
struct node
{
node() : _cKey(), _cValue(), _vNodes() {}
~node()
{
cleanup();
}
void add(key_type index, key_type key, const_reference value)
{
if (!index)
{
_cKey = key;
_cValue.reset(new value_type(value));
return;
}
// get the lsb
index_type lsb = index & 0xff;
if (!_vNodes)
{
_vNodes = new node*[MAX_SIZE]();
_vNodes[lsb] = new node();
}
else if (!_vNodes[lsb])
_vNodes[lsb] = new node();
_vNodes[lsb]->add(index >>
- At start is populated with n number of items (where n can be from 1 to thousands), and the specific key is actually an
unsigned int.
- The critical path in the code requires a lookup based on this key.
I've tried the following:
- sorted
vectorwith binary search
std::map
boost::unordered_map
And now this tree. I've found through profiling that the following holds
- If every item looked up exists, then
unordered_setis quicker thanstd::mapwhich is quicker than sorted vector
- If the number of successful lookups for the search set is low, then
std::mapis quicker thanunordered_set(I guess the hash overhead is higher for failure case)
With this tree, I've found that it's quicker in all cases, but what I would like is some critique on the approach - are there any further optimisations that I am missing for example, and here is the tree:
```
/** Copyright: Nim 2011 */
template
class tree
{
public:
typedef ValueType value_type;
typedef ValueType& reference;
typedef ValueType const& const_reference;
typedef KeyType key_type;
typedef ValueType* iterator;
typedef ValueType const* const_iterator;
private:
static const size_t MAX_SIZE = 256;
typedef unsigned char index_type;
struct node
{
node() : _cKey(), _cValue(), _vNodes() {}
~node()
{
cleanup();
}
void add(key_type index, key_type key, const_reference value)
{
if (!index)
{
_cKey = key;
_cValue.reset(new value_type(value));
return;
}
// get the lsb
index_type lsb = index & 0xff;
if (!_vNodes)
{
_vNodes = new node*[MAX_SIZE]();
_vNodes[lsb] = new node();
}
else if (!_vNodes[lsb])
_vNodes[lsb] = new node();
_vNodes[lsb]->add(index >>
Solution
Just a few points (caveat: I'm not a C++ expert).
-
Comparing structured (i.e., non-scalar) keys is often the most expensive part of doing a search; it is usually worth searching first on an integer hash of the key. This way you only need to do integer comparisons for the most part.
-
I am AMAZINGLY surprised that binary search on a sorted array doesn't beat the pants off everything else. Binary search is vastly simpler than anything else and has optimal pruning. I seriously suspect your results here are to do with issue (1) above.
-
You don't say how much your results differed in your experiments. If the differences are small, it probably isn't worth rolling your own code.
-
Comparing structured (i.e., non-scalar) keys is often the most expensive part of doing a search; it is usually worth searching first on an integer hash of the key. This way you only need to do integer comparisons for the most part.
-
I am AMAZINGLY surprised that binary search on a sorted array doesn't beat the pants off everything else. Binary search is vastly simpler than anything else and has optimal pruning. I seriously suspect your results here are to do with issue (1) above.
-
You don't say how much your results differed in your experiments. If the differences are small, it probably isn't worth rolling your own code.
Context
StackExchange Code Review Q#3294, answer score: 2
Revisions (0)
No revisions yet.