HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Accessing an unordered_map

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
unordered_mapaccessingstackoverflow

Problem

How can I improve the access time to this unordered map? If I instead allocate a 8 x 10 x 30 x 30 x 24000:

std::vector > > > >


each access is about 0.0001 ms. Using an unordered_map implementation, each access is about 0.001 ms, 10x slower. What can I do to reduce the access time for this unordered_map? I'm doing a lot of insertions and modifications of existing entries.

```
#include
#include
#include

#include "boost/functional/hash.hpp"
#include "boost/date_time/posix_time/posix_time.hpp"

struct Key {
Key(int a, int b, int x, int y, int z) :
a_(a), b_(b), x_(x), y_(y), z_(z) {
}

bool operator==(const Key& other) const {
return (a_ == other.a_ &&
b_ == other.b_ &&
x_ == other.x_ &&
y_ == other.y_ &&
z_ == other.z_);
}
int a_, b_, x_, y_, z_;
};

namespace std {
template <>
struct hash {
typedef Key argument_type;
typedef std::size_t result_type;

result_type operator()(const Key& key) const {
std::size_t val = 0;
boost::hash_combine(val, key.a_);
boost::hash_combine(val, key.b_);
boost::hash_combine(val, key.x_);
boost::hash_combine(val, key.y_);
boost::hash_combine(val, key.z_);
return val;
}
};
} // namespace std.

int main(int argc, char** argv) {
std::default_random_engine generator;
std::uniform_int_distribution random_z(1,24000);
std::uniform_real_distribution random_vote(0.0, 1.0);

std::unordered_map votes;
int total_accesses = 0;
boost::posix_time::ptime start =
boost::posix_time::microsec_clock::local_time();
for (int i = 0; i 0.8) {
votes[key] += this_vote; // This is what is slow.
++total_accesses;
}
}
}
}
}

if ((i + 1) % 1000 == 0) {
boost::posix_time::ptime now =
boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration diff = now - start;
printf("%d / %d : Milliseconds per access: %f\

Solution

Firstly, it's somewhat hard to say, because you haven't given the code you are using with vector. Are you keeping it sorted and doing some kind of binary search to find the element, or is this just raw vector access times where you don't care about adding this_vote to a specific key? If so, of course raw vector access with something like push_back will be much faster, due to locality of reference, not requiring the computation of a hash function for each Key, and so on.

Secondly, this is where we grab our trusty profiler to see what exactly is taking up so much time. Compiling with -O3 and running gives this:

%   cumulative   self              self     total           
time  seconds   seconds    calls  ms/call  ms/call  name  
79.19     26.15    26.15      165   158.48   158.48  std::_Hashtable, std::allocator >, std::_Select1st >, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_rehash(unsigned int, std::pair const&)


Which, if you squint hard enough at, becomes clear that _M_rehash is the culprit, taking almost 80% of the time. Unfortunately, I'm running this on GCC 4.7, which has a known bug that slows down unordered_map quite a lot: it's on the mailing list if you're interested. Further, reserve doesn't fix this problem. If you're using GCC 4.7, unordered_map will be slow.

The actual insert call is called 14311307 times, and accounts for only 0.64% of the time at 0.21s:

0.64     32.88     0.21 14311307     0.00     0.00  std::__detail::_Node_iterator, false, true> std::_Hashtable, std::allocator >, std::_Select1st >, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_insert_bucket >(std::pair&&, unsigned int, unsigned int)


Let's switch over to a regular std::map:

4.26     10.10     0.46 14311307     0.00     0.00  std::_Rb_tree_iterator > std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::_M_insert_unique_ >(std::_Rb_tree_const_iterator >, std::pair&&)


Now we have the same 14311307 calls to insert, which take up 4.26% of the time at 0.46s. However, there is not the huge overhead of having to rehash the map, hence this is actually quite a bit quicker.

The std::map code is exactly the same, except with a specialization of std::less:

#include 
#include 
#include 

namespace std
{
template <>
struct less
{
    bool operator()(const Key& k1, const Key& k2) const
    {
        return std::tie(k1.a_, k1.b_, k1.x_, k1.y_, k1.z_) <
               std::tie(k2.a_, k2.b_, k2.x_, k2.y_, k2.z_);
    }
};
}


and with votes being declared as:

std::map votes;


Since you're already using boost, why not try boost::unordered_map instead?

#include "boost/unordered_map.hpp"

std::size_t hash_value(const Key& key) 
{
    typedef Key argument_type;
    typedef std::size_t result_type;
    std::size_t val = 0;
    boost::hash_combine(val, key.a_);
    boost::hash_combine(val, key.b_);
    boost::hash_combine(val, key.x_);
    boost::hash_combine(val, key.y_);
    boost::hash_combine(val, key.z_);
    return val;
}

boost::unordered_map> votes;


You can also save a bit of time by not constructing the Key if it isn't needed, and passing an rvalue instead of an lvalue reference:

float this_vote = random_vote(generator);
if (this_vote > 0.8) {
    votes[Key(a, b, x, y, z)] += this_vote; 
    ++total_accesses;
}


If you have an up to date version of boost, you might also try using emplace:

if (this_vote > 0.8) {
    auto val = votes.emplace(a, b, x, y, z);
    if(!val.second)
        *(val.first) += this_vote;


This may or may not be slower. I can't test it, but my gut feeling is it will be slower because Key is a thin object. If it contained more than a few integers, this would possibly be faster.

Basically, the final advice is to profile. Further, you're unlikely to get access times equivalent to a vector for anything else, as vector is guaranteed to be contiguous, hence will have good locality of reference and likely won't require as many dereferences as any other container.

Code Snippets

%   cumulative   self              self     total           
time  seconds   seconds    calls  ms/call  ms/call  name  
79.19     26.15    26.15      165   158.48   158.48  std::_Hashtable<Key, std::pair<Key const, float>, std::allocator<std::pair<Key const, float> >, std::_Select1st<std::pair<Key const, float> >, std::equal_to<Key>, std::hash<Key>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_rehash(unsigned int, std::pair<unsigned int, unsigned int> const&)
0.64     32.88     0.21 14311307     0.00     0.00  std::__detail::_Node_iterator<std::pair<Key const, float>, false, true> std::_Hashtable<Key, std::pair<Key const, float>, std::allocator<std::pair<Key const, float> >, std::_Select1st<std::pair<Key const, float> >, std::equal_to<Key>, std::hash<Key>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_insert_bucket<std::pair<Key, float> >(std::pair<Key, float>&&, unsigned int, unsigned int)
4.26     10.10     0.46 14311307     0.00     0.00  std::_Rb_tree_iterator<std::pair<Key const, float> > std::_Rb_tree<Key, std::pair<Key const, float>, std::_Select1st<std::pair<Key const, float> >, std::less<Key>, std::allocator<std::pair<Key const, float> > >::_M_insert_unique_<std::pair<Key const, float> >(std::_Rb_tree_const_iterator<std::pair<Key const, float> >, std::pair<Key const, float>&&)
#include <map>
#include <tuple>
#include <functional>

namespace std
{
template <>
struct less<Key>
{
    bool operator()(const Key& k1, const Key& k2) const
    {
        return std::tie(k1.a_, k1.b_, k1.x_, k1.y_, k1.z_) <
               std::tie(k2.a_, k2.b_, k2.x_, k2.y_, k2.z_);
    }
};
}
std::map<Key, float> votes;

Context

StackExchange Code Review Q#27013, answer score: 6

Revisions (0)

No revisions yet.