patterncppMinor
Accessing an unordered_map
Viewed 0 times
unordered_mapaccessingstackoverflow
Problem
How can I improve the access time to this unordered map? If I instead allocate a 8 x 10 x 30 x 30 x 24000:
each access is about 0.0001 ms. Using an
```
#include
#include
#include
#include "boost/functional/hash.hpp"
#include "boost/date_time/posix_time/posix_time.hpp"
struct Key {
Key(int a, int b, int x, int y, int z) :
a_(a), b_(b), x_(x), y_(y), z_(z) {
}
bool operator==(const Key& other) const {
return (a_ == other.a_ &&
b_ == other.b_ &&
x_ == other.x_ &&
y_ == other.y_ &&
z_ == other.z_);
}
int a_, b_, x_, y_, z_;
};
namespace std {
template <>
struct hash {
typedef Key argument_type;
typedef std::size_t result_type;
result_type operator()(const Key& key) const {
std::size_t val = 0;
boost::hash_combine(val, key.a_);
boost::hash_combine(val, key.b_);
boost::hash_combine(val, key.x_);
boost::hash_combine(val, key.y_);
boost::hash_combine(val, key.z_);
return val;
}
};
} // namespace std.
int main(int argc, char** argv) {
std::default_random_engine generator;
std::uniform_int_distribution random_z(1,24000);
std::uniform_real_distribution random_vote(0.0, 1.0);
std::unordered_map votes;
int total_accesses = 0;
boost::posix_time::ptime start =
boost::posix_time::microsec_clock::local_time();
for (int i = 0; i 0.8) {
votes[key] += this_vote; // This is what is slow.
++total_accesses;
}
}
}
}
}
if ((i + 1) % 1000 == 0) {
boost::posix_time::ptime now =
boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration diff = now - start;
printf("%d / %d : Milliseconds per access: %f\
std::vector > > > >each access is about 0.0001 ms. Using an
unordered_map implementation, each access is about 0.001 ms, 10x slower. What can I do to reduce the access time for this unordered_map? I'm doing a lot of insertions and modifications of existing entries.```
#include
#include
#include
#include "boost/functional/hash.hpp"
#include "boost/date_time/posix_time/posix_time.hpp"
struct Key {
Key(int a, int b, int x, int y, int z) :
a_(a), b_(b), x_(x), y_(y), z_(z) {
}
bool operator==(const Key& other) const {
return (a_ == other.a_ &&
b_ == other.b_ &&
x_ == other.x_ &&
y_ == other.y_ &&
z_ == other.z_);
}
int a_, b_, x_, y_, z_;
};
namespace std {
template <>
struct hash {
typedef Key argument_type;
typedef std::size_t result_type;
result_type operator()(const Key& key) const {
std::size_t val = 0;
boost::hash_combine(val, key.a_);
boost::hash_combine(val, key.b_);
boost::hash_combine(val, key.x_);
boost::hash_combine(val, key.y_);
boost::hash_combine(val, key.z_);
return val;
}
};
} // namespace std.
int main(int argc, char** argv) {
std::default_random_engine generator;
std::uniform_int_distribution random_z(1,24000);
std::uniform_real_distribution random_vote(0.0, 1.0);
std::unordered_map votes;
int total_accesses = 0;
boost::posix_time::ptime start =
boost::posix_time::microsec_clock::local_time();
for (int i = 0; i 0.8) {
votes[key] += this_vote; // This is what is slow.
++total_accesses;
}
}
}
}
}
if ((i + 1) % 1000 == 0) {
boost::posix_time::ptime now =
boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration diff = now - start;
printf("%d / %d : Milliseconds per access: %f\
Solution
Firstly, it's somewhat hard to say, because you haven't given the code you are using with
Secondly, this is where we grab our trusty profiler to see what exactly is taking up so much time. Compiling with
Which, if you squint hard enough at, becomes clear that
The actual
Let's switch over to a regular
Now we have the same 14311307 calls to insert, which take up 4.26% of the time at 0.46s. However, there is not the huge overhead of having to rehash the map, hence this is actually quite a bit quicker.
The
and with
Since you're already using
You can also save a bit of time by not constructing the
If you have an up to date version of
This may or may not be slower. I can't test it, but my gut feeling is it will be slower because
Basically, the final advice is to profile. Further, you're unlikely to get access times equivalent to a
vector. Are you keeping it sorted and doing some kind of binary search to find the element, or is this just raw vector access times where you don't care about adding this_vote to a specific key? If so, of course raw vector access with something like push_back will be much faster, due to locality of reference, not requiring the computation of a hash function for each Key, and so on.Secondly, this is where we grab our trusty profiler to see what exactly is taking up so much time. Compiling with
-O3 and running gives this:% cumulative self self total
time seconds seconds calls ms/call ms/call name
79.19 26.15 26.15 165 158.48 158.48 std::_Hashtable, std::allocator >, std::_Select1st >, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_rehash(unsigned int, std::pair const&)Which, if you squint hard enough at, becomes clear that
_M_rehash is the culprit, taking almost 80% of the time. Unfortunately, I'm running this on GCC 4.7, which has a known bug that slows down unordered_map quite a lot: it's on the mailing list if you're interested. Further, reserve doesn't fix this problem. If you're using GCC 4.7, unordered_map will be slow.The actual
insert call is called 14311307 times, and accounts for only 0.64% of the time at 0.21s:0.64 32.88 0.21 14311307 0.00 0.00 std::__detail::_Node_iterator, false, true> std::_Hashtable, std::allocator >, std::_Select1st >, std::equal_to, std::hash, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_insert_bucket >(std::pair&&, unsigned int, unsigned int)Let's switch over to a regular
std::map:4.26 10.10 0.46 14311307 0.00 0.00 std::_Rb_tree_iterator > std::_Rb_tree, std::_Select1st >, std::less, std::allocator > >::_M_insert_unique_ >(std::_Rb_tree_const_iterator >, std::pair&&)Now we have the same 14311307 calls to insert, which take up 4.26% of the time at 0.46s. However, there is not the huge overhead of having to rehash the map, hence this is actually quite a bit quicker.
The
std::map code is exactly the same, except with a specialization of std::less:#include
#include
#include
namespace std
{
template <>
struct less
{
bool operator()(const Key& k1, const Key& k2) const
{
return std::tie(k1.a_, k1.b_, k1.x_, k1.y_, k1.z_) <
std::tie(k2.a_, k2.b_, k2.x_, k2.y_, k2.z_);
}
};
}and with
votes being declared as:std::map votes;Since you're already using
boost, why not try boost::unordered_map instead?#include "boost/unordered_map.hpp"
std::size_t hash_value(const Key& key)
{
typedef Key argument_type;
typedef std::size_t result_type;
std::size_t val = 0;
boost::hash_combine(val, key.a_);
boost::hash_combine(val, key.b_);
boost::hash_combine(val, key.x_);
boost::hash_combine(val, key.y_);
boost::hash_combine(val, key.z_);
return val;
}
boost::unordered_map> votes;You can also save a bit of time by not constructing the
Key if it isn't needed, and passing an rvalue instead of an lvalue reference:float this_vote = random_vote(generator);
if (this_vote > 0.8) {
votes[Key(a, b, x, y, z)] += this_vote;
++total_accesses;
}If you have an up to date version of
boost, you might also try using emplace:if (this_vote > 0.8) {
auto val = votes.emplace(a, b, x, y, z);
if(!val.second)
*(val.first) += this_vote;This may or may not be slower. I can't test it, but my gut feeling is it will be slower because
Key is a thin object. If it contained more than a few integers, this would possibly be faster.Basically, the final advice is to profile. Further, you're unlikely to get access times equivalent to a
vector for anything else, as vector is guaranteed to be contiguous, hence will have good locality of reference and likely won't require as many dereferences as any other container.Code Snippets
% cumulative self self total
time seconds seconds calls ms/call ms/call name
79.19 26.15 26.15 165 158.48 158.48 std::_Hashtable<Key, std::pair<Key const, float>, std::allocator<std::pair<Key const, float> >, std::_Select1st<std::pair<Key const, float> >, std::equal_to<Key>, std::hash<Key>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_rehash(unsigned int, std::pair<unsigned int, unsigned int> const&)0.64 32.88 0.21 14311307 0.00 0.00 std::__detail::_Node_iterator<std::pair<Key const, float>, false, true> std::_Hashtable<Key, std::pair<Key const, float>, std::allocator<std::pair<Key const, float> >, std::_Select1st<std::pair<Key const, float> >, std::equal_to<Key>, std::hash<Key>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, true, false, true>::_M_insert_bucket<std::pair<Key, float> >(std::pair<Key, float>&&, unsigned int, unsigned int)4.26 10.10 0.46 14311307 0.00 0.00 std::_Rb_tree_iterator<std::pair<Key const, float> > std::_Rb_tree<Key, std::pair<Key const, float>, std::_Select1st<std::pair<Key const, float> >, std::less<Key>, std::allocator<std::pair<Key const, float> > >::_M_insert_unique_<std::pair<Key const, float> >(std::_Rb_tree_const_iterator<std::pair<Key const, float> >, std::pair<Key const, float>&&)#include <map>
#include <tuple>
#include <functional>
namespace std
{
template <>
struct less<Key>
{
bool operator()(const Key& k1, const Key& k2) const
{
return std::tie(k1.a_, k1.b_, k1.x_, k1.y_, k1.z_) <
std::tie(k2.a_, k2.b_, k2.x_, k2.y_, k2.z_);
}
};
}std::map<Key, float> votes;Context
StackExchange Code Review Q#27013, answer score: 6
Revisions (0)
No revisions yet.