principlecppMinor
Using includes() vs. hash function to check if an unordered_set is a subset of another unordered_set
Viewed 0 times
includesfunctionhashusingunordered_setanotherchecksubset
Problem
I did two methods to determine if an
I want to see if using a hash function will improve the run time, but I've noticed that there is no really big different in their time, so I start wondering, if what I did is the right idea of hashing!
Could you please look at my code to see if I did the hashing correctly?
```
#include
#include
#include
#include
#include
#include
struct Sstruct {
int Iv;
float Fv1;
float Fv2;
};
bool operator==(Sstruct a, Sstruct b) { return a.Iv == b.Iv; }
struct MyHash {
size_t operator()(const Sstruct& x) const { return std::hash()(x.Iv); }
};
bool IsIn(const std::unordered_set& S1, const std::unordered_map >::const_iterator& iter)
{
for (std::unordered_set::const_iterator iter2 = iter->second.begin(); iter2 != iter->second.end(); ++iter2)
if (S1.find(*iter2) == S1.end())
{
return false;
}
return true;
}
int main()
{
std::unordered_set S1{ { 1, 0.9, 0.1 }, { 2, 0.8, 0.2 }, { 3, 0.9, 0.01 }, { 4, 0.7, 0.5 }, { 7, 0.7, 0.2 }, { 8, 0.1, 0.5 } };
std::unordered_map > S2;
S2[0].insert({ { 1 }, { 2 }, { 3} }); // S2 is a subset of S1
S2[1].insert({ { 3}, { 4 }, { 5} }); // S2 is a subset of S1
S2[3].insert({ { 2 }, { 4 } }); // S2 is a subset of S1
S2[4].insert({ { 3 }, { 4 }}); // S2 is a subset of S1
S2[5].insert({ { 3 }, { 5 } }); // S2 is a subset of S1
S2[6].insert({ { 1 }, { 2 }, { 8 } }); // S2 is a subset of S1
S2[7].insert({ { 3 }, { 4 }, { 6 } }); // S2 is not a subset of S1
S2[8].insert({ { 2 }, {7 } }); // S2 is a subset of S1
S2[9].insert({ { 7 }, { 8 } }); // S2 is a subset of S1
S2[10].insert({ { 5 }, { 6 } }); // S2 is not a subset of S1
std::unordered_set SetOne;
for (std::unordered_set ::iterator iter = S1.begin(); iter != S1.end(); ++iter)
SetOne.insert(iter->Iv);
float time
unordered_set is a subset of another unordered_set, using hashing, and the built-in includes().I want to see if using a hash function will improve the run time, but I've noticed that there is no really big different in their time, so I start wondering, if what I did is the right idea of hashing!
Could you please look at my code to see if I did the hashing correctly?
```
#include
#include
#include
#include
#include
#include
struct Sstruct {
int Iv;
float Fv1;
float Fv2;
};
bool operator==(Sstruct a, Sstruct b) { return a.Iv == b.Iv; }
struct MyHash {
size_t operator()(const Sstruct& x) const { return std::hash()(x.Iv); }
};
bool IsIn(const std::unordered_set& S1, const std::unordered_map >::const_iterator& iter)
{
for (std::unordered_set::const_iterator iter2 = iter->second.begin(); iter2 != iter->second.end(); ++iter2)
if (S1.find(*iter2) == S1.end())
{
return false;
}
return true;
}
int main()
{
std::unordered_set S1{ { 1, 0.9, 0.1 }, { 2, 0.8, 0.2 }, { 3, 0.9, 0.01 }, { 4, 0.7, 0.5 }, { 7, 0.7, 0.2 }, { 8, 0.1, 0.5 } };
std::unordered_map > S2;
S2[0].insert({ { 1 }, { 2 }, { 3} }); // S2 is a subset of S1
S2[1].insert({ { 3}, { 4 }, { 5} }); // S2 is a subset of S1
S2[3].insert({ { 2 }, { 4 } }); // S2 is a subset of S1
S2[4].insert({ { 3 }, { 4 }}); // S2 is a subset of S1
S2[5].insert({ { 3 }, { 5 } }); // S2 is a subset of S1
S2[6].insert({ { 1 }, { 2 }, { 8 } }); // S2 is a subset of S1
S2[7].insert({ { 3 }, { 4 }, { 6 } }); // S2 is not a subset of S1
S2[8].insert({ { 2 }, {7 } }); // S2 is a subset of S1
S2[9].insert({ { 7 }, { 8 } }); // S2 is a subset of S1
S2[10].insert({ { 5 }, { 6 } }); // S2 is not a subset of S1
std::unordered_set SetOne;
for (std::unordered_set ::iterator iter = S1.begin(); iter != S1.end(); ++iter)
SetOne.insert(iter->Iv);
float time
Solution
There are several things that are problematic:
Broken code
The version using
The for loop is clearly the better alternative here because you intend to loop over the whole container anyways.
Broken code
The version using
std::includes is actually broken. When you read up on std::includes you will find that it requires the ranges it is operating on to be ordered by some comparison function (defaults to operator
- too short measurement period: This is somewhat connected the previous one. If the runtime of an algorithm is very short (close to the time measurement granularity limit) there will be big steps for very little change so one would try to increase the runtime (again my doing multiple runs)
- too little problem size: the effect of hashing is the advantage of \$O(1)\$ element access compared to \$O(log n)\$ (for trees) or even \$O(n)\$ for linear structures. However, these are only asymptotic complexities which become meaningful as n grows towards infinity. For the little problem sizes you are presenting here they hardly matter and are dominated by surrounding (constant complexity) code
- you print results during measuring which will most likely account for most of the time you are measuring because printing is very slow.
C++ 11
If you've got C++11 then why not use the whole package?
Use range-based for to clean up messy iterator loops:
for (std::unordered_map >::const_iterator SetTwo = S2.begin(); SetTwo != S2.end(); ++SetTwo)
can become
for (auto const &element: S2)
However notice that element is already the element pointed to by the dereferenced iterator.
Function Interface
Now that we got away from the iterator we should clean up the interface of IsIn. This function is far too specialized by taking an iterator. Instead take the type you want directly:
bool IsIn(const std::unordered_set& haystack, std::unordered_set& needles) {
for (auto const& needle: needles)
if(haystack.find(needle) == haystack.end())
return false;
return true;
}
I renamed the variables to make their intention more clearly (though they are not fitting for the current task). The function should probably be renamed too.
Unused Variable
What is the intention of variable S1? You initialize it and then just copy some of its values into SetOne. You could have just initialized SetOne and be done.
Wrong container type
Using an unordered_map for S2 is totally overkill. You have a continuous range of indices starting at 0 and it seems like you want to iterate this in exactly this order. You can use a vector for this and initialize it with the desired values:
std::vector> S2{
{ { 1 }, { 2 }, { 3} }, // S2 is a subset of S1
{ { 3}, { 4 }, { 5} }, // S2 is a subset of S1
{ { 2 }, { 4 } }, // S2 is a subset of S1
{ { 3 }, { 4 }}, // S2 is a subset of S1
{ { 3 }, { 5 } }, // S2 is a subset of S1
{ { 1 }, { 2 }, { 8 } }, // S2 is a subset of S1
{ { 3 }, { 4 }, { 6 } }, // S2 is not a subset of S1
{ 2 }, {7 } }, // S2 is a subset of S1
{ 7 }, { 8 } }, // S2 is a subset of S1
{ 5 }, { 6 } }, // S2 is not a subset of S1
};
I would propose renaming S2 to testSets or some other name that indicates that there are multiple sets that should be tested. It may even be advisable to create two vectors, one with subsets and one that has no subsets.
Inadequate loop
In your second method you are using a while` loop instead of a for loop. Benchmarkwise this might skew your results because you don't want to measure differences in the loops, so you should be using the same code there.The for loop is clearly the better alternative here because you intend to loop over the whole container anyways.
Code Snippets
for (std::unordered_map <int, std::unordered_set<int>>::const_iterator SetTwo = S2.begin(); SetTwo != S2.end(); ++SetTwo)for (auto const &element: S2)bool IsIn(const std::unordered_set<int>& haystack, std::unordered_set<int>& needles) {
for (auto const& needle: needles)
if(haystack.find(needle) == haystack.end())
return false;
return true;
}std::vector<std::unordered_set<int>> S2{
{ { 1 }, { 2 }, { 3} }, // S2 is a subset of S1
{ { 3}, { 4 }, { 5} }, // S2 is a subset of S1
{ { 2 }, { 4 } }, // S2 is a subset of S1
{ { 3 }, { 4 }}, // S2 is a subset of S1
{ { 3 }, { 5 } }, // S2 is a subset of S1
{ { 1 }, { 2 }, { 8 } }, // S2 is a subset of S1
{ { 3 }, { 4 }, { 6 } }, // S2 is not a subset of S1
{ 2 }, {7 } }, // S2 is a subset of S1
{ 7 }, { 8 } }, // S2 is a subset of S1
{ 5 }, { 6 } }, // S2 is not a subset of S1
};Context
StackExchange Code Review Q#54326, answer score: 7
Revisions (0)
No revisions yet.