principlecppMinor

Using includes() vs. hash function to check if an unordered_set is a subset of another unordered_set

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

stl codereview cpp c++stackoverflow c++11 performance set

includesfunctionhashusingunordered_setanotherchecksubset

Problem

I did two methods to determine if an unordered_set is a subset of another unordered_set, using hashing, and the built-in includes().

I want to see if using a hash function will improve the run time, but I've noticed that there is no really big different in their time, so I start wondering, if what I did is the right idea of hashing!

Could you please look at my code to see if I did the hashing correctly?

```
#include
#include
#include
#include
#include
#include

struct Sstruct {
int Iv;
float Fv1;
float Fv2;
};

bool operator==(Sstruct a, Sstruct b) { return a.Iv == b.Iv; }

struct MyHash {
size_t operator()(const Sstruct& x) const { return std::hash()(x.Iv); }
};

bool IsIn(const std::unordered_set& S1, const std::unordered_map >::const_iterator& iter)
{
for (std::unordered_set::const_iterator iter2 = iter->second.begin(); iter2 != iter->second.end(); ++iter2)

if (S1.find(*iter2) == S1.end())
{
return false;
}

return true;

}

int main()
{

std::unordered_set S1{ { 1, 0.9, 0.1 }, { 2, 0.8, 0.2 }, { 3, 0.9, 0.01 }, { 4, 0.7, 0.5 }, { 7, 0.7, 0.2 }, { 8, 0.1, 0.5 } };

std::unordered_map > S2;

S2[0].insert({ { 1 }, { 2 }, { 3} }); // S2 is a subset of S1
S2[1].insert({ { 3}, { 4 }, { 5} }); // S2 is a subset of S1
S2[3].insert({ { 2 }, { 4 } }); // S2 is a subset of S1
S2[4].insert({ { 3 }, { 4 }}); // S2 is a subset of S1
S2[5].insert({ { 3 }, { 5 } }); // S2 is a subset of S1
S2[6].insert({ { 1 }, { 2 }, { 8 } }); // S2 is a subset of S1
S2[7].insert({ { 3 }, { 4 }, { 6 } }); // S2 is not a subset of S1
S2[8].insert({ { 2 }, {7 } }); // S2 is a subset of S1
S2[9].insert({ { 7 }, { 8 } }); // S2 is a subset of S1
S2[10].insert({ { 5 }, { 6 } }); // S2 is not a subset of S1

std::unordered_set SetOne;

for (std::unordered_set ::iterator iter = S1.begin(); iter != S1.end(); ++iter)
SetOne.insert(iter->Iv);

float time

Solution

There are several things that are problematic:

Broken code

The version using std::includes is actually broken. When you read up on std::includes you will find that it requires the ranges it is operating on to be ordered by some comparison function (defaults to

operator

too short measurement period: This is somewhat connected the previous one. If the runtime of an algorithm is very short (close to the time measurement granularity limit) there will be big steps for very little change so one would try to increase the runtime (again my doing multiple runs)



too little problem size: the effect of hashing is the advantage of \$O(1)\$ element access compared to \$O(log n)\$ (for trees) or even \$O(n)\$ for linear structures. However, these are only asymptotic complexities which become meaningful as n grows towards infinity. For the little problem sizes you are presenting here they hardly matter and are dominated by surrounding (constant complexity) code



you print results during measuring which will most likely account for most of the time you are measuring because printing is very slow.



C++ 11

If you've got C++11 then why not use the whole package?

Use range-based for to clean up messy iterator loops:

for (std::unordered_map >::const_iterator SetTwo = S2.begin(); SetTwo != S2.end(); ++SetTwo)


can become

for (auto const &element: S2)


However notice that

element

 is already the element pointed to by the dereferenced iterator.

Function Interface

Now that we got away from the iterator we should clean up the interface of

IsIn

. This function is far too specialized by taking an iterator. Instead take the type you want directly:

bool IsIn(const std::unordered_set& haystack, std::unordered_set& needles) {
    for (auto const& needle: needles)
        if(haystack.find(needle) == haystack.end()) 
            return false;
    return true;
}


I renamed the variables to make their intention more clearly (though they are not fitting for the current task). The function should probably be renamed too.

Unused Variable

What is the intention of variable

S1? You initialize it and then just copy some of its values into SetOne. You could have just initialized SetOne

 and be done.

Wrong container type

Using an

unordered_map for S2

 is totally overkill. You have a continuous range of indices starting at 0 and it seems like you want to iterate this in exactly this order. You can use a

vector

 for this and initialize it with the desired values:

std::vector> S2{
    { { 1 }, { 2 }, { 3} }, //  S2 is a subset of S1
    { { 3}, { 4 }, { 5} }, // S2 is a subset of S1
    { { 2 }, { 4 } }, // S2 is  a subset of S1
    { { 3 }, { 4 }}, // S2 is  a subset of S1
    { { 3 },  { 5 } }, // S2 is a subset of S1
    { { 1 }, { 2 }, { 8 } }, //  S2 is a subset of S1
    { { 3 }, { 4 }, { 6 } }, // S2 is not a subset of S1
    { 2 }, {7 } }, // S2 is  a subset of S1
    { 7 }, { 8 } }, // S2 is  a subset of S1
    { 5 }, { 6 } }, // S2 is not a subset of S1
};


I would propose renaming

S2 to testSets

 or some other name that indicates that there are multiple sets that should be tested. It may even be advisable to create two vectors, one with subsets and one that has no subsets.

Inadequate loop

In your second method you are using a

while` loop instead of a for loop. Benchmarkwise this might skew your results because you don't want to measure differences in the loops, so you should be using the same code there.

The for loop is clearly the better alternative here because you intend to loop over the whole container anyways.

Code Snippets

for (std::unordered_map <int, std::unordered_set<int>>::const_iterator SetTwo = S2.begin(); SetTwo != S2.end(); ++SetTwo)

for (auto const &element: S2)

bool IsIn(const std::unordered_set<int>& haystack, std::unordered_set<int>& needles) {
    for (auto const& needle: needles)
        if(haystack.find(needle) == haystack.end()) 
            return false;
    return true;
}

std::vector<std::unordered_set<int>> S2{
    { { 1 }, { 2 }, { 3} }, //  S2 is a subset of S1
    { { 3}, { 4 }, { 5} }, // S2 is a subset of S1
    { { 2 }, { 4 } }, // S2 is  a subset of S1
    { { 3 }, { 4 }}, // S2 is  a subset of S1
    { { 3 },  { 5 } }, // S2 is a subset of S1
    { { 1 }, { 2 }, { 8 } }, //  S2 is a subset of S1
    { { 3 }, { 4 }, { 6 } }, // S2 is not a subset of S1
    { 2 }, {7 } }, // S2 is  a subset of S1
    { 7 }, { 8 } }, // S2 is  a subset of S1
    { 5 }, { 6 } }, // S2 is not a subset of S1
};

Context

StackExchange Code Review Q#54326, answer score: 7

Revisions (0)

No revisions yet.