HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Counting number of unique elements in a sorted container

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
uniquenumbercountingcontainerelementssorted

Problem

Similar to this question, but in C++. Although my priority is performance (and correctness!) rather than readability.

#include 
#include 
#include 

template 
size_t count_unique(const Container& elements)
{
    size_t result {};

    for (auto it = std::cbegin(elements), last = std::cend(elements); it != last;) {
        ++result;
        it = std::upper_bound(it, last, *it);
    }

    return result;
}

Solution

Worst case is \$O(n \log n)\$

If we pass an array of completely unique elements, we get a pretty bad runtime. Your solution is good if we expect a lot of duplicates, but bad if we don't.

As an optimization for the worst case, we could add some exponential bounding to find the next element. Basically, replace

it = std::upper_bound(it, last, *it);


with:

std::ptrdiff_t bound = 1;
while (std::distance(it, last) < bound && *std::next(it, bound) == *it) {
    bound *= 2;
}

it = std::upper_bound(
    std::next(it, bound/2), 
    std::next(it, std::min(bound, std::distance(it, last))),
    *it);


Of course is now pretty bad for non-random access iterators since we have to keep doing lots of distance and advance related calls, so I would simply default to the trivial comparison algorithm in that case. That is:

template 
size_t count_unique(C const& container) {
    using It = decltype(std::cbegin(container));
    return count_unique(container, std::iterator_traits::iterator_category{});
}

template 
size_t count_unique(C const& container, std::random_access_iterator_tag ) {
   // what I suggested
}

template 
size_t count_unique(C const& container, std::forward_iterator_tag ) {
    auto first = std::cbegin(container), last = std::cend(container);
    if (first == last) return 0;

    size_t count = 1;
    for (auto next = std::next(first); next != last; ++first, ++next)
    {
        count += !(*next == *first);
    }
    return count; 
}


Additional Requirements

The binary-search optimization requires that the type the container holds is ordered. You may want to fallback to the simple forward-iterator version if the type is only equality-comparable but not less-than-comparable.

Initialization

I would prefer:

size_t result = 0;


to

size_t result {};


No reason to hide the 0.

Code Snippets

it = std::upper_bound(it, last, *it);
std::ptrdiff_t bound = 1;
while (std::distance(it, last) < bound && *std::next(it, bound) == *it) {
    bound *= 2;
}

it = std::upper_bound(
    std::next(it, bound/2), 
    std::next(it, std::min(bound, std::distance(it, last))),
    *it);
template <class C>
size_t count_unique(C const& container) {
    using It = decltype(std::cbegin(container));
    return count_unique(container, std::iterator_traits<It>::iterator_category{});
}

template <class C>
size_t count_unique(C const& container, std::random_access_iterator_tag ) {
   // what I suggested
}

template <class C>
size_t count_unique(C const& container, std::forward_iterator_tag ) {
    auto first = std::cbegin(container), last = std::cend(container);
    if (first == last) return 0;

    size_t count = 1;
    for (auto next = std::next(first); next != last; ++first, ++next)
    {
        count += !(*next == *first);
    }
    return count; 
}
size_t result = 0;
size_t result {};

Context

StackExchange Code Review Q#116229, answer score: 8

Revisions (0)

No revisions yet.