HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppModerate

Dividing a range into N sub-ranges

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rangeintosubrangesdividing

Problem

Introduction

The function divides a range given by a begin_iterator and an end_iterator into n sub-ranges of the same format, where the sub-ranges must be as equally sized as possible.

Example

Consider a set of size 10:


\$\{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 \}\$

The output would be is a set of n = 3 sub-ranges:


\$\{ \{ 0, 3 \}, \{ 4, 6 \}, \{ 7, 9 \} \}\$

Where each sub-set is as even as possible, in the previous example the sets have sizes 4, 3, and 3, respectively; that is the most even way to distribute 10 elements into 3 sub-sets.

Review goals

  • Efficiency



  • Correctness



Implementation

The oringinal code contained an issue which I missed by pure luck. Here is the fixed working code; the changes do not invalidate the present answer.

```
#include // std::vector
#include // std::distance(), std::next()

template
decltype( auto ) divide_work( Iterator begin, Iterator end, std::size_t n )
{
using size_type = std::size_t;

auto sz = std::distance( begin, end );
auto element_per_group_count = sz / n;
auto remaining_element_count = sz % n;

std::vector> ranges;
ranges.reserve( n );

if ( remaining_element_count == 0 )
{
--element_per_group_count;

ranges.emplace_back( std::make_pair( std::next( begin, 0 ),
std::next( begin, element_per_group_count ) ) );

for ( auto i = 1U; i < n; ++i )
{
auto first = std::next( ranges[ i - 1 ].second, 1 );
ranges.emplace_back( std::make_pair( first,
std::next( first, element_per_group_count ) ) );
}
}
else
{
ranges.emplace_back( std::make_pair( std::next( begin, 0 ),
std::next( begin, element_per_group_count ) ) );

for ( auto i = 1U; i < remaining_element_count; ++i )
{
auto first = std::next( ranges[ i - 1 ].second, 1 );
ranges.emplace_back( std::make_pair( first,
std::next( first, element_per_group

Solution

Misleading Return Type

You have:

template 
decltype( auto ) divide_work( Iterator begin, Iterator end, std::size_t n )


The difference between auto and decltype(auto) is that the latter can deduce reference types as well. But you're returning a value, always, regardless of Iterator. This choice makes it seem as if you could be returning a reference. If you are going to use a placeholder return type here, use simply auto.

Even better, the return type for this function is not a complicated expression that's difficult to spell. It's simply std::vector>, so let's just use that:

template 
std::vector>
divide_work( Iterator begin, Iterator end, std::size_t n )


emplace_back

The advantage of emplace_back is not having to actually construct the object to pass in. So when you write:

ranges.emplace_back( std::make_pair( std::next( begin, 0 ),
            std::next( begin, element_per_group_count ) ) );


That's inefficient (you're copy constructing a pair instead of in-place constructing it) and just a bunch of extra typing. Also next(iter, 0) is iter, so the above line is logically exactly equivalent to:

ranges.emplace_back(begin, std::next(begin, element_per_group_count));


Overly Complicated Algorithm!

I'm not sure why you have three for loops or why even division is a special case of the algorithm... Also I'm pretty sure that special case is wrong. Consider a range of size 15 dividing into 3 chunks, so it divides evenly. We'd start with element_per_group_count == 5, but then we decrement it to 4, and add the first range of:

[0, 4)


Then the next two ranges will be:

[5, 9)
[10, 14)


We successfully divided into three chunks, but we're actually missing elements! 4, 9, and 14 will not appear in any of the three chunks. The correct division would be [[0, 5), [5, 10), [10, 15)].

We're just chunking the range (begin, end) into n pieces. That implies a simple for-loop. We need n things. Each one is approximately of size distance(begin, end)/n. There will be distance(begin, end)%n remainder, which we will allocate one at a time as we go until we run out of remainder. When we already added n-1 things, we just can add the rest as the last chunk (since there's nothing else to do at that point).

This is all you need:

template 
std::vector>
divide_work( Iterator begin, Iterator end, std::size_t n )
{
    std::vector> ranges;
    ranges.reserve(n);

    auto dist = std::distance(begin, end);
    auto chunk = dist / n;
    auto remainder = dist % n;

    for (size_t i = 0; i < n-1; ++i) {
        auto next_end = std::next(begin, chunk + (remainder ? 1 : 0));
        ranges.emplace_back(begin, next_end);

        begin = next_end;
        if (remainder) remainder -= 1;
    }

    // last chunk
    ranges.emplace_back(begin, end);
    return ranges;
}


Dividing up the remainder one at a time means that if we have 15 elements into 6 chunks, we end up with sizes of [3, 3, 3, 2, 2, 2] instead of any other weird alternative like the naive [2, 2, 2, 2, 2, 5].

Too many chunks? None?

What do you do in the case where n > distance(begin, end) or n == 0? I guess we should probably add some error checks:

template 
std::vector>
divide_work( Iterator begin, Iterator end, std::size_t n )
{
    std::vector> ranges;
    if (n == 0) return ranges;
    ranges.reserve(n);

    auto dist = std::distance(begin, end);
    n = std::min(n, dist);
    auto chunk = dist / n;
    auto remainder = dist % n;

    for (size_t i = 0; i < n-1; ++i) {
        auto next_end = std::next(begin, chunk + (remainder ? 1 : 0));
        ranges.emplace_back(begin, next_end);

        begin = next_end;
        if (remainder) remainder -= 1;
    }

    // last chunk
    ranges.emplace_back(begin, end);
    return ranges;
}

Code Snippets

template <typename Iterator>
decltype( auto ) divide_work( Iterator begin, Iterator end, std::size_t n )
template <typename Iterator>
std::vector<std::pair<Iterator, Iterator>>
divide_work( Iterator begin, Iterator end, std::size_t n )
ranges.emplace_back( std::make_pair( std::next( begin, 0 ),
            std::next( begin, element_per_group_count ) ) );
ranges.emplace_back(begin, std::next(begin, element_per_group_count));
[5, 9)
[10, 14)

Context

StackExchange Code Review Q#106773, answer score: 10

Revisions (0)

No revisions yet.