HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Copy range up to delimiter

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
rangedelimitercopy

Problem

The problem I was trying to solve sounded pretty basic, but turned out to be a bit hairy.

I need to select some lines of arbitrary length from an input stream (file or standard input) and print them to output. In other words, a function that just echos the line would do the job. The input stream is encoded in UTF-8. For reasons that I don't quite understand, the normal ` mangles the stream, so I couldn't use:

std::string line;
std::readline(line);
std::cout << line;


I did solve it using
basic_istream::getline(); however, this function writes to a buffer, and I need to tell it how big the buffer is. To make it deal with lines of arbitrary length, I need to read to a buffer in a loop, checking whether failbit is set (when the end of the line wasn't reached).

Then, the easier solution was to use iterators on the input and the output stream, like this:

template
I copy_until(I first, I last, O d_first, T delim)
{
    while (first != last && *first != delim) {
        *d_first = *first;
        ++first;
        ++d_first;
    }
    return first;
}


This copies from
first up to last at most, and stops if the delimiter is encountered. It returns a pointer to the first character in the input range that was not copied.

It can be used like this to copy all input to output:

int main()
{
    std::istreambuf_iterator first(std::cin);
    std::istreambuf_iterator last;
    std::ostreambuf_iterator d_first(std::cout);

    while (first != last) {
        first = copy_until(first, last, d_first, '\n');
        std::cout.put(*first);
        ++first;
    }
}


The third argument of
copy_until could of course be a unary predicate, then instead of while (first != last && first != delim) it would say while (first != last && !p(first)).

More important questions:

  • Is there an algorithm in that I have overlooked? copy_if and partition_point come close, but they don't do exactly that, and partition_point followed by copy_n` will no

Solution

The third argument of copy_until could of course be a unary predicate, then instead of while (first != last && first != delim) it would say while (first != last && !p(first)).

Even better: remove the I last argument entirely! Just use the predicate for the whole thing, and the condition becomes while (!p(first)).

You're returning I (the place you stopped parsing the input stream), but similar functions in the STL return O (the place they stopped outputting to the output stream). The latter is more useful in STL cases, because typically the caller knows where he stopped parsing the input: it's last!

In your case, both I and O are useful; so why not return both?

template
std::pair copy_until(I first, O d_first, Pred pred)
{
    while (!pred(first)) {
        *d_first = *first;
        ++first;
        ++d_first;
    }
    return {first, d_first};
}


Your main thus becomes:

int main()
{
    std::istreambuf_iterator first(std::cin);
    std::istreambuf_iterator last;
    std::ostreambuf_iterator d_first(std::cout);

    while (first != last) {
        std::tie(first, std::ignore) = copy_until(
            first, d_first, [=](auto it){ return it == last || *it == '\n'; }
        );
        if (first == last) break;  // Whoops!
        std::cout.put(*first);
        ++first;
    }
}


The // Whoops! comment indicates the fix for a crashing bug in your original main.

Boost.Algorithm actually has both copy_until and its inverse, copy_while: http://marshall.calepin.co/copy_while.html
Strangely, Boost's version takes last as a parameter exactly the way yours does. In Boost's case, I think that might be because they're trying to target C++03, which doesn't have lambdas, so there's no way that's both easy and lightweight to express last { return it != last && *it != delim; } — in C++03 you had to do that with boost::bind and maybe boost::function as well.


I am not sure how to declare that the two iterators need to point to the same type as the type of the delimiter or the argument to the unary predicate.

The short answer is "don't"; complicated dependencies like that usually indicate that you've put the abstraction boundary in the wrong place (e.g. in this example, passing First, Last, Delim instead of First, Pred). But just for the record, you could do something crazy with enable_if, which can go on the return type of the function or on one of its parameters or on one of its template parameters, your choice.

auto copy_until(I first, I last, O d_first, T delim)
    -> std::enable_if_t, I>
{
    // ...
}


I'm pretty sure you don't want to assert that decltype(*d_first) is T, because that would prevent you from using something like a back_insert_iterator> with this algorithm.

Code Snippets

template<typename I, typename O, typename Pred>
std::pair<I, O> copy_until(I first, O d_first, Pred pred)
{
    while (!pred(first)) {
        *d_first = *first;
        ++first;
        ++d_first;
    }
    return {first, d_first};
}
int main()
{
    std::istreambuf_iterator<char> first(std::cin);
    std::istreambuf_iterator<char> last;
    std::ostreambuf_iterator<char> d_first(std::cout);

    while (first != last) {
        std::tie(first, std::ignore) = copy_until(
            first, d_first, [=](auto it){ return it == last || *it == '\n'; }
        );
        if (first == last) break;  // Whoops!
        std::cout.put(*first);
        ++first;
    }
}
auto copy_until(I first, I last, O d_first, T delim)
    -> std::enable_if_t<std::is_same_v<decltype(*first), T>, I>
{
    // ...
}

Context

StackExchange Code Review Q#138626, answer score: 5

Revisions (0)

No revisions yet.