HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Lazy String splitter in C++

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lazystringsplitter

Problem

I created a string splitter in C++. It splits a string in a lazy fashion and makes use of forward iterators to sequentially supply the next token from the string until it runs out of tokens.

```
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

class StringSplitter {

template
class SplitIterator: public std::iterator::type> {

// The type of the elements to be iterated over
using Type = typename std::iterator_traits::value_type;
using UnqualifiedType = typename std::remove_const::type;

friend class StringSplitter;

const StringSplitter& split;
UnqualifiedType token;
std::smatch results;
typename Type::const_iterator pos, fin;

/**
* @brief This constructor takes a StringSplitter reference to make use of
*
* @param splitter The underlying StringSplitter object which
* contains the string to split
*/
SplitIterator (const StringSplitter& splitter) :split(splitter) {
this->pos = splitter.string_to_split.begin();
this->fin = splitter.string_to_split.end();
this->prepare_next_token();
std::cout pos = split.string_to_split.begin();
this->fin = split.string_to_split.end();
this->prepare_next_token();
std::cout pos == this->fin) {
this->token = "";
return;
}

if (!std::regex_search(this->pos, this->fin, this->results, this->split.pattern,
std::regex_constants::match_not_null)) {
// If the end of the string is reached without finding a delimiter
// then copy the rest of the string and set pos to fin
if (this->results.suffix().length() == 0) {
this->token = std::string(pos, fin);
this->pos = this->fin;
// otherwise

Solution

I'm going to focus on your iterator first.

The way you've templated it is a little strange. Typically you'll just template it as a ` and then give either std::string or const std::string as the type.

Also, you almost never see
this in C++ code - using it means you're either shadowing a member variable with a parameter (don't do this) or it is completely redundant (don't do this either).

You should also prefer colon initialization as well whenever possible.

I don't like that you have all of these debug statements printing to stdout, and all the time. For this situation I'd probably use something like

#ifdef DEBUG_MODE
std::cerr << "Iterator String split object copy ctor called\n";
#endif


Even better would be to wrap it in a function.

void logMessage(const std::string& message, ostream& out = std::cerr) {
    #ifdef DEBUG_MODE
        out << message;
    #endif
}


I've never seen a move constructor where the parameter is
const - I don't actually know what, if any impact that would have, but it is surprising and shouldn't have an impact on anything. I'm also not sure of a situation when you'd actually want the iterator to use move construction - in the interest of YAGNI, I'd say just remove it.

Your constructor is not ideal - I'd explicitly pass in begin, end, and pattern.

In
prepare_next_token I think you don't need to check the case of it being at the end - in C++ we generally prefer to make the user handle those cases, and not introduce extra overhead for everyone else.

I don't see a point to
is_valid, you don't use it anywhere I see.

A default constructor should be just that - default. I don't think it is worth setting the values to be anything (it might even make something work, or seem to work, that it shouldn't if you don't let them be default initialized).

Equality seems questionable to me - I think you'd also want to check that they have the same position within the string (it seems conceivable that a string could have the same token within it twice, although I don't know much about C++ regex).

It is also completely unnecessary to have a reference to your
StringSplitter in the iterator (and is probably not what you actually want, too). Instead, you should store the pattern in your iterator as well, and then you can get rid of it entirely.

I also wouldn't put the iterator at the very top of your class - I generally prefer to put private things, including my iterators, at the end.

Also, your

begin()/end() const;


functions should return a
const_iterator, not an iterator.

I'd rewrite your iterator to look something like this.

``
void logMessage(const std::string& message, ostream& out = std::cerr) {
#ifdef DEBUG_MODE
out
class SplitIterator: public std::iterator {

using unqualified_type = typename std::remove_const::type;
using const_reference = const reference;
using const_pointer = const pointer;

friend class StringSplitter;

unqualified_type token;
std::smatch results;
std::regex pattern;
typename value_type::const_iterator pos, fin;

SplitIterator(
const std::string& pattern, typename value_type::const_iterator start,
typename value_type::const_iterator end)
: pattern(pattern), pos(start), fin(end) {
prepare_next_token();
}

/**
* @brief Creates the next token which can be accessed by dereferencing
* the iterator
* @details It will search for the next occurence of the delimiters supplied
* and return the next token available
*/
void prepare_next_token() noexcept {
if (!std::regex_search(pos, fin, results, pattern, std::regex_constants::match_not_null)) {
// If the end of the string is reached without finding a delimiter
// then copy the rest of the string and set pos to fin
if (!results.suffix().length()) {
token = std::string(pos, fin);
pos = fin;
} else {
token = results.prefix();
}
} else {
token = results.prefix();
pos += results.position() + results.length();
}
}

public:
// Satisfy Default Constructible
SplitIterator() {}

SplitIterator (const SplitIterator& other)
:pattern(other.pattern), token(other.token),
results(other.results), pos(other.pos),
fin(other.fin) {

logMessage("Iter copy ctor called\n");
}

// Satisfy Swappable
void swap(SplitIterator& other) noexcept {
std::swap(pattern, other.pattern);
std::swap(pos, other.pos);
std::swap(fin, other.fin);
std::swap(token, other.token);
std::swap(results, other.results);
}

SplitIterator& operator++() {
logMessage("Pre-increment called!!\n");
prepare_next_token();
return *this;
}

Code Snippets

#ifdef DEBUG_MODE
std::cerr << "Iterator String split object copy ctor called\n";
#endif
void logMessage(const std::string& message, ostream& out = std::cerr) {
    #ifdef DEBUG_MODE
        out << message;
    #endif
}
begin()/end() const;
void logMessage(const std::string& message, ostream& out = std::cerr) {
    #ifdef DEBUG_MODE
        out << message;
    #endif
}

template <typename T>
class SplitIterator: public std::iterator<std::forward_iterator_tag, T> {

    using unqualified_type = typename std::remove_const<value_type>::type;
    using const_reference = const reference;
    using const_pointer = const pointer;

    friend class StringSplitter;

    unqualified_type token;
    std::smatch results;
    std::regex pattern;
    typename value_type::const_iterator pos, fin;

    SplitIterator(
        const std::string& pattern, typename value_type::const_iterator start,
        typename value_type::const_iterator end)
      : pattern(pattern), pos(start), fin(end) {
        prepare_next_token();
      }

    /**
     * @brief Creates the next token which can be accessed by dereferencing
     * the iterator
     * @details It will search for the next occurence of the delimiters supplied
     * and return the next token available
     */
    void prepare_next_token() noexcept {
        if (!std::regex_search(pos, fin, results, pattern, std::regex_constants::match_not_null)) {
            // If the end of the string is reached without finding a delimiter
            // then copy the rest of the string and set pos to fin
            if (!results.suffix().length()) {
                token = std::string(pos, fin);
                pos = fin;
            } else {
                token = results.prefix();
            }
        } else {
            token = results.prefix();
            pos += results.position() + results.length();
        }
    }

    public:
        // Satisfy Default Constructible
        SplitIterator() {}

        SplitIterator (const SplitIterator<T>& other)
            :pattern(other.pattern), token(other.token),
            results(other.results), pos(other.pos),
            fin(other.fin) {

            logMessage("Iter copy ctor called\n");
        }

        // Satisfy Swappable
        void swap(SplitIterator<T>& other) noexcept {
            std::swap(pattern, other.pattern);
            std::swap(pos, other.pos);
            std::swap(fin, other.fin);
            std::swap(token, other.token);
            std::swap(results, other.results);
        }

        SplitIterator<T>& operator++() {
            logMessage("Pre-increment called!!\n");
            prepare_next_token();
            return *this;
        }

        SplitIterator<T> operator++(int) {
            logMessage("Post-increment called!!\n");
            SplitIterator<T> prev_i = *this;
            prepare_next_token();
            return prev_i;
        }

        bool operator==(const SplitIterator<T>& rhs) const {
            return pos == rhs.pos;
        }

        // Satisfy EqualityComparable
        bool operator != (const SplitIterator<T>& rhs) const {
            return !operator==(rhs);
        }

        reference operator*() {
            return token;
        }

        const_r
StringSplitter(some_type string_source, const std::string& pattern)

Context

StackExchange Code Review Q#126972, answer score: 2

Revisions (0)

No revisions yet.