HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Templated Tokenizer Functions

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
functionstemplatedtokenizer

Problem

I wanted to be able to tokenize a few different containers so I created a generic way to do it. I originally wrote this in Visual Studio 2012, but I had to modify it to get it to compile on Ideone.

Here is a brief description of my tokenizer functions:

Tokenize (): Tokenizes a container based on a single delimiter.

TokenizeIf (): Tokenizes a container based on a single delimiter and a condition.

BackInsertTokenize () and BackInsertTokenizeIf (): Wrapper for containers that can use a std::back_inserter iterator.

A specialized BackInsertTokenize () for character types (char, wchar, etc).

Here are my generic tokenizer functions:

```
#include
#include
#include
#include

template
auto TokenizeIf (Iter begin, Iter end, OutIter out, typename std::iterator_traits ::value_type delimiter, Condition condition) -> OutIter
{
if (begin == end) {
return out ;
}

auto current = begin ;
auto next = begin ;

do {
next = std::find (current, end, delimiter) ;
Token token (current, next) ;

if (condition (token) == true) {
*out++ = std::move (token) ;
}
current = next ;

} while (next != end && ++current != end) ;

if (next != end) {
Token token ;
if (condition (token) == true) {
*out++ = std::move (token) ;
}
}

return out ;
};

template
auto Tokenize (Iter begin, Iter end, OutIter out, typename std::iterator_traits ::value_type delimiter) -> OutIter
{
if (begin == end) {
return out ;
}

auto current = begin ;
auto next = begin ;

do {
next = std::find (current, end, delimiter) ;
*out++ = Token (current, next) ;
current = next ;

} while (next != end && ++current != end) ;

if (next != end) {
*out++ = Token () ;
}

return out ;
};

template
auto BackInsertTokenizeIf (ContainerIn const &in, typename ContainerIn::value_type delimiter, Condition condition

Solution

The code generally looks good to me. It is clear, complete and working. The naming and formatting are clear and consistent (although I prefer to read code that does not have the space before each statement-terminating semicolon) but I did see a few things that may help you improve your code.

Consider using {} style initializers

There are a few places where a Token is constructed, such as this:

Token token (current, next_) ;


However, this might be misconstrued as a function call. Assuming that you're using C++11 or better, it may be worth considering using the {} style for the constructor:

Token token {current, next} ;


This can't be misconstrued as a function call and may be slightly less ambiguous.

Consider alternative usage

The code works well for the use cases you've said you're interested in addressing. That's good, and it may be all you ever need, but when I first saw the code, I thought it might be useful to be able to use it like this:

std::string const s2 = "55,33,1,7,42";
auto const t4 = BackInsertTokenize > (s2, ',') ;


The intent was to create a vector of integers from the const string, but this code doesn't actually compile. The problem is essentially this line:

*out++ = Token (current, next) ;


That works fine for any Token type that can be constructed from an iterator range like this, but not for a primitive type like int. One way to address that might be to provide another template that additionally takes an operator to explicitly perform this conversion with the given types.

Omit return 0

When a C++ program reaches the end of main the compiler will automatically generate code to return 0, so there is no reason to put return 0; explicitly at the end of main.

Code Snippets

Token token (current, next_) ;
Token token {current, next} ;
std::string const s2 = "55,33,1,7,42";
auto const t4 = BackInsertTokenize <std::vector <int>> (s2, ',') ;
*out++ = Token (current, next) ;

Context

StackExchange Code Review Q#95614, answer score: 2

Revisions (0)

No revisions yet.