HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppModerate

C++ Split string into a vector

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
intosplitstringvector

Problem

While profiling parts of my code, I noticed that my Split( ... ) function is rather slow, (about 50% of the time).
I was wondering if there was a more efficient way of Splitting a string into a vector.

My subject string can be anything between 2 words to well over 1000 words, hence the reason why I would like to speed it up.

Here is an example program with the function.

#include 
#include 
#include 
#include 

void Split(const std::string& subject, std::vector& container)
{
  container.clear();
  size_t len = subject.length() + 1;
  char* s = new char[ len ];
  memset(s, 0, len*sizeof(char));
  memcpy(s, subject.c_str(), (len - 1)*sizeof(char));
  for (char *p = strtok(s, " "); p != NULL; p = strtok(NULL, " "))
  {
    container.push_back( p );
  }
  delete[] s;
}

int main()
{
  std::vector container;
  Split( "Hello World", container );

  for( std::vector::const_iterator it = container.begin(); it != container.end();++it)
  {
    std::cout << *it << "!\n";
  }
}


Any suggestions on how I could speed it up.

Of course any other general comments are more than welcome.

Edit profiling is very simple.

...
clock_t t = clock (); 
std::vector container;
Split( "actus non facit reum nisi mens sit rea", container );
t = clock() - t;
std::cout << (((float)t)/CLOCKS_PER_SEC) << "ms\n";
...

Solution

I haven't attempted to measure the timing of this, in part because I don't know how you're planning to use it, but here are some ideas for improving your code.

Consider returning something useful from functions

Your Split function doesn't return anything, which isn't an error, but it's an odd design. I'd expect it to be declared like this instead:

std::vector Split(const std::string& subject)


Understand sizeof

The sizeof operator is defined as always returning 1 for sizeof(char), so having that expression as part of your code probably isn't useful.

Use "range for" to simplify your code

The code to print the vector could be made much simpler and shorter by using a "range for" that was introduced in C++11:

for(const auto &item : container) {
    std::cout << item << "!\n";
}


Use standard library algorithms

I'd be inclined to write this using C++ rather than C functions. One way to do this is to use std::copy and std::stringstream:

std::vector Split(const std::string& subject)
{
    std::istringstream ss{subject};
    using StrIt = std::istream_iterator;
    std::vector container{StrIt{ss}, StrIt{}};
    return container;
}


I've also been playing with std::sregex_token_iterator lately. Here's how that might look:

std::vector Split(const std::string& subject) {
    static const std::regex re{"\\s+"};
    std::vector container{
        std::sregex_token_iterator(subject.begin(), subject.end(), re, -1), 
        std::sregex_token_iterator()
    };
    return container;
}

Code Snippets

std::vector<std::string> Split(const std::string& subject)
for(const auto &item : container) {
    std::cout << item << "!\n";
}
std::vector<std::string> Split(const std::string& subject)
{
    std::istringstream ss{subject};
    using StrIt = std::istream_iterator<std::string>;
    std::vector<std::string> container{StrIt{ss}, StrIt{}};
    return container;
}
std::vector<std::string> Split(const std::string& subject) {
    static const std::regex re{"\\s+"};
    std::vector<std::string> container{
        std::sregex_token_iterator(subject.begin(), subject.end(), re, -1), 
        std::sregex_token_iterator()
    };
    return container;
}

Context

StackExchange Code Review Q#159628, answer score: 17

Revisions (0)

No revisions yet.