patterncppMinor
Fastest way to search istringstream for patterns in around 0.02 seconds
Viewed 0 times
aroundistringstreamsearchsecondswaypatternsforfastest
Problem
Problem
I have a stream composed of 2 columns and 1000 lines:
I want to search for the patterns in column 1, extract their corresponding values in column 2 and save them in vectors. I actually have 3 types that I want to look for which are :
I read the file sequentially. If I find the pattern that I am looking for in column 1 I extract the value in column 2 and append it with the corresponding vector.
All the other patterns will be saved in a vector called others.
Sample
Column 1 contains the patterns that I want to look for
What is the fastest way to search and extract the values?
The following source code takes
```
void returnValues(const string & file, vector> & types, vector> & labels, vector> & names, vector> & others)
{
istringstream str(file);
string line;
//skip first line
getline(str,line);
while(getline(str, line))
{
vector values;
line.erase(remove( line.begin(), line.end(), '\"' ), line.end());
boost::split(values, line, boost::is_any_of("\t"));
if(contains(values[0],"type"))
{
pair fact = make_pair(values[0], values[1]);
types.push_back(fact);
}
else if(contains(values[0],"label"))
{
I have a stream composed of 2 columns and 1000 lines:
- Column 1: contains the patterns that I want to find
- Column 2: contains the values corresponding the the patterns in Column1
I want to search for the patterns in column 1, extract their corresponding values in column 2 and save them in vectors. I actually have 3 types that I want to look for which are :
{type, label, name}. Therefore i'll have 3 vectors for every type .pair represents column 1 and column 2.vector>` types
vector> Labels
vector> namesI read the file sequentially. If I find the pattern that I am looking for in column 1 I extract the value in column 2 and append it with the corresponding vector.
All the other patterns will be saved in a vector called others.
vector> othersSample
Column 1 contains the patterns that I want to look for
{type, label, name}. and Column 2 the corresponding valuesrdf-syntax-ns#type base.qualia.topic
rdf-syntax-ns#type common.topic
rdf-syntax-ns#type film.producer
rdf-schema#label สตีฟ จอบส์
rdf-schema#label ﺎﺴﺗیﻭ ﺝﺎﺑﺯ
rdf-schema#label Styvas Džobsas
type.object.name ﺎﺴﺗیﻭ ﺝﺎﺑﺯ
type.object.name Styvas Džobsas
type.object.name Steve Jobs
type.object.name Steve JobsWhat is the fastest way to search and extract the values?
The following source code takes
0.04 seconds to read 1000 lines, find the patterns and extract their corresponding values.```
void returnValues(const string & file, vector> & types, vector> & labels, vector> & names, vector> & others)
{
istringstream str(file);
string line;
//skip first line
getline(str,line);
while(getline(str, line))
{
vector values;
line.erase(remove( line.begin(), line.end(), '\"' ), line.end());
boost::split(values, line, boost::is_any_of("\t"));
if(contains(values[0],"type"))
{
pair fact = make_pair(values[0], values[1]);
types.push_back(fact);
}
else if(contains(values[0],"label"))
{
Solution
Without profiling data, I can only guess... so here goes:
At a quick glance the biggest inefficiency I can see is that you allocate and de-allocate the vector capacity for
Like this:
Another thing that you can do (even though I don't believe it will affect your result significantly) is to use
Like this:
This will construct the pair in place and avoid the
At a quick glance the biggest inefficiency I can see is that you allocate and de-allocate the vector capacity for
values in each iteration. This takes some time, just move the vector outside of the loop and use clear() at the head of the loop. Like this:
vector values;
while(getline(str, line))
{
values.clear();Another thing that you can do (even though I don't believe it will affect your result significantly) is to use
emplace_back instead of push_back to avoid the possibility of a copy.Like this:
types.emplace_back(values[0], values[1]);This will construct the pair in place and avoid the
pair copy constructor (the compiler might have optimized this for you already). While we're at it, we can realize that values will not be used after this statement until it is cleared. So we can steal the memory and avoid another two new/deletes (unless your STL has SSO and your strings are small, in which case this is moot) just activate the move construction of the pair like this:types.emplace_back(std::move(values[0]), std::move(values[1]));Code Snippets
vector<string> values;
while(getline(str, line))
{
values.clear();types.emplace_back(values[0], values[1]);types.emplace_back(std::move(values[0]), std::move(values[1]));Context
StackExchange Code Review Q#94771, answer score: 7
Revisions (0)
No revisions yet.