patterncppModerate
Efficiency/design of trim() library code
Viewed 0 times
trimdesignlibrarycodeefficiency
Problem
I am working on some library code and I am trying to optimize my
What I want to do with this piece of code is take advice in all areas:
Nomenclature:
I am open to suggestions for function names. For example do I name them after the type of parameter or the type of value (
Unified function name based on overloads:
Also it would be nice to end up with a universal set of functions that select the most efficient method based on input parameter without having a suffix to select between them:
I am pretty sure that's not completely achievable due to ambiguity. The question is what version should I leave out? The copy or the mutating?
Efficiency of basic algorithm:
I am using this basic algorithm:
I would like to think the library is smart enough to special-case
Efficiency of calling:
I have implemented these functions in terms of each other, but I am not sure (at all) if what I have done is most efficient. Or even if I am not reducing efficiency (by inhibiting RVO for example).
Am I even on the right track?
Do I even need 4 functions (or overloads)? Would it be just as efficient to simply take parameters as copies and rely on RVO?
```
const char* const ws = " \t\n\r\f\v";
inline std::string& ltrim_mute(std::string& s, const char* t);
inline std::s
trim() functions. To that effect I am trying to figure out how best to deal with each kind of input. Move semantics are starting to fry my brains a little. Sometimes I think I understand them... but not very often.What I want to do with this piece of code is take advice in all areas:
Nomenclature:
I am open to suggestions for function names. For example do I name them after the type of parameter or the type of value (
trim_cref vs trim_rval)? What do people prefer?Unified function name based on overloads:
Also it would be nice to end up with a universal set of functions that select the most efficient method based on input parameter without having a suffix to select between them:
std::string trim(std::string s);
std::string& trim(std::string& s); // mutating
std::string trim(std::string&& s);
std::string trim(const std::string& s);I am pretty sure that's not completely achievable due to ambiguity. The question is what version should I leave out? The copy or the mutating?
Efficiency of basic algorithm:
I am using this basic algorithm:
std::string& trim(std::string& s, const char* t = ws)
{
s.erase(0, s.find_first_not_of(t));
s.erase(s.find_last_not_of(t) + 1);
return s;
}I would like to think the library is smart enough to special-case
s.erase() when the first parameter is zero or the last parameter is std::string::npos. If not is there a faster way to do this?Efficiency of calling:
I have implemented these functions in terms of each other, but I am not sure (at all) if what I have done is most efficient. Or even if I am not reducing efficiency (by inhibiting RVO for example).
Am I even on the right track?
Do I even need 4 functions (or overloads)? Would it be just as efficient to simply take parameters as copies and rely on RVO?
```
const char* const ws = " \t\n\r\f\v";
inline std::string& ltrim_mute(std::string& s, const char* t);
inline std::s
Solution
I'll take a swing at this:
2) Unified function name based on overloads:
...
Also it would be nice to end up with a universal set of functions that select the most efficient method based on input parameter without having a suffix to select between them:
...
I am pretty sure that's not completely achievable due to ambiguity. The question is what version should I leave out? The copy or the mutating?
I think that it would be a very bad idea to have 2 overloads of the same function name where one mutates its argument and the other returns a copy. Overloads that don't perform the same semantic operation shouldn't be overloads, at least in my mind. So, I'd pick 2 different names instead, say "inplace_trim" and "trim". If you take that approach, then it would make perfect sense to have overloaded versions of "trim" for by-value, by-reference, by-const-reference, and by-xvalue.
This also solves:
1) Nomenclature:
I am open to suggestions for function names. For example do I name them after the type of parameter or the type of value (trim_cref vs trim_rval)?
This way you only need to have 2 names, "inplace_trim" and "trim". You don't need any funky naming convention since it allows overloading to work.
3) Efficiency of basic algorithm:
I have only one thought here, and it's a minor one: It will be slightly more efficient to trim the trailing whitespace first, then the leading. The reason is that
Efficiency of calling:
It has just occurred to me that you're doing more copying than necessary in the case where there are actually leading spaces in your strings. The way that you implement
So it seems like, by trying to implement these functions in terms of each other, you may be missing out on some optimization opportunities.
Other thoughts:
-
I'd rename
-
If
2) Unified function name based on overloads:
...
Also it would be nice to end up with a universal set of functions that select the most efficient method based on input parameter without having a suffix to select between them:
...
I am pretty sure that's not completely achievable due to ambiguity. The question is what version should I leave out? The copy or the mutating?
I think that it would be a very bad idea to have 2 overloads of the same function name where one mutates its argument and the other returns a copy. Overloads that don't perform the same semantic operation shouldn't be overloads, at least in my mind. So, I'd pick 2 different names instead, say "inplace_trim" and "trim". If you take that approach, then it would make perfect sense to have overloaded versions of "trim" for by-value, by-reference, by-const-reference, and by-xvalue.
This also solves:
1) Nomenclature:
I am open to suggestions for function names. For example do I name them after the type of parameter or the type of value (trim_cref vs trim_rval)?
This way you only need to have 2 names, "inplace_trim" and "trim". You don't need any funky naming convention since it allows overloading to work.
3) Efficiency of basic algorithm:
I have only one thought here, and it's a minor one: It will be slightly more efficient to trim the trailing whitespace first, then the leading. The reason is that
eraseing from the front of a string is just like eraseing from the front of a vector; it requires copying every character after the erasure point into its new location in the sequence. Erasing from the end just updates pointers, so if you do the erase from the end first, there are fewer characters needing copying during the erase from the beginning.Efficiency of calling:
It has just occurred to me that you're doing more copying than necessary in the case where there are actually leading spaces in your strings. The way that you implement
ltrim_cref, for instance, you copy the input string and pass the copy into ltrim_move, which then deletes the leading whitespace by shifting every character after the erasure point left (another hidden copy). You could avoid this extra copy by instead doing (untested):inline std::string ltrim_cref(const std::string& s, const char* t = ws)
{
return s.substr(std::min(s.size(), s.find_first_not_of(t)));
}So it seems like, by trying to implement these functions in terms of each other, you may be missing out on some optimization opportunities.
Other thoughts:
-
I'd rename
t to chars_to_skip or something like that - t isn't a particularly meaningful name.-
If
ws is going to be exposed as part of your API through your header file, I'd make sure that a) it's in a namespace to avoid namespace pollution, and b) it gets a more descriptive name, like StringTrimming::ASCII_WHITESPACE.Code Snippets
inline std::string ltrim_cref(const std::string& s, const char* t = ws)
{
return s.substr(std::min(s.size(), s.find_first_not_of(t)));
}Context
StackExchange Code Review Q#59243, answer score: 14
Revisions (0)
No revisions yet.