patterncMinor
String compaction algorithm
Viewed 0 times
algorithmcompactionstring
Problem
I've been working on improving my C coding. I wanted to write two string algorithms:
I have a few problems with this code that I was hoping to get some advice on:
I'm looking for a review on algorithmic correctness, code style, design, and any other improvements you can think of. My main concern is performance!
```
#include
#include
#include
#include
#include
#include
static const char TERMINATOR = '\0';
/* skips space in @param[str] depending on the provided direction
@param[in] -- str -- the string
@param[in] -- direction -- the direction to skip space in
1 -- forward direction
0 -- back direction
@return -- pointer to first non-space character in the string if
@param[direction] and @param[str] are valid or the original
string otherwise
*/
const char skip_space(const char str, int direction)
{
if (direction == 1) {
while (isspace(*str)) { ++str; }
} else if (direction == 0) {
while (isspace(*str)) { --str; }
}
return str;
}
const char ltrim_white(const char str, size_t len)
{
char end = (char )&str[len - 1];
end = (char *)skip_space(end, 0);
*++end = TERMINATOR;
return skip_space(str, 1);
}
const char trim_white(const char str)
{
return ltrim_white(str, strlen(str));
}
char compact_with_mem(char str)
{
str = (char *)skip_space(str, 1);
size_t len = strlen(str);
if (len == 0) {
char *c = malloc(1);
if (c) {
*c = TERMINATOR;
}
return c;
}
char back = (char )skip_space(&str[len - 1], 0);
*++back = T
- to trim white space from the beginning and end of a string
- to compact a string
I have a few problems with this code that I was hoping to get some advice on:
- The casts from
const chartochar. Is there a clean and idiomatic way of cleaning those up?
- The string compaction algorithm requires extra storage and I couldn't think of a clean way to do the compaction in place. Is there a clean way of doing it in place?
I'm looking for a review on algorithmic correctness, code style, design, and any other improvements you can think of. My main concern is performance!
```
#include
#include
#include
#include
#include
#include
static const char TERMINATOR = '\0';
/* skips space in @param[str] depending on the provided direction
@param[in] -- str -- the string
@param[in] -- direction -- the direction to skip space in
1 -- forward direction
0 -- back direction
@return -- pointer to first non-space character in the string if
@param[direction] and @param[str] are valid or the original
string otherwise
*/
const char skip_space(const char str, int direction)
{
if (direction == 1) {
while (isspace(*str)) { ++str; }
} else if (direction == 0) {
while (isspace(*str)) { --str; }
}
return str;
}
const char ltrim_white(const char str, size_t len)
{
char end = (char )&str[len - 1];
end = (char *)skip_space(end, 0);
*++end = TERMINATOR;
return skip_space(str, 1);
}
const char trim_white(const char str)
{
return ltrim_white(str, strlen(str));
}
char compact_with_mem(char str)
{
str = (char *)skip_space(str, 1);
size_t len = strlen(str);
if (len == 0) {
char *c = malloc(1);
if (c) {
*c = TERMINATOR;
}
return c;
}
char back = (char )skip_space(&str[len - 1], 0);
*++back = T
Solution
The functionality you have implemented in your “compact” is generally known as white-space normalization.
Get rid of redundant functions
You have a lot of functions. While breaking complicated logic up into manageable bits is a good thing, there is also a point where it becomes too much and hurts maintainability rather than helping it.
For example, the only use of
This function doesn't really buy you anything. Making a function call
isn't really simpler than writing
directly. And the latter form saves you the headache figuring out what that
Make auxiliary functions
Those auxiliary functions that you determine are in fact useful and should be kept, should still be private to your implementation and not pollute the global name-space. Therefore, you should declare them
Put the standard library to good use
The `
Instead of advancing the start pointer, you could actually move the bytes of the string to the beginning. While this might seem costly at first, note that it doesn't actua
Get rid of redundant functions
You have a lot of functions. While breaking complicated logic up into manageable bits is a good thing, there is also a point where it becomes too much and hurts maintainability rather than helping it.
For example, the only use of
ltrim_white is in trim_white which consists of a single line calling that function. You could have equally well put all the code into trim_white directly. ltrim_white also is a misnomer because unlike its name suggests, it trims white-space from both sides, not just from the left.const char *skip_space(const char *str, int direction)
{
if (direction == 1) {
while (isspace(*str)) { ++str; }
} else if (direction == 0) {
while (isspace(*str)) { --str; }
}
return str;
}This function doesn't really buy you anything. Making a function call
s = (char *) skip_space(s, 1);isn't really simpler than writing
while (isspace(*s)) ++s;directly. And the latter form saves you the headache figuring out what that
1 as second parameter means. As a bonus, it saves a branch and avoids the const trouble. If you really want a dedicated function, provide skip_space_forward and skip_space_backward instead of introducing the second parameter.Make auxiliary functions
staticThose auxiliary functions that you determine are in fact useful and should be kept, should still be private to your implementation and not pollute the global name-space. Therefore, you should declare them
static. As a nice bonus, this might also make your code run faster (for free).Put the standard library to good use
The `
standard library header provides highly optimized functions for dealing with memory. Use them. In particular, memmove and strlen will come in handy.
Be const-correct
As you have already realized yourself, the adding and removing of const has become a total mess. You should only add const where you can, not at all cost. It is far better to have an internal pointer being not const even if you don't need to modify the pointed-to object than it is to repetitively cast const away. Once you start casting, const doesn't buy you anything in terms of code correctness.
If you have a function like skip_space, that takes a pointer to a buffer and returns a pointer into that buffer without modifying the buffer, it is good to keep the buffer const. Unfortunately, when you call skip_space with a non-const pointer, what you get back is always const and you cannot use the pointer to write through it even though you know it would be safe. This can be solved without a cast while still keeping the buffer skip_space operates on const, though: use the returned pointer to compute an index and use that one!
char buffer[] = "the quick brown fox"; // mutable buffer
const char *const p = skip_space(buffer, 1); // pointer to const
const ptrdiff_t i = p - buffer; // just an integer
buffer[i] = 'X'; // perfectly fine now
If you have a decent compiler, it should be able to optimize the additional arithmetic away while still being able to verify the const-correctness of your code.
Don't over-generalize
NUL terminated character strings are so prevalent throughout C that you can safely assume the convention is here to stay. By spelling out '\0', you make your code actually more readable than by using a constant like TERMINATOR.
If you really want generality, make the caller pass in a pair of pointers. (If you have ever used the C++ standard library, this will be very familiar to you.) Then your function also works with arrays that are not terminated at all. You can provide a convenience function that only accepts a single pointer (that has to point to a terminated array in this case, of course) and uses rawmemchr (if you have it) or strlen to find the end. Then it calls the generic two-argument version.
Actually, a more useful generalization would probably be to accept a function pointer to use in place of isspace. Such indirect calls can have a performance penalty, though.
Consider pointer arithmetic
Instead of &p[n - i] you can just write p + n - i. I find this much more readable.
Modify pointers with care
Your trim_white function trims space from the beginning by simply adding an offset to the pointer. While this is super-fast, there is a big downside. If I call your function with a pointer that I obtained through malloc, I still have to keep the original pointer along with the pointer to the trimmed buffer. I need the former to pass it to free` when I no longer need the string. I'm sure I will confuse them all the time.Instead of advancing the start pointer, you could actually move the bytes of the string to the beginning. While this might seem costly at first, note that it doesn't actua
Code Snippets
const char *skip_space(const char *str, int direction)
{
if (direction == 1) {
while (isspace(*str)) { ++str; }
} else if (direction == 0) {
while (isspace(*str)) { --str; }
}
return str;
}s = (char *) skip_space(s, 1);while (isspace(*s)) ++s;char buffer[] = "the quick brown fox"; // mutable buffer
const char *const p = skip_space(buffer, 1); // pointer to const
const ptrdiff_t i = p - buffer; // just an integer
buffer[i] = 'X'; // perfectly fine nowchar *
trim_space(const char *const src, char *const dst)
{
size_t first = 0;
size_t last = strlen(src);
size_t length = last - first;;
if (length > 0)
{
while (isspace(src[first]))
++first;
while ((last > first) && isspace(src[last - 1]))
--last;
length = last - first;
memmove(dst, src + first, length);
}
dst[length] = '\0';
return dst + length;
}Context
StackExchange Code Review Q#118282, answer score: 3
Revisions (0)
No revisions yet.