patterncsharpMinor
Consecutive whitespace reduction
Viewed 0 times
whitespaceconsecutivereduction
Problem
I pursued an according algorithm with an absolute best performance, the following class embodies the solution.
Any comment is welcome.
Any comment is welcome.
class TextCompressor
{
private const char Seperator = '\x20';
private StringBuilder builder;
public TextCompressor()
{
this.builder = new StringBuilder();
}
public string Compress(string str)
{
int offset = 0;
int end = str.Length - 1;
for (int i = 0; i
Compress vs. Regex.Replace
.. using \s{2,}
String: " The quick brown fox jumps over the lazy dog. "
- 1.97017 * (10 times faster)
- 19.333874
String: "The quick brown fox jumps over the lazy dog."`- 0.485821 * (16 times faster)
- 8.049062
Solution
The good stuff
Your code looks neat and is pretty fast.
Remarks
I would second that the name
Review
The biggest problem I see is the missing
The
Initializing the
Instead of having this method in the
I would like to suggest using a
Some numbers
You mentioned in your question that you have measured the performance and provided some numbers, which are unfortunately missing a unit. I couldn't compare them to my numbers so I just measured your method and my method.
For both scenarios each benchmarked method first does a warm-up phase for 5000 ms and then it is called 10000 times and returns the average of
-
Passed in a string with
Yours: 0,0289723431498079 ms
Mine: 0,0308240396927017 ms
Other: 0,110759507042254 ms
-
Passed in a string with
Yours: 6,70626021126761 ms
Mine: 7,66718351472471 ms
Other: 42,3948242637644 ms
So, what is Other? It is only taking 42 ms for the second scenario which is IMO fast enough for such a task. It's very simple:
Your code looks neat and is pretty fast.
Remarks
I would second that the name
Separator isn't chosen well and would like to suggest SpaceChar which is more obvious. Review
The biggest problem I see is the missing
null check for the parameter of the public method Compress. Throwing an ArgumentNullException instead of getting a NullReferenceException is better. The
Trim(string s) method doesn't buy you much. Saving some Ticks this way is a optimization which comes with less readable code. Just a call to string.Trim() is sufficient. Initializing the
StringBuilder like @Shelby115 suggested in his answer won't save you a lot but it is a good habit to do so. If you know how big the buffer of the StringBuilder can increase you should use that knowledge. With the default constructor the StringBuilder's internal buffer has a size of 16 which is doubled each time it is reached. Instead of having this method in the
TextCompressor class which implies that it compresses some text I would suggest making this an extension method. I would like to suggest using a
char [] instead of a StringBuilder and change the arrow code to something like this:public static class StringExtensions
{
public static string ReplaceMultipleWhiteSpaceWithSpace(this string str)
{
return str.ReplaceMultipleWhiteSpaceWithSeparator('\x20');
}
public static string ReplaceMultipleWhiteSpaceWithSeparator(this string str, char separator)
{
if (str == null) { throw new ArgumentNullException(str); }
if (str.Length == 0) { return str; }
int end = str.Length;
int destinationOffset = 0;
bool foundWhiteSpace = true;
char current = separator;
char[] destination = new char[end];
for (int i = 0; i = values.Length) { startIndex = values.Length - 1; }
for (int i = startIndex; i >= 0; i--)
{
if (!Char.IsWhiteSpace(values[i]) && values[i]!='\0')
{
return i +1;
}
}
return startIndex;
}
}Some numbers
You mentioned in your question that you have measured the performance and provided some numbers, which are unfortunately missing a unit. I couldn't compare them to my numbers so I just measured your method and my method.
For both scenarios each benchmarked method first does a warm-up phase for 5000 ms and then it is called 10000 times and returns the average of
StopWatch.ElapsedTicks which is then converted to milliseconds.-
Passed in a string with
Length of 7368 containing 4124 white space characters.Yours: 0,0289723431498079 ms
Mine: 0,0308240396927017 ms
Other: 0,110759507042254 ms
-
Passed in a string with
Length of 1591488 containing 890784 white space characters.Yours: 6,70626021126761 ms
Mine: 7,66718351472471 ms
Other: 42,3948242637644 ms
So, what is Other? It is only taking 42 ms for the second scenario which is IMO fast enough for such a task. It's very simple:
string.Join(" ", str.Split().Where(s => s.Length > 0));Code Snippets
public static class StringExtensions
{
public static string ReplaceMultipleWhiteSpaceWithSpace(this string str)
{
return str.ReplaceMultipleWhiteSpaceWithSeparator('\x20');
}
public static string ReplaceMultipleWhiteSpaceWithSeparator(this string str, char separator)
{
if (str == null) { throw new ArgumentNullException(str); }
if (str.Length == 0) { return str; }
int end = str.Length;
int destinationOffset = 0;
bool foundWhiteSpace = true;
char current = separator;
char[] destination = new char[end];
for (int i = 0; i < end; i++)
{
if (Char.IsWhiteSpace(str[i]))
{
if (foundWhiteSpace) { continue; }
foundWhiteSpace = true;
current = separator;
}
else
{
foundWhiteSpace = false;
current = str[i];
}
destination[destinationOffset] = current;
destinationOffset++;
}
int newSize = foundWhiteSpace
? IndexOfLastNonWhithSpaceChar(destination, destinationOffset)
: destinationOffset;
Array.Resize(ref destination, newSize);
return new string(destination);
}
private static int IndexOfLastNonWhithSpaceChar(char[] values, int startIndex)
{
if (startIndex >= values.Length) { startIndex = values.Length - 1; }
for (int i = startIndex; i >= 0; i--)
{
if (!Char.IsWhiteSpace(values[i]) && values[i]!='\0')
{
return i +1;
}
}
return startIndex;
}
}string.Join(" ", str.Split().Where(s => s.Length > 0));Context
StackExchange Code Review Q#118640, answer score: 7
Revisions (0)
No revisions yet.