HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Consecutive whitespace reduction

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
whitespaceconsecutivereduction

Problem

I pursued an according algorithm with an absolute best performance, the following class embodies the solution.

Any comment is welcome.

class TextCompressor
{
private const char Seperator = '\x20';

private StringBuilder builder;

public TextCompressor()
{
this.builder = new StringBuilder();
}

public string Compress(string str)
{
int offset = 0;
int end = str.Length - 1;

for (int i = 0; i

Compress vs. Regex.Replace

.. using
\s{2,}

String:
" The quick brown fox jumps over the lazy dog. "

  • 1.97017 * (10 times faster)



  • 19.333874



String:
"The quick brown fox jumps over the lazy dog."`

  • 0.485821 * (16 times faster)



  • 8.049062

Solution

The good stuff

Your code looks neat and is pretty fast.

Remarks

I would second that the name Separator isn't chosen well and would like to suggest SpaceChar which is more obvious.

Review

The biggest problem I see is the missing null check for the parameter of the public method Compress. Throwing an ArgumentNullException instead of getting a NullReferenceException is better.

The Trim(string s) method doesn't buy you much. Saving some Ticks this way is a optimization which comes with less readable code. Just a call to string.Trim() is sufficient.

Initializing the StringBuilder like @Shelby115 suggested in his answer won't save you a lot but it is a good habit to do so. If you know how big the buffer of the StringBuilder can increase you should use that knowledge. With the default constructor the StringBuilder's internal buffer has a size of 16 which is doubled each time it is reached.

Instead of having this method in the TextCompressor class which implies that it compresses some text I would suggest making this an extension method.

I would like to suggest using a char [] instead of a StringBuilder and change the arrow code to something like this:

public static class StringExtensions
{
    public static string ReplaceMultipleWhiteSpaceWithSpace(this string str)
    {
        return str.ReplaceMultipleWhiteSpaceWithSeparator('\x20');
    }

    public static string ReplaceMultipleWhiteSpaceWithSeparator(this string str, char separator)
    {
        if (str == null) { throw new ArgumentNullException(str); }
        if (str.Length == 0) { return str; }

        int end = str.Length;
        int destinationOffset = 0;
        bool foundWhiteSpace = true;
        char current = separator;
        char[] destination = new char[end];

        for (int i = 0; i = values.Length) { startIndex = values.Length - 1; }
        for (int i = startIndex; i >= 0; i--)
        {
            if (!Char.IsWhiteSpace(values[i]) && values[i]!='\0')
            {
                return i +1;
            }
        }
        return startIndex;
    }
}


Some numbers

You mentioned in your question that you have measured the performance and provided some numbers, which are unfortunately missing a unit. I couldn't compare them to my numbers so I just measured your method and my method.

For both scenarios each benchmarked method first does a warm-up phase for 5000 ms and then it is called 10000 times and returns the average of StopWatch.ElapsedTicks which is then converted to milliseconds.

-
Passed in a string with Length of 7368 containing 4124 white space characters.

Yours: 0,0289723431498079 ms

Mine: 0,0308240396927017 ms

Other: 0,110759507042254 ms

-
Passed in a string with Length of 1591488 containing 890784 white space characters.

Yours: 6,70626021126761 ms

Mine: 7,66718351472471 ms

Other: 42,3948242637644 ms

So, what is Other? It is only taking 42 ms for the second scenario which is IMO fast enough for such a task. It's very simple:

string.Join(" ", str.Split().Where(s => s.Length > 0));

Code Snippets

public static class StringExtensions
{
    public static string ReplaceMultipleWhiteSpaceWithSpace(this string str)
    {
        return str.ReplaceMultipleWhiteSpaceWithSeparator('\x20');
    }

    public static string ReplaceMultipleWhiteSpaceWithSeparator(this string str, char separator)
    {
        if (str == null) { throw new ArgumentNullException(str); }
        if (str.Length == 0) { return str; }

        int end = str.Length;
        int destinationOffset = 0;
        bool foundWhiteSpace = true;
        char current = separator;
        char[] destination = new char[end];

        for (int i = 0; i < end; i++)
        {
            if (Char.IsWhiteSpace(str[i]))
            {
                if (foundWhiteSpace) { continue; }

                foundWhiteSpace = true;
                current = separator;
            }
            else
            {
                foundWhiteSpace = false;
                current = str[i];
            }
            destination[destinationOffset] = current;
            destinationOffset++;
        }


        int newSize = foundWhiteSpace
            ? IndexOfLastNonWhithSpaceChar(destination, destinationOffset)
            : destinationOffset;

        Array.Resize(ref destination, newSize);

        return new string(destination);

    }
    private static int IndexOfLastNonWhithSpaceChar(char[] values, int startIndex)
    {
        if (startIndex >= values.Length) { startIndex = values.Length - 1; }
        for (int i = startIndex; i >= 0; i--)
        {
            if (!Char.IsWhiteSpace(values[i]) && values[i]!='\0')
            {
                return i +1;
            }
        }
        return startIndex;
    }
}
string.Join(" ", str.Split().Where(s => s.Length > 0));

Context

StackExchange Code Review Q#118640, answer score: 7

Revisions (0)

No revisions yet.