HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Partitioning a string into chunks

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
intochunksstringpartitioning

Problem

Inspired by this question: Split a string into chunks of the same length

The code is designed to work on text elements rather than chars to avoid unicode problems.

public static class StringExtensions
{
    public static IEnumerable Partition(this string value, int chunkSize)
    {
        if (value == null)
        {
            throw new ArgumentNullException(nameof(value));
        }
        if (chunkSize < 1)
        {
            throw new ArgumentOutOfRangeException(nameof(chunkSize));
        }
        var sb = new StringBuilder(chunkSize);
        var enumerator = StringInfo.GetTextElementEnumerator(value);
        while (enumerator.MoveNext())
        {
            sb.Append(enumerator.GetTextElement());
            for (var i = 0; i < chunkSize - 1; i++)
            {
                if (!enumerator.MoveNext())
                {
                    break;
                }
                sb.Append(enumerator.GetTextElement());
            }
            yield return sb.ToString();
            sb.Length = 0;
        }
    }
}


And a couple of unit tests

[TestMethod]
public void Partition_SplittingAnAsciiString_ShouldSplitTheStringIntoTheRequiredChunkSize()
{
    string input = "123456";
    string[] expected = { "123", "456" };
    string[] actual = input.Partition(3).ToArray();

    CollectionAssert.AreEqual(expected, actual);
}

[TestMethod]
public void Partition_SplittingAPartiallyDecomposedString_ShouldSplitTheStringIntoTheRequiredChunkSize()
{
    string input = "éée\u0301";
    string[] expected = { "é", "é", "e\u0301" };
    string[] actual = input.Partition(1).ToArray();

    CollectionAssert.AreEqual(expected, actual);
}


I usually use Machine Specifications for unit tests so any tips on writing MSTest unit tests would be great especially naming conventions!

The method feels slightly too long to me but I couldn't see a nice way of splitting it up.

Solution

Nice and clean implementation. I tested this with the tests (the new ones) of my implementation and they all passed (skipping the tests with chunkSize == 0).

The code could be sligthly more readable by having some vertical space to group related code, for instance after validating the input.

You could improve this a little bit by just having this

if (chunkSize == 1)
{
    yield return value;
    yield break;
}


after the validation, in this way you wouldn't need to create a StringBuilder nor having the enumerator.

A slightly different approach could be to remove the for loop. It makes the intent more clear (IMO) and removes the need to double check return value of enumerator.MoveNext().

Unfortunately this makes the code 2 lines (3 with the vertical spacing) longer

var sb = new StringBuilder(chunkSize);
        var enumerator = StringInfo.GetTextElementEnumerator(value);
        var counter = 0;
        while (enumerator.MoveNext())
        {
            counter++;
            sb.Append(enumerator.GetTextElement());
            if (counter == chunkSize)
            {
                yield return sb.ToString();

                sb.Length = 0;
                counter = 0;
            }
        }
        if (counter > 0)
        {
            yield return sb.ToString();
        }


Regarding naming of unit tests, I usually (not in the posted question of mine) use the pattern

UnitOfWork_StateUnderTest_ExpectedBehavior

like shown in the accepted answer over here: unit test naming best practices

Code Snippets

if (chunkSize == 1)
{
    yield return value;
    yield break;
}
var sb = new StringBuilder(chunkSize);
        var enumerator = StringInfo.GetTextElementEnumerator(value);
        var counter = 0;
        while (enumerator.MoveNext())
        {
            counter++;
            sb.Append(enumerator.GetTextElement());
            if (counter == chunkSize)
            {
                yield return sb.ToString();

                sb.Length = 0;
                counter = 0;
            }
        }
        if (counter > 0)
        {
            yield return sb.ToString();
        }

Context

StackExchange Code Review Q#112004, answer score: 3

Revisions (0)

No revisions yet.