HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Lazy String.Split

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
splitlazystring

Problem

C#'s String.Split method comes from C# 2.0, and lazy operations weren't a feature back then. The task is to split a string according to a (single) separator. Doing so with String.Split is used like

string[] split = myString.Split(new string[] { separator });


Now, not that bad, but if you want to add more operations to that string[] (and you probably do), you'll need to loop over the whole array, basically iterating the string twice. Using coroutine-like behaviour of the lazy yield keyword, you can (maybe) do more than one operation while only iterating once over the string.

public static IEnumerable LazySplit(this string stringToSplit, string separator) {

    if (stringToSplit == null) throw new ArgumentNullException("stringToSplit");
    if (separator == null) throw new ArgumentNullException("separator");

    var lastIndex = 0;
    var index = -1;
    do {
        index = stringToSplit.IndexOf(separator, lastIndex);
        if (index = lastIndex) {
            yield return stringToSplit.Substring(lastIndex, index - lastIndex);
        }
        lastIndex = index + separator.Length;
    } while (index > 0);
}


While this does not have the "remove empty entries" option, using myString.LazySplit(separator).Where(str => !String.IsNullOrWhiteSpace(str)) should do the job with an O(n) operation, or am I wrong here?

I'm not sure about the time complexity using co-routines, but for the functionality I've written some unit tests to be sure its working:

```
[TestMethod]
public void LazyStringSplit() {
var str = "ab;cd;;";
var resp = str.LazySplit(";");
var expected = new[] { "ab", "cd", "" };
var result = resp.ToArray();
CollectionAssert.AreEqual(expected, result);
}

[TestMethod]
public void LazyStringSplitEmptyString() {
var str = "";
var resp = str.LazySplit(";");
var expected = new string[0];
var result = resp.ToArray();
CollectionAssert.AreEqual(expected, result);
}

[TestMethod]
public void Lazy

Solution

Edge cases:

-
";abc".LazySplit(";") will return an empty sequence. To match
the behaviour of ";abc".Split(new char[] { ';' }) it should return
the sequence { "", "abc" }.

-
";abc".LazySplit("") will return a sequence with a single item, the
empty string. To match the behaviour of ";abc".Split(new char[] { }) it should return the sequence { ";abc" }.

Here's how I would suggest writing it.

First, deal with the empty separator

if (separator.Length == 0)
{
    yield return value;
    yield break;
}


Then have two variables, start and end that refer to the start and end of the substring we want to extract.

var start = 0;
for (var end = value.IndexOf(separator); end != -1; end = value.IndexOf(separator, start))
{
    yield return value.Substring(start, end - start);
    start = end + separator.Length;
}

yield return value.Substring(start);


To make your unit tests match the behaviour of string.Split, you also want to change LazyStringSplit to have

var expected = new[] { "ab", "cd", "", "" };


and LazyStringSplitEmptyString to have

var expected = new string[] { "" };


If you want to test that your implementation matches the behaviour of string.Split, I would suggest introducing a helper method for the tests. Something like

var expected = value.Split(new string[] { separator }, StringSplitOptions.None);
CollectionAssert.AreEqual(expected, value.LazySplit(separator));

Code Snippets

if (separator.Length == 0)
{
    yield return value;
    yield break;
}
var start = 0;
for (var end = value.IndexOf(separator); end != -1; end = value.IndexOf(separator, start))
{
    yield return value.Substring(start, end - start);
    start = end + separator.Length;
}

yield return value.Substring(start);
var expected = new[] { "ab", "cd", "", "" };
var expected = new string[] { "" };
var expected = value.Split(new string[] { separator }, StringSplitOptions.None);
CollectionAssert.AreEqual(expected, value.LazySplit(separator));

Context

StackExchange Code Review Q#84163, answer score: 9

Revisions (0)

No revisions yet.