HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Divide list into batches

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
dividelistintobatches

Problem

Three questions: is there a more performant way, is there a more suscinct, strike that, a way of expressing this where if you read just the body it is immediately apperent what the alorithm does, and should I be returning and IEnumerable of an IEnumerable, I mean what would be the point over and IEnumerable of IList?

public static IEnumerable> IntoBatches(this IEnumerable list, int size)
{
    if (size ();
        foreach (var item in list)
        {
            batch.Add(item);
            if (size == ++count)
            {
                yield return batch;
                batch.Clear();
            }
        }
        if (batch.Count > 0) yield return batch;
    }
}

Solution

Bug

You have 2 big bugs in your method. The first is that you never ever set the count variable to 0 and the second that you are yielding the List.

If I call your method with a List containing 10000 ints and do a ToList() on the result I get 2 Lists both containing 9997 ints.

Although this is easy to fix like so

public static IEnumerable> IntoBatches(this IEnumerable list, int size)
{
    if (size ();                
            batch.Add(item);
            if (size == ++count)
            {
                yield return batch;
                batch = new List();
                count = 0;
            }

        }
        if (batch.Count > 0) yield return batch;
    }
}


this solution takes for a List having 10000 items with size:
3: 0.506 ms
13: 0.505 ms
113: 0.505 ms


whereas an array based solution like this (taken from here)

public static IEnumerable> Chunkify(this IEnumerable source, int size)
{

    using (var iter = source.GetEnumerator())
    {
        while (iter.MoveNext())
        {
            var chunk = new T[size];
            chunk[0] = iter.Current;
            for (int i = 1; i < size && iter.MoveNext(); i++)
            {
                chunk[i] = iter.Current;
            }
            yield return chunk;
        }
    }
}


takes
3: 0.270 ms
13: 0.270 ms
113: 0.270 ms

Edit

That Chunkify() method unfortunately has a bug, which is for a passed in IEnumerable with a size which isn't dividable by the passed in chunk size will produce to many items.

E.g passed in a int[] with values 1,2,3,4 and an size argument of 3 will produce 1,2,3,4,0,0.

Fixed version

public static IEnumerable> Chunkify(this IEnumerable source, int size)
{
    int count = 0;
    using (var iter = source.GetEnumerator())
    {
        while (iter.MoveNext())
        {
            var chunk = new T[size];
            count = 1;
            chunk[0] = iter.Current;
            for (int i = 1; i < size && iter.MoveNext(); i++)
            {
                chunk[i] = iter.Current;
                count++;
            }
            if (count < size)
            {
                Array.Resize(ref chunk, count);
            }
            yield return chunk;
        }
    }
}

Code Snippets

public static IEnumerable<IEnumerable<T>> IntoBatches<T>(this IEnumerable<T> list, int size)
{
    if (size < 1)
    {
        yield return list;
    }
    else
    {
        var count = 0;
        foreach (var item in list)
        {
            var batch = new List<T>();                
            batch.Add(item);
            if (size == ++count)
            {
                yield return batch;
                batch = new List<T>();
                count = 0;
            }

        }
        if (batch.Count > 0) yield return batch;
    }
}
public static IEnumerable<IEnumerable<T>> Chunkify<T>(this IEnumerable<T> source, int size)
{

    using (var iter = source.GetEnumerator())
    {
        while (iter.MoveNext())
        {
            var chunk = new T[size];
            chunk[0] = iter.Current;
            for (int i = 1; i < size && iter.MoveNext(); i++)
            {
                chunk[i] = iter.Current;
            }
            yield return chunk;
        }
    }
}
public static IEnumerable<IEnumerable<T>> Chunkify<T>(this IEnumerable<T> source, int size)
{
    int count = 0;
    using (var iter = source.GetEnumerator())
    {
        while (iter.MoveNext())
        {
            var chunk = new T[size];
            count = 1;
            chunk[0] = iter.Current;
            for (int i = 1; i < size && iter.MoveNext(); i++)
            {
                chunk[i] = iter.Current;
                count++;
            }
            if (count < size)
            {
                Array.Resize(ref chunk, count);
            }
            yield return chunk;
        }
    }
}

Context

StackExchange Code Review Q#122471, answer score: 7

Revisions (0)

No revisions yet.