HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Concatenating two IEnumerables with a limit

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
withlimitconcatenatingtwoienumerables

Problem

I've been playing around with generators, generics and extension methods in C# (5.0) and wanted to create an extension method for IEnumerable, which would append another IEnumerable to it and allow a limit to be put on the whole thing.

The way this differs from just using Concat() and then Take() is that instead of passing an IEnumerable, you pass a function that takes an int (which represents the amount of elements that are needed at maximum from the second IEnumerable) and returns an IEnumerable.

Here's what I came up with:

using System;
using System.Collections.Generic;
using System.Linq;

public static class EnumerableExt
{
    public static IEnumerable ConcatLimiting(this IEnumerable first, Func> getSecondWithLimit, int limit = int.MaxValue)
    {
        int elems = 0;
        foreach (T elem in first.Take(limit))
        {
            yield return elem;
            elems++;
        }

        foreach (T elem in getSecondWithLimit(limit - elems).Take(limit - elems))
        {
            yield return elem;
        }

        yield break;
    }
}


An example of a use case for this is when you need a limited amount of records from two different databases. You can write something like:

return GetEnumerableFromDatabase(someDatabase, 20)
    .ConcatLimiting(limit => GetEnumerableFromDatabase(someOtherDatabase, limit))
    .ToList();


My concerns:

  • Are the two IEnumerables going to be handled correctly, assuming that the resulting one is properly disposed of?



  • I have tried out quite a few possible scenarios and it seems to be handled correctly, but I'm not entirely confident.



  • Is the name ConcatLimiting() not very conform with what you usually see in C#?



  • Is this extension method completely useless?



  • Does an easy way to do this already exist?



  • Am I more or less following best practices?



  • Are there any other glaring issues?

Solution

Before answering your questions let me note that your example code is wrong, second parameter should be Func> directly IEnumerable. If your concern is to do not retrieve unneeded elements from DB you probably need to let this task to Entity Framework (or your favorite ORM), it understands Take() and generate proper TOP clause.

Let me assume, in this example, that you're using EF (and Users and Employees table have same type...):

return dbContext.Users.Concat(dbContext.Employees).Take(20);


EF may generate something like this:

SELECT TOP 20 * FROM
(
    SELECT * FROM Users
    UNION ALL
    SELECT * FROM Employees
)


Now from QP you may verify what effectively will be generated. Almost-optimal expected result (what you're trying to do with your custom method) is something like this:

SELECT TOP 20 * FROM
(
    SELECT TOP 20 * FROM Users
    UNION ALL
    SELECT TOP 20 * FROM Employees
)


As you can imagine if (2) is ORM generated code (check with your) or ORM generates (1) but you verify QP and you see (2) then simplest proposed code is preferable. ORM will generate best SQL code (for its knowledge) and you will be able to defer query to the last moment (using expressions you can't call custom functions in this way) effectively losing some optimizations it may apply. Note that generated code for, let's say, SQL Server and MySQL may be pretty different (recall rules for nested LIMIT in MySQL.)

My favorite is still this code (unless you have a measured performance issue):

return dbContext.Users.Concat(dbContext.Employees).Take(20);


Let's say that you do need your extension method and you can't use proposed code. What should be changed?

  • First of all if called function respects its contract (to return at most limit items) then you do not need to reinforce this constraint.



  • You're not validating your parameters (and if you did you will have trouble because generated IEnumerable does it when enumeration effectively happens, that's why so many *Impl() method proliferated.)



In code:

public static IEnumerable ConcatLimiting(this IEnumerable first,
    Func> getSecondWithLimit,
    int limit = int.MaxValue)
{
    if (first == null)
        throw new ArgumentNullException(nameof(first));

    if (second == null)
        throw new ArgumentNullException(nameof(second));

    if (limit <= 0)
        throw new ArgumentOutOfRangeException(nameof(limit));

    var itemsFromFirst = first.Take(limit);
    return itemsFromFirst.Concat(getSecondWithLimit(itemsFromFirst.Count() - elems);
}


Third: I don't like guard values like Int32.MaxValue even when they're literally used. In this case you do not really want to get only 2,147,483,647 items but you're using a very big limit to...don't have limits.

Not to mention that useless TOP clause may slow down your query. See, for example but it's not the only one, Why SELECT TOP clause could lead to long time cost. What I'd do is then to replace int limit = Int32.MaxValue with a nullable argument and apply Take() only when required.

public static IEnumerable ConcatLimiting(this IEnumerable first,
    Func> getSecondWithLimit,
    int? limit = null)
{
    // Arguments validation...

    if (limit == null)
        return first.Concat(getSecondWithLimit(null));

    var itemsFromFirst = first.Take((int)limit);
    return itemsFromFirst.Concat(getSecondWithLimit((int)limit - itemsFromFirst.Count());
}


For completeness some answers to your questions.


Are the two IEnumerables going to be handled correctly...

Yes, compiler generated code is correct and it will dispose everything as expected. Note, however, that initialization code won't be executed until you effectively enumerate returned enumeration. This caused some funny bugs in code like File.ReadLines("path_to_file").Take(0).


Is the name ConcatLimiting() not very conform with what you usually see in C#?

It has to be clear, waiting for elevator we may discuss if ConcatWithLimit() is better or not...


Is this Extension Method completely useless?
Am I more or less following best practices?
Are there any other glaring issues?

Maybe yes, maybe not. See answer...

Code Snippets

return dbContext.Users.Concat(dbContext.Employees).Take(20);
SELECT TOP 20 * FROM
(
    SELECT * FROM Users
    UNION ALL
    SELECT * FROM Employees
)
SELECT TOP 20 * FROM
(
    SELECT TOP 20 * FROM Users
    UNION ALL
    SELECT TOP 20 * FROM Employees
)
return dbContext.Users.Concat(dbContext.Employees).Take(20);
public static IEnumerable<T> ConcatLimiting<T>(this IEnumerable<T> first,
    Func<int, IEnumerable<T>> getSecondWithLimit,
    int limit = int.MaxValue)
{
    if (first == null)
        throw new ArgumentNullException(nameof(first));

    if (second == null)
        throw new ArgumentNullException(nameof(second));

    if (limit <= 0)
        throw new ArgumentOutOfRangeException(nameof(limit));

    var itemsFromFirst = first.Take(limit);
    return itemsFromFirst.Concat(getSecondWithLimit(itemsFromFirst.Count() - elems);
}

Context

StackExchange Code Review Q#142229, answer score: 5

Revisions (0)

No revisions yet.