patterncsharpMinor
Concatenating two IEnumerables with a limit
Viewed 0 times
withlimitconcatenatingtwoienumerables
Problem
I've been playing around with generators, generics and extension methods in C# (5.0) and wanted to create an extension method for
The way this differs from just using
Here's what I came up with:
An example of a use case for this is when you need a limited amount of records from two different databases. You can write something like:
My concerns:
IEnumerable, which would append another IEnumerable to it and allow a limit to be put on the whole thing.The way this differs from just using
Concat() and then Take() is that instead of passing an IEnumerable, you pass a function that takes an int (which represents the amount of elements that are needed at maximum from the second IEnumerable) and returns an IEnumerable.Here's what I came up with:
using System;
using System.Collections.Generic;
using System.Linq;
public static class EnumerableExt
{
public static IEnumerable ConcatLimiting(this IEnumerable first, Func> getSecondWithLimit, int limit = int.MaxValue)
{
int elems = 0;
foreach (T elem in first.Take(limit))
{
yield return elem;
elems++;
}
foreach (T elem in getSecondWithLimit(limit - elems).Take(limit - elems))
{
yield return elem;
}
yield break;
}
}An example of a use case for this is when you need a limited amount of records from two different databases. You can write something like:
return GetEnumerableFromDatabase(someDatabase, 20)
.ConcatLimiting(limit => GetEnumerableFromDatabase(someOtherDatabase, limit))
.ToList();My concerns:
- Are the two
IEnumerablesgoing to be handled correctly, assuming that the resulting one is properly disposed of?
- I have tried out quite a few possible scenarios and it seems to be handled correctly, but I'm not entirely confident.
- Is the name
ConcatLimiting()not very conform with what you usually see in C#?
- Is this extension method completely useless?
- Does an easy way to do this already exist?
- Am I more or less following best practices?
- Are there any other glaring issues?
Solution
Before answering your questions let me note that your example code is wrong, second parameter should be
Let me assume, in this example, that you're using EF (and
EF may generate something like this:
Now from QP you may verify what effectively will be generated. Almost-optimal expected result (what you're trying to do with your custom method) is something like this:
As you can imagine if (2) is ORM generated code (check with your) or ORM generates (1) but you verify QP and you see (2) then simplest proposed code is preferable. ORM will generate best SQL code (for its knowledge) and you will be able to defer query to the last moment (using expressions you can't call custom functions in this way) effectively losing some optimizations it may apply. Note that generated code for, let's say, SQL Server and MySQL may be pretty different (recall rules for nested
My favorite is still this code (unless you have a measured performance issue):
Let's say that you do need your extension method and you can't use proposed code. What should be changed?
In code:
Third: I don't like guard values like
Not to mention that useless
For completeness some answers to your questions.
Are the two IEnumerables going to be handled correctly...
Yes, compiler generated code is correct and it will dispose everything as expected. Note, however, that initialization code won't be executed until you effectively enumerate returned enumeration. This caused some funny bugs in code like
Is the name ConcatLimiting() not very conform with what you usually see in C#?
It has to be clear, waiting for elevator we may discuss if
Is this Extension Method completely useless?
Am I more or less following best practices?
Are there any other glaring issues?
Maybe yes, maybe not. See answer...
Func> directly IEnumerable. If your concern is to do not retrieve unneeded elements from DB you probably need to let this task to Entity Framework (or your favorite ORM), it understands Take() and generate proper TOP clause.Let me assume, in this example, that you're using EF (and
Users and Employees table have same type...):return dbContext.Users.Concat(dbContext.Employees).Take(20);EF may generate something like this:
SELECT TOP 20 * FROM
(
SELECT * FROM Users
UNION ALL
SELECT * FROM Employees
)Now from QP you may verify what effectively will be generated. Almost-optimal expected result (what you're trying to do with your custom method) is something like this:
SELECT TOP 20 * FROM
(
SELECT TOP 20 * FROM Users
UNION ALL
SELECT TOP 20 * FROM Employees
)As you can imagine if (2) is ORM generated code (check with your) or ORM generates (1) but you verify QP and you see (2) then simplest proposed code is preferable. ORM will generate best SQL code (for its knowledge) and you will be able to defer query to the last moment (using expressions you can't call custom functions in this way) effectively losing some optimizations it may apply. Note that generated code for, let's say, SQL Server and MySQL may be pretty different (recall rules for nested
LIMIT in MySQL.)My favorite is still this code (unless you have a measured performance issue):
return dbContext.Users.Concat(dbContext.Employees).Take(20);Let's say that you do need your extension method and you can't use proposed code. What should be changed?
- First of all if called function respects its contract (to return at most
limititems) then you do not need to reinforce this constraint.
- You're not validating your parameters (and if you did you will have trouble because generated
IEnumerabledoes it when enumeration effectively happens, that's why so many*Impl()method proliferated.)
In code:
public static IEnumerable ConcatLimiting(this IEnumerable first,
Func> getSecondWithLimit,
int limit = int.MaxValue)
{
if (first == null)
throw new ArgumentNullException(nameof(first));
if (second == null)
throw new ArgumentNullException(nameof(second));
if (limit <= 0)
throw new ArgumentOutOfRangeException(nameof(limit));
var itemsFromFirst = first.Take(limit);
return itemsFromFirst.Concat(getSecondWithLimit(itemsFromFirst.Count() - elems);
}Third: I don't like guard values like
Int32.MaxValue even when they're literally used. In this case you do not really want to get only 2,147,483,647 items but you're using a very big limit to...don't have limits.Not to mention that useless
TOP clause may slow down your query. See, for example but it's not the only one, Why SELECT TOP clause could lead to long time cost. What I'd do is then to replace int limit = Int32.MaxValue with a nullable argument and apply Take() only when required.public static IEnumerable ConcatLimiting(this IEnumerable first,
Func> getSecondWithLimit,
int? limit = null)
{
// Arguments validation...
if (limit == null)
return first.Concat(getSecondWithLimit(null));
var itemsFromFirst = first.Take((int)limit);
return itemsFromFirst.Concat(getSecondWithLimit((int)limit - itemsFromFirst.Count());
}For completeness some answers to your questions.
Are the two IEnumerables going to be handled correctly...
Yes, compiler generated code is correct and it will dispose everything as expected. Note, however, that initialization code won't be executed until you effectively enumerate returned enumeration. This caused some funny bugs in code like
File.ReadLines("path_to_file").Take(0).Is the name ConcatLimiting() not very conform with what you usually see in C#?
It has to be clear, waiting for elevator we may discuss if
ConcatWithLimit() is better or not...Is this Extension Method completely useless?
Am I more or less following best practices?
Are there any other glaring issues?
Maybe yes, maybe not. See answer...
Code Snippets
return dbContext.Users.Concat(dbContext.Employees).Take(20);SELECT TOP 20 * FROM
(
SELECT * FROM Users
UNION ALL
SELECT * FROM Employees
)SELECT TOP 20 * FROM
(
SELECT TOP 20 * FROM Users
UNION ALL
SELECT TOP 20 * FROM Employees
)return dbContext.Users.Concat(dbContext.Employees).Take(20);public static IEnumerable<T> ConcatLimiting<T>(this IEnumerable<T> first,
Func<int, IEnumerable<T>> getSecondWithLimit,
int limit = int.MaxValue)
{
if (first == null)
throw new ArgumentNullException(nameof(first));
if (second == null)
throw new ArgumentNullException(nameof(second));
if (limit <= 0)
throw new ArgumentOutOfRangeException(nameof(limit));
var itemsFromFirst = first.Take(limit);
return itemsFromFirst.Concat(getSecondWithLimit(itemsFromFirst.Count() - elems);
}Context
StackExchange Code Review Q#142229, answer score: 5
Revisions (0)
No revisions yet.