patterncsharpMinor
Efficiently checking if a large enumeration has a consistent order
Viewed 0 times
enumerationefficientlyorderconsistentcheckinghaslarge
Problem
The scenario is as follows: We have a streamed enumeration that we have to check if it's ordered or not before processing it (about 75-80% of the time it is ordered). We are talking about pretty big enumerations, in the range of 10-100 million elements.
Because the first step processing the data is ordering it and we need to repeat this process quite often, we obviously do not want to incur in the price of ordering the enumeration if it isn't necessary; our tests show that calling
So my job is to implement an efficient algorithm that checks if any given enumeration is ordered or not.
My initial goals are:
The code I've come up with is:
And I have the following performance test:
```
publi
Because the first step processing the data is ordering it and we need to repeat this process quite often, we obviously do not want to incur in the price of ordering the enumeration if it isn't necessary; our tests show that calling
OrderBy() on an ordered enumeration takes, on average, about 1/3 to 1/4 of the time it takes ordering a random enumeration, so it's still a significant price we want to avoid.So my job is to implement an efficient algorithm that checks if any given enumeration is ordered or not.
My initial goals are:
- Make it fast, at least about 20 times faster than directly calling
OrderBy()in the worst case scenario; the input stream is already ordered.
- Make it as generic as possible.
The code I've come up with is:
public static bool IsOrdered(this IEnumerable list, Func comparer)
{
using (var enumerator = list.GetEnumerator())
{
if (!enumerator.MoveNext())
return false;
var left = enumerator.Current;
int previousUnequalComparison = 0;
while (enumerator.MoveNext())
{
var right = enumerator.Current;
var currentComparison = comparer(left, right);
if (currentComparison != 0)
{
if (previousUnequalComparison != 0 &&
Math.Sign(currentComparison) != Math.Sign(previousUnequalComparison))
return false;
previousUnequalComparison = currentComparison;
}
left = right;
}
}
return true;
}
public static bool IsOrdered(this IEnumerable list) where T : IComparable
{
return IsOrdered(list, (t, s) => t.CompareTo(s));
}And I have the following performance test:
```
publi
Solution
IEnumerable extension method
Your method is very fast.
Nevertheless it can be made a little bit faster:
It makes sense to store a result of
The improved method:
I don't think it's possible to get better results for
IList parallel extension method
If it is convenient to pass a collection as
Sample code:
On my 6-core machine the last method is ~5 times faster.
Your method is very fast.
Nevertheless it can be made a little bit faster:
- You are calling
Math.Sign(previousUnequalComparison)on each iteration.
It makes sense to store a result of
Math.Sign in the previousUnequalComparison variable.- The assignment
left = right;can be moved into theif (currentComparison != 0)block.
The improved method:
public static bool IsOrdered(this IEnumerable list, Func comparer)
{
using (var enumerator = list.GetEnumerator())
{
if (!enumerator.MoveNext())
return false;
var left = enumerator.Current;
int previousUnequalComparison = 0;
while (enumerator.MoveNext())
{
var right = enumerator.Current;
var currentComparison = comparer(left, right);
if (currentComparison != 0)
{
currentComparison = Math.Sign(currentComparison);
if (previousUnequalComparison != 0
&& currentComparison != previousUnequalComparison)
{
return false;
}
previousUnequalComparison = currentComparison;
left = right;
}
}
}
return true;
}I don't think it's possible to get better results for
IEnumerable.IList parallel extension method
If it is convenient to pass a collection as
IList instead of IEnumerable then you could use parallelism.Sample code:
public static bool IsOrderedParallel(this IList list, Func comparer)
{
// Degree of parallelism:
int nProc = Environment.ProcessorCount;
// Ranges for each 'thread':
int[] mins = new int[nProc]; // Minimums (including)
int[] maxs = new int[nProc]; // Maximums (including)
// Arrays of results (per-thread):
bool[] isSorted = new bool[nProc]; // true - if sorted
int[] signs = new int[nProc]; // Order of range (-1, 0, +1)
int prev = 0;
for (int i = 0; i
{
var left = list[mins[index]];
int previousUnequalComparison = 0;
for (int i = mins[index] + 1; i true if all ranges are sorted and all of them have the same order:
return isSorted.All(r => r) && (signs.All(s => s >= 0) || signs.All(s => s <= 0));
}On my 6-core machine the last method is ~5 times faster.
Code Snippets
public static bool IsOrdered<T>(this IEnumerable<T> list, Func<T, T, int> comparer)
{
using (var enumerator = list.GetEnumerator())
{
if (!enumerator.MoveNext())
return false;
var left = enumerator.Current;
int previousUnequalComparison = 0;
while (enumerator.MoveNext())
{
var right = enumerator.Current;
var currentComparison = comparer(left, right);
if (currentComparison != 0)
{
currentComparison = Math.Sign(currentComparison);
if (previousUnequalComparison != 0
&& currentComparison != previousUnequalComparison)
{
return false;
}
previousUnequalComparison = currentComparison;
left = right;
}
}
}
return true;
}public static bool IsOrderedParallel<T>(this IList<T> list, Func<T, T, int> comparer)
{
// Degree of parallelism:
int nProc = Environment.ProcessorCount;
// Ranges for each 'thread':
int[] mins = new int[nProc]; // Minimums (including)
int[] maxs = new int[nProc]; // Maximums (including)
// Arrays of results (per-thread):
bool[] isSorted = new bool[nProc]; // true - if sorted
int[] signs = new int[nProc]; // Order of range (-1, 0, +1)
int prev = 0;
for (int i = 0; i < nProc; i++)
{
mins[i] = prev; // Ranges should overlap
prev = maxs[i] = (i + 1) * list.Count / nProc;
}
maxs[nProc - 1] = list.Count - 1;
Parallel.For(0, nProc, (index, loopState) =>
{
var left = list[mins[index]];
int previousUnequalComparison = 0;
for (int i = mins[index] + 1; i <= maxs[index] && !loopState.IsStopped; i++)
{
var right = list[i];
var currentComparison = comparer(left, right);
if (currentComparison != 0)
{
currentComparison = Math.Sign(currentComparison);
if (previousUnequalComparison != 0 &&
currentComparison != previousUnequalComparison)
{
isSorted[index] = false;
loopState.Stop();
break;
}
previousUnequalComparison = currentComparison;
left = right;
}
}
isSorted[index] = true;
signs[index] = previousUnequalComparison;
});
// Return <c>true<c> if all ranges are sorted and all of them have the same order:
return isSorted.All(r => r) && (signs.All(s => s >= 0) || signs.All(s => s <= 0));
}Context
StackExchange Code Review Q#105676, answer score: 4
Revisions (0)
No revisions yet.