HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

Efficiently checking if a large enumeration has a consistent order

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
enumerationefficientlyorderconsistentcheckinghaslarge

Problem

The scenario is as follows: We have a streamed enumeration that we have to check if it's ordered or not before processing it (about 75-80% of the time it is ordered). We are talking about pretty big enumerations, in the range of 10-100 million elements.

Because the first step processing the data is ordering it and we need to repeat this process quite often, we obviously do not want to incur in the price of ordering the enumeration if it isn't necessary; our tests show that calling OrderBy() on an ordered enumeration takes, on average, about 1/3 to 1/4 of the time it takes ordering a random enumeration, so it's still a significant price we want to avoid.

So my job is to implement an efficient algorithm that checks if any given enumeration is ordered or not.

My initial goals are:

  • Make it fast, at least about 20 times faster than directly calling OrderBy() in the worst case scenario; the input stream is already ordered.



  • Make it as generic as possible.



The code I've come up with is:

public static bool IsOrdered(this IEnumerable list, Func comparer)
{
    using (var enumerator = list.GetEnumerator())
    {
        if (!enumerator.MoveNext())
            return false;

        var left = enumerator.Current;
        int previousUnequalComparison = 0;

        while (enumerator.MoveNext())
        {
            var right = enumerator.Current;
            var currentComparison = comparer(left, right);

            if (currentComparison != 0)
            {
                if (previousUnequalComparison != 0 &&
                    Math.Sign(currentComparison) != Math.Sign(previousUnequalComparison))
                    return false;

                previousUnequalComparison = currentComparison;
            }

            left = right;
        }
    }

    return true;
}

public static bool IsOrdered(this IEnumerable list) where T : IComparable
{
    return IsOrdered(list, (t, s) => t.CompareTo(s));
}


And I have the following performance test:

```
publi

Solution

IEnumerable extension method

Your method is very fast.

Nevertheless it can be made a little bit faster:

  • You are calling Math.Sign(previousUnequalComparison) on each iteration.



It makes sense to store a result of Math.Sign in the previousUnequalComparison variable.

  • The assignment left = right; can be moved into the if (currentComparison != 0) block.



The improved method:

public static bool IsOrdered(this IEnumerable list, Func comparer)
{
    using (var enumerator = list.GetEnumerator())
    {
        if (!enumerator.MoveNext())
            return false;

        var left = enumerator.Current;
        int previousUnequalComparison = 0;

        while (enumerator.MoveNext())
        {
            var right = enumerator.Current;
            var currentComparison = comparer(left, right);

            if (currentComparison != 0)
            {
                currentComparison = Math.Sign(currentComparison);
                if (previousUnequalComparison != 0
                    && currentComparison != previousUnequalComparison)
                {
                    return false;
                }
                previousUnequalComparison = currentComparison;

                left = right;
            }
        }
    }

    return true;
}


I don't think it's possible to get better results for IEnumerable.

IList parallel extension method

If it is convenient to pass a collection as IList instead of IEnumerable then you could use parallelism.

Sample code:

public static bool IsOrderedParallel(this IList list, Func comparer)
{
    // Degree of parallelism:
    int nProc = Environment.ProcessorCount;

    // Ranges for each 'thread':
    int[] mins = new int[nProc];  // Minimums (including)
    int[] maxs = new int[nProc];  // Maximums (including)
    // Arrays of results (per-thread):
    bool[] isSorted = new bool[nProc];  // true - if sorted
    int[] signs = new int[nProc];       // Order of range (-1, 0, +1)

    int prev = 0;
    for (int i = 0; i 
        {
            var left = list[mins[index]];
            int previousUnequalComparison = 0;
            for (int i = mins[index] + 1; i true if all ranges are sorted and all of them have the same order:
    return isSorted.All(r => r) && (signs.All(s => s >= 0) || signs.All(s => s <= 0));
}


On my 6-core machine the last method is ~5 times faster.

Code Snippets

public static bool IsOrdered<T>(this IEnumerable<T> list, Func<T, T, int> comparer)
{
    using (var enumerator = list.GetEnumerator())
    {
        if (!enumerator.MoveNext())
            return false;

        var left = enumerator.Current;
        int previousUnequalComparison = 0;

        while (enumerator.MoveNext())
        {
            var right = enumerator.Current;
            var currentComparison = comparer(left, right);

            if (currentComparison != 0)
            {
                currentComparison = Math.Sign(currentComparison);
                if (previousUnequalComparison != 0
                    && currentComparison != previousUnequalComparison)
                {
                    return false;
                }
                previousUnequalComparison = currentComparison;

                left = right;
            }
        }
    }

    return true;
}
public static bool IsOrderedParallel<T>(this IList<T> list, Func<T, T, int> comparer)
{
    // Degree of parallelism:
    int nProc = Environment.ProcessorCount;

    // Ranges for each 'thread':
    int[] mins = new int[nProc];  // Minimums (including)
    int[] maxs = new int[nProc];  // Maximums (including)
    // Arrays of results (per-thread):
    bool[] isSorted = new bool[nProc];  // true - if sorted
    int[] signs = new int[nProc];       // Order of range (-1, 0, +1)

    int prev = 0;
    for (int i = 0; i < nProc; i++)
    {
        mins[i] = prev;  // Ranges should overlap
        prev = maxs[i] = (i + 1) * list.Count / nProc;
    }
    maxs[nProc - 1] = list.Count - 1;

    Parallel.For(0, nProc, (index, loopState) =>
        {
            var left = list[mins[index]];
            int previousUnequalComparison = 0;
            for (int i = mins[index] + 1; i <= maxs[index] && !loopState.IsStopped; i++)
            {
                var right = list[i];
                var currentComparison = comparer(left, right);

                if (currentComparison != 0)
                {
                    currentComparison = Math.Sign(currentComparison);
                    if (previousUnequalComparison != 0 &&
                        currentComparison != previousUnequalComparison)
                    {
                        isSorted[index] = false;
                        loopState.Stop();
                        break;
                    }

                    previousUnequalComparison = currentComparison;

                    left = right;
                }
            }
            isSorted[index] = true;
            signs[index] = previousUnequalComparison;
        });

    // Return <c>true<c> if all ranges are sorted and all of them have the same order:
    return isSorted.All(r => r) && (signs.All(s => s >= 0) || signs.All(s => s <= 0));
}

Context

StackExchange Code Review Q#105676, answer score: 4

Revisions (0)

No revisions yet.