HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpMinor

CodeKata: Find all the runs in the set

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
theallcodekatafindrunsset

Problem

Given a finite set of unique numbers, find all the runs in the set. Runs are 1 or more consecutive numbers.

That is, given {1,59,12,43,4,58,5,13,46,3,6}, the output should be: {1}, {3,4,5,6}, {12,13}, {43}, {46},{58,59}.

Note that the size of the set may be very large and comes form external source at any time (is not fully in memory). Also you may be able to query in any moment while other items are arriving.

So the chalenge is:

  • Can not full set sorting items because comes from external source in any time and need to keep partiar result for quering it in any moment.



  • Avoid binary search, sort again or shift the set on inserts every time a new item is added to speed up the add operation because when set is hurge will kill performance.



This implementation seems to work pretty fast:

```
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{

Stopwatch stopwatch = new Stopwatch();

Random rand = new Random();
//int[] initValues = Enumerable.Range(1, 20000000).OrderBy(s => Guid.NewGuid()).Where(x => (rand.Next() % 2) == 0).ToArray();
int[] initValues = { 1, 59, 12, 43, 4, 58, 5, 13, 46, 3, 6, 2, 8, 9, 54, 37, 22, 105, 44, 7 };

Console.WriteLine("Number of input items: " + initValues.Length);

RunManager manager = new RunManager();

stopwatch.Start(); //measure adding and query
foreach (int item in initValues)
{
manager.addToIndex(item);
foreach (var run in manager.runListFromIndex())
{
// Console.WriteLine(run.ToString());
}
}

stopwatch.Stop();;

Console.WriteLine("Millisecons: " + stopwatch.ElapsedMilliseconds);
Console.ReadLine();

}
}

public class RunManager
{
Dictionary upperBoundIndex

Solution

I didn't think about something advanced but even the following straight solution works much faster then your

var orderedValues = initValues.AsParallel().OrderBy(e => e).ToList();
var runs = new List();
var lastValue = orderedValues.First();
Run lastRun = new Run(lastValue);
foreach (var currentValue in orderedValues)
{
    if (lastValue + 1 == currentValue)
    {
        lastRun.upperBound++;
    }
    else
    {
        runs.Add(lastRun);
        lastRun = new Run(currentValue);
    }

    lastValue = currentValue;
}


Before - 23776 miliseconds

After - 4669 miliseconds

I tested with this

int[] initValues = Enumerable.Range(1, 20000000).OrderBy(s => Guid.NewGuid()).Where(x => (rand.Next() % 2) == 0).ToArray();

Code Snippets

var orderedValues = initValues.AsParallel().OrderBy(e => e).ToList();
var runs = new List<Run>();
var lastValue = orderedValues.First();
Run lastRun = new Run(lastValue);
foreach (var currentValue in orderedValues)
{
    if (lastValue + 1 == currentValue)
    {
        lastRun.upperBound++;
    }
    else
    {
        runs.Add(lastRun);
        lastRun = new Run(currentValue);
    }

    lastValue = currentValue;
}

Context

StackExchange Code Review Q#136182, answer score: 6

Revisions (0)

No revisions yet.