HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

Gaps and islands: client solution vs T-SQL query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqlqueryclientandgapssolutionislands

Problem

Can a T-SQL solution for gaps and islands run faster than a C# solution running on the client?

To be specific, let us provide some test data:

CREATE TABLE dbo.Numbers
  (
    n INT NOT NULL
          PRIMARY KEY
  ) ; 
GO 

INSERT  INTO dbo.Numbers
        ( n )
VALUES  ( 1 ) ; 
GO 
DECLARE @i INT ; 
SET @i = 0 ; 
WHILE @i  500005
                )
GO


This first set of test data has exactly one gap:

SELECT  StartedAt ,
        FinishedAt
FROM    dbo.Tasks
WHERE   StartedAt BETWEEN DATEADD(MINUTE, 499999, '20100101')
                  AND     DATEADD(MINUTE, 500006, '20100101')


The second set of test data has 2M -1 gaps, a gap between each two adjacent intervals:

TRUNCATE TABLE dbo.Tasks;
GO

INSERT  INTO dbo.Tasks
        ( StartedAt ,
          FinishedAt
        )
        SELECT  DATEADD(MINUTE, 3*n, '20100101') AS StartedAt ,
                DATEADD(MINUTE, 3*n + 2, '20100101') AS FinishedAt
        FROM    dbo.Numbers
        WHERE   ( n  500005
                )
GO


Currently I am running 2008 R2, but 2012 solutions are very welcome.
I have posted my C# solution as an answer.

Solution

The following C# code solves the problem:

var connString =
        "Initial Catalog=MyDb;Data Source=MyServer;Integrated Security=SSPI;Application Name=Benchmarks;";

    var stopWatch = new Stopwatch();
    stopWatch.Start();

    using (var conn = new SqlConnection(connString))
    {
        conn.Open();
        var command = conn.CreateCommand();
        command.CommandText = "dbo.GetAllTaskEvents";
        command.CommandType = CommandType.StoredProcedure;
        var gaps = new List();
        using (var dr = command.ExecuteReader())
        {
            var currentEvents = 0;
            var gapStart = new DateTime();
            var gapStarted = false;
            while (dr.Read())
            {
                var change = dr.GetInt32(1);
                if (change == -1 && currentEvents == 1)
                {
                    gapStart = dr.GetDateTime(0);
                    gapStarted = true;
                }
                else if (change == 1 && currentEvents == 0 && gapStarted)
                {
                    gaps.Add(string.Format("({0},{1})", gapStart, dr.GetDateTime(0)));
                    gapStarted = false;
                }
                currentEvents += change;
            }
        }
        File.WriteAllLines(@"C:\Temp\Gaps.txt", gaps);
    }

    stopWatch.Stop();
    System.Console.WriteLine("Elapsed: " + stopWatch.Elapsed);


This code invokes this stored procedure:

CREATE PROCEDURE dbo.GetAllTaskEvents
AS 
  BEGIN ;
    SELECT  EventTime ,
            Change
    FROM    ( SELECT  StartedAt AS EventTime ,
                      1 AS Change
              FROM    dbo.Tasks
              UNION ALL
              SELECT  FinishedAt AS EventTime ,
                      -1 AS Change
              FROM    dbo.Tasks
            ) AS TaskEvents
    ORDER BY EventTime, Change DESC ;
  END ;
GO


It finds and prints one gap in 2M intervals in the following times, warm cache:

1 gap: Elapsed: 00:00:01.4852029 00:00:01.4444307 00:00:01.4644152


It finds and prints 2M-1 gaps in 2M intervals in the following times, warm cache:

2M-1 gaps Elapsed: 00:00:08.8576637 00:00:08.9123053 00:00:09.0372344 00:00:08.8545477


This is a very simple solution - it took me 10 minutes to develop. A recent college graduate can come up with it. On the database side, execution plan is a trivial merge join which uses very little CPU and memory.

Edit: to be realistic, I am running client and server on separate boxes.

Code Snippets

var connString =
        "Initial Catalog=MyDb;Data Source=MyServer;Integrated Security=SSPI;Application Name=Benchmarks;";

    var stopWatch = new Stopwatch();
    stopWatch.Start();

    using (var conn = new SqlConnection(connString))
    {
        conn.Open();
        var command = conn.CreateCommand();
        command.CommandText = "dbo.GetAllTaskEvents";
        command.CommandType = CommandType.StoredProcedure;
        var gaps = new List<string>();
        using (var dr = command.ExecuteReader())
        {
            var currentEvents = 0;
            var gapStart = new DateTime();
            var gapStarted = false;
            while (dr.Read())
            {
                var change = dr.GetInt32(1);
                if (change == -1 && currentEvents == 1)
                {
                    gapStart = dr.GetDateTime(0);
                    gapStarted = true;
                }
                else if (change == 1 && currentEvents == 0 && gapStarted)
                {
                    gaps.Add(string.Format("({0},{1})", gapStart, dr.GetDateTime(0)));
                    gapStarted = false;
                }
                currentEvents += change;
            }
        }
        File.WriteAllLines(@"C:\Temp\Gaps.txt", gaps);
    }

    stopWatch.Stop();
    System.Console.WriteLine("Elapsed: " + stopWatch.Elapsed);
CREATE PROCEDURE dbo.GetAllTaskEvents
AS 
  BEGIN ;
    SELECT  EventTime ,
            Change
    FROM    ( SELECT  StartedAt AS EventTime ,
                      1 AS Change
              FROM    dbo.Tasks
              UNION ALL
              SELECT  FinishedAt AS EventTime ,
                      -1 AS Change
              FROM    dbo.Tasks
            ) AS TaskEvents
    ORDER BY EventTime, Change DESC ;
  END ;
GO
1 gap: Elapsed: 00:00:01.4852029 00:00:01.4444307 00:00:01.4644152
2M-1 gaps Elapsed: 00:00:08.8576637 00:00:08.9123053 00:00:09.0372344 00:00:08.8545477

Context

StackExchange Database Administrators Q#39272, answer score: 4

Revisions (0)

No revisions yet.