principlesqlMinor
Gaps and islands: client solution vs T-SQL query
Viewed 0 times
sqlqueryclientandgapssolutionislands
Problem
Can a T-SQL solution for gaps and islands run faster than a C# solution running on the client?
To be specific, let us provide some test data:
This first set of test data has exactly one gap:
The second set of test data has 2M -1 gaps, a gap between each two adjacent intervals:
Currently I am running 2008 R2, but 2012 solutions are very welcome.
I have posted my C# solution as an answer.
To be specific, let us provide some test data:
CREATE TABLE dbo.Numbers
(
n INT NOT NULL
PRIMARY KEY
) ;
GO
INSERT INTO dbo.Numbers
( n )
VALUES ( 1 ) ;
GO
DECLARE @i INT ;
SET @i = 0 ;
WHILE @i 500005
)
GOThis first set of test data has exactly one gap:
SELECT StartedAt ,
FinishedAt
FROM dbo.Tasks
WHERE StartedAt BETWEEN DATEADD(MINUTE, 499999, '20100101')
AND DATEADD(MINUTE, 500006, '20100101')The second set of test data has 2M -1 gaps, a gap between each two adjacent intervals:
TRUNCATE TABLE dbo.Tasks;
GO
INSERT INTO dbo.Tasks
( StartedAt ,
FinishedAt
)
SELECT DATEADD(MINUTE, 3*n, '20100101') AS StartedAt ,
DATEADD(MINUTE, 3*n + 2, '20100101') AS FinishedAt
FROM dbo.Numbers
WHERE ( n 500005
)
GOCurrently I am running 2008 R2, but 2012 solutions are very welcome.
I have posted my C# solution as an answer.
Solution
The following C# code solves the problem:
This code invokes this stored procedure:
It finds and prints one gap in 2M intervals in the following times, warm cache:
It finds and prints 2M-1 gaps in 2M intervals in the following times, warm cache:
This is a very simple solution - it took me 10 minutes to develop. A recent college graduate can come up with it. On the database side, execution plan is a trivial merge join which uses very little CPU and memory.
Edit: to be realistic, I am running client and server on separate boxes.
var connString =
"Initial Catalog=MyDb;Data Source=MyServer;Integrated Security=SSPI;Application Name=Benchmarks;";
var stopWatch = new Stopwatch();
stopWatch.Start();
using (var conn = new SqlConnection(connString))
{
conn.Open();
var command = conn.CreateCommand();
command.CommandText = "dbo.GetAllTaskEvents";
command.CommandType = CommandType.StoredProcedure;
var gaps = new List();
using (var dr = command.ExecuteReader())
{
var currentEvents = 0;
var gapStart = new DateTime();
var gapStarted = false;
while (dr.Read())
{
var change = dr.GetInt32(1);
if (change == -1 && currentEvents == 1)
{
gapStart = dr.GetDateTime(0);
gapStarted = true;
}
else if (change == 1 && currentEvents == 0 && gapStarted)
{
gaps.Add(string.Format("({0},{1})", gapStart, dr.GetDateTime(0)));
gapStarted = false;
}
currentEvents += change;
}
}
File.WriteAllLines(@"C:\Temp\Gaps.txt", gaps);
}
stopWatch.Stop();
System.Console.WriteLine("Elapsed: " + stopWatch.Elapsed);This code invokes this stored procedure:
CREATE PROCEDURE dbo.GetAllTaskEvents
AS
BEGIN ;
SELECT EventTime ,
Change
FROM ( SELECT StartedAt AS EventTime ,
1 AS Change
FROM dbo.Tasks
UNION ALL
SELECT FinishedAt AS EventTime ,
-1 AS Change
FROM dbo.Tasks
) AS TaskEvents
ORDER BY EventTime, Change DESC ;
END ;
GOIt finds and prints one gap in 2M intervals in the following times, warm cache:
1 gap: Elapsed: 00:00:01.4852029 00:00:01.4444307 00:00:01.4644152It finds and prints 2M-1 gaps in 2M intervals in the following times, warm cache:
2M-1 gaps Elapsed: 00:00:08.8576637 00:00:08.9123053 00:00:09.0372344 00:00:08.8545477This is a very simple solution - it took me 10 minutes to develop. A recent college graduate can come up with it. On the database side, execution plan is a trivial merge join which uses very little CPU and memory.
Edit: to be realistic, I am running client and server on separate boxes.
Code Snippets
var connString =
"Initial Catalog=MyDb;Data Source=MyServer;Integrated Security=SSPI;Application Name=Benchmarks;";
var stopWatch = new Stopwatch();
stopWatch.Start();
using (var conn = new SqlConnection(connString))
{
conn.Open();
var command = conn.CreateCommand();
command.CommandText = "dbo.GetAllTaskEvents";
command.CommandType = CommandType.StoredProcedure;
var gaps = new List<string>();
using (var dr = command.ExecuteReader())
{
var currentEvents = 0;
var gapStart = new DateTime();
var gapStarted = false;
while (dr.Read())
{
var change = dr.GetInt32(1);
if (change == -1 && currentEvents == 1)
{
gapStart = dr.GetDateTime(0);
gapStarted = true;
}
else if (change == 1 && currentEvents == 0 && gapStarted)
{
gaps.Add(string.Format("({0},{1})", gapStart, dr.GetDateTime(0)));
gapStarted = false;
}
currentEvents += change;
}
}
File.WriteAllLines(@"C:\Temp\Gaps.txt", gaps);
}
stopWatch.Stop();
System.Console.WriteLine("Elapsed: " + stopWatch.Elapsed);CREATE PROCEDURE dbo.GetAllTaskEvents
AS
BEGIN ;
SELECT EventTime ,
Change
FROM ( SELECT StartedAt AS EventTime ,
1 AS Change
FROM dbo.Tasks
UNION ALL
SELECT FinishedAt AS EventTime ,
-1 AS Change
FROM dbo.Tasks
) AS TaskEvents
ORDER BY EventTime, Change DESC ;
END ;
GO1 gap: Elapsed: 00:00:01.4852029 00:00:01.4444307 00:00:01.46441522M-1 gaps Elapsed: 00:00:08.8576637 00:00:08.9123053 00:00:09.0372344 00:00:08.8545477Context
StackExchange Database Administrators Q#39272, answer score: 4
Revisions (0)
No revisions yet.