patternsqlMinor
Find continuous records with equal values
Viewed 0 times
withequalrecordsfindvaluescontinuous
Problem
I have a table named "TOOLS" .
Sample data:
So, the SQL query should select sum of TOOLS having continuous
How can I do it?
Sample data:
ID TAG
1 0
2 1
3 1
4 1
5 0
6 0
7 1
8 1
9 0So, the SQL query should select sum of TOOLS having continuous
1 in field TAG for a specified row, for example ID=2 then the query returns 3 in this sample. It also returns 2 for ID=7.How can I do it?
Solution
Or a more traditional approach:
The trouble with all these lovely window function-based solutions is that if you write them as pseudo-code you get something like this:
You also can't inject the ID at any point as it invalidates the window range. Even @JulienVavasseur's solution which uses an ID variable, will scan 999,999 rows in a 1 million row table where the
More traditional set-based solutions can often work best, even recursive CTEs (under certain conditions) and these were the first two that occurred to me:
Plugging the other solutions into my test rig does not end well for them:
In my tests, my recursive CTE solution (which normally would be thought of unfavourably as RBAR) performs the best with short sequences (< ~10,000), although this dips with really long sequences; My subquery was best with these.
I would be interested to see if a window-based approach could out-perform either of these two, even with different indexes eg I was surprised not to see a LAG / LEAD solution here. Does anyone know Itzik? : )
SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )The trouble with all these lovely window function-based solutions is that if you write them as pseudo-code you get something like this:
Do something across ALL the IDs, then work out the sequence, then aggregate...You also can't inject the ID at any point as it invalidates the window range. Even @JulienVavasseur's solution which uses an ID variable, will scan 999,999 rows in a 1 million row table where the
@ID is 2. Basically these solutions do not scale well. So if you have 1 million or 1 billion rows in your tools table, well the query will take a long time, 4+ seconds versus 0-50ms in my simple test rig (see below). Now maybe you might have only 10 rows but it's a bad habit to write code that doesn't scale IMHO. You might know the volume today but you might not tomorrow.More traditional set-based solutions can often work best, even recursive CTEs (under certain conditions) and these were the first two that occurred to me:
DECLARE @tools TABLE ( ID INT PRIMARY KEY, TAG INT NOT NULL )
;WITH cte AS (
SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO @tools ( ID, TAG )
SELECT rn, CAST( ( rn % 7 ) AS BIT )
FROM cte
DECLARE @id INT = 2
-- Set-based with subquery
SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(b.ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )
-- CTE
;WITH cte AS
(
SELECT ID, ID AS parentID, TAG
FROM @tools t
WHERE ID = @id
UNION ALL
SELECT b.ID, a.parentID, a.TAG
FROM cte a
INNER JOIN @tools b ON b.ID = a.ID + 1
WHERE b.TAG = 1
)
SELECT parentID AS ID, SUM(TAG) TAG
FROM cte
GROUP BY parentID
OPTION ( MAXRECURSION 999 ); -- <-- this can be set to 0 but set to lower boundary for safetyPlugging the other solutions into my test rig does not end well for them:
In my tests, my recursive CTE solution (which normally would be thought of unfavourably as RBAR) performs the best with short sequences (< ~10,000), although this dips with really long sequences; My subquery was best with these.
I would be interested to see if a window-based approach could out-perform either of these two, even with different indexes eg I was surprised not to see a LAG / LEAD solution here. Does anyone know Itzik? : )
Code Snippets
SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )DECLARE @tools TABLE ( ID INT PRIMARY KEY, TAG INT NOT NULL )
;WITH cte AS (
SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO @tools ( ID, TAG )
SELECT rn, CAST( ( rn % 7 ) AS BIT )
FROM cte
DECLARE @id INT = 2
-- Set-based with subquery
SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(b.ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )
-- CTE
;WITH cte AS
(
SELECT ID, ID AS parentID, TAG
FROM @tools t
WHERE ID = @id
UNION ALL
SELECT b.ID, a.parentID, a.TAG
FROM cte a
INNER JOIN @tools b ON b.ID = a.ID + 1
WHERE b.TAG = 1
)
SELECT parentID AS ID, SUM(TAG) TAG
FROM cte
GROUP BY parentID
OPTION ( MAXRECURSION 999 ); -- <-- this can be set to 0 but set to lower boundary for safetyContext
StackExchange Database Administrators Q#129999, answer score: 3
Revisions (0)
No revisions yet.