HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Find continuous records with equal values

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withequalrecordsfindvaluescontinuous

Problem

I have a table named "TOOLS" .

Sample data:

ID       TAG
1         0   
2         1
3         1
4         1
5         0
6         0
7         1
8         1
9         0


So, the SQL query should select sum of TOOLS having continuous 1 in field TAG for a specified row, for example ID=2 then the query returns 3 in this sample. It also returns 2 for ID=7.

How can I do it?

Solution

Or a more traditional approach:

SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )


The trouble with all these lovely window function-based solutions is that if you write them as pseudo-code you get something like this:

Do something across ALL the IDs, then work out the sequence, then aggregate...

You also can't inject the ID at any point as it invalidates the window range. Even @JulienVavasseur's solution which uses an ID variable, will scan 999,999 rows in a 1 million row table where the @ID is 2. Basically these solutions do not scale well. So if you have 1 million or 1 billion rows in your tools table, well the query will take a long time, 4+ seconds versus 0-50ms in my simple test rig (see below). Now maybe you might have only 10 rows but it's a bad habit to write code that doesn't scale IMHO. You might know the volume today but you might not tomorrow.

More traditional set-based solutions can often work best, even recursive CTEs (under certain conditions) and these were the first two that occurred to me:

DECLARE @tools TABLE ( ID INT PRIMARY KEY, TAG INT NOT NULL )

;WITH cte AS (
SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
)
INSERT INTO @tools ( ID, TAG )
SELECT rn, CAST( ( rn % 7 ) AS BIT )
FROM cte

DECLARE @id INT = 2

-- Set-based with subquery
SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(b.ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )

-- CTE
;WITH cte AS
(
SELECT ID, ID AS parentID, TAG
FROM @tools t
WHERE ID = @id
UNION ALL
SELECT b.ID, a.parentID, a.TAG
FROM cte a
    INNER JOIN @tools b ON b.ID = a.ID + 1
WHERE b.TAG = 1
)
SELECT parentID AS ID, SUM(TAG) TAG
FROM cte
GROUP BY parentID
OPTION ( MAXRECURSION 999 );        -- <-- this can be set to 0 but set to lower boundary for safety


Plugging the other solutions into my test rig does not end well for them:

In my tests, my recursive CTE solution (which normally would be thought of unfavourably as RBAR) performs the best with short sequences (< ~10,000), although this dips with really long sequences; My subquery was best with these.

I would be interested to see if a window-based approach could out-perform either of these two, even with different indexes eg I was surprised not to see a LAG / LEAD solution here. Does anyone know Itzik? : )

Code Snippets

SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )
DECLARE @tools TABLE ( ID INT PRIMARY KEY, TAG INT NOT NULL )

;WITH cte AS (
SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
)
INSERT INTO @tools ( ID, TAG )
SELECT rn, CAST( ( rn % 7 ) AS BIT )
FROM cte


DECLARE @id INT = 2

-- Set-based with subquery
SELECT @ID ID, SUM(a.TAG) r
FROM @tools a
WHERE a.ID Between @ID And ( SELECT MIN(b.ID) FROM @tools b WHERE b.ID > @ID AND b.TAG = 0 )


-- CTE
;WITH cte AS
(
SELECT ID, ID AS parentID, TAG
FROM @tools t
WHERE ID = @id
UNION ALL
SELECT b.ID, a.parentID, a.TAG
FROM cte a
    INNER JOIN @tools b ON b.ID = a.ID + 1
WHERE b.TAG = 1
)
SELECT parentID AS ID, SUM(TAG) TAG
FROM cte
GROUP BY parentID
OPTION ( MAXRECURSION 999 );        -- <-- this can be set to 0 but set to lower boundary for safety

Context

StackExchange Database Administrators Q#129999, answer score: 3

Revisions (0)

No revisions yet.