patternsqlMajor
Can you use COUNT DISTINCT with an OVER clause?
Viewed 0 times
distinctcanyouwithclausecountuseover
Problem
I'm trying to improve the performance of the following query:
Currently with my test data it takes about a minute. I have a limited amount of input into changes on the over all stored procedure where this query resides but I can probably get them to modify this one query. Or add an index. I tried adding the following index:
And it actually doubled the amount of time the query takes. I get the same effect with a NON-CLUSTERED index.
I tried re-writing it as follows with no effect.
Next I tried to use a windowing function like this.
At this point I started to get the error
So I have two ques
UPDATE [#TempTable]
SET Received = r.Number
FROM [#TempTable]
INNER JOIN (SELECT AgentID,
RuleID,
COUNT(DISTINCT (GroupId)) Number
FROM [#TempTable]
WHERE Passed = 1
GROUP BY AgentID,
RuleID
) r ON r.RuleID = [#TempTable].RuleID AND
r.AgentID = [#TempTable].AgentIDCurrently with my test data it takes about a minute. I have a limited amount of input into changes on the over all stored procedure where this query resides but I can probably get them to modify this one query. Or add an index. I tried adding the following index:
CREATE CLUSTERED INDEX ix_test ON #TempTable(AgentID, RuleId, GroupId, Passed)And it actually doubled the amount of time the query takes. I get the same effect with a NON-CLUSTERED index.
I tried re-writing it as follows with no effect.
WITH r AS (SELECT AgentID,
RuleID,
COUNT(DISTINCT (GroupId)) Number
FROM [#TempTable]
WHERE Passed = 1
GROUP BY AgentID,
RuleID
)
UPDATE [#TempTable]
SET Received = r.Number
FROM [#TempTable]
INNER JOIN r
ON r.RuleID = [#TempTable].RuleID AND
r.AgentID = [#TempTable].AgentIDNext I tried to use a windowing function like this.
UPDATE [#TempTable]
SET Received = COUNT(DISTINCT (CASE WHEN Passed=1 THEN GroupId ELSE NULL END))
OVER (PARTITION BY AgentId, RuleId)
FROM [#TempTable]At this point I started to get the error
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'distinct'.So I have two ques
Solution
This construction is not currently supported in SQL Server. It could (and should, in my opinion) be implemented in a future version.
Applying one of the workarounds listed in the feedback item reporting this deficiency, your query could be rewritten as:
The resulting execution plan is:
This has the advantage of avoiding an Eager Table Spool for Halloween Protection (due to the self-join), but it introduces a sort (for the window) and an often-inefficient Lazy Table Spool construction to calculate and apply the
The overall approach is a difficult one to make perform well. Applying updates (especially ones based on a self-join) recursively to a large structure may be good for debugging but it is a recipe for poor performance. Repeated large scans, memory spills, and Halloween issues are just some of the issues. Indexing and (more) temporary tables can help, but very careful analysis is needed especially if the index is updated by other statements in the process (maintaining indexes affects query plan choices and adds I/O).
Ultimately, solving the underlying problem would make for interesting consultancy work, but it is too much for this site. I hope this answer addresses the surface question though.
Alternative interpretation of the original query (results in updating more rows):
Note: eliminating the sort (e.g. by providing an index) might reintroduce the need for an Eager Spool or something else to provide the necessary Halloween Protection. Sort is a blocking operator, so it provides full phase separation.
Applying one of the workarounds listed in the feedback item reporting this deficiency, your query could be rewritten as:
WITH UpdateSet AS
(
SELECT
AgentID,
RuleID,
Received,
Calc = SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (
PARTITION BY AgentID, RuleID)
FROM
(
SELECT
AgentID,
RuleID,
Received,
rn = ROW_NUMBER() OVER (
PARTITION BY AgentID, RuleID, GroupID
ORDER BY GroupID)
FROM #TempTable
WHERE Passed = 1
) AS X
)
UPDATE UpdateSet
SET Received = Calc;The resulting execution plan is:
This has the advantage of avoiding an Eager Table Spool for Halloween Protection (due to the self-join), but it introduces a sort (for the window) and an often-inefficient Lazy Table Spool construction to calculate and apply the
SUM OVER (PARTITION BY) result to all rows in the window. How it performs in practice is an exercise only you can perform.The overall approach is a difficult one to make perform well. Applying updates (especially ones based on a self-join) recursively to a large structure may be good for debugging but it is a recipe for poor performance. Repeated large scans, memory spills, and Halloween issues are just some of the issues. Indexing and (more) temporary tables can help, but very careful analysis is needed especially if the index is updated by other statements in the process (maintaining indexes affects query plan choices and adds I/O).
Ultimately, solving the underlying problem would make for interesting consultancy work, but it is too much for this site. I hope this answer addresses the surface question though.
Alternative interpretation of the original query (results in updating more rows):
WITH UpdateSet AS
(
SELECT
AgentID,
RuleID,
Received,
Calc = SUM(CASE WHEN Passed = 1 AND rn = 1 THEN 1 ELSE 0 END) OVER (
PARTITION BY AgentID, RuleID)
FROM
(
SELECT
AgentID,
RuleID,
Received,
Passed,
rn = ROW_NUMBER() OVER (
PARTITION BY AgentID, RuleID, Passed, GroupID
ORDER BY GroupID)
FROM #TempTable
) AS X
)
UPDATE UpdateSet
SET Received = Calc
WHERE Calc > 0;Note: eliminating the sort (e.g. by providing an index) might reintroduce the need for an Eager Spool or something else to provide the necessary Halloween Protection. Sort is a blocking operator, so it provides full phase separation.
Code Snippets
WITH UpdateSet AS
(
SELECT
AgentID,
RuleID,
Received,
Calc = SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (
PARTITION BY AgentID, RuleID)
FROM
(
SELECT
AgentID,
RuleID,
Received,
rn = ROW_NUMBER() OVER (
PARTITION BY AgentID, RuleID, GroupID
ORDER BY GroupID)
FROM #TempTable
WHERE Passed = 1
) AS X
)
UPDATE UpdateSet
SET Received = Calc;WITH UpdateSet AS
(
SELECT
AgentID,
RuleID,
Received,
Calc = SUM(CASE WHEN Passed = 1 AND rn = 1 THEN 1 ELSE 0 END) OVER (
PARTITION BY AgentID, RuleID)
FROM
(
SELECT
AgentID,
RuleID,
Received,
Passed,
rn = ROW_NUMBER() OVER (
PARTITION BY AgentID, RuleID, Passed, GroupID
ORDER BY GroupID)
FROM #TempTable
) AS X
)
UPDATE UpdateSet
SET Received = Calc
WHERE Calc > 0;Context
StackExchange Database Administrators Q#76628, answer score: 32
Revisions (0)
No revisions yet.