patternsqlModerate
Optimize Subquery with Windowing Function
Viewed 0 times
withfunctionsubqueryoptimizewindowing
Problem
As my performance tuning skills never seem to feel sufficient, I always wonder if there is more optimization I can perform against some queries. The situation that this question pertains to is a Windowed MAX function nested within a subquery.
The data that I'm digging through is a series of transactions on various groups of larger sets. I've got 4 fields of importance, the unique ID of a transaction, the Group ID of a batch of transactions, and dates associated with the respective unique transaction or group of transactions. Most times the Group Date matches the Maximum Unique Transaction Date for a Batch, but there are times where manual adjustments come through our system and a unique date operation occurs after the group transaction date is captured. This manual edit doesn't adjust the group date by design.
What I identify in this query are those records where the Unique Date falls after the Group Date. The following sample query builds out a rough equivalent of the my scenario and the SELECT statement returns the records I'm looking for, however, am I approaching this solution in the most efficient manner? This takes a while to run during my fact table loads as my record counts number in the upper 9 digits, but mostly my disdain for subqueries makes me wonder if there's a better approach here. I'm not as concerned about any indexes as I'm confident those are already in place; what I'm looking for is an alternative query approach that will achieve the same thing, but even more efficiently. Any feedback is welcome.
dbfiddle here
The data that I'm digging through is a series of transactions on various groups of larger sets. I've got 4 fields of importance, the unique ID of a transaction, the Group ID of a batch of transactions, and dates associated with the respective unique transaction or group of transactions. Most times the Group Date matches the Maximum Unique Transaction Date for a Batch, but there are times where manual adjustments come through our system and a unique date operation occurs after the group transaction date is captured. This manual edit doesn't adjust the group date by design.
What I identify in this query are those records where the Unique Date falls after the Group Date. The following sample query builds out a rough equivalent of the my scenario and the SELECT statement returns the records I'm looking for, however, am I approaching this solution in the most efficient manner? This takes a while to run during my fact table loads as my record counts number in the upper 9 digits, but mostly my disdain for subqueries makes me wonder if there's a better approach here. I'm not as concerned about any indexes as I'm confident those are already in place; what I'm looking for is an alternative query approach that will achieve the same thing, but even more efficiently. Any feedback is welcome.
CREATE TABLE #Example
(
UniqueID INT IDENTITY(1,1)
, GroupID INT
, GroupDate DATETIME
, UniqueDate DATETIME
)
CREATE CLUSTERED INDEX [CX_1] ON [#Example]
(
[UniqueID] ASC
)
SET NOCOUNT ON
--Populate some test data
DECLARE @i INT = 0, @j INT = 5, @UniqueDate DATETIME, @GroupDate DATETIME
WHILE @i GroupDate
AND maxUniqueDate = UniqueDate
DROP TABLE #Exampledbfiddle here
Solution
When and if you are able to upgrade from SQL Server 2012 to SQL Server 2016, you may be able to take advantage of the much improved performance (especially for frameless window aggregates) provided by the new batch mode Window Aggregate operator.
Almost all large data processing scenarios work better with columnstore storage than rowstore. Even without changing to columnstore for your base tables, you can still gain the benefits of the new 2016 operator and batch mode execution by creating an empty nonclustered columnstore filtered index on one of the base tables, or by redundantly outer joining to a columnstore-organized table.
Using the second option, the query becomes:
db<>fiddle
Note the only change to the original query is creating an empty temporary table and adding the left join. The execution plan is:
For more information and options, see Itzik Ben-Gan's excellent series, What You Need to Know about the Batch Mode Window Aggregate Operator in SQL Server 2016 (in three parts).
Almost all large data processing scenarios work better with columnstore storage than rowstore. Even without changing to columnstore for your base tables, you can still gain the benefits of the new 2016 operator and batch mode execution by creating an empty nonclustered columnstore filtered index on one of the base tables, or by redundantly outer joining to a columnstore-organized table.
Using the second option, the query becomes:
-- Just to get batch mode processing and the window aggregate operator
CREATE TABLE #Dummy (a integer NOT NULL, INDEX DummyCC CLUSTERED COLUMNSTORE);
-- Identify any UniqueDates that are greater than the GroupDate within their GroupID
SELECT
calc_maxUD.UniqueID,
calc_maxUD.GroupID,
calc_maxUD.GroupDate,
calc_maxUD.UniqueDate
FROM
(
SELECT
E.UniqueID,
E.GroupID,
E.GroupDate,
E.UniqueDate,
maxUniqueDate = MAX(UniqueDate) OVER (
PARTITION BY GroupID)
FROM #Example AS E
LEFT JOIN #Dummy AS D -- The only change to the original query
ON 1 = 0
) AS calc_maxUD
WHERE
calc_maxUD.maxUniqueDate > calc_maxUD.GroupDate
AND calc_maxUD.maxUniqueDate = calc_maxUD.UniqueDate;db<>fiddle
Note the only change to the original query is creating an empty temporary table and adding the left join. The execution plan is:
(58 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
Table '#Example'. Scan count 1, logical reads 40, physical reads 0, read-ahead reads 0
For more information and options, see Itzik Ben-Gan's excellent series, What You Need to Know about the Batch Mode Window Aggregate Operator in SQL Server 2016 (in three parts).
Code Snippets
-- Just to get batch mode processing and the window aggregate operator
CREATE TABLE #Dummy (a integer NOT NULL, INDEX DummyCC CLUSTERED COLUMNSTORE);
-- Identify any UniqueDates that are greater than the GroupDate within their GroupID
SELECT
calc_maxUD.UniqueID,
calc_maxUD.GroupID,
calc_maxUD.GroupDate,
calc_maxUD.UniqueDate
FROM
(
SELECT
E.UniqueID,
E.GroupID,
E.GroupDate,
E.UniqueDate,
maxUniqueDate = MAX(UniqueDate) OVER (
PARTITION BY GroupID)
FROM #Example AS E
LEFT JOIN #Dummy AS D -- The only change to the original query
ON 1 = 0
) AS calc_maxUD
WHERE
calc_maxUD.maxUniqueDate > calc_maxUD.GroupDate
AND calc_maxUD.maxUniqueDate = calc_maxUD.UniqueDate;Context
StackExchange Database Administrators Q#171030, answer score: 11
Revisions (0)
No revisions yet.