HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Optimize Subquery with Windowing Function

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withfunctionsubqueryoptimizewindowing

Problem

As my performance tuning skills never seem to feel sufficient, I always wonder if there is more optimization I can perform against some queries. The situation that this question pertains to is a Windowed MAX function nested within a subquery.

The data that I'm digging through is a series of transactions on various groups of larger sets. I've got 4 fields of importance, the unique ID of a transaction, the Group ID of a batch of transactions, and dates associated with the respective unique transaction or group of transactions. Most times the Group Date matches the Maximum Unique Transaction Date for a Batch, but there are times where manual adjustments come through our system and a unique date operation occurs after the group transaction date is captured. This manual edit doesn't adjust the group date by design.

What I identify in this query are those records where the Unique Date falls after the Group Date. The following sample query builds out a rough equivalent of the my scenario and the SELECT statement returns the records I'm looking for, however, am I approaching this solution in the most efficient manner? This takes a while to run during my fact table loads as my record counts number in the upper 9 digits, but mostly my disdain for subqueries makes me wonder if there's a better approach here. I'm not as concerned about any indexes as I'm confident those are already in place; what I'm looking for is an alternative query approach that will achieve the same thing, but even more efficiently. Any feedback is welcome.

CREATE TABLE #Example
(
    UniqueID INT IDENTITY(1,1)
  , GroupID INT
  , GroupDate DATETIME
  , UniqueDate DATETIME
)

CREATE CLUSTERED INDEX [CX_1] ON [#Example]
(
    [UniqueID] ASC
)

SET NOCOUNT ON

--Populate some test data
DECLARE @i INT = 0, @j INT = 5, @UniqueDate DATETIME, @GroupDate DATETIME

WHILE @i  GroupDate
    AND maxUniqueDate = UniqueDate

DROP TABLE #Example


dbfiddle here

Solution

When and if you are able to upgrade from SQL Server 2012 to SQL Server 2016, you may be able to take advantage of the much improved performance (especially for frameless window aggregates) provided by the new batch mode Window Aggregate operator.

Almost all large data processing scenarios work better with columnstore storage than rowstore. Even without changing to columnstore for your base tables, you can still gain the benefits of the new 2016 operator and batch mode execution by creating an empty nonclustered columnstore filtered index on one of the base tables, or by redundantly outer joining to a columnstore-organized table.

Using the second option, the query becomes:

-- Just to get batch mode processing and the window aggregate operator
CREATE TABLE #Dummy (a integer NOT NULL, INDEX DummyCC CLUSTERED COLUMNSTORE);

-- Identify any UniqueDates that are greater than the GroupDate within their GroupID
SELECT
    calc_maxUD.UniqueID,
    calc_maxUD.GroupID,
    calc_maxUD.GroupDate,
    calc_maxUD.UniqueDate
FROM 
(
    SELECT
        E.UniqueID,
        E.GroupID,
        E.GroupDate,
        E.UniqueDate,
        maxUniqueDate = MAX(UniqueDate) OVER (
            PARTITION BY GroupID)
    FROM #Example AS E
    LEFT JOIN #Dummy AS D -- The only change to the original query
        ON 1 = 0
) AS calc_maxUD
WHERE 
    calc_maxUD.maxUniqueDate > calc_maxUD.GroupDate
    AND calc_maxUD.maxUniqueDate = calc_maxUD.UniqueDate;


db<>fiddle

Note the only change to the original query is creating an empty temporary table and adding the left join. The execution plan is:

(58 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
Table '#Example'. Scan count 1, logical reads 40, physical reads 0, read-ahead reads 0


For more information and options, see Itzik Ben-Gan's excellent series, What You Need to Know about the Batch Mode Window Aggregate Operator in SQL Server 2016 (in three parts).

Code Snippets

-- Just to get batch mode processing and the window aggregate operator
CREATE TABLE #Dummy (a integer NOT NULL, INDEX DummyCC CLUSTERED COLUMNSTORE);

-- Identify any UniqueDates that are greater than the GroupDate within their GroupID
SELECT
    calc_maxUD.UniqueID,
    calc_maxUD.GroupID,
    calc_maxUD.GroupDate,
    calc_maxUD.UniqueDate
FROM 
(
    SELECT
        E.UniqueID,
        E.GroupID,
        E.GroupDate,
        E.UniqueDate,
        maxUniqueDate = MAX(UniqueDate) OVER (
            PARTITION BY GroupID)
    FROM #Example AS E
    LEFT JOIN #Dummy AS D -- The only change to the original query
        ON 1 = 0
) AS calc_maxUD
WHERE 
    calc_maxUD.maxUniqueDate > calc_maxUD.GroupDate
    AND calc_maxUD.maxUniqueDate = calc_maxUD.UniqueDate;

Context

StackExchange Database Administrators Q#171030, answer score: 11

Revisions (0)

No revisions yet.