HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

Getting a SORT operator when I have an index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexoperatorgettingwhensorthave

Problem

On a Azure SQL db (SQL2019 compat), I have an ETL process that populates HISTORY tables in a DeltaTrack pattern.

In the Proc, there's an UPDATE to the HISTORY table which the query engine is using a SORT but I have an index that should cover it.

The use case for this UPDATE is for existing rows where we've added additional columns to the ingest since the row was first added to the HISTORY table.

This SORT is resulting in the Procs where the updates are on our larger/wider tables to be painfully slow.

How do I adjust the index or query to remove the SORT in query 3?

Here's the updated execution plan as requested by J.D.

Here's the DDL.

```
DROP TABLE IF EXISTS dbo.STAGE;
GO
CREATE TABLE dbo.STAGE
(
Id varchar(18) NULL,
CreatedDate varchar(4000) NULL,
LastModifiedDate varchar(4000) NULL,
LastReferencedDate varchar(4000) NULL,
[Name] varchar(4000) NULL,
OwnerId varchar(4000) NULL,
SystemTimestamp datetime2(7) NULL
)
GO

DROP TABLE IF EXISTS dbo.HISTORY;
GO
CREATE TABLE dbo.HISTORY
(
HistoryRecordId int IDENTITY(1,1) NOT NULL,
[Hash] binary(64) NOT NULL,
[IsActive] BIT NOT NULL ,
ActiveFromDateTime datetime2(7) NOT NULL,
ActiveToDateTime datetime2(7) NOT NULL,
Id varchar(18) NOT NULL,
CreatedDate datetime2(7) NULL,
LastModifiedDate datetime2(7) NULL,
LastReferencedDate datetime2(7) NULL,
[Name] varchar(80) NULL,
OwnerId varchar(18) NULL,
SystemTimestamp datetime2(7) NULL
)
GO
CREATE UNIQUE CLUSTERED INDEX [CL__HISTORY] ON dbo.HISTORY
(
Id ,
[ActiveToDateTime] ASC,
[IsActive] ASC
)
GO
CREATE NONCLUSTERED INDEX [IX__HISTORY_IsActive] ON dbo.HISTORY
(
[Id] ASC
)
INCLUDE([IsActive],[ActiveToDateTime])
GO

DROP TABLE IF EXISTS #updates;
GO

WITH src AS (
SELECT
CONVERT(VARCHAR(18), t.[Id]) AS [Id]
, CONVERT(DATETIME2, t.[CreatedDate]) AS [CreatedDate]
, CONVERT(DATETIME2, t.[LastModifiedDate]) AS [LastModifiedDate]
, CONVERT(DATETIME2, t.[LastReferencedDate]) A

Solution

The Id column in your temporary table is unique, but you're not telling the optimizer about that.

Replace the existing nonclustered index on the temporary table with:

CREATE UNIQUE CLUSTERED INDEX CCI_#updates__Id
ON #updates ([Id]);


Note the index is UNIQUE and CLUSTERED.

This will remove the Hash Match Aggregate from the plan (choosing arbitrary row values per undeclared key). This aggregate was slow because it spilled to disk for lack of memory, but the main point is the aggregate was only needed because SQL Server couldn't be sure that one row from the HISTORY table would match at most one row from the temporary table. Adding a uniqueness guarantee to the temporary table resolves that problem and removes the aggregate.

Now add a FORCESEEK hint to the final update:

UPDATE tgt 
SET
    tgt.[Hash] = src.[Hash], 
    tgt.IsActive = 1, 
    tgt.[CreatedDate] = src.[CreatedDate], 
    tgt.[LastModifiedDate] = src.[LastModifiedDate],
    tgt.[LastReferencedDate] = src.[LastReferencedDate],
    tgt.[Name] = src.[Name],
    tgt.[OwnerId] = src.[OwnerId],
    tgt.SystemTimestamp = src.SystemTimestamp
FROM dbo.HISTORY tgt
INNER JOIN #updates src   
    WITH (FORCESEEK) -- NEW!
    ON tgt.[Id] = src.[Id]
    AND src.[ActiveFromDateTime] = tgt.[ActiveFromDateTime] 
    AND tgt.[Hash]  <> src.[Hash];


You should get a plan with no sorts or hashing like this:

The Eager Table Spool is required for Halloween Protection because you're updating a clustering key (IsActive).

You may find this plan shape works best. You're not updating a ton of rows.

The original sort was introduced to present rows to the Clustered Index Update operator in key order. This helps produce a sequential access pattern rather than seeking into the clustered index for each update. The plan above relies on preserving that key order, so no sort is needed.

I know you said you're following some pattern or other, but many aspects of your script seem redundant, inefficient, or unsafe.

  • The history table doesn't have a unique constraint on the identity column.



  • The hash calculation could use CONCAT_WS.



  • The hash calculation doesn't use style formats for date conversions.



  • The HistoryRecordId column saved to the temporary table is never used.



  • It's unclear if you're saving anything with the hashes over comparing columns directly.



  • Your final update changes the cluster key column IsActive unconditionally, requiring Halloween Protection. You could look into not doing that, or only doing it when absolutely necessary, perhaps in a separate update. This all rather depends on what that column means and what your process guarantees.

Code Snippets

CREATE UNIQUE CLUSTERED INDEX CCI_#updates__Id
ON #updates ([Id]);
UPDATE tgt 
SET
    tgt.[Hash] = src.[Hash], 
    tgt.IsActive = 1, 
    tgt.[CreatedDate] = src.[CreatedDate], 
    tgt.[LastModifiedDate] = src.[LastModifiedDate],
    tgt.[LastReferencedDate] = src.[LastReferencedDate],
    tgt.[Name] = src.[Name],
    tgt.[OwnerId] = src.[OwnerId],
    tgt.SystemTimestamp = src.SystemTimestamp
FROM dbo.HISTORY tgt
INNER JOIN #updates src   
    WITH (FORCESEEK) -- NEW!
    ON tgt.[Id] = src.[Id]
    AND src.[ActiveFromDateTime] = tgt.[ActiveFromDateTime] 
    AND tgt.[Hash]  <> src.[Hash];

Context

StackExchange Database Administrators Q#331081, answer score: 7

Revisions (0)

No revisions yet.