HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What order should my index's columns be for a SELECT a, b, MAX( c ) FROM d WHERE e = 1 GROUP BY a, b?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
ordercolumnswhatgroupwhereselectmaxforshouldindex

Problem

Supposing I have this table (in a multi-tenant SaaS database:

CREATE TABLE dbo.Messages (

    TenantId    int          NOT NULL,
    RecipientId int          NOT NULL,
    MessageId   int          NOT NULL IDENTITY,
    SentUtc     datetime2(7) NOT NULL,
    IsDeleted   bit          NOT NULL, /* Less than 5% of rows have IsDeleted = 1 */
    -- Body, Subject, etc
    
    CONSTRAINT PK_Messages PRIMARY KEY ( MessageId, TenantId ),

    CONSTRAINT UK_Messages UNIQUE ( MessageId ),

    CONSTRAINT FK_Messages_Tenants FOREIGN KEY ( TenantId ) REFERENCES dbo.Tenants ( TenantId ),
    CONSTRAINT FK_Messages_Recipients FOREIGN KEY ( RecipientId, TenantId ) REFERENCES dbo.Recipients ( RecipientId, TenantId )
);


I have a VIEW that is meant to get the latest MessageId for each RecipientId:

CREATE VIEW dbo.LatestMessages
    WITH SCHEMABINDING
AS

WITH latestMessageUtcPerRecipient AS (

    SELECT
        m.TenantId,
        m.RecipientId,
        MAX( m.SentUtc ) AS MaxSentUtc
    FROM
        dbo.Messages AS m
    WHERE
        m.IsDeleted = 0
    GROUP BY
        m.TenantId,
        m.RecipientId
)
SELECT
    lm.TenantId,
    lm.RecipientId,
    m .MessageId,
    m .SentUtc

FROM
    dbo.Messages AS m

    INNER JOIN latestMessageUtcPerRecipient AS lm ON
        m.TenantId = lm.TenantId
        AND
        m.RecipientId = lm.RecipientId
        AND
        m.SentUtc = lm.MaxSentUtc


-
I currently have foreign-key-covering (non-clustered) indexes over FX_Tenants ( TenantId ) and FX_Recipients ( RecipientId, TenantId ).

-
Without any other extra INDEX objects on dbo.Messages, this VIEW's execution-plan currently performs an Index Scan on the FX_Recipients index where it reads 14,000% of the final query rows (erk!)

-
So I need to create a good index for this VIEW's query that covers the TenantId, RecipientId, and SentUtc - and I understand I should exclude IsDeleted from the index as is a low-selectivity column) - the probl

Solution

Index and View

I would probably replace the FX indexes with:

CREATE NONCLUSTERED INDEX 
    [dbo.Messages TenantID, RecipientId, SentUtc, (MessageId, IsDeleted)]
ON dbo.[Messages]
    (TenantID ASC, RecipientId ASC, SentUtc ASC)
INCLUDE 
    (MessageId, IsDeleted);


There's not too much benefit to MessageId being an INCLUDE rather than at the end of the key. It will only be non-key at the leaf of the index because the index is not specified as unique. MessageId is part of the clustering key so it will be part of the nonclustered index key above the leaf level. It doesn't make a huge amount of difference either way.

You could make this index filtered on IsDeleted = 0 but whether this is worthwhile depends on how many deleted records you have and how your foreign keys are set up for deleted rows.

You would also need to be careful that your queries don't ever supply a parameterized value for IsDeleted (or that a supplied literal isn't simple/forced parameterized on the server) because then the filtered index can't be matched. There are also other issues. Initially, I probably wouldn't bother with it.

Add the missing IsDeleted = 0 predicate to your view:

CREATE OR ALTER VIEW dbo.LatestMessages
    WITH SCHEMABINDING
AS
WITH latestMessageUtcPerRecipient AS (
    SELECT
        m.TenantId,
        m.RecipientId,
        MAX( m.SentUtc ) AS MaxSentUtc
    FROM
        dbo.Messages AS m
    WHERE
        m.IsDeleted = 0
    GROUP BY
        m.TenantId,
        m.RecipientId
)
SELECT
    lm.TenantId,
    lm.RecipientId,
    m .MessageId,
    m .SentUtc

FROM
    dbo.Messages AS m
    INNER JOIN latestMessageUtcPerRecipient AS lm ON
        m.TenantId = lm.TenantId
        AND
        m.RecipientId = lm.RecipientId
        AND
        m.SentUtc = lm.MaxSentUtc
WHERE
    m.IsDeleted = 0; -- This was missing


Now, a query on the view uses a very efficient Segment Top plan:

SELECT LM.*
FROM dbo.LatestMessages AS LM;


Alternate Views

If it were my code, I might express the view equivalently as:

CREATE OR ALTER VIEW dbo.LatestMessages
WITH SCHEMABINDING
AS
SELECT
    M.TenantId,
    M.RecipientId,
    M.MessageId,
    M.SentUtc
FROM dbo.[Messages] AS M
WHERE
    M.IsDeleted = 0
    AND M.SentUtc =
    (
        SELECT
            MAX(M2.SentUtc)
        FROM dbo.[Messages] AS M2
        WHERE
            -- Correlations
            M2.TenantId = M.TenantId
            AND M2.RecipientId = M.RecipientId
            AND M2.IsDeleted = M.IsDeleted
    );


But this is mostly a question of style.

You might also prefer:

CREATE OR ALTER VIEW dbo.LatestMessages
    WITH SCHEMABINDING
AS
SELECT
    TR.TenantId, 
    TR.RecipientId, 
    T1.MessageId, 
    T1.SentUtc
FROM 
(
    SELECT 
        R.TenantId, 
        R.RecipientId
    FROM dbo.Recipients AS R
) AS TR
CROSS APPLY 
(
    SELECT TOP (1)
        M.MessageId, 
        M.SentUtc
    FROM dbo.[Messages] AS M
    WHERE 
        M.TenantId = TR.TenantId
        AND M.RecipientId = TR.RecipientId
        AND M.IsDeleted = 0
    ORDER BY
        M.SentUtc DESC
) AS T1;


This definition can also work well with the suggested index:

Window Functions

I would probably avoid writing the view with a DENSE_RANK windowing function because SQL Server has had some problems pushing parameterized or variable predicates down into such views. As I mention in the linked article, you can work around this by replacing the view with a parameterized inline TVF, though that is not always convenient.

For example:

DECLARE @TenantId integer = 1;

SELECT
    LM.*
FROM dbo.LatestMessages AS LM 
WHERE 
    LM.TenantId = @TenantId;


With the windowing function approach, the predicate on TenantId will not be pushed past the function, resulting in an index scan where you would expect a seek. It can be hard to spot because the stuck filter ends up combined with the existing filter for DENSE_RANK = 1.

This aspect is not a concern if you are running SQL Server 2017 CU30 or later with query optimizer hotfixes enabled, or SQL Server 2022 (where the hotfix setting is not required). In those configurations, the optimizer will push the predicate down to obtain a seek.

Code Snippets

CREATE NONCLUSTERED INDEX 
    [dbo.Messages TenantID, RecipientId, SentUtc, (MessageId, IsDeleted)]
ON dbo.[Messages]
    (TenantID ASC, RecipientId ASC, SentUtc ASC)
INCLUDE 
    (MessageId, IsDeleted);
CREATE OR ALTER VIEW dbo.LatestMessages
    WITH SCHEMABINDING
AS
WITH latestMessageUtcPerRecipient AS (
    SELECT
        m.TenantId,
        m.RecipientId,
        MAX( m.SentUtc ) AS MaxSentUtc
    FROM
        dbo.Messages AS m
    WHERE
        m.IsDeleted = 0
    GROUP BY
        m.TenantId,
        m.RecipientId
)
SELECT
    lm.TenantId,
    lm.RecipientId,
    m .MessageId,
    m .SentUtc

FROM
    dbo.Messages AS m
    INNER JOIN latestMessageUtcPerRecipient AS lm ON
        m.TenantId = lm.TenantId
        AND
        m.RecipientId = lm.RecipientId
        AND
        m.SentUtc = lm.MaxSentUtc
WHERE
    m.IsDeleted = 0; -- This was missing
SELECT LM.*
FROM dbo.LatestMessages AS LM;
CREATE OR ALTER VIEW dbo.LatestMessages
WITH SCHEMABINDING
AS
SELECT
    M.TenantId,
    M.RecipientId,
    M.MessageId,
    M.SentUtc
FROM dbo.[Messages] AS M
WHERE
    M.IsDeleted = 0
    AND M.SentUtc =
    (
        SELECT
            MAX(M2.SentUtc)
        FROM dbo.[Messages] AS M2
        WHERE
            -- Correlations
            M2.TenantId = M.TenantId
            AND M2.RecipientId = M.RecipientId
            AND M2.IsDeleted = M.IsDeleted
    );
CREATE OR ALTER VIEW dbo.LatestMessages
    WITH SCHEMABINDING
AS
SELECT
    TR.TenantId, 
    TR.RecipientId, 
    T1.MessageId, 
    T1.SentUtc
FROM 
(
    SELECT 
        R.TenantId, 
        R.RecipientId
    FROM dbo.Recipients AS R
) AS TR
CROSS APPLY 
(
    SELECT TOP (1)
        M.MessageId, 
        M.SentUtc
    FROM dbo.[Messages] AS M
    WHERE 
        M.TenantId = TR.TenantId
        AND M.RecipientId = TR.RecipientId
        AND M.IsDeleted = 0
    ORDER BY
        M.SentUtc DESC
) AS T1;

Context

StackExchange Database Administrators Q#330840, answer score: 8

Revisions (0)

No revisions yet.