HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Ordering on two columns is less efficient than order on only the first, why?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyorderthecolumnsthanefficientorderingtwofirstless

Problem

I have a table that stores transactions for accounts, and I want to select out the latest transaction for a specific account. But the DB (Azure SQL) seems to fetch all transactions for the account, and then do top 1 when I order the result on two columns, and I dont understand why.

My query looks something like this:

SELECT TOP 1 *
    FROM [dbo].[Transaction] T
    WHERE T.AccountId = 4
    ORDER BY
        T.[Timestamp] DESC,
        T.[Created] DESC


and plan:
https://www.brentozar.com/pastetheplan/?id=SyF7Y2ew2

If I instead just order on Timestamp it does it quicker:

SELECT TOP 1 *
    FROM [dbo].[Transaction] T
    WHERE T.AccountId = 4
    ORDER BY
        T.[Timestamp] DESC


https://www.brentozar.com/pastetheplan/?id=HJDqFhgD3

The table and index:

CREATE  TABLE [dbo].[Transaction](
    [TransactionId] [UNIQUEIDENTIFIER] NOT NULL PRIMARY KEY,
    [Created] [DATETIME] NOT NULL DEFAULT GETUTCDATE(),
    [Timestamp] [DATETIME] NOT NULL,
    [AccountId] int NULL)
GO
CREATE NONCLUSTERED INDEX [IX_Transaction_AccountId_Timestamp] ON [dbo].[Transaction]([AccountId], [Timestamp] DESC) WITH (ONLINE = ON)
GO


I have simplified the query and table a bit for testing purposes. In my real table I have 10s of millions of rows, with quite a lot of columns not included here, but the behavior regarding the query plan usage is the same.

Solution

The index ordering only enforces that for a particular AccountId the index rows are ordered by Timestamp.

Not by [Timestamp] , [Created]

The TOP N Sort needs to see all the rows for AccountId=4 and keep track of the top 1. (conceptually this is getting the TOP 1 row after all the rows have been sorted by [Timestamp] , [Created] and so the TOP operation does not benefit in any way from any ordering that the rows may arrive in)

You might consider

WITH TopTimestamp AS
(
SELECT TOP 1 WITH TIES *
    FROM [dbo].[Transaction] T
    WHERE T.AccountId = 4
    ORDER BY
        T.[Timestamp] DESC
)
SELECT TOP 1 *
FROM TopTimestamp
ORDER BY [Created] DESC


in order to get the execution plan you want (where the index ordering is used to get the rows with the top timestamp and then the top created is found from that subset) .

I dont understand why

I guess this is just because the product doesn't contain every possible optimisation rule so sometimes writing the query in a different way is needed to get the desired results.

Code Snippets

WITH TopTimestamp AS
(
SELECT TOP 1 WITH TIES *
    FROM [dbo].[Transaction] T
    WHERE T.AccountId = 4
    ORDER BY
        T.[Timestamp] DESC
)
SELECT TOP 1 *
FROM TopTimestamp
ORDER BY [Created] DESC

Context

StackExchange Database Administrators Q#328114, answer score: 3

Revisions (0)

No revisions yet.