patternsqlMinor
Ordering on two columns is less efficient than order on only the first, why?
Viewed 0 times
whyorderthecolumnsthanefficientorderingtwofirstless
Problem
I have a table that stores transactions for accounts, and I want to select out the latest transaction for a specific account. But the DB (Azure SQL) seems to fetch all transactions for the account, and then do top 1 when I order the result on two columns, and I dont understand why.
My query looks something like this:
and plan:
https://www.brentozar.com/pastetheplan/?id=SyF7Y2ew2
If I instead just order on Timestamp it does it quicker:
https://www.brentozar.com/pastetheplan/?id=HJDqFhgD3
The table and index:
I have simplified the query and table a bit for testing purposes. In my real table I have 10s of millions of rows, with quite a lot of columns not included here, but the behavior regarding the query plan usage is the same.
My query looks something like this:
SELECT TOP 1 *
FROM [dbo].[Transaction] T
WHERE T.AccountId = 4
ORDER BY
T.[Timestamp] DESC,
T.[Created] DESCand plan:
https://www.brentozar.com/pastetheplan/?id=SyF7Y2ew2
If I instead just order on Timestamp it does it quicker:
SELECT TOP 1 *
FROM [dbo].[Transaction] T
WHERE T.AccountId = 4
ORDER BY
T.[Timestamp] DESChttps://www.brentozar.com/pastetheplan/?id=HJDqFhgD3
The table and index:
CREATE TABLE [dbo].[Transaction](
[TransactionId] [UNIQUEIDENTIFIER] NOT NULL PRIMARY KEY,
[Created] [DATETIME] NOT NULL DEFAULT GETUTCDATE(),
[Timestamp] [DATETIME] NOT NULL,
[AccountId] int NULL)
GO
CREATE NONCLUSTERED INDEX [IX_Transaction_AccountId_Timestamp] ON [dbo].[Transaction]([AccountId], [Timestamp] DESC) WITH (ONLINE = ON)
GOI have simplified the query and table a bit for testing purposes. In my real table I have 10s of millions of rows, with quite a lot of columns not included here, but the behavior regarding the query plan usage is the same.
Solution
The index ordering only enforces that for a particular AccountId the index rows are ordered by
Not by
The
You might consider
in order to get the execution plan you want (where the index ordering is used to get the rows with the top timestamp and then the top
I dont understand why
I guess this is just because the product doesn't contain every possible optimisation rule so sometimes writing the query in a different way is needed to get the desired results.
Timestamp.Not by
[Timestamp] , [Created]The
TOP N Sort needs to see all the rows for AccountId=4 and keep track of the top 1. (conceptually this is getting the TOP 1 row after all the rows have been sorted by [Timestamp] , [Created] and so the TOP operation does not benefit in any way from any ordering that the rows may arrive in)You might consider
WITH TopTimestamp AS
(
SELECT TOP 1 WITH TIES *
FROM [dbo].[Transaction] T
WHERE T.AccountId = 4
ORDER BY
T.[Timestamp] DESC
)
SELECT TOP 1 *
FROM TopTimestamp
ORDER BY [Created] DESCin order to get the execution plan you want (where the index ordering is used to get the rows with the top timestamp and then the top
created is found from that subset) .I dont understand why
I guess this is just because the product doesn't contain every possible optimisation rule so sometimes writing the query in a different way is needed to get the desired results.
Code Snippets
WITH TopTimestamp AS
(
SELECT TOP 1 WITH TIES *
FROM [dbo].[Transaction] T
WHERE T.AccountId = 4
ORDER BY
T.[Timestamp] DESC
)
SELECT TOP 1 *
FROM TopTimestamp
ORDER BY [Created] DESCContext
StackExchange Database Administrators Q#328114, answer score: 3
Revisions (0)
No revisions yet.