HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

ROW_NUMBER() OVER (PARTITION BY B,A ORDER BY C) doesn't use index on (A,B,C)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
orderpartitiondoesnrow_numberindexuseover

Problem

Consider these two functions:

ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY C)

ROW_NUMBER() OVER (PARTITION BY B,A ORDER BY C)


As far as I understand, they produce exactly the same result.
In other words, the order in which you list the columns in the PARTITION BY clause doesn't matter.

If there is an index on (A,B,C) I expected the optimiser to use this index in both variants.

But, surprisingly, optimiser decided to do an extra explicit Sort in the second variant.

I've seen it on SQL Server 2008 Standard and SQL Server 2014 Express.

Here is a full script that I used to reproduce it.

Tried on Microsoft SQL Server 2014 - 12.0.2000.8 (X64)
Feb 20 2014 20:04:26
Copyright (c) Microsoft Corporation
Express Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1)

and Microsoft SQL Server 2014 (SP1-CU7) (KB3162659) - 12.0.4459.0 (X64)
May 27 2016 15:33:17
Copyright (c) Microsoft Corporation
Express Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1)

with both old and new Cardinality Estimator by using OPTION (QUERYTRACEON 9481) and OPTION (QUERYTRACEON 2312).

Set up table, index, sample data

CREATE TABLE [dbo].[T](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [A] [int] NOT NULL,
    [B] [int] NOT NULL,
    [C] [int] NOT NULL,
    CONSTRAINT [PK_T] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, 
IGNORE_DUP_KEY = OFF, 
ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [IX_ABC] ON [dbo].[T]
(
    [A] ASC,
    [B] ASC,
    [C] ASC
)WITH (PAD_INDEX = OFF, 
STATISTICS_NORECOMPUTE = OFF, 
SORT_IN_TEMPDB = OFF, 
DROP_EXISTING = OFF, 
ONLINE = OFF, 
ALLOW_ROW_LOCKS = ON, 
ALLOW_PAGE_LOCKS = ON)
GO

INSERT INTO [dbo].[T] ([A],[B],[C]) VALUES
(10, 20, 30),
(10, 21, 31),
(10, 21, 32),
(10, 21, 33),
(11, 20, 34),
(11, 21, 35),
(11, 21, 36),
(12, 20, 37),
(12, 21, 38),
(13, 21, 39);


Queries

```
SELECT --

Solution

It seems that there is no good definitive "answer" to the question "what is going on in the optimiser", unless you are its developer and know its internals.

I'll put together the comments here.

Overall, it seems that it would be too harsh to call it a bug, because the end result of the query is correct. In some cases the execution plan is simply not optimal. ypercubeᵀᴹ, Martin Smith and Aaron Bertrand call it "missed optimization".

-

Seems like GROUP BY a,b and GROUP BY b,a yields identical plans
but PARTITION BY can't use the same transformation

-

There are other missing optimisations too where window functions with
the same window specification can have an extra sort operation if
separated in the select list by one with a different specification.

-

Yeah, this seems like another missed optimization, and there are
plenty of those. The optimizer is written by humans and isn't perfect

There is a somewhat related article Descending Indexes. Index ordering, parallelism, and ranking calculations by Itzik Ben-Gan. There Itzik discusses descending indexes and also gives an example how direction of the index definition affects window functions with partitions. He shows examples of queries and generated plans with ROW_NUMBER that have extra sort operator that optimiser could have avoided.

To me the practical outcome would be to keep this peculiarity of optimiser in mind. When using PARTITION BY in window functions always try to match the order in which you list the columns in PARTITION BY with the order in which they are listed in the index. Even though it should not matter.

Another side of this precaution is when you review your indexes and decide to swap some columns around in the index definition. Be aware that you may inadvertently affect some existing queries that seemingly should not be affected. This is actually how I noticed this peculiarity of the optimiser.

If you don't, optimiser may not be able to use the index to its full potential. Even if optimiser does choose an optimal plan, such plan may change to less optimal with a slightest innocent change to the query, like changing the order of columns in the SELECT statement.

Context

StackExchange Database Administrators Q#141999, answer score: 5

Revisions (0)

No revisions yet.