HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

Why am I getting a sort when I have an index?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexwhygettingwhensorthave

Problem

Azure SQL Database.

I have a table from which I need to get the first and most recent rows for Col1 and Col2 based on CreateDate.

CREATE TABLE dbo.table1 (
    Id            INT    IDENTITY(1,1) PRIMARY KEY ,
    Col1        VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL ,
    Col2        VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CS_AS NOT NULL ,
    CreateDate    DATETIME NOT NULL
) ;


I have an index like so:

CREATE INDEX IX__table1_ASC
ON dbo.table1 (Col1, Col2, CreateDate );


My query to get the first row is (plan here):

--Get the first row
SELECT    TOP (1) WITH TIES
        *
FROM    table1
ORDER BY ROW_NUMBER()
        OVER (PARTITION BY Col1, Col2
              ORDER BY CreateDate );


The index scan is using the index (IX__table1_ASC) I created, but why am I getting a sort?

My query to get the most recent row (plan here):

--get latest row
SELECT    TOP (1) WITH TIES
        *
FROM    table1
ORDER BY ROW_NUMBER()
        OVER (PARTITION BY Col1, Col2
              ORDER BY CreateDate DESC); --desc here


Again, the index scan is using the index (IX__table1_ASC), but this time I'm getting two sorts.
The first one straight after the index scan.
Isn't the optimiser clever enough to read the index in reverse order?
Again, what's the second sort for?

The actual table is quite large, so you can imagine the sorts are costly. How can I best optimise here?

Solution

The index scan is using the index (IX__table1_ASC) I created but why
am I getting a sort?

Because you are using an inefficient way of selecting the top row per group.

Just use

WITH T
     AS (SELECT *,
                ROW_NUMBER()
                  OVER (
                    PARTITION BY Col1, Col2
                    ORDER BY CreateDate ) AS RN
         FROM   table1)
SELECT *
FROM   T
WHERE  RN = 1


The TOP (1) WITH TIES here is just a more obfuscated and less efficient way of selecting all rows where the row number equals 1. Unfortunately some answerers on StackOverflow use this method for no good reason that I can discern except liking the novelty.

In your first execution plan the sort is not there to calculate the row number but is there to order the rows by the result of that row numbering after it was already calculated without a sort.

Regarding your second query this is a long standing optimser limitation - you can get the backwards ordered index scan and no sort with the below.

WITH T
     AS (SELECT *,
                ROW_NUMBER()
                  OVER (
                    PARTITION BY Col1, Col2
                    ORDER BY CreateDate DESC ) AS RN
         FROM   table1
         ORDER BY Col1 DESC, Col2 DESC, CreateDate DESC
         OFFSET 0 ROWS
         )
SELECT *
FROM   T
WHERE  RN = 1


The OFFSET 0 ROWS is a hack to allow ORDER BY in a derived table, which is not normally allowed in SQL Server. The important thing is to give the optimizer a separate reason to consider the optimal ordering.

A presentation-level ORDER BY would serve the same purpose, but I prefer placing the hack closer to the thing that requires it. This approach also allows you to specify a different presentation order. Bear in mind the OFFSET 0 could be optimized away some day, like TOP (100) PERCENT is.

In SQL Server 2000 some people used to get an "ordered view" by adding a TOP 100 PERCENT ... ORDER BY. The effect of this, at least most of the time, was that just doing a plain SELECT from the view without any ORDER BY on the outer query returned the rows in the desired order. This was never guaranteed and in SQL Server 2005 logic was added to the optimiser that just optimised TOP 100 PERCENT out in this type of case as logically redundant. Potentially the same could happen in the future to OFFSET 0 ROWS as it is similarly redundant.

Personally I would hope that any spare engineering effort goes into improving the optimisation so this sort of hack isn't necessary first though!

Code Snippets

WITH T
     AS (SELECT *,
                ROW_NUMBER()
                  OVER (
                    PARTITION BY Col1, Col2
                    ORDER BY CreateDate ) AS RN
         FROM   table1)
SELECT *
FROM   T
WHERE  RN = 1
WITH T
     AS (SELECT *,
                ROW_NUMBER()
                  OVER (
                    PARTITION BY Col1, Col2
                    ORDER BY CreateDate DESC ) AS RN
         FROM   table1
         ORDER BY Col1 DESC, Col2 DESC, CreateDate DESC
         OFFSET 0 ROWS
         )
SELECT *
FROM   T
WHERE  RN = 1

Context

StackExchange Database Administrators Q#317999, answer score: 16

Revisions (0)

No revisions yet.