patternsqlModerate

Paging performance with customizable sorting over many millions of rows

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

sortingrowspagingmillionswithcustomizableperformancemanyover

Problem

In our application we have a grid where users can page over a large number of records (10-20 million). The grid supports sorting in ascending and descending order in a number of columns (20+). Many of the values are also not unique and so the application also sorts by id as a tie-breaker to make sure that rows always appear on the same page. As an example, should the user want to sort by widget size (starting with the largest), the application generates a query that looks a bit like this:

SELECT TOP 30
    * -- (Pretend that there is a list of columns here)
FROM Test
--  WHERE widgetSize > 100
ORDER BY
    widgetSize DESC,
    id ASC

This query takes ~15s to run (with cached data), the major of the cost appears to be sorting ~1.3m rows by widgetSize. In an attempt to tune this query I discovered that if I add in a WHERE clause restricted to just the largest widgetSizes(commented out in the above query) the query takes just ~800ms (all of the top 50,000 results have a widget size > 100).

Why is the query without the WHERE clause so much slower? I've checked the statistics on the widgetSize column and they show that the top 739 rows have a WidgetSize > 506. As only 30 rows are required can SQL server not use this information to deduce that it only needs to sort rows with a widget size which is large?

I know that I can make this specific query perform quicker by adding in an index on widgetSize and id, however this index is only useful in this specific scenario, and becomes worthless if (for example) the user reverses the sort direction. This table contains many additional columns and each index is large (~200mb) so I can't really afford to add an index for every possible sort order.

Is there some way I can get these queries query to perform without adding an index for every possible sort order? (the user can sort by any one of 20+ columns)

The following script creates the above table and populates it with some representative data. The table is

Solution

There is no magic solution to this type of problem. To avoid a potentially expensive sort, there has to be an index that can provide the requested order (and the optimizer must choose to use that index). Without a supporting index, the best SQL Server can do natively is to restrict the qualifying rows (based on the WHERE clause) before sorting the resulting set. Without a WHERE clause, this means sorting all the rows in the table.

I've checked the statistics on the widgetSize column and they show that the top 739 rows have a WidgetSize > 506

The 'top 739' rows in that statement presumably refers to the first entries in the statistics histogram, ordered by RANGE_HI_KEY. The histogram is built on an ordered stream (using a sort). No information is kept about where those rows are in the table. Even if those rows are encountered first in the table scan, the engine has no option but to fully complete the scan to ensure it doesn't encounter values that sort higher.

As only 30 rows are required can SQL server not use this information to deduce that it only needs to sort rows with a widget size which is large?

To find the 30 largest rows, SQL Server has to check every single row (that qualifies the WHERE clause). There is no way for SQL Server to pick an arbitrary 'minimum value' that qualifies as 'large enough', and even if it did, it couldn't locate those rows without the appropriate index.

In fact, Top N Sort where N <= 100 does use a replacement strategy where only incoming values that are larger than the current minimum are placed in the sort buffer, but this is a minor optimization compared to the cost of reading rows from the table and passing them to the sort.

In principle, the engine could push a dynamic filter (on the current minimum value present in the sort buffer) down into the table scan, to restrict rows as early as possible, but this is not implemented. To work around this, a similar idea involves creating an indexed view over the distinct values of widgetSize with the number of rows matching each value:

CREATE VIEW dbo.WidgetSizes
WITH SCHEMABINDING
AS
SELECT
    T.widgetSize,
    NumRows = COUNT_BIG(*) 
FROM dbo.Test AS T
GROUP BY
    T.widgetSize;
GO
CREATE UNIQUE CLUSTERED INDEX CUQ_WidgetSizes_widgetSize
ON dbo.WidgetSizes (widgetSize);

This indexed view will be much smaller than an equivalent nonclustered index on widgetSize if there are relatively few distinct values (as is the case with the sample data). This information can then be used to assess which minimum widgetSize to filter on, while still guaranteeing there will be at least 30 rows found.

First page

For the first page of 30 rows, the implementation looks like this:

DECLARE 
    @TopRows bigint = 30,
    @Minimum integer;

SELECT TOP (1)
    @Minimum = Filtered.widgetSize
FROM 
(
    SELECT * FROM 
    (
        SELECT
            WS.widgetSize,
            WS.NumRows,
            -- SQL Server 2012 or later
            SumNumRows = SUM(WS.NumRows) OVER (
                ORDER BY WS.widgetSize DESC)
        FROM dbo.WidgetSizes AS WS WITH (NOEXPAND)
    ) AS RunningTotal
    WHERE 
        RunningTotal.SumNumRows >= @TopRows
) AS Filtered
ORDER BY 
    Filtered.SumNumRows ASC;

SELECT TOP (@TopRows)
    T.id,
    T.widgetSize
FROM dbo.Test AS T
WHERE T.widgetSize >= @Minimum
ORDER BY
    T.widgetSize DESC,
    T.id ASC;

Execution plans:

This improves execution time markedly, with most of the remaining cost associated with the table scan and pushed-down filter. Performance can be improved further by creating a nonclustered column-store index (SQL Server 2012 onward):

CREATE NONCLUSTERED COLUMNSTORE INDEX 
    NCCI_Test_id_widgetSize 
ON dbo.Test (id, widgetSize);

On my laptop, performing the scan and filter in batch mode on the column-store index reduced execution time from around 300ms to just 20ms:

Next page

The last row returned by the first-page query has widgetSize = 2903 and id = 327:

Finding the next 30 rows (page 2) requires only simple modifications to the previous query:

DECLARE 
    @TopRows bigint = 30,
    @Minimum integer;

SELECT TOP (1)
    @Minimum = Filtered.widgetSize
FROM 
(
    SELECT * FROM 
    (
        SELECT
            WS.widgetSize,
            WS.NumRows,
            SumNumRows = SUM(WS.NumRows) OVER (
                ORDER BY WS.widgetSize DESC)
        FROM dbo.WidgetSizes AS WS WITH (NOEXPAND)
        WHERE
            -- Added
            WS.widgetSize = @TopRows
) AS Filtered
ORDER BY 
    Filtered.SumNumRows ASC;

SELECT TOP (@TopRows)
    T.id,
    T.widgetSize
FROM dbo.Test AS T
WHERE 
    T.widgetSize >= @Minimum
    AND 
    (
        -- Added
        T.widgetSize  327)
    )
ORDER BY
    T.widgetSize DESC,
    T.id ASC;

This produces the same results as the obvious extension of the original query:

```
SELECT TOP 30
* -- (Pretend that there is a list of columns here)
FROM Test
WHERE widgetSize 327)
ORDER BY

Code Snippets

CREATE VIEW dbo.WidgetSizes
WITH SCHEMABINDING
AS
SELECT
    T.widgetSize,
    NumRows = COUNT_BIG(*) 
FROM dbo.Test AS T
GROUP BY
    T.widgetSize;
GO
CREATE UNIQUE CLUSTERED INDEX CUQ_WidgetSizes_widgetSize
ON dbo.WidgetSizes (widgetSize);

DECLARE 
    @TopRows bigint = 30,
    @Minimum integer;

SELECT TOP (1)
    @Minimum = Filtered.widgetSize
FROM 
(
    SELECT * FROM 
    (
        SELECT
            WS.widgetSize,
            WS.NumRows,
            -- SQL Server 2012 or later
            SumNumRows = SUM(WS.NumRows) OVER (
                ORDER BY WS.widgetSize DESC)
        FROM dbo.WidgetSizes AS WS WITH (NOEXPAND)
    ) AS RunningTotal
    WHERE 
        RunningTotal.SumNumRows >= @TopRows
) AS Filtered
ORDER BY 
    Filtered.SumNumRows ASC;

SELECT TOP (@TopRows)
    T.id,
    T.widgetSize
FROM dbo.Test AS T
WHERE T.widgetSize >= @Minimum
ORDER BY
    T.widgetSize DESC,
    T.id ASC;

CREATE NONCLUSTERED COLUMNSTORE INDEX 
    NCCI_Test_id_widgetSize 
ON dbo.Test (id, widgetSize);

DECLARE 
    @TopRows bigint = 30,
    @Minimum integer;

SELECT TOP (1)
    @Minimum = Filtered.widgetSize
FROM 
(
    SELECT * FROM 
    (
        SELECT
            WS.widgetSize,
            WS.NumRows,
            SumNumRows = SUM(WS.NumRows) OVER (
                ORDER BY WS.widgetSize DESC)
        FROM dbo.WidgetSizes AS WS WITH (NOEXPAND)
        WHERE
            -- Added
            WS.widgetSize < 2903
    ) AS RunningTotal
    WHERE 
        RunningTotal.SumNumRows >= @TopRows
) AS Filtered
ORDER BY 
    Filtered.SumNumRows ASC;

SELECT TOP (@TopRows)
    T.id,
    T.widgetSize
FROM dbo.Test AS T
WHERE 
    T.widgetSize >= @Minimum
    AND 
    (
        -- Added
        T.widgetSize < 2903
        OR (widgetSize = 2903 AND id > 327)
    )
ORDER BY
    T.widgetSize DESC,
    T.id ASC;

SELECT TOP 30
    * -- (Pretend that there is a list of columns here)
FROM Test
    WHERE widgetSize < 2903
    OR (widgetSize = 2903 AND id > 327)
ORDER BY
    widgetSize DESC,
    id ASC;

Context

StackExchange Database Administrators Q#87351, answer score: 16

Revisions (0)

No revisions yet.