HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Seek and you shall Scan... on partitioned tables

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
seektablesscanyouandshallpartitioned

Problem

I've read these articles in PCMag by Itzik Ben-Gan:

Seek and You Shall Scan Part I: When the Optimizer Doesn't Optimize

Seek and You Shall Scan Part II: Ascending Keys

I’m currently having a "Grouped Max" problem with all of our partitioned tables.
We use the trick Itzik Ben-Gan provided for getting a max(ID), but sometimes it just doesn’t run:

DECLARE @MaxIDPartitionTable BIGINT
SELECT  @MaxIDPartitionTable = ISNULL(MAX(IDPartitionedTable), 0)
FROM    ( SELECT    *
          FROM      ( SELECT    partition_number PartitionNumber
                      FROM      sys.partitions
                      WHERE     object_id = OBJECT_ID('fct.MyTable')
                                AND index_id = 1
                    ) T1
                    CROSS APPLY ( SELECT    ISNULL(MAX(UpdatedID), 0) AS IDPartitionedTable
                                  FROM      fct.MyTable s
                                  WHERE     $PARTITION.PF_MyTable(s.PCTimeStamp) = PartitionNumber
                                            AND UpdatedID <= @IDColumnThresholdValue
                                ) AS o
        ) AS T2;
SELECT @MaxIDPartitionTable


I get this plan

But after 45 minutes, look at the reads

reads          writes   physical_reads
12,949,127        2       12,992,610


that I get out of sp_whoisactive.

Normally it runs quite quickly, but not today.

Edit:
table structure with partitions:

```
CREATE PARTITION FUNCTION MonthlySmallDateTime AS RANGE RIGHT FOR VALUES (N'2000-01-01T00:00:00.000', N'2000-02-01T00:00:00.000' / and many more /)
go
CREATE PARTITION SCHEME PS_FctContractualAvailability AS PARTITION [MonthlySmallDateTime] TO ([Standard], [Standard])
GO
CREATE TABLE fct.MyTable(
MyTableID BIGINT IDENTITY(1,1),
[DT1TurbineID] INT NOT NULL,
[PCTimeStamp] SMALLDATETIME NOT NULL,
Filler CHAR(100) NOT NULL DEFAULT 'N/A',
UpdatedID BIGINT NULL,
UpdatedDate DATETIME NULL
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUST

Solution

The basic issue is that the Index Seek is not followed by a Top operator. This is an optimization that is usually introduced when the seek returns rows in the correct order for a MIN\MAX aggregate.

This optimization exploits the fact that the min/max row is the first one in ascending or descending order. It may also be that the optimizer cannot apply this optimization to partitioned tables; I forget.

Anyway, the point is that without this transformation, the execution plan ends up processing every row that qualifies S.UpdatedID <= @IDColumnThresholdValue per partition, rather than the desired one row per partition.

You have not provided table, index, or partitioning definitions in the question so I can't be much more specific. You should check that your index would support such a transformation. More or less equivalently, you could also express the MAX as a TOP (1) ... ORDER BY UpdatedID DESC.

If this results in a Sort (including a TopN Sort), you know your index is not helpful. For example:

SELECT
    @MaxIDPartitionTable = ISNULL(MAX(T2.IDPartitionedTable), 0)
FROM    
( 
    SELECT
        O.IDPartitionedTable
    FROM      
    ( 
        SELECT
            P.partition_number AS PartitionNumber
        FROM sys.partitions AS P
        WHERE 
            P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
            AND P.index_id = 1
    ) AS T1
    CROSS APPLY 
    (    
        SELECT TOP (1) 
            S.UpdatedID AS IDPartitionedTable
        FROM fct.MyTable AS S
        WHERE
            $PARTITION.PF_MyTable(S.PCTimeStamp) = T1.PartitionNumber
            AND S.UpdatedID <= @IDColumnThresholdValue
        ORDER BY
            S.UpdatedID DESC
    ) AS O
) AS T2;


The plan shape this should produce is:

Notice the Top below the Index Seek. This limits the processing to one row per partition.

Or, using a temporary table to hold partition numbers:

CREATE TABLE #Partitions
(
    partition_number integer PRIMARY KEY CLUSTERED
);

INSERT #Partitions
    (partition_number)
SELECT
    P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE 
    P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
    AND P.index_id = 1;

SELECT
    @MaxIDPartitionTable = ISNULL(MAX(T2.UpdatedID), 0)
FROM #Partitions AS P
CROSS APPLY 
(
    SELECT TOP (1) 
        S.UpdatedID
    FROM fct.MyTable AS S
    WHERE
        $PARTITION.PF_MyTable(S.PCTimeStamp) = P.partition_number
        AND S.UpdatedID <= @IDColumnThresholdValue
    ORDER BY
        S.UpdatedID DESC
) AS T2;

DROP TABLE #Partitions;


Side note: accessing a system table in your query prevents parallelism. If this is important, consider materializing the partition numbers in a temporary table, then APPLY from that. Parallelism is not normally helpful in this pattern (with correct indexing) but it would be remiss of me not to mention it.

Side note 2: There is an active Connect item requesting built-in support for MIN\MAX aggregates and Top on partitioned objects.

Code Snippets

SELECT
    @MaxIDPartitionTable = ISNULL(MAX(T2.IDPartitionedTable), 0)
FROM    
( 
    SELECT
        O.IDPartitionedTable
    FROM      
    ( 
        SELECT
            P.partition_number AS PartitionNumber
        FROM sys.partitions AS P
        WHERE 
            P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
            AND P.index_id = 1
    ) AS T1
    CROSS APPLY 
    (    
        SELECT TOP (1) 
            S.UpdatedID AS IDPartitionedTable
        FROM fct.MyTable AS S
        WHERE
            $PARTITION.PF_MyTable(S.PCTimeStamp) = T1.PartitionNumber
            AND S.UpdatedID <= @IDColumnThresholdValue
        ORDER BY
            S.UpdatedID DESC
    ) AS O
) AS T2;
CREATE TABLE #Partitions
(
    partition_number integer PRIMARY KEY CLUSTERED
);

INSERT #Partitions
    (partition_number)
SELECT
    P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE 
    P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
    AND P.index_id = 1;

SELECT
    @MaxIDPartitionTable = ISNULL(MAX(T2.UpdatedID), 0)
FROM #Partitions AS P
CROSS APPLY 
(
    SELECT TOP (1) 
        S.UpdatedID
    FROM fct.MyTable AS S
    WHERE
        $PARTITION.PF_MyTable(S.PCTimeStamp) = P.partition_number
        AND S.UpdatedID <= @IDColumnThresholdValue
    ORDER BY
        S.UpdatedID DESC
) AS T2;

DROP TABLE #Partitions;

Context

StackExchange Database Administrators Q#117016, answer score: 28

Revisions (0)

No revisions yet.