patternsqlMajor
Seek and you shall Scan... on partitioned tables
Viewed 0 times
seektablesscanyouandshallpartitioned
Problem
I've read these articles in PCMag by Itzik Ben-Gan:
Seek and You Shall Scan Part I: When the Optimizer Doesn't Optimize
Seek and You Shall Scan Part II: Ascending Keys
I’m currently having a "Grouped Max" problem with all of our partitioned tables.
We use the trick Itzik Ben-Gan provided for getting a max(ID), but sometimes it just doesn’t run:
I get this plan
But after 45 minutes, look at the reads
that I get out of
Normally it runs quite quickly, but not today.
Edit:
table structure with partitions:
```
CREATE PARTITION FUNCTION MonthlySmallDateTime AS RANGE RIGHT FOR VALUES (N'2000-01-01T00:00:00.000', N'2000-02-01T00:00:00.000' / and many more /)
go
CREATE PARTITION SCHEME PS_FctContractualAvailability AS PARTITION [MonthlySmallDateTime] TO ([Standard], [Standard])
GO
CREATE TABLE fct.MyTable(
MyTableID BIGINT IDENTITY(1,1),
[DT1TurbineID] INT NOT NULL,
[PCTimeStamp] SMALLDATETIME NOT NULL,
Filler CHAR(100) NOT NULL DEFAULT 'N/A',
UpdatedID BIGINT NULL,
UpdatedDate DATETIME NULL
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUST
Seek and You Shall Scan Part I: When the Optimizer Doesn't Optimize
Seek and You Shall Scan Part II: Ascending Keys
I’m currently having a "Grouped Max" problem with all of our partitioned tables.
We use the trick Itzik Ben-Gan provided for getting a max(ID), but sometimes it just doesn’t run:
DECLARE @MaxIDPartitionTable BIGINT
SELECT @MaxIDPartitionTable = ISNULL(MAX(IDPartitionedTable), 0)
FROM ( SELECT *
FROM ( SELECT partition_number PartitionNumber
FROM sys.partitions
WHERE object_id = OBJECT_ID('fct.MyTable')
AND index_id = 1
) T1
CROSS APPLY ( SELECT ISNULL(MAX(UpdatedID), 0) AS IDPartitionedTable
FROM fct.MyTable s
WHERE $PARTITION.PF_MyTable(s.PCTimeStamp) = PartitionNumber
AND UpdatedID <= @IDColumnThresholdValue
) AS o
) AS T2;
SELECT @MaxIDPartitionTableI get this plan
But after 45 minutes, look at the reads
reads writes physical_reads
12,949,127 2 12,992,610that I get out of
sp_whoisactive.Normally it runs quite quickly, but not today.
Edit:
table structure with partitions:
```
CREATE PARTITION FUNCTION MonthlySmallDateTime AS RANGE RIGHT FOR VALUES (N'2000-01-01T00:00:00.000', N'2000-02-01T00:00:00.000' / and many more /)
go
CREATE PARTITION SCHEME PS_FctContractualAvailability AS PARTITION [MonthlySmallDateTime] TO ([Standard], [Standard])
GO
CREATE TABLE fct.MyTable(
MyTableID BIGINT IDENTITY(1,1),
[DT1TurbineID] INT NOT NULL,
[PCTimeStamp] SMALLDATETIME NOT NULL,
Filler CHAR(100) NOT NULL DEFAULT 'N/A',
UpdatedID BIGINT NULL,
UpdatedDate DATETIME NULL
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUST
Solution
The basic issue is that the Index Seek is not followed by a Top operator. This is an optimization that is usually introduced when the seek returns rows in the correct order for a
This optimization exploits the fact that the min/max row is the first one in ascending or descending order. It may also be that the optimizer cannot apply this optimization to partitioned tables; I forget.
Anyway, the point is that without this transformation, the execution plan ends up processing every row that qualifies
You have not provided table, index, or partitioning definitions in the question so I can't be much more specific. You should check that your index would support such a transformation. More or less equivalently, you could also express the
If this results in a Sort (including a TopN Sort), you know your index is not helpful. For example:
The plan shape this should produce is:
Notice the Top below the Index Seek. This limits the processing to one row per partition.
Or, using a temporary table to hold partition numbers:
Side note: accessing a system table in your query prevents parallelism. If this is important, consider materializing the partition numbers in a temporary table, then
Side note 2: There is an active Connect item requesting built-in support for
MIN\MAX aggregate.This optimization exploits the fact that the min/max row is the first one in ascending or descending order. It may also be that the optimizer cannot apply this optimization to partitioned tables; I forget.
Anyway, the point is that without this transformation, the execution plan ends up processing every row that qualifies
S.UpdatedID <= @IDColumnThresholdValue per partition, rather than the desired one row per partition.You have not provided table, index, or partitioning definitions in the question so I can't be much more specific. You should check that your index would support such a transformation. More or less equivalently, you could also express the
MAX as a TOP (1) ... ORDER BY UpdatedID DESC.If this results in a Sort (including a TopN Sort), you know your index is not helpful. For example:
SELECT
@MaxIDPartitionTable = ISNULL(MAX(T2.IDPartitionedTable), 0)
FROM
(
SELECT
O.IDPartitionedTable
FROM
(
SELECT
P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE
P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
AND P.index_id = 1
) AS T1
CROSS APPLY
(
SELECT TOP (1)
S.UpdatedID AS IDPartitionedTable
FROM fct.MyTable AS S
WHERE
$PARTITION.PF_MyTable(S.PCTimeStamp) = T1.PartitionNumber
AND S.UpdatedID <= @IDColumnThresholdValue
ORDER BY
S.UpdatedID DESC
) AS O
) AS T2;The plan shape this should produce is:
Notice the Top below the Index Seek. This limits the processing to one row per partition.
Or, using a temporary table to hold partition numbers:
CREATE TABLE #Partitions
(
partition_number integer PRIMARY KEY CLUSTERED
);
INSERT #Partitions
(partition_number)
SELECT
P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE
P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
AND P.index_id = 1;
SELECT
@MaxIDPartitionTable = ISNULL(MAX(T2.UpdatedID), 0)
FROM #Partitions AS P
CROSS APPLY
(
SELECT TOP (1)
S.UpdatedID
FROM fct.MyTable AS S
WHERE
$PARTITION.PF_MyTable(S.PCTimeStamp) = P.partition_number
AND S.UpdatedID <= @IDColumnThresholdValue
ORDER BY
S.UpdatedID DESC
) AS T2;
DROP TABLE #Partitions;Side note: accessing a system table in your query prevents parallelism. If this is important, consider materializing the partition numbers in a temporary table, then
APPLY from that. Parallelism is not normally helpful in this pattern (with correct indexing) but it would be remiss of me not to mention it.Side note 2: There is an active Connect item requesting built-in support for
MIN\MAX aggregates and Top on partitioned objects.Code Snippets
SELECT
@MaxIDPartitionTable = ISNULL(MAX(T2.IDPartitionedTable), 0)
FROM
(
SELECT
O.IDPartitionedTable
FROM
(
SELECT
P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE
P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
AND P.index_id = 1
) AS T1
CROSS APPLY
(
SELECT TOP (1)
S.UpdatedID AS IDPartitionedTable
FROM fct.MyTable AS S
WHERE
$PARTITION.PF_MyTable(S.PCTimeStamp) = T1.PartitionNumber
AND S.UpdatedID <= @IDColumnThresholdValue
ORDER BY
S.UpdatedID DESC
) AS O
) AS T2;CREATE TABLE #Partitions
(
partition_number integer PRIMARY KEY CLUSTERED
);
INSERT #Partitions
(partition_number)
SELECT
P.partition_number AS PartitionNumber
FROM sys.partitions AS P
WHERE
P.[object_id] = OBJECT_ID(N'fct.MyTable', N'U')
AND P.index_id = 1;
SELECT
@MaxIDPartitionTable = ISNULL(MAX(T2.UpdatedID), 0)
FROM #Partitions AS P
CROSS APPLY
(
SELECT TOP (1)
S.UpdatedID
FROM fct.MyTable AS S
WHERE
$PARTITION.PF_MyTable(S.PCTimeStamp) = P.partition_number
AND S.UpdatedID <= @IDColumnThresholdValue
ORDER BY
S.UpdatedID DESC
) AS T2;
DROP TABLE #Partitions;Context
StackExchange Database Administrators Q#117016, answer score: 28
Revisions (0)
No revisions yet.