patternsqlMinor
SQL Server query scanning more partitions than expected
Viewed 0 times
sqlquerymorethanexpectedpartitionsserverscanning
Problem
So my team has the following select statement in one of our stored procedures
And we've partitioned the DailyCosts table on the months of each year as follows:
We noticed that when we ran the procedure, rather than finding the appropriate partitions right away it looked like it was scanning partitions from either end. For the above example, it viewed partitions
When we removed the math from the where statement (
What causes the server to scan the partitions when this math is included, and is it actually looking through all of the indexes in these partitions or is it just checking their ranges and moving on?
Any and all sources you can provide would go a long way in helping us understand!
SELECT
ai.Name
,dc.Component
,SUM(dc.Value) Value
FROM
Warm.DailyCosts dc
JOIN Warm.AccountInfo ai
ON dc.AccountInfoId = ai.Id
WHERE
ai.CorrelationId = '00000000-0000-0000-0000-000000000000'
AND ((dc.CalendarId >= (20190601 + ai.DayOfMonth - 1)
AND dc.CalendarId = 20270601
AND dc.CalendarId < (20270601 + ai.DayOfMonth - 1)))
GROUP BY
Name
,ComponentAnd we've partitioned the DailyCosts table on the months of each year as follows:
CREATE PARTITION FUNCTION [PF_CostDateByYearMonth](int) AS RANGE RIGHT FOR VALUES (
20180101,
20180201,
20180301,
20180401,
20180501,
20180601,
20180701,
20180801,
20180901,
20181001,
20181101,
20181201,
...
20300101,
20300201,
20300301,
20300401,
20300501,
20300601,
20300701,
20300801,
20300901,
20301001,
20301101,
20301201)We noticed that when we ran the procedure, rather than finding the appropriate partitions right away it looked like it was scanning partitions from either end. For the above example, it viewed partitions
20180101 up to 20190701 and 20301201 down to 20270601 totaling 62 partitions.When we removed the math from the where statement (
+ ai.DayOfMonth - 1), the partitions read dropped down to 2, 20190601 and 20270601, as expected. Note that we use seed data and DayOfMonth is set to 15 on every account.What causes the server to scan the partitions when this math is included, and is it actually looking through all of the indexes in these partitions or is it just checking their ranges and moving on?
Any and all sources you can provide would go a long way in helping us understand!
Solution
Reasoning
When comparing the column of the non partitioned table against the partitioned table, sql server will not be able to know what
As a result it would not know which partitions to return to satisfy this filtering when joining the two tables.
A different example to give some more insight can be found here.
Testing
I was able to recreate your issue, for further questions please add as much information as possible.
This could be Table definition, indexes, partition scheme, ...
The DDL & DML is at the bottom.
When running the query we can get the same result:
62 partitions returned.
On the seek predicates it tries to filter what it can, this is the two filters on
This translates to
When running the query with these filters you will see the same amount of rows returned on accessing the table in the query plan.
Only after getting this data, it can and will be joined to the
To confirm your statement that when we remove the column that only two partitions are scanned:
Only two are accessed and used:
Solution
I am going to assume that
If you know these boundaries, you can 'hard code' them so that sql server knows what partitions to look for. After this you can add the extra filtering.
E.G.
The sole purpose of the
Sidenote: Adding a constraint does not work to enforce this.
This query gives us the result we want, with the correct partition elemination:
Only accessing partitions
You can probably also use
Test data
```
CREATE SCHEMA WARM
GO
CREATE TABLE Warm.DailyCosts(ID INT IDENTITY(1,1) NOT NULL,
Component int,
Value int,
CalendarId INT,
AccountInfoId int
)
CREATE TABLE Warm.AccountInfo(Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
Name VARCHAR(25),
CorrelationId uniqueidentifier,
DayOfMonth int
);
USE [master]
GO
ALTER DATABASE [test] ADD FILEGROUP [Partitionfg]
GO
USE [test]
GO
CREATE PARTITION FUNCTION PF_CostDateByYearMonth AS RANGE RIGHT FOR VALUES (
20180101,
20180201,
20180301,
20180401,
20180501,
20180601,
20180701,
20180801,
20180901,
20181001,
20181101,
20181201,
20190101,
20190201,
20190301,
20190401,
20190501,
20190601,
20190701,
20270601,
20270701,
20270801,
20270901,
20271001,
20271101,
20271201,
20280101,
20280201,
20280301,
20280401,
20280501,
20280601,
20280701,
20280801,
20280901,
20281001,
20281101,
20281201,
20290101,
20290201,
20290301,
20290401,
20290501,
20290601,
20290701,
20290801,
20290901,
20291001,
20291101,
20291201,
20300101,
20300201,
20300301,
20300401,
20300501,
20300601,
20300701,
20300801,
20300901,
20301001,
20301101,
20301201)
CREATE PARTITION SCHEME [PS_CostDateByYearMonth]
AS PARTITION [PF_CostDateByYearMonth]
ALL TO ( [Partitionfg] );
CREATE UNIQUE CLUSTERED INDEX ptci ON Warm.DailyCosts(CalendarId,Id) ON PS_CostDateByYearMonth ;
USE [master]
GO
ALTER DATABASE [test] ADD FILE ( NAME = N'TestPartition', FILENAME = N'D:\DATA\TestPartition.ndf' , SIZE = 3072KB , FILEGROWTH = 65536KB ) TO FILEGROUP [Partitionfg]
GO
USE [test]
GO
INSERT INTO Warm.DailyCosts(Component,
When comparing the column of the non partitioned table against the partitioned table, sql server will not be able to know what
DayOfMonth will hold, even if all of them are 15. As a result it would not know which partitions to return to satisfy this filtering when joining the two tables.
A different example to give some more insight can be found here.
Testing
I was able to recreate your issue, for further questions please add as much information as possible.
This could be Table definition, indexes, partition scheme, ...
The DDL & DML is at the bottom.
When running the query we can get the same result:
62 partitions returned.
On the seek predicates it tries to filter what it can, this is the two filters on
CalendarId that do not have the + DayOfMonth-1.This translates to
WHERE ...
AND (( dc.CalendarId = 20270601))
...When running the query with these filters you will see the same amount of rows returned on accessing the table in the query plan.
Only after getting this data, it can and will be joined to the
AccountInfo table on the ID's and the DayOfMonth - 1To confirm your statement that when we remove the column that only two partitions are scanned:
SELECT
ai.Name
,SUM(dc.Component )
,SUM(dc.Value) Value
FROM
Warm.DailyCosts dc
JOIN Warm.AccountInfo ai
ON dc.AccountInfoId = ai.Id
WHERE
ai.CorrelationId = '00000000-0000-0000-0000-000000000000'
AND ((dc.CalendarId >= (20190601 - 1)
AND dc.CalendarId = 20270601
AND dc.CalendarId < (20270601 - 1)))
GROUP BY NameOnly two are accessed and used:
Solution
I am going to assume that
DayOfMonth only goes up to 31 days. If you know these boundaries, you can 'hard code' them so that sql server knows what partitions to look for. After this you can add the extra filtering.
E.G.
WITH CTE
AS
(
SELECT
ai.Name
,dc.Component
,dc.Value
,dc.CalendarId
,ai.DayOfMonth
FROM
Warm.DailyCosts dc
JOIN Warm.AccountInfo ai
ON dc.AccountInfoId = ai.Id
WHERE
ai.CorrelationId = '00000000-0000-0000-0000-000000000000'
AND ((dc.CalendarId >= (20190601 )
AND dc.CalendarId = 20270601
AND dc.CalendarId = (20190601 + DayOfMonth - 1)
AND CalendarId = 20270601
AND CalendarId < (20270601 + DayOfMonth - 1)))
GROUP BY
Name
,ComponentThe sole purpose of the
cte is to let sql server know that it can also filter on dc.CalendarId >= dc.CalendarId >= (20190601 ) OR dc.CalendarId < (20270701)).Sidenote: Adding a constraint does not work to enforce this.
This query gives us the result we want, with the correct partition elemination:
Only accessing partitions
19 & 21.You can probably also use
OR / AND logic to get the same result if you prefer. The important part is getting the boundaries known.Test data
```
CREATE SCHEMA WARM
GO
CREATE TABLE Warm.DailyCosts(ID INT IDENTITY(1,1) NOT NULL,
Component int,
Value int,
CalendarId INT,
AccountInfoId int
)
CREATE TABLE Warm.AccountInfo(Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
Name VARCHAR(25),
CorrelationId uniqueidentifier,
DayOfMonth int
);
USE [master]
GO
ALTER DATABASE [test] ADD FILEGROUP [Partitionfg]
GO
USE [test]
GO
CREATE PARTITION FUNCTION PF_CostDateByYearMonth AS RANGE RIGHT FOR VALUES (
20180101,
20180201,
20180301,
20180401,
20180501,
20180601,
20180701,
20180801,
20180901,
20181001,
20181101,
20181201,
20190101,
20190201,
20190301,
20190401,
20190501,
20190601,
20190701,
20270601,
20270701,
20270801,
20270901,
20271001,
20271101,
20271201,
20280101,
20280201,
20280301,
20280401,
20280501,
20280601,
20280701,
20280801,
20280901,
20281001,
20281101,
20281201,
20290101,
20290201,
20290301,
20290401,
20290501,
20290601,
20290701,
20290801,
20290901,
20291001,
20291101,
20291201,
20300101,
20300201,
20300301,
20300401,
20300501,
20300601,
20300701,
20300801,
20300901,
20301001,
20301101,
20301201)
CREATE PARTITION SCHEME [PS_CostDateByYearMonth]
AS PARTITION [PF_CostDateByYearMonth]
ALL TO ( [Partitionfg] );
CREATE UNIQUE CLUSTERED INDEX ptci ON Warm.DailyCosts(CalendarId,Id) ON PS_CostDateByYearMonth ;
USE [master]
GO
ALTER DATABASE [test] ADD FILE ( NAME = N'TestPartition', FILENAME = N'D:\DATA\TestPartition.ndf' , SIZE = 3072KB , FILEGROWTH = 65536KB ) TO FILEGROUP [Partitionfg]
GO
USE [test]
GO
INSERT INTO Warm.DailyCosts(Component,
Code Snippets
WHERE ...
AND (( dc.CalendarId < 20190701)
OR (dc.CalendarId >= 20270601))
...SELECT
ai.Name
,SUM(dc.Component )
,SUM(dc.Value) Value
FROM
Warm.DailyCosts dc
JOIN Warm.AccountInfo ai
ON dc.AccountInfoId = ai.Id
WHERE
ai.CorrelationId = '00000000-0000-0000-0000-000000000000'
AND ((dc.CalendarId >= (20190601 - 1)
AND dc.CalendarId < 20190701)
OR (dc.CalendarId >= 20270601
AND dc.CalendarId < (20270601 - 1)))
GROUP BY NameWITH CTE
AS
(
SELECT
ai.Name
,dc.Component
,dc.Value
,dc.CalendarId
,ai.DayOfMonth
FROM
Warm.DailyCosts dc
JOIN Warm.AccountInfo ai
ON dc.AccountInfoId = ai.Id
WHERE
ai.CorrelationId = '00000000-0000-0000-0000-000000000000'
AND ((dc.CalendarId >= (20190601 )
AND dc.CalendarId < 20190701)
OR (dc.CalendarId >= 20270601
AND dc.CalendarId < (20270701)))
)
SELECT Name,Component,SUM(Value)
FROM CTE
WHERE ((CalendarId >= (20190601 + DayOfMonth - 1)
AND CalendarId < 20190701)
OR (CalendarId >= 20270601
AND CalendarId < (20270601 + DayOfMonth - 1)))
GROUP BY
Name
,ComponentCREATE SCHEMA WARM
GO
CREATE TABLE Warm.DailyCosts(ID INT IDENTITY(1,1) NOT NULL,
Component int,
Value int,
CalendarId INT,
AccountInfoId int
)
CREATE TABLE Warm.AccountInfo(Id INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
Name VARCHAR(25),
CorrelationId uniqueidentifier,
DayOfMonth int
);
USE [master]
GO
ALTER DATABASE [test] ADD FILEGROUP [Partitionfg]
GO
USE [test]
GO
CREATE PARTITION FUNCTION [PF_CostDateByYearMonth](int) AS RANGE RIGHT FOR VALUES (
20180101,
20180201,
20180301,
20180401,
20180501,
20180601,
20180701,
20180801,
20180901,
20181001,
20181101,
20181201,
20190101,
20190201,
20190301,
20190401,
20190501,
20190601,
20190701,
20270601,
20270701,
20270801,
20270901,
20271001,
20271101,
20271201,
20280101,
20280201,
20280301,
20280401,
20280501,
20280601,
20280701,
20280801,
20280901,
20281001,
20281101,
20281201,
20290101,
20290201,
20290301,
20290401,
20290501,
20290601,
20290701,
20290801,
20290901,
20291001,
20291101,
20291201,
20300101,
20300201,
20300301,
20300401,
20300501,
20300601,
20300701,
20300801,
20300901,
20301001,
20301101,
20301201)
CREATE PARTITION SCHEME [PS_CostDateByYearMonth]
AS PARTITION [PF_CostDateByYearMonth]
ALL TO ( [Partitionfg] );
CREATE UNIQUE CLUSTERED INDEX ptci ON Warm.DailyCosts(CalendarId,Id) ON [PS_CostDateByYearMonth](CalendarId) ;
USE [master]
GO
ALTER DATABASE [test] ADD FILE ( NAME = N'TestPartition', FILENAME = N'D:\DATA\TestPartition.ndf' , SIZE = 3072KB , FILEGROWTH = 65536KB ) TO FILEGROUP [Partitionfg]
GO
USE [test]
GO
INSERT INTO Warm.DailyCosts(Component,
Value,
CalendarId,
AccountInfoId
)
SELECT TOP(1000000) --1M
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
20180101 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) / 1000 % 1000 ,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2
-- 2018
INSERT INTO Warm.DailyCosts(Component,
Value,
CalendarId,
AccountInfoId
)
SELECT TOP(1000000) --1M
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
20190101 + ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) / 1000 % 1000 ,
ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FRContext
StackExchange Database Administrators Q#245575, answer score: 9
Revisions (0)
No revisions yet.