patternsqlModerate
Is there appreciable overhead from having a large proportion of empty table partitions in SQL Server?
Viewed 0 times
fromserversqlhavingproportionemptylargepartitionsappreciableoverhead
Problem
I recently inherited a project using partitioning on date where a daily scheduled task implements a sliding window scheme with 30 days in the past and 60 future dates.
In reality the data that is inserted uses
Is this an issue that needs addressing or should I just let sleeping dogs lie?
In reality the data that is inserted uses
SYSUTCDATETIME() for the partitioning column so the 60 future partitions are always empty. Is this an issue that needs addressing or should I just let sleeping dogs lie?
Solution
Having a large number of partitions can cause a noticeable performance impact, so removing the relatively large number of empty ones in this case may provide an easy benefit.
Setup
This sets up a situation somewhat similar to described in the question. The first 25 partitions have data and the remaining 66 are empty.
Query
The above took 9.8 seconds for me. It needed to scan the whole of
We have a partition aligned index with leading column
The loop join was actually slightly worse (0.4 seconds slower). The 30,000 seeks are unable to do any partition elimination and all need to inspect 91 partitions, multiplying out the work required significantly.
One final attempt...
This completed in 3.7 seconds for me. The difference is that the query now first identifies the top non empty partition and in the subsequent seeks uses this value to avoid needing to do any work for the empty ones.
So my conclusion is that the empty partitions certainly can have a noticeable impact on query performance, and likely should be addressed if performing queries that do not include the partition column as a predicate.
Removing all but one empty tail partition with the below...
... gives the same speedup (and no longer needed to use the hint to get a loop join plan as the estimated operator cost for the seek fell from
Setup
CREATE PARTITION FUNCTION [pf](datetime2(2)) AS RANGE RIGHT FOR VALUES (N'2020-05-06', N'2020-05-07', N'2020-05-08', N'2020-05-09', N'2020-05-10', N'2020-05-11', N'2020-05-12', N'2020-05-13', N'2020-05-14', N'2020-05-15', N'2020-05-16', N'2020-05-17', N'2020-05-18', N'2020-05-19', N'2020-05-20', N'2020-05-21', N'2020-05-22', N'2020-05-23', N'2020-05-24', N'2020-05-25', N'2020-05-26', N'2020-05-27', N'2020-05-28', N'2020-05-29', N'2020-05-30', N'2020-05-31', N'2020-06-01', N'2020-06-02', N'2020-06-03', N'2020-06-04', N'2020-06-05', N'2020-06-06', N'2020-06-07', N'2020-06-08', N'2020-06-09', N'2020-06-10', N'2020-06-11', N'2020-06-12', N'2020-06-13', N'2020-06-14', N'2020-06-15', N'2020-06-16', N'2020-06-17', N'2020-06-18', N'2020-06-19', N'2020-06-20', N'2020-06-21', N'2020-06-22', N'2020-06-23', N'2020-06-24', N'2020-06-25', N'2020-06-26', N'2020-06-27', N'2020-06-28', N'2020-06-29', N'2020-06-30', N'2020-07-01', N'2020-07-02', N'2020-07-03', N'2020-07-04', N'2020-07-05',N'2020-07-06', N'2020-07-07', N'2020-07-08', N'2020-07-09', N'2020-07-10', N'2020-07-11', N'2020-07-12', N'2020-07-13', N'2020-07-14', N'2020-07-15', N'2020-07-16', N'2020-07-17', N'2020-07-18', N'2020-07-19', N'2020-07-20', N'2020-07-21', N'2020-07-22', N'2020-07-23', N'2020-07-24', N'2020-07-25', N'2020-07-26', N'2020-07-27', N'2020-07-28', N'2020-07-29', N'2020-07-30', N'2020-07-31', N'2020-08-01', N'2020-08-02', N'2020-08-03')
CREATE PARTITION SCHEME [ps] AS PARTITION [pf] ALL TO ([PRIMARY])
CREATE TABLE T1(X INT PRIMARY KEY);
INSERT INTO T1
SELECT TOP 30000 ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.all_objects o1,
sys.all_objects o2
CREATE TABLE T2
(
X INT,
dt2 DATETIME2(2),
OtherCol CHAR(100),
PRIMARY KEY(X, dt2) ON ps(dt2)
);
INSERT INTO T2 (X, dt2)
SELECT TOP 21474836 ROW_NUMBER() OVER (ORDER BY @@SPID),
DATEADD(MILLISECOND, 100 * ROW_NUMBER() OVER (ORDER BY @@SPID), '2020-05-05')
FROM sys.all_objects o1,
sys.all_objects o2,
sys.all_objects o3This sets up a situation somewhat similar to described in the question. The first 25 partitions have data and the remaining 66 are empty.
Query
SET STATISTICS TIME ON;
SELECT COUNT(*)
FROM T1 INNER JOIN T2 ON T1.X = T2.XThe above took 9.8 seconds for me. It needed to scan the whole of
T2.We have a partition aligned index with leading column
X so what happens if we force a loop join?SELECT COUNT(*)
FROM T1 INNER LOOP JOIN T2 ON T1.X = T2.XThe loop join was actually slightly worse (0.4 seconds slower). The 30,000 seeks are unable to do any partition elimination and all need to inspect 91 partitions, multiplying out the work required significantly.
One final attempt...
SELECT COUNT(*)
FROM T1
CROSS APPLY (SELECT TOP 1 $partition.pf(dt2) FROM T2 ORDER BY $partition.pf(dt2) DESC) CA(MaxPtn)
INNER LOOP JOIN T2 ON T1.X = T2.X AND $partition.pf(dt2) <= CA.MaxPtnThis completed in 3.7 seconds for me. The difference is that the query now first identifies the top non empty partition and in the subsequent seeks uses this value to avoid needing to do any work for the empty ones.
So my conclusion is that the empty partitions certainly can have a noticeable impact on query performance, and likely should be addressed if performing queries that do not include the partition column as a predicate.
Removing all but one empty tail partition with the below...
DECLARE @dt datetime2(2) = '2020-08-03'
WHILE @dt >= '2020-05-31'
BEGIN
ALTER PARTITION FUNCTION pf()
MERGE RANGE (@dt)
SET @dt = DATEADD(DAY, -1, @dt)
END... gives the same speedup (and no longer needed to use the hint to get a loop join plan as the estimated operator cost for the seek fell from
523.328 in the original INNER LOOP JOIN case to 149.546)Code Snippets
CREATE PARTITION FUNCTION [pf](datetime2(2)) AS RANGE RIGHT FOR VALUES (N'2020-05-06', N'2020-05-07', N'2020-05-08', N'2020-05-09', N'2020-05-10', N'2020-05-11', N'2020-05-12', N'2020-05-13', N'2020-05-14', N'2020-05-15', N'2020-05-16', N'2020-05-17', N'2020-05-18', N'2020-05-19', N'2020-05-20', N'2020-05-21', N'2020-05-22', N'2020-05-23', N'2020-05-24', N'2020-05-25', N'2020-05-26', N'2020-05-27', N'2020-05-28', N'2020-05-29', N'2020-05-30', N'2020-05-31', N'2020-06-01', N'2020-06-02', N'2020-06-03', N'2020-06-04', N'2020-06-05', N'2020-06-06', N'2020-06-07', N'2020-06-08', N'2020-06-09', N'2020-06-10', N'2020-06-11', N'2020-06-12', N'2020-06-13', N'2020-06-14', N'2020-06-15', N'2020-06-16', N'2020-06-17', N'2020-06-18', N'2020-06-19', N'2020-06-20', N'2020-06-21', N'2020-06-22', N'2020-06-23', N'2020-06-24', N'2020-06-25', N'2020-06-26', N'2020-06-27', N'2020-06-28', N'2020-06-29', N'2020-06-30', N'2020-07-01', N'2020-07-02', N'2020-07-03', N'2020-07-04', N'2020-07-05',N'2020-07-06', N'2020-07-07', N'2020-07-08', N'2020-07-09', N'2020-07-10', N'2020-07-11', N'2020-07-12', N'2020-07-13', N'2020-07-14', N'2020-07-15', N'2020-07-16', N'2020-07-17', N'2020-07-18', N'2020-07-19', N'2020-07-20', N'2020-07-21', N'2020-07-22', N'2020-07-23', N'2020-07-24', N'2020-07-25', N'2020-07-26', N'2020-07-27', N'2020-07-28', N'2020-07-29', N'2020-07-30', N'2020-07-31', N'2020-08-01', N'2020-08-02', N'2020-08-03')
CREATE PARTITION SCHEME [ps] AS PARTITION [pf] ALL TO ([PRIMARY])
CREATE TABLE T1(X INT PRIMARY KEY);
INSERT INTO T1
SELECT TOP 30000 ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM sys.all_objects o1,
sys.all_objects o2
CREATE TABLE T2
(
X INT,
dt2 DATETIME2(2),
OtherCol CHAR(100),
PRIMARY KEY(X, dt2) ON ps(dt2)
);
INSERT INTO T2 (X, dt2)
SELECT TOP 21474836 ROW_NUMBER() OVER (ORDER BY @@SPID),
DATEADD(MILLISECOND, 100 * ROW_NUMBER() OVER (ORDER BY @@SPID), '2020-05-05')
FROM sys.all_objects o1,
sys.all_objects o2,
sys.all_objects o3SET STATISTICS TIME ON;
SELECT COUNT(*)
FROM T1 INNER JOIN T2 ON T1.X = T2.XSELECT COUNT(*)
FROM T1 INNER LOOP JOIN T2 ON T1.X = T2.XSELECT COUNT(*)
FROM T1
CROSS APPLY (SELECT TOP 1 $partition.pf(dt2) FROM T2 ORDER BY $partition.pf(dt2) DESC) CA(MaxPtn)
INNER LOOP JOIN T2 ON T1.X = T2.X AND $partition.pf(dt2) <= CA.MaxPtnDECLARE @dt datetime2(2) = '2020-08-03'
WHILE @dt >= '2020-05-31'
BEGIN
ALTER PARTITION FUNCTION pf()
MERGE RANGE (@dt)
SET @dt = DATEADD(DAY, -1, @dt)
ENDContext
StackExchange Database Administrators Q#268647, answer score: 15
Revisions (0)
No revisions yet.