patternsqlMinor
Does partitioning help if you don't query on the partition value?
Viewed 0 times
thepartitionyouqueryhelpvaluedoespartitioningdon
Problem
Say I have an
The table has over 10 million rows and the partition is quarterly (or some such regular interval).
If I run a query like:
Am I going to see any benefit from the partition for this query?
I don't see how there could be a benefit because I am not using the partition value in my query and all of the table still needs to be searched.
Note: The results of the query will all be in the most recent partition. But I don't see how that would help.
Second Note: The partitions are all on the same SAN drive (Sys Admins won't let it be any other way), so the disk IO will not be parallel.
Is there a better way to partition so that random queries, like the one above, will perform better? (Maybe on the primary key?)
Third Note: The query above is made up. I am using it to show that I have many queries that hit many random columns (which are properly indexed).
orders.Order table that is partitioned on CreatedWhen (a datetime column set to GetDate() when the row is created). The table has over 10 million rows and the partition is quarterly (or some such regular interval).
If I run a query like:
SELECT ord.OrderId, ord.ClientId, ord.ReceptId, ord.Cost, agt.State
FROM orders.Order ord
join personnel.Agent agt
on ord.AgentId = agt.AgentId
WHERE agt.FirstName = 'Bob'
and LastName = 'Whiely'
and ord.Cost > 135Am I going to see any benefit from the partition for this query?
I don't see how there could be a benefit because I am not using the partition value in my query and all of the table still needs to be searched.
Note: The results of the query will all be in the most recent partition. But I don't see how that would help.
Second Note: The partitions are all on the same SAN drive (Sys Admins won't let it be any other way), so the disk IO will not be parallel.
Is there a better way to partition so that random queries, like the one above, will perform better? (Maybe on the primary key?)
Third Note: The query above is made up. I am using it to show that I have many queries that hit many random columns (which are properly indexed).
Solution
Partitioning is not typically a performance enhancer. Normally, partitioning is employed to enable efficient data management. For instance, you can employ partitioning to easily swap old rows out of a table into an archive table, one partition at a time.
Querying against a partitioned table where your query does not include the partitioning key forces SQL Server to look at the entire table. This is less efficient than simply scanning or seeking into a non-partitioned table because each partition is essentially a table, necessitating a
To make this more obvious, lets set up a simple test rig in tempdb, with a date-based partitioning key:
This will populate the table with 1,000,000 rows with randomly generated data evenly spread over all partitions:
Let's go ahead and create a non-clustered index using our partitioning scheme:
Now, let's look at three "typical" queries you might execute (the first query just gets an actual value out of the table, which is then used in the
Here is a quick screenshot of the execution plans so you can see the difference adding the partitioning key to the
The "query costs relative to the batch" numbers are instructive here. The first query consumes 50% of the batch cost since it performed a scan of all 13 partitions. It does that since it has no idea where in the table the particular value of
The "statistics" for the 3 queries are:
Table 'Tab'. Scan count 13, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tab'. Scan count 6, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tab'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
What if we add one of the other columns to these queries:
```
DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';
SELECT Data1
, Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d
Querying against a partitioned table where your query does not include the partitioning key forces SQL Server to look at the entire table. This is less efficient than simply scanning or seeking into a non-partitioned table because each partition is essentially a table, necessitating a
UNION ALL style query against all the partitions. This is not particularly obvious when looking at a query plan for a partitioned table, although you can see it if you look carefully at the pop-up properties for an index or table seek or scan:To make this more obvious, lets set up a simple test rig in tempdb, with a date-based partitioning key:
USE tempdb;
GO
IF OBJECT_ID(N'dbo.Tab', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.Tab;
DROP PARTITION SCHEME PartScheme;
DROP PARTITION FUNCTION PartFun;
END
GO
CREATE PARTITION FUNCTION PartFun (datetime)
AS RANGE LEFT
FOR VALUES (
N'2012-01-01T00:00:00'
, N'2012-04-01T00:00:00'
, N'2012-07-01T00:00:00'
, N'2012-10-01T00:00:00'
, N'2013-01-01T00:00:00'
, N'2013-04-01T00:00:00'
, N'2013-07-01T00:00:00'
, N'2013-10-01T00:00:00'
, N'2014-01-01T00:00:00'
, N'2014-04-01T00:00:00'
, N'2014-07-01T00:00:00'
, N'2014-10-01T00:00:00'
);
CREATE PARTITION SCHEME PartScheme
AS PARTITION PartFun
ALL TO ([PRIMARY]);
IF OBJECT_ID(N'dbo.Tab', N'U') IS NOT NULL
DROP TABLE dbo.Tab;
CREATE TABLE dbo.Tab
(
TabID int NOT NULL
, CreateDate datetime NOT NULL
, Data1 varchar(100) NOT NULL
, Data2 varchar(10) NOT NULL
, Data3 varchar(1000) NOT NULL
, CONSTRAINT PK_Tab
PRIMARY KEY CLUSTERED
(CreateDate, TabID)
) ON [PartScheme](CreateDate);This will populate the table with 1,000,000 rows with randomly generated data evenly spread over all partitions:
;WITH Ten AS
(
SELECT v.Num
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))v(Num)
)
, Million AS
(
SELECT Num = (t6.Num * POWER(10, 5))
+ (t5.Num * POWER(10, 4))
+ (t4.Num * POWER(10, 3))
+ (t3.Num * POWER(10, 2))
+ (t2.Num * POWER(10, 1))
+ (t1.Num)
FROM Ten t1
CROSS JOIN Ten t2
CROSS JOIN Ten t3
CROSS JOIN Ten t4
CROSS JOIN Ten t5
CROSS JOIN Ten t6
)
INSERT INTO dbo.Tab (TabID, CreateDate, Data1, Data2, Data3)
SELECT m.Num, DATEADD(DAY, m.Num % 1000, N'2012-01-01T00:00:00')
, CONVERT(varchar(100), CRYPT_GEN_RANDOM(100))
, CONVERT(varchar(10), CRYPT_GEN_RANDOM(10))
, CONVERT(varchar(1000), CRYPT_GEN_RANDOM(1000))
FROM Million m;Let's go ahead and create a non-clustered index using our partitioning scheme:
CREATE NONCLUSTERED INDEX IX_Tab
ON dbo.Tab(Data1)
ON [PartScheme](CreateDate);Now, let's look at three "typical" queries you might execute (the first query just gets an actual value out of the table, which is then used in the
WHERE clause for the three queries):DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';
SELECT Data1
FROM dbo.Tab
WHERE Tab.Data1 = @d;
SELECT Data1
FROM dbo.Tab
WHERE Tab.Data1 = @d
AND tab.CreateDate >= N'2013-08-23T00:00:00'
SELECT Data1
FROM dbo.Tab
WHERE Tab.Data1 = @d
AND tab.CreateDate >= N'2013-08-23T00:00:00'
AND tab.CreateDate <= N'2013-08-29T00:00:00';Here is a quick screenshot of the execution plans so you can see the difference adding the partitioning key to the
WHERE clause can make:The "query costs relative to the batch" numbers are instructive here. The first query consumes 50% of the batch cost since it performed a scan of all 13 partitions. It does that since it has no idea where in the table the particular value of
Data1 exists. The second query consumes only 23% by virtue of the "Start Date" parameter, which can eliminate half the partitions. Finally, the third query only consumes 4% since it only needs to look at a single partition, because we included a "start date" and an "end date".The "statistics" for the 3 queries are:
Table 'Tab'. Scan count 13, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tab'. Scan count 6, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tab'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
What if we add one of the other columns to these queries:
```
DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';
SELECT Data1
, Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d
Code Snippets
USE tempdb;
GO
IF OBJECT_ID(N'dbo.Tab', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.Tab;
DROP PARTITION SCHEME PartScheme;
DROP PARTITION FUNCTION PartFun;
END
GO
CREATE PARTITION FUNCTION PartFun (datetime)
AS RANGE LEFT
FOR VALUES (
N'2012-01-01T00:00:00'
, N'2012-04-01T00:00:00'
, N'2012-07-01T00:00:00'
, N'2012-10-01T00:00:00'
, N'2013-01-01T00:00:00'
, N'2013-04-01T00:00:00'
, N'2013-07-01T00:00:00'
, N'2013-10-01T00:00:00'
, N'2014-01-01T00:00:00'
, N'2014-04-01T00:00:00'
, N'2014-07-01T00:00:00'
, N'2014-10-01T00:00:00'
);
CREATE PARTITION SCHEME PartScheme
AS PARTITION PartFun
ALL TO ([PRIMARY]);
IF OBJECT_ID(N'dbo.Tab', N'U') IS NOT NULL
DROP TABLE dbo.Tab;
CREATE TABLE dbo.Tab
(
TabID int NOT NULL
, CreateDate datetime NOT NULL
, Data1 varchar(100) NOT NULL
, Data2 varchar(10) NOT NULL
, Data3 varchar(1000) NOT NULL
, CONSTRAINT PK_Tab
PRIMARY KEY CLUSTERED
(CreateDate, TabID)
) ON [PartScheme](CreateDate);;WITH Ten AS
(
SELECT v.Num
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))v(Num)
)
, Million AS
(
SELECT Num = (t6.Num * POWER(10, 5))
+ (t5.Num * POWER(10, 4))
+ (t4.Num * POWER(10, 3))
+ (t3.Num * POWER(10, 2))
+ (t2.Num * POWER(10, 1))
+ (t1.Num)
FROM Ten t1
CROSS JOIN Ten t2
CROSS JOIN Ten t3
CROSS JOIN Ten t4
CROSS JOIN Ten t5
CROSS JOIN Ten t6
)
INSERT INTO dbo.Tab (TabID, CreateDate, Data1, Data2, Data3)
SELECT m.Num, DATEADD(DAY, m.Num % 1000, N'2012-01-01T00:00:00')
, CONVERT(varchar(100), CRYPT_GEN_RANDOM(100))
, CONVERT(varchar(10), CRYPT_GEN_RANDOM(10))
, CONVERT(varchar(1000), CRYPT_GEN_RANDOM(1000))
FROM Million m;CREATE NONCLUSTERED INDEX IX_Tab
ON dbo.Tab(Data1)
ON [PartScheme](CreateDate);DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';
SELECT Data1
FROM dbo.Tab
WHERE Tab.Data1 = @d;
SELECT Data1
FROM dbo.Tab
WHERE Tab.Data1 = @d
AND tab.CreateDate >= N'2013-08-23T00:00:00'
SELECT Data1
FROM dbo.Tab
WHERE Tab.Data1 = @d
AND tab.CreateDate >= N'2013-08-23T00:00:00'
AND tab.CreateDate <= N'2013-08-29T00:00:00';DECLARE @d varchar(100);
SELECT TOP(1) @d = t.Data1
FROM dbo.Tab t
WHERE t.CreateDate > N'2013-08-24T00:00:00';
SELECT Data1
, Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d;
SELECT Data1
, Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d
AND tab.CreateDate >= N'2013-08-23T00:00:00'
SELECT Data1
, Data2
FROM dbo.Tab
WHERE Tab.Data1 = @d
AND tab.CreateDate >= N'2013-08-23T00:00:00'
AND tab.CreateDate <= N'2013-08-29T00:00:00';Context
StackExchange Database Administrators Q#89188, answer score: 7
Revisions (0)
No revisions yet.