HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Getting a realistic query plan when partitioning with low volumes of data

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
realisticvolumeslowwithquerygettingplanwhendatapartitioning

Problem

We're using partitioning to reduce the amount of blocking our OLTP system experiences due to locks, with a partition scheme that splits the working tables into 100 partitions based on a customer id. We're finding during testing however that the execution plans aren't being chosen in quite the way we expected.

The test scenario is a single customer with 300,000 contact records (each contact's data is split across two tables), all residing in a single partition, with a query looking for 500 specific rows in the a customer's partition. You'd expect something like a hash match to eliminate the unwanted 299,500 to kick in pretty early in the plan, but it looks like SQL Server is choosing to take the record count for the entire table and average it over all the partitions before considering how many records it's going to be dealing with, which causes it to choose a nested loop and eliminating the unwanted records much later in the process. Typically this takes 9 times as long as the same query against non partitioned tables.

Oddly, adding an option (recompile) to the select gives a sensible plan, but I'm at a loss as to why this makes a difference. This isn't a stored procedure and during tests we clear the procedure cache before doing each test run.

This behaviour isn't seen when the tables involved are not partitioned, i.e. an appropriate plan is picked every time because the estimated number of rows matches the actual number

Any insights into this behaviour would be appreciated.

Schema setup:

```
USE [Scratch]
GO
CREATE SCHEMA part
GO
CREATE PARTITION FUNCTION ContactPartition AS RANGE LEFT FOR VALUES (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,

Solution

It looks like SQL Server is generating a parameterized query plan that can work for any value of @CustomerPartitionKey. In order to do so, it seems to treat @CustomerPartitionKey as both a partition and a column you are seeking upon. If we take a look at the query plan where we have the bad estimate (3000 rows estimated, 300000 actual), we see that there are actually two separate seek predicates on part.Contact related to @CustomerPartitionKey:

Seek Keys[1]: Prefix: PtnId1004, [Test].[part].[Contact].CustomerPartitionKey = Scalar Operator([Expr1008]), Scalar Operator([@CustomerPartitionKey])

I think that the latter ([Test].[part].[Contact].CustomerPartitionKey = Scalar Operator([@CustomerPartitionKey]) is able to get a proper estimate based on parameter sniffing for the value of @CustomerPartitionKey. However, the former (Prefix: PtnId1004 = Scalar Operator([Expr1008])) is likely not able to do so, perhaps because Expr1008 is complicated expression that processes partition elimiation: [Expr1008]=RangePartitionNew([@CustomerPartitionKey],(0),(0),(1),(2),...,(97),(98)).

In this case, there are 100 partitions and the row estimate is exactly 100 times too low because SQL Server isn't able to process the partition elimination in the same smart way that it processes the actual seek on the column and uses an estimate for the runtime parameter value of 3. This theory is supported by the way that the estimated rows varies if you remove partitions; if you use 90 partitions instead, the estimate will be 3333.33 (300000 / 90).

In our own queries, we typically use a literal (e.g., 3 in this case) or use OPTION RECOMPILE whenever we are writing a query that is going to take advantage of partition elimination. This practice has worked fairly well for us given that the number of queries on the system is modest and query compilation overhead for queries against large partitioned tables is not a concern for us. Not necessarily a satisfying answer, but it might work for you.

Context

StackExchange Database Administrators Q#101308, answer score: 4

Revisions (0)

No revisions yet.