patternsqlMinor
using $Partition function to improve query performance
Viewed 0 times
partitionqueryfunctionimproveusingperformance
Problem
I have tables which are partitioned based on a
I see some queries that are using
For example instead of saying:
they have been written like below:
where
I see comments on those queries that this has been done to improve performance, and when I run both queries I see different in execution time and different execution plan. I'm not sure how these two queries are different.
This is a real query:
And below is the execution plan of the real query:
Sorry I can't upload even a sanitized execution plan as uploads are monitored by our network and it might be a policy violation.
Question is: Why execution plans are different and why the second query is faster.
I appreciate if someone can share any idea on this. Just interested to know why they are different. Faster is better though.
This is happening on SQL Server 2014. If I run the same on SQL Server 2012, the result will be different and the first query will perform faster!
INT column.I see some queries that are using
$Partition function to compare partition number instead of comparing the actual field data. For example instead of saying:
select *
from T1
inner join T2 on T2.SnapshotKey = T1.SnapshotKeythey have been written like below:
select *
from T1
inner join T2 on $Partition.PF_Name(T2.SnapshotKey) = $Partition.PF_Name(T1.SnapshotKey)where
PF_Name is the name of partition function.I see comments on those queries that this has been done to improve performance, and when I run both queries I see different in execution time and different execution plan. I'm not sure how these two queries are different.
This is a real query:
-- this takes about 9 seconds
select sum(PrinBal)
from fpc
inner join gl on gl.SnapshotKey = fpc.SnapshotKey
where fpc.SnapshotKey = 201703
-- this takes about 5 seconds
select sum(PrinBal)
from fpc
inner join gl on $Partition.SnapshotKeyPF(gl.SnapshotKey) = $Partition.SnapshotKeyPF(fpc.SnapshotKey)
where fpc.SnapshotKey = 201703And below is the execution plan of the real query:
Sorry I can't upload even a sanitized execution plan as uploads are monitored by our network and it might be a policy violation.
Question is: Why execution plans are different and why the second query is faster.
I appreciate if someone can share any idea on this. Just interested to know why they are different. Faster is better though.
This is happening on SQL Server 2014. If I run the same on SQL Server 2012, the result will be different and the first query will perform faster!
Solution
Why execution plans are different
First query
The optimizer knows:
so it can infer:
Just as if you had written:
The literal value 201703 can also be used by the optimizer to determine the partition id. With both
Going further, with a literal value (201703) for
simplifies to:
Meaning there is no join predicate at all. The result is a logical cross join. Expressing the final execution plan using the closest available T-SQL syntax, it is as if you wrote:
Second query
The optimizer can no longer infer anything about
Indeed, unless it is true that each partition holds only a single
Again, expressing the execution plan produced using the closest available T-SQL syntax:
This time there is no logical cross join. Instead, there is a correlated join (an apply) on the partition id.
why the second query is faster.
This is hard to assess from the information given. Using mock data and tables based on the queries and plan image provided, I found the first query outperformed the second in every case.
The same query expressed using different syntax can often produce a different execution plan, simply because the optimizer started from a different point, and explored options in a different order before it found a suitable execution plan. Plan search is not exhaustive, and not every possible logical transformation is available, so the end result is likely to be different. As noted above, the two queries do not necessarily express the same requirement anyway (at least given the information available to the optimizer).
On a separate note, be aware that the initial columnstore implementation in SQL Server 2012 (and to a somewhat lesser extent, 2014) has many limitations, not least on the optimization side of things. You will likely get better, and more consistent, results by upgrading to a more recent release (ideally the very latest). This is particularly true if you're going to be using partitioning.
I certainly would not recommend you get into the habit of rewriting joins using
That's about all I can say without being able to see the schema or plan detail.
First query
select sum(PrinBal)
from fpc
inner join gl on gl.SnapshotKey = fpc.SnapshotKey
where fpc.SnapshotKey = 201703The optimizer knows:
gl.SnapshotKey = fpc.SnapshotKey; and
fpc.SnapshotKey = 201703
so it can infer:
gl.SnapshotKey = 201703
Just as if you had written:
select sum(PrinBal)
from fpc
inner join gl on gl.SnapshotKey = fpc.SnapshotKey
where fpc.SnapshotKey = 201703
and gl.SnapshotKey = 201703The literal value 201703 can also be used by the optimizer to determine the partition id. With both
SnapshotKey predicates (one given, one inferred) this means the optimizer knows the partition id for both tables.Going further, with a literal value (201703) for
SnapshotKey now available on both tables, the join predicate:gl.SnapshotKey = fpc.SnapshotKey
simplifies to:
201703 = 201703; or simply
true
Meaning there is no join predicate at all. The result is a logical cross join. Expressing the final execution plan using the closest available T-SQL syntax, it is as if you wrote:
SELECT
CASE
WHEN SUM(Q1.c) = 0 THEN NULL
ELSE SUM(Q1.s)
END
FROM
(
SELECT c = COUNT_BIG(*), s = SUM(GL.PrinBal)
FROM dbo.gl AS GL
WHERE GL.SnapshotKey = 201703
AND $PARTITION.PF(GL.SnapshotKey) = $PARTITION.PF(201703)
) AS Q1
CROSS JOIN
(
SELECT Dummy = 1
FROM dbo.fpc AS FPC
WHERE FPC.SnapshotKey = 201703
AND $PARTITION.PF(FPC.SnapshotKey) = $PARTITION.PF(201703)
) AS Q2;Second query
select sum(PrinBal)
from fpc
inner join gl on $Partition.PF(gl.SnapshotKey) = $Partition.PF(fpc.SnapshotKey)
where fpc.SnapshotKey = 201703The optimizer can no longer infer anything about
gl.SnapshotKey, so the simplifications and transformations made for the first query are no longer possible.Indeed, unless it is true that each partition holds only a single
SnapshotKey, the rewrite is not guaranteed to produce the same results.Again, expressing the execution plan produced using the closest available T-SQL syntax:
SELECT
CASE
WHEN SUM(Q2.c) = 0 THEN NULL
ELSE SUM(Q2.s)
END
FROM
(
SELECT
Q1.PtnID,
c = COUNT_BIG(*),
s = SUM(Q1.PrinBal)
FROM
(
SELECT GL.PrinBal, PtnID = $PARTITION.PF(GL.SnapshotKey)
FROM dbo.gl AS GL
) AS Q1
GROUP BY
Q1.PtnID
) AS Q2
CROSS APPLY
(
SELECT Dummy = 1
FROM dbo.fpc AS FPC
WHERE
$PARTITION.PF(FPC.SnapshotKey) = Q2.PtnID
AND FPC.SnapshotKey = 201703
) AS Q3;This time there is no logical cross join. Instead, there is a correlated join (an apply) on the partition id.
why the second query is faster.
This is hard to assess from the information given. Using mock data and tables based on the queries and plan image provided, I found the first query outperformed the second in every case.
The same query expressed using different syntax can often produce a different execution plan, simply because the optimizer started from a different point, and explored options in a different order before it found a suitable execution plan. Plan search is not exhaustive, and not every possible logical transformation is available, so the end result is likely to be different. As noted above, the two queries do not necessarily express the same requirement anyway (at least given the information available to the optimizer).
On a separate note, be aware that the initial columnstore implementation in SQL Server 2012 (and to a somewhat lesser extent, 2014) has many limitations, not least on the optimization side of things. You will likely get better, and more consistent, results by upgrading to a more recent release (ideally the very latest). This is particularly true if you're going to be using partitioning.
I certainly would not recommend you get into the habit of rewriting joins using
$PARTITION, except as a very last resort, and with a very deep understanding of what you are doing.That's about all I can say without being able to see the schema or plan detail.
Code Snippets
select sum(PrinBal)
from fpc
inner join gl on gl.SnapshotKey = fpc.SnapshotKey
where fpc.SnapshotKey = 201703select sum(PrinBal)
from fpc
inner join gl on gl.SnapshotKey = fpc.SnapshotKey
where fpc.SnapshotKey = 201703
and gl.SnapshotKey = 201703SELECT
CASE
WHEN SUM(Q1.c) = 0 THEN NULL
ELSE SUM(Q1.s)
END
FROM
(
SELECT c = COUNT_BIG(*), s = SUM(GL.PrinBal)
FROM dbo.gl AS GL
WHERE GL.SnapshotKey = 201703
AND $PARTITION.PF(GL.SnapshotKey) = $PARTITION.PF(201703)
) AS Q1
CROSS JOIN
(
SELECT Dummy = 1
FROM dbo.fpc AS FPC
WHERE FPC.SnapshotKey = 201703
AND $PARTITION.PF(FPC.SnapshotKey) = $PARTITION.PF(201703)
) AS Q2;select sum(PrinBal)
from fpc
inner join gl on $Partition.PF(gl.SnapshotKey) = $Partition.PF(fpc.SnapshotKey)
where fpc.SnapshotKey = 201703SELECT
CASE
WHEN SUM(Q2.c) = 0 THEN NULL
ELSE SUM(Q2.s)
END
FROM
(
SELECT
Q1.PtnID,
c = COUNT_BIG(*),
s = SUM(Q1.PrinBal)
FROM
(
SELECT GL.PrinBal, PtnID = $PARTITION.PF(GL.SnapshotKey)
FROM dbo.gl AS GL
) AS Q1
GROUP BY
Q1.PtnID
) AS Q2
CROSS APPLY
(
SELECT Dummy = 1
FROM dbo.fpc AS FPC
WHERE
$PARTITION.PF(FPC.SnapshotKey) = Q2.PtnID
AND FPC.SnapshotKey = 201703
) AS Q3;Context
StackExchange Database Administrators Q#186705, answer score: 9
Revisions (0)
No revisions yet.