patternsqlMinor
Partitioned view aggregation query not optimized
Viewed 0 times
queryviewoptimizedpartitionedaggregationnot
Problem
Given the following two tables:
And the following view
The following query uses a
PasteThePlan
Whereas a minor rewrite gives the obviously more performant
PasteThePlan
The compiler can clearly see that the view is partitioned by
The question is: why does the first query force a
Shouldn't the compiler be able to see that a
db<>fiddle
Even stranger, as @MartinSmith has found, if there is no data then the compiler will use the better plan, although without an intermediate aggregation on
I'
CREATE TABLE SalesLedger (
Id int PRIMARY KEY IDENTITY,
Date date NOT NULL,
Total decimal(38,18),
INDEX IX (Date, Total)
);
CREATE TABLE Purchases (
Id int PRIMARY KEY IDENTITY,
Date date NOT NULL,
Total decimal(38,18),
INDEX IX (Date, Total)
);And the following view
CREATE VIEW ViewMetrics
AS
Select
Date,
'Sale' as Metric,
Total as Value
From SalesLedger
UNION ALL
Select
Date,
'Purchase' as Metric,
Total as Value
From Purchases;The following query uses a
Concatenation Sort pair:Select SUM(Value) as Sales, Date
from ViewMetrics
Group By Date;PasteThePlan
Whereas a minor rewrite gives the obviously more performant
Merge ConcatenationSELECT SUM(Sales), Date
FROM (
Select SUM(Value) as Sales, Date
from ViewMetrics
Group By Metric, Date
) t
GROUP BY Date;PasteThePlan
The compiler can clearly see that the view is partitioned by
Metric, as this query shows, no Sort is needed:Select SUM(Value) as Sales, Date
from ViewMetrics
where Metric = 'Sale'
Group By Date;The question is: why does the first query force a
Sort, whereas the second can use a more efficient Merge Concatenation, given that the Metric column has no WHERE predicate in either case?Shouldn't the compiler be able to see that a
Merge would work given that the indexes are sorted on Date already, and the partitioning is on Metric? Or if it cannot see that, why does GROUP BY Metric, Date suddenly give it that ability?db<>fiddle
Even stranger, as @MartinSmith has found, if there is no data then the compiler will use the better plan, although without an intermediate aggregation on
Metric, Date. db<>fiddle On the other hand, the merge without partial aggregation is probably slower than sorting after partial aggregation anyway, because there are more rows to merge. Question is why can't it do both partial aggregation and merge at the same time by default?I'
Solution
The SQL Server optimizer has two main ways to push an aggregate down past a union all.
The first rule is
It can only do this safely if the union is disjoint—that is, if there's something that makes each input completely independent, and that factor appears in the
It doesn't have to be a literal value like in your case, but it does have to be something the optimizer can recognize as completely separating the sets, like non-overlapping ranges. As part of the transformation, the constant part of the grouping clause is removed.
This rule is engaged with your rewrite because the Metric attribute is disjoint and present in the grouping specification.
An example using the Adventure Works sample database:
With the
Without the hint, the top-level aggregate can be moved and copied to each union input:
The second transformation involves a couple of different rules.
First,
The general idea is used in both serial and parallel plans. Sometimes, the local aggregate computes a subtotal local to its own thread, sometimes it performs a subset of the work below a join. In any case, the general idea is the same: do some part of the overall aggregation task as early as possible.
Like many optimizer explorations,
In your case, a rule named
Importantly, the local aggregate is not quite a normal aggregate. It is performing only part of the calculation and may end up on one of many threads in a parallel plan. A global aggregate always runs on a single thread to ensure correct results.
In a parallel plan, a local aggregate may be physically implemented as a Hash Match Partial Aggregate. This operator only gets a small, fixed memory grant and never spills to tempdb. If it runs out of memory, it simply stops aggregating. Results will still be correct thanks to the global aggregate.
The caveats are not limited to this physical operator or parallel plans. In general, you should think of a local aggregate as being a bit different from the normal kind you write in SQL.
This is mostly a consequence of implementation details and the need to preserve the connection between the local and global aggregates. Like many exploration rules,
Anyway, one of the consequences of a local aggregate being a bit different is that it does not come with a uniqueness guarantee associated with its grouping keys. This is important in many instances but particularly so with your desired Merge Concatenation operator.
Merge Concatenation
As the plan operator's name suggests, Merge Concatenation is just a normal Merge Join operator running in a special mode. It requires input sorted on the 'join keys' though exactly which sort order is needed can be affected by the projection column list, global ordering requirement, and any uniqueness guarantees available (see reference below).
A normal aggregate that provides grouping key uniqueness guarantees can allow the Merge Concatenation to require less onerous sorting than is possible with input from a local aggregate.
For your original query, the optimizer did consid
- Global Pushdown
The first rule is
GbAggBelowUniAll. It's a fairly straightforward transform that moves the aggregation onto each of the union inputs.It can only do this safely if the union is disjoint—that is, if there's something that makes each input completely independent, and that factor appears in the
GROUP BY clause.It doesn't have to be a literal value like in your case, but it does have to be something the optimizer can recognize as completely separating the sets, like non-overlapping ranges. As part of the transformation, the constant part of the grouping clause is removed.
This rule is engaged with your rewrite because the Metric attribute is disjoint and present in the grouping specification.
An example using the Adventure Works sample database:
-- Helpful indexes
CREATE INDEX i ON Production.TransactionHistory
(Quantity, TransactionDate);
CREATE INDEX i ON Production.TransactionHistoryArchive
(Quantity, TransactionDate);SELECT
U.Quantity,
MAX(U.TransactionDate)
FROM
(
-- The predicates on quantity make the two union inputs disjoint
SELECT
TH.*
FROM Production.TransactionHistory AS TH
WHERE
TH.Quantity BETWEEN 50 AND 60
UNION ALL
SELECT
THA.*
FROM Production.TransactionHistoryArchive AS THA
WHERE
THA.Quantity BETWEEN 10 AND 20
) AS U
-- Grouping by the disjoint element
GROUP BY
U.Quantity
ORDER BY
U.Quantity
--OPTION (FORCE ORDER)
;With the
FORCE ORDER hint uncommented, the optimizer is prevented from moving the aggregate around:Without the hint, the top-level aggregate can be moved and copied to each union input:
- Local Aggregation
The second transformation involves a couple of different rules.
First,
GenLGAgg splits an aggregate into two parts, a global aggregate and a local aggregate. For example, a COUNT aggregate would be split into a local COUNT aggregate and a global SUM aggregate that adds all the local contributions together to arrive at the correct result.The general idea is used in both serial and parallel plans. Sometimes, the local aggregate computes a subtotal local to its own thread, sometimes it performs a subset of the work below a join. In any case, the general idea is the same: do some part of the overall aggregation task as early as possible.
Like many optimizer explorations,
GenLGAgg produces one or more alternatives that can be further explored by other rules. For example, the new local aggregates may be past joins or matched to an indexed view.In your case, a rule named
LocalAggBelowUniAll is employed to move the local aggregate below the UNION ALL. This is what was happening with the original query.Importantly, the local aggregate is not quite a normal aggregate. It is performing only part of the calculation and may end up on one of many threads in a parallel plan. A global aggregate always runs on a single thread to ensure correct results.
In a parallel plan, a local aggregate may be physically implemented as a Hash Match Partial Aggregate. This operator only gets a small, fixed memory grant and never spills to tempdb. If it runs out of memory, it simply stops aggregating. Results will still be correct thanks to the global aggregate.
The caveats are not limited to this physical operator or parallel plans. In general, you should think of a local aggregate as being a bit different from the normal kind you write in SQL.
This is mostly a consequence of implementation details and the need to preserve the connection between the local and global aggregates. Like many exploration rules,
LocalAggBelowUniAll represents a query rewrite you could perform yourself, but you should not expect it to behave exactly the same in all respects as the closest T-SQL representation. It should also be said that the optimizer has the advantage of being able to dynamically decide which rewrite to use based on current statistics and metadata. This is not generally true for a manual rewrite.Anyway, one of the consequences of a local aggregate being a bit different is that it does not come with a uniqueness guarantee associated with its grouping keys. This is important in many instances but particularly so with your desired Merge Concatenation operator.
Merge Concatenation
As the plan operator's name suggests, Merge Concatenation is just a normal Merge Join operator running in a special mode. It requires input sorted on the 'join keys' though exactly which sort order is needed can be affected by the projection column list, global ordering requirement, and any uniqueness guarantees available (see reference below).
A normal aggregate that provides grouping key uniqueness guarantees can allow the Merge Concatenation to require less onerous sorting than is possible with input from a local aggregate.
For your original query, the optimizer did consid
Code Snippets
-- Helpful indexes
CREATE INDEX i ON Production.TransactionHistory
(Quantity, TransactionDate);
CREATE INDEX i ON Production.TransactionHistoryArchive
(Quantity, TransactionDate);SELECT
U.Quantity,
MAX(U.TransactionDate)
FROM
(
-- The predicates on quantity make the two union inputs disjoint
SELECT
TH.*
FROM Production.TransactionHistory AS TH
WHERE
TH.Quantity BETWEEN 50 AND 60
UNION ALL
SELECT
THA.*
FROM Production.TransactionHistoryArchive AS THA
WHERE
THA.Quantity BETWEEN 10 AND 20
) AS U
-- Grouping by the disjoint element
GROUP BY
U.Quantity
ORDER BY
U.Quantity
--OPTION (FORCE ORDER)
;SELECT
TotalSales = SUM(T.Sales),
T.[Date]
FROM
(
SELECT
Sales = ISNULL(SUM(VM.[Value]), 0.0),
VM.[Date]
FROM dbo.ViewMetrics AS VM
WHERE
VM.[Value] IS NOT NULL
GROUP BY
VM.Metric,
VM.[Date]
) AS T
GROUP BY
T.[Date];Context
StackExchange Database Administrators Q#330663, answer score: 6
Revisions (0)
No revisions yet.