snippetModerate
How can I achieve predicate pushdown in my view
Viewed 0 times
canachievepushdownviewpredicatehow
Problem
I have a reporting table (about 1bn rows), and a tiny dimension table:
... to which I've added a reporting view used by an application for OLTP-style reporting.
```
CREATE OR ALTER VIEW dbo.SalesReport_unpartitioned
AS
SELECT bu.BusinessUnit,
s.[Date],
s.SKU,
s.Quantity,
s.Amount
FROM dbo.BusinessUnits AS bu
CROSS APPLY (
--- Regular sales
SELECT t.BusinessUnit, t.[Date], t.SKU, t.Quantity, t.Amount
FROM dbo.Sales_unpartitioned AS t
WHERE t.BusinessUnit=bu.BusinessUnit
AND t.SKU LIKE 'T%'
UNION ALL
--- This is a special reporting entry. We only
--- want to see today's row. In case of duplicates,
--- get the row with the first "SKU".
SELECT TOP (1) s.BusinessUnit, s.[Date], s.SKU, s.Quantity, s.Amount
FROM dbo.Sales_unpartitioned AS s
WHERE s.BusinessUnit=bu.BusinessUnit
AND s.[Date]=CAST(SYSDATETIME() AS date)
AND s.SKU LIKE 'S%'
ORDER BY s.BusinessUni
CREATE TABLE dbo.Sales_unpartitioned (
BusinessUnit int NOT NULL,
[Date] date NOT NULL,
SKU varchar(8) NOT NULL,
Quantity numeric(10, 2) NOT NULL,
Amount numeric(10, 2) NOT NULL,
CONSTRAINT PK_Sales_unpartitioned PRIMARY KEY CLUSTERED (BusinessUnit, [Date], SKU)
);
--- Demo data:
INSERT INTO dbo.Sales_unpartitioned
SELECT severity AS BusinessUnit,
DATEADD(day, message_id, '2000-01-01') AS [Date],
LEFT([text], 3) AS SKU,
1000.*RAND(CHECKSUM(NEWID())) AS Quantity,
10000.*RAND(CHECKSUM(NEWID())) AS Amount
FROM sys.messages
WHERE [language_id]=1033;
--- Artificially inflate statistics of demo data:
UPDATE STATISTICS dbo.Sales_unpartitioned WITH ROWCOUNT=1000000000;
--- Dimension table:
CREATE TABLE dbo.BusinessUnits (
BusinessUnit int NOT NULL,
SalesManager nvarchar(250) NULL,
PRIMARY KEY CLUSTERED (BusinessUnit)
);
INSERT INTO dbo.BusinessUnits (BusinessUnit)
SELECT DISTINCT BusinessUnit FROM dbo.Sales;... to which I've added a reporting view used by an application for OLTP-style reporting.
```
CREATE OR ALTER VIEW dbo.SalesReport_unpartitioned
AS
SELECT bu.BusinessUnit,
s.[Date],
s.SKU,
s.Quantity,
s.Amount
FROM dbo.BusinessUnits AS bu
CROSS APPLY (
--- Regular sales
SELECT t.BusinessUnit, t.[Date], t.SKU, t.Quantity, t.Amount
FROM dbo.Sales_unpartitioned AS t
WHERE t.BusinessUnit=bu.BusinessUnit
AND t.SKU LIKE 'T%'
UNION ALL
--- This is a special reporting entry. We only
--- want to see today's row. In case of duplicates,
--- get the row with the first "SKU".
SELECT TOP (1) s.BusinessUnit, s.[Date], s.SKU, s.Quantity, s.Amount
FROM dbo.Sales_unpartitioned AS s
WHERE s.BusinessUnit=bu.BusinessUnit
AND s.[Date]=CAST(SYSDATETIME() AS date)
AND s.SKU LIKE 'S%'
ORDER BY s.BusinessUni
Solution
I'm trying to understand if this is a shortcoming of the optimizer, if it's the result of a deliberate costing/optimization mechanism, or if I've simply overlooked something.
It's a little bit of all of those.
There's a lot going on in the query presented — too much really — so to avoid writing half a book about it, I am going to boil it down to the main element that is causing you not to get the plan you are after:
The optimizer does not push predicates down the inner side of an apply.
The rule that operates on relational selections (filters, predicates) above an apply is called, naturally enough,
Sel (A Apply B) -> Sel (Sel A Apply B)
It takes part(s) of a potentially complex selection involving both A and B, and pushes those it can down to the driving table A. No part of the selection is pushed to B. The part(s) of the selection that cannot be pushed down remain behind.
This might sound like a shocking oversight, and counter to experience. That's because it is not the full story.
The optimizer tries to convert an apply to the equivalent join early on in the compilation process (during simplification, before trivial plan and cost-based optimization). It is capable of pushing selections down either side of a join, where it is safe. That join may in turn be transformed into a physical apply during cost-based optimization.
The effect of all this is to make it seem like the optimizer pushed a predicate down the inner side of an apply:
Let me show you an example:
If you look carefully at the plan, you will see the predicate on T2 pushed to the inner side seek, and the nested loop join is an apply (it has outer references). This was only possible because the optimizer was able to rewrite the apply as a join initially, push the predicates, then transform back to an apply later on.
We can disable the apply-to-join transformation using undocumented trace flag 9114:
This means only
Notice the part of the selection on T2.c2 is 'stuck' above the apply, in a filter. The inner side seek is only on the FK/PK equality specified inside the apply.
The optimizer is built on relational principles. It appreciates a relational schema design, and queries that use relational constructs. Apply (lateral join) is a relatively new extension. The optimizer knows a lot more tricks with join than it does with apply, hence the early effort to rewrite.
When you use things like apply, or the (non-relational) Top, you are implicitly taking more responsibility for the final plan shape. In other words, you will more often have to express your query differently (as in your workaround) to get a good outcome.
My preference would be to use the inline table-valued function with explicit predicate placement. If I were to rewrite the view, I might go with:
For the provided test query:
The execution
It's a little bit of all of those.
There's a lot going on in the query presented — too much really — so to avoid writing half a book about it, I am going to boil it down to the main element that is causing you not to get the plan you are after:
The optimizer does not push predicates down the inner side of an apply.
The rule that operates on relational selections (filters, predicates) above an apply is called, naturally enough,
SELonApply. It performs the following logical substitution:Sel (A Apply B) -> Sel (Sel A Apply B)
It takes part(s) of a potentially complex selection involving both A and B, and pushes those it can down to the driving table A. No part of the selection is pushed to B. The part(s) of the selection that cannot be pushed down remain behind.
This might sound like a shocking oversight, and counter to experience. That's because it is not the full story.
The optimizer tries to convert an apply to the equivalent join early on in the compilation process (during simplification, before trivial plan and cost-based optimization). It is capable of pushing selections down either side of a join, where it is safe. That join may in turn be transformed into a physical apply during cost-based optimization.
The effect of all this is to make it seem like the optimizer pushed a predicate down the inner side of an apply:
- Written apply transformed to a join.
- Predicate(s) pushed down either side of the join.
- Join transformed to an apply.
Let me show you an example:
DECLARE @T1 table (pk integer PRIMARY KEY, c1 integer NOT NULL INDEX ic1);
DECLARE @T2 table (fk integer NOT NULL, c2 integer NOT NULL, PRIMARY KEY (fk, c2));
SELECT
T1.*,
T2.*
FROM @T1 AS T1
CROSS APPLY
(
SELECT T2.*
FROM @T2 AS T2
WHERE T2.fk = T1.pk
) AS T2
WHERE
1 = 1
AND T1.c1 = 1
AND T2.c2 = 2;If you look carefully at the plan, you will see the predicate on T2 pushed to the inner side seek, and the nested loop join is an apply (it has outer references). This was only possible because the optimizer was able to rewrite the apply as a join initially, push the predicates, then transform back to an apply later on.
We can disable the apply-to-join transformation using undocumented trace flag 9114:
DECLARE @T1 table (pk integer PRIMARY KEY, c1 integer NOT NULL INDEX ic1);
DECLARE @T2 table (fk integer NOT NULL, c2 integer NOT NULL, PRIMARY KEY (fk, c2));
SELECT
T1.*,
T2.*
FROM @T1 AS T1
CROSS APPLY
(
SELECT T2.*
FROM @T2 AS T2
WHERE T2.fk = T1.pk
) AS T2
WHERE
1 = 1
AND T1.c1 = 1
AND T2.c2 = 2
OPTION (QUERYTRACEON 9114);This means only
SELonApply can be used, which only pushes to the driving table A:Notice the part of the selection on T2.c2 is 'stuck' above the apply, in a filter. The inner side seek is only on the FK/PK equality specified inside the apply.
The optimizer is built on relational principles. It appreciates a relational schema design, and queries that use relational constructs. Apply (lateral join) is a relatively new extension. The optimizer knows a lot more tricks with join than it does with apply, hence the early effort to rewrite.
When you use things like apply, or the (non-relational) Top, you are implicitly taking more responsibility for the final plan shape. In other words, you will more often have to express your query differently (as in your workaround) to get a good outcome.
My preference would be to use the inline table-valued function with explicit predicate placement. If I were to rewrite the view, I might go with:
CREATE OR ALTER VIEW dbo.SalesReport_unpartitioned
AS
--- Regular sales
SELECT
BU.BusinessUnit,
RS.[Date],
RS.SKU,
RS.Quantity,
RS.Amount
FROM dbo.BusinessUnits AS BU
JOIN dbo.Sales_unpartitioned AS RS
ON RS.BusinessUnit = BU.BusinessUnit
WHERE
RS.SKU LIKE 'T%'
UNION ALL
--- This is a special reporting entry.
SELECT
BU.BusinessUnit,
SR.[Date],
SR.SKU,
SR.Quantity,
SR.Amount
FROM dbo.BusinessUnits AS BU
JOIN dbo.Sales_unpartitioned AS SR
ON SR.BusinessUnit = BU.BusinessUnit
WHERE
1 = 1
AND SR.SKU LIKE 'S%'
--- We only want to see today's row.
AND SR.[Date] = CONVERT(date, SYSDATETIME())
--- In case of duplicates, get the row with the first "SKU".
AND SR.SKU =
(
SELECT
MIN(SR2.SKU)
FROM dbo.Sales_unpartitioned AS SR2
WHERE
SR2.BusinessUnit = SR.BusinessUnit
AND SR2.[Date] = SR.[Date]
AND SR2.SKU LIKE 'S%'
);
GOFor the provided test query:
DECLARE @from date='2021-01-01', @to date='2021-12-31';
SELECT *
FROM dbo.SalesReport_unpartitioned
WHERE BusinessUnit=16
AND [Date] BETWEEN @from AND @to
ORDER BY BusinessUnit, [Date], SKU;The execution
Code Snippets
DECLARE @T1 table (pk integer PRIMARY KEY, c1 integer NOT NULL INDEX ic1);
DECLARE @T2 table (fk integer NOT NULL, c2 integer NOT NULL, PRIMARY KEY (fk, c2));
SELECT
T1.*,
T2.*
FROM @T1 AS T1
CROSS APPLY
(
SELECT T2.*
FROM @T2 AS T2
WHERE T2.fk = T1.pk
) AS T2
WHERE
1 = 1
AND T1.c1 = 1
AND T2.c2 = 2;DECLARE @T1 table (pk integer PRIMARY KEY, c1 integer NOT NULL INDEX ic1);
DECLARE @T2 table (fk integer NOT NULL, c2 integer NOT NULL, PRIMARY KEY (fk, c2));
SELECT
T1.*,
T2.*
FROM @T1 AS T1
CROSS APPLY
(
SELECT T2.*
FROM @T2 AS T2
WHERE T2.fk = T1.pk
) AS T2
WHERE
1 = 1
AND T1.c1 = 1
AND T2.c2 = 2
OPTION (QUERYTRACEON 9114);CREATE OR ALTER VIEW dbo.SalesReport_unpartitioned
AS
--- Regular sales
SELECT
BU.BusinessUnit,
RS.[Date],
RS.SKU,
RS.Quantity,
RS.Amount
FROM dbo.BusinessUnits AS BU
JOIN dbo.Sales_unpartitioned AS RS
ON RS.BusinessUnit = BU.BusinessUnit
WHERE
RS.SKU LIKE 'T%'
UNION ALL
--- This is a special reporting entry.
SELECT
BU.BusinessUnit,
SR.[Date],
SR.SKU,
SR.Quantity,
SR.Amount
FROM dbo.BusinessUnits AS BU
JOIN dbo.Sales_unpartitioned AS SR
ON SR.BusinessUnit = BU.BusinessUnit
WHERE
1 = 1
AND SR.SKU LIKE 'S%'
--- We only want to see today's row.
AND SR.[Date] = CONVERT(date, SYSDATETIME())
--- In case of duplicates, get the row with the first "SKU".
AND SR.SKU =
(
SELECT
MIN(SR2.SKU)
FROM dbo.Sales_unpartitioned AS SR2
WHERE
SR2.BusinessUnit = SR.BusinessUnit
AND SR2.[Date] = SR.[Date]
AND SR2.SKU LIKE 'S%'
);
GODECLARE @from date='2021-01-01', @to date='2021-12-31';
SELECT *
FROM dbo.SalesReport_unpartitioned
WHERE BusinessUnit=16
AND [Date] BETWEEN @from AND @to
ORDER BY BusinessUnit, [Date], SKU;Context
StackExchange Database Administrators Q#296111, answer score: 11
Revisions (0)
No revisions yet.