patternsqlModerate
Why doesn't join elimination work with sys.query_store_plan?
Viewed 0 times
whywitheliminationjoindoesnsysworkquery_store_plan
Problem
The following is a simplification of a performance problem encountered with the Query Store:
The
Execution plan
Why is this, and what can be done to obtain join elimination here?
CREATE TABLE #tears
(
plan_id bigint NOT NULL
);
INSERT #tears (plan_id)
VALUES (1);
SELECT
T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
ON QSP.plan_id = T.plan_id;The
plan_id column is documented as being the primary key of sys.query_store_plan, but the execution plan does not use join elimination as would be expected:- No attributes are being projected from the DMV.
- The DMV primary key
plan_idcannot duplicate rows from the temporary table
- A
LEFT JOINis used, so no rows fromTcan be eliminated.
Execution plan
Why is this, and what can be done to obtain join elimination here?
Solution
The documentation is a little misleading. The DMV is a non-materialized view, and does not have a primary key as such. The underlying definitions are a little complex but a simplified definition of
Further,
The indexes on
╔════════════════════════╦══════════════════════════════════════╦═════════════╗
║ index_name ║ index_description ║ index_keys ║
╠════════════════════════╬══════════════════════════════════════╬═════════════╣
║ plan_persist_plan_cidx ║ clustered, unique located on PRIMARY ║ plan_id ║
║ plan_persist_plan_idx1 ║ nonclustered located on PRIMARY ║ query_id(-) ║
╚════════════════════════╩══════════════════════════════════════╩═════════════╝
So
Now,
At heart, the query being executed is therefore equivalent to:
...which does not produce join elimination:
Getting right to the core of the issue, the problem is the inner query:
...clearly the left join might result in rows from
To workaround this, we can explicitly tell the optimizer that we do not require any duplicate
The outer join to
Applying that to the real query:
Equally, we could add
Note that making
sys.query_store_plan is:CREATE VIEW sys.query_store_plan AS
SELECT
PPM.plan_id
-- various other attributes
FROM sys.plan_persist_plan_merged AS PPM
LEFT JOIN sys.syspalvalues AS P
ON P.class = 'PFT'
AND P.[value] = plan_forcing_type;Further,
sys.plan_persist_plan_merged is also a view, though one needs to connect via the Dedicated Administrator Connection to see its definition. Again, simplified:CREATE VIEW sys.plan_persist_plan_merged AS
SELECT
P.plan_id as plan_id,
-- various other attributes
FROM sys.plan_persist_plan P
-- NOTE - in order to prevent potential deadlock
-- between QDS_STATEMENT_STABILITY LOCK and index locks
WITH (NOLOCK)
LEFT JOIN sys.plan_persist_plan_in_memory PM
ON P.plan_id = PM.plan_id;The indexes on
sys.plan_persist_plan are:╔════════════════════════╦══════════════════════════════════════╦═════════════╗
║ index_name ║ index_description ║ index_keys ║
╠════════════════════════╬══════════════════════════════════════╬═════════════╣
║ plan_persist_plan_cidx ║ clustered, unique located on PRIMARY ║ plan_id ║
║ plan_persist_plan_idx1 ║ nonclustered located on PRIMARY ║ query_id(-) ║
╚════════════════════════╩══════════════════════════════════════╩═════════════╝
So
plan_id is constrained to be unique on sys.plan_persist_plan.Now,
sys.plan_persist_plan_in_memory is a streaming table-valued function, presenting a tabular view of data only held in internal memory structures. As such, it does not have any unique constraints.At heart, the query being executed is therefore equivalent to:
DECLARE @t1 table (plan_id integer NOT NULL);
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT
T1.plan_id
FROM @t1 AS T1
LEFT JOIN
(
SELECT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id
) AS Q1
ON Q1.plan_id = T1.plan_id;...which does not produce join elimination:
Getting right to the core of the issue, the problem is the inner query:
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id;...clearly the left join might result in rows from
@t2 being duplicated because @t3 has no uniqueness constraint on plan_id. Therefore, the join cannot be eliminated:To workaround this, we can explicitly tell the optimizer that we do not require any duplicate
plan_id values:DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT DISTINCT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id;The outer join to
@t3 can now be eliminated:Applying that to the real query:
SELECT DISTINCT
T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
ON QSP.plan_id = T.plan_id;Equally, we could add
GROUP BY T.plan_id instead of the DISTINCT. Anyway, the optimizer can now correctly reason about the plan_id attribute all the way down through the nested views, and eliminate both outer joins as desired:Note that making
plan_id unique in the temporary table would not be sufficient to obtain join elimination, since it would not preclude incorrect results. We must explicitly reject duplicate plan_id values from the final result to allow the optimizer to work its magic here.Code Snippets
CREATE VIEW sys.query_store_plan AS
SELECT
PPM.plan_id
-- various other attributes
FROM sys.plan_persist_plan_merged AS PPM
LEFT JOIN sys.syspalvalues AS P
ON P.class = 'PFT'
AND P.[value] = plan_forcing_type;CREATE VIEW sys.plan_persist_plan_merged AS
SELECT
P.plan_id as plan_id,
-- various other attributes
FROM sys.plan_persist_plan P
-- NOTE - in order to prevent potential deadlock
-- between QDS_STATEMENT_STABILITY LOCK and index locks
WITH (NOLOCK)
LEFT JOIN sys.plan_persist_plan_in_memory PM
ON P.plan_id = PM.plan_id;DECLARE @t1 table (plan_id integer NOT NULL);
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT
T1.plan_id
FROM @t1 AS T1
LEFT JOIN
(
SELECT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id
) AS Q1
ON Q1.plan_id = T1.plan_id;DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id;DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT DISTINCT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id;Context
StackExchange Database Administrators Q#216485, answer score: 17
Revisions (0)
No revisions yet.