patternsqlModerate
Persisted computed column causing scan
Viewed 0 times
computedscancolumnpersistedcausing
Problem
Converting a regular column to a persisted computed column is causing this query to not be able to do index seeks. Why?
Tested on several SQL Server versions, including 2016 SP1 CU1.
Repros
The trouble is with
The tables and query are a partial (and simplified) version of the originals. I'm aware the query could be rewritten differently, and for some reason avoid the problem, but we need to avoid touching the code, and the question of why
As Paul White showed (thanks!), the seek is available if forced, so the question is: Why the seek is not chosen by the optimizer, and whether we can do something differently to make the seek happen as it should, without changing the code?
To clarify the problematic part, here's the relevant scan in the bad execution plan:
Tested on several SQL Server versions, including 2016 SP1 CU1.
Repros
- With a computed column
- With a regular column
The trouble is with
table1, col7.The tables and query are a partial (and simplified) version of the originals. I'm aware the query could be rewritten differently, and for some reason avoid the problem, but we need to avoid touching the code, and the question of why
table1 cannot be seeked still stands.As Paul White showed (thanks!), the seek is available if forced, so the question is: Why the seek is not chosen by the optimizer, and whether we can do something differently to make the seek happen as it should, without changing the code?
To clarify the problematic part, here's the relevant scan in the bad execution plan:
Solution
Why the seek is not chosen by the optimizer
TL:DR The expanded computed column definition interferes with the optimizer's ability to reorder joins initially. With a different starting point, cost-based optimization takes a different path through the optimizer, and ends up with a different final plan choice.
Details
For all but the very simplest of queries, the optimizer does not attempt to explore anything like the whole space of possible plans. Instead, it picks a reasonable-looking starting point, then spends a budgeted amount of effort exploring logical and physical variations, in one or more search phases, until it finds a reasonable plan.
The main reason you get different plans (with different final cost estimates) for the two cases is that there are different starting points. Starting from a different place, optimization ends up at a different place (after its limited number of exploration and implementation iterations). I hope this is reasonably intuitive.
The starting point I mentioned, is somewhat based on the textual representation of the query, but changes are made to the internal tree representation as it passes through the parsing, binding, normalization, and simplification stages of query compilation.
Importantly, the exact starting point depends heavily on the initial join order selected by the optimizer. This choice is made before statistics are loaded, and before any cardinality estimations have been derived. The total cardinality (number of rows) in each table is however known, having been obtained from system metadata.
The initial join ordering is therefore based on heuristics. For example, the optimizer tries to rewrite the tree such that smaller tables are joined before larger ones, and inner joins come before outer joins (and cross joins).
The presence of the computed column interferes with this process, most specifically with the optimizer's ability to push outer joins down the query tree. This is because the computed column is expanded into its underlying expression before join reordering occurs, and moving a join past a complex expression is much more difficult than moving it past a simple column reference.
The trees involved are quite large, but to illustrate, the non-computed column initial query tree begins with: (note the two outer joins at the top)
LogOp_Select
LogOp_Apply (x_jtLeftOuter)
LogOp_LeftOuterJoin
LogOp_NAryJoin
LogOp_LeftAntiSemiJoin
LogOp_NAryJoin
LogOp_Get TBL: dbo.table1(alias TBL: a4)
LogOp_Select
LogOp_Get TBL: dbo.table6(alias TBL: a3)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a3].col18
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
LogOp_Select
LogOp_Get TBL: dbo.table1(alias TBL: a1)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a1].col2
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
LogOp_Select
LogOp_Get TBL: dbo.table5(alias TBL: a2)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a2].col2
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a4].col2
ScaOp_Identifier QCOL: [a3].col19
LogOp_Select
LogOp_Get TBL: dbo.table7(alias TBL: a7)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a7].col22
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a4].col2
ScaOp_Identifier QCOL: [a7].col23
LogOp_Select
LogOp_Get TBL: table1(alias TBL: cdc)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [cdc].col6
ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=4)
LogOp_Get TBL: dbo.table5(alias TBL: a5)
LogOp_Get TBL: table2(alias TBL: cdt)
ScaOp_Logical x_lopAnd
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a5].col2
ScaOp_Identifier QCOL: [cdc].col2
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a4].col2
ScaOp_Identifier QCOL: [cdc].col2
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [cdt].col1
ScaOp_Identifier QCOL: [cdc].col1
LogOp_Get TBL: tabl
TL:DR The expanded computed column definition interferes with the optimizer's ability to reorder joins initially. With a different starting point, cost-based optimization takes a different path through the optimizer, and ends up with a different final plan choice.
Details
For all but the very simplest of queries, the optimizer does not attempt to explore anything like the whole space of possible plans. Instead, it picks a reasonable-looking starting point, then spends a budgeted amount of effort exploring logical and physical variations, in one or more search phases, until it finds a reasonable plan.
The main reason you get different plans (with different final cost estimates) for the two cases is that there are different starting points. Starting from a different place, optimization ends up at a different place (after its limited number of exploration and implementation iterations). I hope this is reasonably intuitive.
The starting point I mentioned, is somewhat based on the textual representation of the query, but changes are made to the internal tree representation as it passes through the parsing, binding, normalization, and simplification stages of query compilation.
Importantly, the exact starting point depends heavily on the initial join order selected by the optimizer. This choice is made before statistics are loaded, and before any cardinality estimations have been derived. The total cardinality (number of rows) in each table is however known, having been obtained from system metadata.
The initial join ordering is therefore based on heuristics. For example, the optimizer tries to rewrite the tree such that smaller tables are joined before larger ones, and inner joins come before outer joins (and cross joins).
The presence of the computed column interferes with this process, most specifically with the optimizer's ability to push outer joins down the query tree. This is because the computed column is expanded into its underlying expression before join reordering occurs, and moving a join past a complex expression is much more difficult than moving it past a simple column reference.
The trees involved are quite large, but to illustrate, the non-computed column initial query tree begins with: (note the two outer joins at the top)
LogOp_Select
LogOp_Apply (x_jtLeftOuter)
LogOp_LeftOuterJoin
LogOp_NAryJoin
LogOp_LeftAntiSemiJoin
LogOp_NAryJoin
LogOp_Get TBL: dbo.table1(alias TBL: a4)
LogOp_Select
LogOp_Get TBL: dbo.table6(alias TBL: a3)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a3].col18
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
LogOp_Select
LogOp_Get TBL: dbo.table1(alias TBL: a1)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a1].col2
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
LogOp_Select
LogOp_Get TBL: dbo.table5(alias TBL: a2)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a2].col2
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a4].col2
ScaOp_Identifier QCOL: [a3].col19
LogOp_Select
LogOp_Get TBL: dbo.table7(alias TBL: a7)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a7].col22
ScaOp_Const TI(varchar collate 53256,Var,Trim,ML=16)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a4].col2
ScaOp_Identifier QCOL: [a7].col23
LogOp_Select
LogOp_Get TBL: table1(alias TBL: cdc)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [cdc].col6
ScaOp_Const TI(smallint,ML=2) XVAR(smallint,Not Owned,Value=4)
LogOp_Get TBL: dbo.table5(alias TBL: a5)
LogOp_Get TBL: table2(alias TBL: cdt)
ScaOp_Logical x_lopAnd
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a5].col2
ScaOp_Identifier QCOL: [cdc].col2
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [a4].col2
ScaOp_Identifier QCOL: [cdc].col2
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [cdt].col1
ScaOp_Identifier QCOL: [cdc].col1
LogOp_Get TBL: tabl
Context
StackExchange Database Administrators Q#165651, answer score: 12
Revisions (0)
No revisions yet.