patternsqlMajor
Optimization issue with user defined function
Viewed 0 times
withuserissuefunctionoptimizationdefined
Problem
I have a problem understanding why SQL server decides to call user defined function for every value in the table even though only one row should be fetched. The actual SQL is a lot more complex, but I was able to reduce the problem down to this:
For this query, SQL Server decides to call GetGroupCode function for every single value that exists in PRODUCT Table, even though the estimate and actual number of rows returned from ORDERLINE is 1 (it's the primary key):
Same plan in plan explorer showing the row counts:
Tables:
The index being used for the scan is:
The function is actually slightly more complex, but the same thing happens with a dummy multi-statement function like this:
I was able to "fix" the performance by forcing SQL server to fetch the top 1 product, although 1 is max that can ever be found:
```
select
S.GROUPCODE,
H.ORDERCAT
from
ORDERLINE L
join ORDERHDR H
on H.ORDERID = M.ORDERID
cross apply (select top 1 P.FACTORY from PRODUCT P where P.PRODUCT = L.PRODUCT) P
cross apply dbo.GetGroupCode (P.FACTORY) S
where
L.ORDERNUMBER = 'XXX/YYY-123456' and
L.RMPHASE = '0' and
L.ORDERLINE = '01'
select
S.GROUPCODE,
H.ORDERCATEGORY
from
ORDERLINE L
join ORDERHDR H on H.ORDERID = L.ORDERID
join PRODUCT P on P.PRODUCT = L.PRODUCT
cross apply dbo.GetGroupCode (P.FACTORY) S
where
L.ORDERNUMBER = 'XXX/YYY-123456' and
L.RMPHASE = '0' and
L.ORDERLINE = '01'For this query, SQL Server decides to call GetGroupCode function for every single value that exists in PRODUCT Table, even though the estimate and actual number of rows returned from ORDERLINE is 1 (it's the primary key):
Same plan in plan explorer showing the row counts:
Tables:
ORDERLINE: 1.5M rows, primary key: ORDERNUMBER + ORDERLINE + RMPHASE (clustered)
ORDERHDR: 900k rows, primary key: ORDERID (clustered)
PRODUCT: 6655 rows, primary key: PRODUCT (clustered)The index being used for the scan is:
create unique nonclustered index PRODUCT_FACTORY on PRODUCT (PRODUCT, FACTORY)The function is actually slightly more complex, but the same thing happens with a dummy multi-statement function like this:
create function GetGroupCode (@FACTORY varchar(4))
returns @t table(
TYPE varchar(8),
GROUPCODE varchar(30)
)
as begin
insert into @t (TYPE, GROUPCODE) values ('XX', 'YY')
return
endI was able to "fix" the performance by forcing SQL server to fetch the top 1 product, although 1 is max that can ever be found:
```
select
S.GROUPCODE,
H.ORDERCAT
from
ORDERLINE L
join ORDERHDR H
on H.ORDERID = M.ORDERID
cross apply (select top 1 P.FACTORY from PRODUCT P where P.PRODUCT = L.PRODUCT) P
cross apply dbo.GetGroupCode (P.FACTORY) S
where
L.ORDERNUMBER = 'XXX/YYY-123456' and
L.RMPHASE = '0' and
L.ORDERLINE = '01'
Solution
There are three main technical reasons you get the plan you do:
Given the small cardinality estimate assigned to the UDF apply, the n-ary join expansion heuristics unfortunately reposition it earlier in the tree than you would wish.
The query also qualifies for search 0 optimization by virtue of having at least three joins (including applies). The final physical plan you get, with the odd-looking scan, is based on that heuristically-deduced join order. It is costed low enough that the optimizer considers the plan "good enough". The low cost estimation and cardinality for the UDF contributes to this early finish.
Search 0 (also known as the Transaction Processing phase) targets low-cardinality OLTP-type queries, with final plans that usually feature nested loops joins. More importantly, search 0 runs only a relatively small subset of the optimizer's exploration abilities. This subset does not include pulling an apply up the query tree over a join (rule
There is also an issue where the optimizer can decide between naive nested loops join (join predicate on the join itself) and a correlated indexed join (apply) where the correlated predicate is applied on the inner side of the join using an index seek. The latter is usually the desired plan shape, but the optimizer is capable of exploring both. With incorrect costing and cardinality estimates, it can choose the non-apply NL join, as in the submitted plans (explaining the scan).
So, there are multiple interacting reasons involving several general optimizer features that normally work well to find good plans in a short period of time without using excessive resources. Avoiding any one of the reasons is enough to produce the 'expected' plan shape for the sample query, even with empty tables:
There is no supported way to avoid search 0 plan selection, early optimizer termination, or to improve the costing of UDFs (aside from the limited enhancements in the SQL Server 2014 CE model for this). This leaves things like plan guides, manual query rewrites (including the
Rewriting
Overall, the right solution for you depends on a variety of factors that we cannot judge for you. I would, however, encourage you to consider solutions that are guaranteed to always work in future, and that work with (rather than against) the optimizer wherever possible.
- The optimizer's costing framework has no real support for non-inline functions. It does not make any attempt to look inside the function definition to see how expensive it might be, it just assigns a very small fixed cost, and estimates the function will produce 1 row of output each time it is called. Both these modelling assumptions are very often completely unsafe. The situation is very slightly improved in 2014 with the new cardinality estimator enabled since the fixed 1-row guess is replaced with a fixed 100-row guess. There is still no support for costing the content of non-inline functions, however.
- SQL Server initially collapses joins and applies into a single internal n-ary logical join. This helps the optimizer reason about join orders later on. Expanding the single n-ary join into candidate join orders comes later, and is largely based on heuristics. For example, inner joins come before outer joins, small tables and selective joins before large tables and less selective joins, and so on.
- When SQL Server performs cost-based optimization, it splits the effort into optional phases to minimize the chances of spending too long optimizing low-cost queries. There are three main phases, search 0, search 1, and search 2. Each phase has entry conditions, and later phases enable more optimizer explorations than earlier ones. Your query happens to qualify for the least-capable search phase, phase 0. A low enough cost plan is found there that later stages are not entered.
Given the small cardinality estimate assigned to the UDF apply, the n-ary join expansion heuristics unfortunately reposition it earlier in the tree than you would wish.
The query also qualifies for search 0 optimization by virtue of having at least three joins (including applies). The final physical plan you get, with the odd-looking scan, is based on that heuristically-deduced join order. It is costed low enough that the optimizer considers the plan "good enough". The low cost estimation and cardinality for the UDF contributes to this early finish.
Search 0 (also known as the Transaction Processing phase) targets low-cardinality OLTP-type queries, with final plans that usually feature nested loops joins. More importantly, search 0 runs only a relatively small subset of the optimizer's exploration abilities. This subset does not include pulling an apply up the query tree over a join (rule
PullApplyOverJoin). This is exactly what is required in the test case to reposition the UDF apply above the joins, to appear last in the sequence of operations (as it were).There is also an issue where the optimizer can decide between naive nested loops join (join predicate on the join itself) and a correlated indexed join (apply) where the correlated predicate is applied on the inner side of the join using an index seek. The latter is usually the desired plan shape, but the optimizer is capable of exploring both. With incorrect costing and cardinality estimates, it can choose the non-apply NL join, as in the submitted plans (explaining the scan).
So, there are multiple interacting reasons involving several general optimizer features that normally work well to find good plans in a short period of time without using excessive resources. Avoiding any one of the reasons is enough to produce the 'expected' plan shape for the sample query, even with empty tables:
There is no supported way to avoid search 0 plan selection, early optimizer termination, or to improve the costing of UDFs (aside from the limited enhancements in the SQL Server 2014 CE model for this). This leaves things like plan guides, manual query rewrites (including the
TOP (1) idea or using intermediate temporary tables) and avoiding poorly-costed 'black boxes' (from a QO point of view) like non-inline functions.Rewriting
CROSS APPLY as OUTER APPLY can also work, as it currently prevents some of the early join-collapsing work, but you have to be careful to preserve the original query semantics (e.g. rejecting any NULL-extended rows that might be introduced, without the optimizer collapsing back to a cross apply). You need to be aware though that this behaviour is not guaranteed to remain stable, so you would need to remember to retest any such observed behaviours each time you patch or upgrade the SQL Server.Overall, the right solution for you depends on a variety of factors that we cannot judge for you. I would, however, encourage you to consider solutions that are guaranteed to always work in future, and that work with (rather than against) the optimizer wherever possible.
Context
StackExchange Database Administrators Q#104192, answer score: 30
Revisions (0)
No revisions yet.