patternsqlMinor
why is this left join faster than an inner join?
Viewed 0 times
thisleftwhythanjoinfasterinner
Problem
So I'm tuning this query, and I am pretty sure that in this instance, I can replace an inner join with a left join without affecting the data. However, I'm not entirely sure why this is faster. Here is the query:
The bottleneck is at the table-value function
SELECT DISTINCT cl.NAME AS company_name,
cl.id AS company_id,
ep.Plan__c AS plan_id,
ep.Employee__c,
ep.id,
do.Subcategory__c,
ep.Plan_Type__c,
pt.SubType,
Sum((pt.[shares] * fvh.[ValuePerShare])) AS TotalValue,
ppe.Deferral_Option__c,
dt.Defer_type_Code,
do.Short_Code__c,
pt.ContributionYear
FROM dbo.ParticipantTrades pt WITH (NOLOCK)
INNER JOIN dbo.PayoutPathElection ppe WITH (NOLOCK)
ON pt.payoutPathElectionID = ppe.Id
INNER JOIN dbo.DeferralOption do WITH (NOLOCK)
ON ppe.Deferral_Option__c = do.id
INNER JOIN dbo.EmployeePlan ep WITH (NOLOCK)
ON pt.employeePlan = ep.Id
LEFT JOIN dbo.DeferralType dt WITH (NOLOCK)
ON pt.deferralType = dt.defID
INNER JOIN dbo.Fnc_lastfundvalue('2019-01-30') AS fvh
ON pt.fund = fvh.Fund
INNER JOIN dbo.Clients cl with (NOLOCK)
ON ep.Company__c = cl.Id
WHERE ep.Company__c = '0017000001WL1HfAAL' AND ep.Plan_Type__c LIKE '%' AND pt.tradeDate <= '2019-01-30'
Group by cl.NAME,
cl.id,
ep.Plan__c,
ep.Employee__c,
ep.id,
do.Subcategory__c,
ep.Plan_Type__c,
pt.SubType,
ppe.Deferral_Option__c,
dt.Defer_type_Code,
do.Short_Code__c,
pt.ContributionYearThe bottleneck is at the table-value function
Solution
It looks as though some of the difference is because the slow plan was run first.
The slow plan was clearly operating against a cold cache as it shows additional physical reads and managed to accumulate an additional 15 seconds of
This looks as though it affected both the execution of the TVF itself (which took
You say in the comments that on second execution it took
The main problem you are experiencing is due to the poor default estimation for multi statement TVFs (fixed guess of
In the inner join case because inner joins are associative and commutative it can be re-ordered flexibly and gets moved to the deepest part of the tree. This would make sense if one row was actually returned as it could whittle down the row count early for the other joins in the plan.
The execution plan is below. The pink annotations are "Actual Elapsed Time (ms)" from the XML.
Because of the 1 row estimate it starts off badly by joining onto
When you add the
Whilst the estimate for the number of rows coming out of the TVF is still
There are still cardinality estimation errors in the outer join plan but not of the same magnitude and it requests a sufficient memory grant to avoid spilling anywhere.
This cardinality estimation issue has been resolved in the most recent version with interleaved execution. In the meantime (as you say in the comments it needs to be 2008 compatible) you can manually interleave it by storing the TVF result into a
The slow plan was clearly operating against a cold cache as it shows additional physical reads and managed to accumulate an additional 15 seconds of
PAGEIOLATCH_SH waits compared to the fast case.This looks as though it affected both the execution of the TVF itself (which took
5.5 seconds compared with 1.35 in the fast case.) and the wider plan using the result of it.You say in the comments that on second execution it took
11 seconds. This is still 3 times slower than the fast plan (3.446 seconds) so doesn't explain all the performance difference.The main problem you are experiencing is due to the poor default estimation for multi statement TVFs (fixed guess of
100 rows in compatibility levels 2014/2016 and 1 in earlier versions). In reality your TVF returns 1,715 rows.In the inner join case because inner joins are associative and commutative it can be re-ordered flexibly and gets moved to the deepest part of the tree. This would make sense if one row was actually returned as it could whittle down the row count early for the other joins in the plan.
The execution plan is below. The pink annotations are "Actual Elapsed Time (ms)" from the XML.
Because of the 1 row estimate it starts off badly by joining onto
dbo.ParticipantTrades and selecting a plan with nested loops and lookups. That nested loops has an elapsed time of 13.976 seconds (presumably most of which was spent waiting on the spilling sort immediately upstream to request rows from it, with 3.26 seconds taken up on the lookups themselves and associated IO waits for the physical reads at that operator).923,646 rows are emitted from the join vs an estimated 1423.18 and this mis-estimate propagates upwards in the plan through 3 sorts and a hash join spilling as it goes (due to the row underestimate)When you add the
LEFT JOIN it can not be as freely re-ordered and the join happens much higher up in the plan (where it would do less damage in the problematic INNER JOIN case). The semantics of outer join help here anyway. Whilst the estimate for the number of rows coming out of the TVF is still
1 this does not adversely affect the estimates for the join it is directly involved in (SQL Server assumes that the cardinality coming out of that join will be the same as that of the other sub tree to the join and this is in fact what happens - an outer join cannot reduce this number as a non joined row would still pass through - just with NULL for the fvh columns).There are still cardinality estimation errors in the outer join plan but not of the same magnitude and it requests a sufficient memory grant to avoid spilling anywhere.
This cardinality estimation issue has been resolved in the most recent version with interleaved execution. In the meantime (as you say in the comments it needs to be 2008 compatible) you can manually interleave it by storing the TVF result into a
#temp table and joining onto that to allow the count of the intermediate result (and column statistics) to be taken into account.Context
StackExchange Database Administrators Q#229165, answer score: 6
Revisions (0)
No revisions yet.