HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does SQL Server sometimes estimate that joining onto an empty table will increase the row count?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sometimesestimatewhythesqlemptyontothatwilldoes

Problem

I recently came across an issue where tSQLt tests were taking a long time to run.

The procedure under test was doing a 38 table (!) join (with 37 faked tables and a table valued parameter).

Only two of the faked tables and the TVP had any rows inserted

Compilation times were extremely slow.

Trace flag 8675 showed

End of simplification, time: 0.002 net: 0.002 total: 0 net: 0.002
end exploration, tasks: 549 no total cost time: 0.013 net: 0.013 total: 0 net: 0.015
end search(0),  cost: 13372.9 tasks: 3517 time: 0.012 net: 0.012 total: 0 net: 0.028
end exploration, tasks: 3983 Cost = 13372.9 time: 0 net: 0 total: 0 net: 0.028
end search(1),  cost: 6706.79 tasks: 10187 time: 0.024 net: 0.024 total: 0 net: 0.052
end exploration, tasks: 10188 Cost = 6706.79 time: 0 net: 0 total: 0 net: 0.052
end search(1),  cost: 6706.79 tasks: 61768 time: 0.165 net: 0.165 total: 0 net: 0.218
*** Optimizer time out abort at task 614400 ***
end search(2),  cost: 6706.79 tasks: 614400 time: 12.539 net: 12.539 total: 12 net: 12.758
*** Optimizer time out abort at task 614400 ***
End of post optimization rewrite, time: 0.001 net: 0.001 total: 12 net: 12.759
End of query plan compilation, time: 0.003 net: 0.003 total: 12 net: 12.762
SQL Server parse and compile time: 
   CPU time = 12735 ms, elapsed time = 12770 ms.


It looks like the estimated rows grow exponentially for each join between empty tables until at the end the estimated row count was 135,601,000 and the query had a huge estimated cost justifying longer compile time.

There was one specific table involved in a lot of these joins and inserting a single row to that table was enough to stop this explosion for the joins that table was involved in (cardinality estimator output indicates that it is now using the stats histogram from that table)

The original behaviour seems weird to me. SQL Server knows that the table it is joining onto is empty and the plan caching white paper indicates that inserting any row to an empty t

Solution

I don't know why the 'new' (or default, as Microsoft would have it) cardinality estimator resorts to a 50% selectivity guess when the 'combined distinct count' is 1, but it does:
Begin selectivity computation

Input tree:

LogOp_LeftOuterJoin
CStCollOuterJoin(ID=40, CARD=9 x_jtLeftOuter)
CStCollBaseTable(ID=1, CARD=9 TBL: T1)
CStCollBaseTable(ID=2, CARD=1 TBL: T2)
CStCollBaseTable(ID=3, CARD=1 TBL: T2 AS TBL: T3)
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [T3].C1
ScaOp_Identifier QCOL: [Sandpit].[dbo].[T2].C1

Plan for computation:
CSelCalcSimpleJoinWithDistinctCounts (Using base cardinality)
CDVCPlanJoin
Plan for non-join columns (Right)
CDVCPlanLeaf
0 Multi-Column Stats, 0 Single-Column Stats, 1 Guesses
CDVCPlanLeaf
0 Multi-Column Stats, 0 Single-Column Stats, 1 Guesses

Using ambient cardinality 1 to combine distinct counts:
1

Using ambient cardinality 1 to combine distinct counts:
1

Selectivity: 0.5

Stats collection generated:

CStCollOuterJoin(ID=41, CARD=13.3889 x_jtLeftOuter)
CStCollOuterJoin(ID=40, CARD=9 x_jtLeftOuter)
CStCollBaseTable(ID=1, CARD=9 TBL: T1)
CStCollBaseTable(ID=2, CARD=1 TBL: T2)
CStCollBaseTable(ID=3, CARD=1 TBL: T2 AS TBL: T3)
End selectivity computation


The 'legacy' (or better, as I would have it) cardinality estimator does not have this issue.

Context

StackExchange Database Administrators Q#321839, answer score: 4

Revisions (0)

No revisions yet.