HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Cardinality estimation problem on inner join

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
problemcardinalityjoinestimationinner

Problem

I'm struggling to understand why row estimation is so terribly wrong, here is my case:

Simple join - using SQL Server 2016 sp2 (same issue on sp1), dbcompatiblity=130.

select Amount_TransactionCurrency_id, CurrencyShareds.id 
from CurrencyShareds 
    INNER JOIN annexes ON Amount_TransactionCurrency_id = CurrencyShareds.Id 
option (QUERYTRACEON 3604, QUERYTRACEON 2363);


SQL estimates 1 row, whereas it's 107131 and chooses to do a nested loop (link to plan). After statistics are updated on CurrencyShareds then estimation is fine and a merge join is chosen (link to new plan). As soon as just one record is added to CurrencyShareds, then statistics become "stale" and sql goes back to wrong estimation.

I wouldn't worry that much about this simple query, but this is just a part of a larger one, and this is the begining of a domino...

Why adding one row to 100 records table causes such a damage?
When looking into the output of cardinality estimation trace, I see this warning WARNING: badly-formed histogram but I couldn't find anything more on this topic.

Here is output the full output from cardinality estimation:

```
Begin selectivity computation
Input tree:

LogOp_Join

CStCollBaseTable(ID=1, CARD=107131 TBL: annexes)

CStCollBaseTable(ID=2, CARD=100 TBL: CurrencyShareds)

ScaOp_Comp x_cmpEq

ScaOp_Identifier QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id

ScaOp_Identifier QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id

Plan for computation:

CSelCalcExpressionComparedToExpression( QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id x_cmpEq QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id )

Loaded histogram for column QCOL: [test.MasterData].[dbo].[Annexes].Amount_TransactionCurrency_id from stats with id 7

Loaded histogram for column QCOL: [test.MasterData].[dbo].[CurrencyShareds].Id from stats with id 1 WARNING: badly-formed histogram

Selectivity: 4.59503e-018

Stats collection generated

Solution

Based on your histograms I was able to repro the issue in 2017 CU6. I wouldn't say that you're doing something wrong. Rather, something is going wrong with cardinality estimation. Here's what I get before inserting a row:

The final cardinality estimate falls quite a bit after inserting a row:

You have a pretty simple repro here so my advice is to file product feedback or to open a support ticket with Microsoft. I was able to find a few workarounds that worked on your sample data and one of the might be acceptable for you.

  • Drop the unique index on CurrencyShareds.Id. I can't get the repro to work without a unique index. The table is small, so maybe you can get by without the index. Of course, you might have very good reasons for keeping it.



  • Materialize the results of the join into a temp table. Based on your question it's important to get a reasonable estimate at this step so the larger query performs well. A temp table is one way to make that happen.



  • Use the legacy CE. I can't get the issue to reproduce with it. Of course, this might have negative consequences on the rest of your query.



  • Trick the query optimizer with silly code. For example, in my testing the following rewrite works great:



.

select Amount_TransactionCurrency_id, CurrencyShareds.id
from CurrencyShareds 
INNER JOIN annexes
ON Amount_TransactionCurrency_id % 9223372036854775809 = CurrencyShareds.Id % 9223372036854775809


I suspect that this works because the CE appears to use the density instead of the histogram. Other similar rewrites may have the same effect. There's no guarantee that type of query will continue to work well in the future. That's why you should contact Microsoft to improve the odds that one day a fix for your issue will make it into the released product.

Code Snippets

select Amount_TransactionCurrency_id, CurrencyShareds.id
from CurrencyShareds 
INNER JOIN annexes
ON Amount_TransactionCurrency_id % 9223372036854775809 = CurrencyShareds.Id % 9223372036854775809

Context

StackExchange Database Administrators Q#207671, answer score: 10

Revisions (0)

No revisions yet.