patternsqlMinor

Actual and Estimated rows differ greatly

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

rowsgreatlyactualdifferestimatedand

Problem

The full Actual plan is here.

Prior to executing the plan (because I'm debugging a poorly functioning plan) I have this block of variable assignments:

DECLARE @Days INT = 180
DECLARE @DateRangeFrom DateTime = DATEADD(d, -@Days, getDate())
DECLARE @DateRangeTo DateTime = getDate()
DECLARE @FacilityID INT = 1010
DECLARE @Answer0 INT = 1879
DECLARE @Answer1 INT = 1949
DECLARE @Answer1SetID INT = 1607
DECLARE @Answer2 INT = 1907
DECLARE @Answer2SetID INT = 1593

My first problem is with the lookup I'm performing on the IRItemAnswer_Info table (Node ID 19). It's spilling to Tempdb which already starts the query off on the wrong foot.
It's referencing the IRItemAnswerInfo_DGItemID_AnswerSourceID index, which is the correct index, as I'm matching on DGItemID and AnswerSourceID, and getting back IncidentID. The index is created as

CREATE NONCLUSTERED INDEX IRItemAnswerInfo_DGItemID_AnswerSourceID
ON dbo.IRItemAnswer_Info (DGItemID, AnswerSourceID) 
INCLUDE([IncidentID], [AnswerBoolean])

However, the Estimated Rows for the query is 53,459 and the Actual Rows is 969,812.

I just finished forcing new statistics via UPDATE STATISTICS IRItemAnswer_Info IRItemAnswerInfo_DGItemID_AnswerSourceID WITH FULLSCAN and it made no difference.

DBCC SHOW_STATISTICS ('IRItemAnswer_Info', 'DGItemID') for DGItemID=1949 has EQ_ROWS as 1,063,536 and

DBCC SHOW_STATISTICS ('IRItemAnswer_Info', 'AnswerSourceID') for AnswerSourceID=1607 has EQ_ROWS as 970,079

The database is running Compatibility level 140 (SQL Server 2017). We would run 2019, but there are issues we need to correct in the stored procedures before we can do that.

What should be the next thing I look at?

I chose the worst performing output, which is the most common values. IRItemAnswer_Info is a table containing user-defined answers to associate to an event, where DGItemID=1949 is one of the most common questions (almost every event has one), and where AnswerSourceID=1607 i

Solution

As discussed in the related Q & A How does SQL Server know predicates are correlated? SQL Server assumes predicates are completely independent by default.

It only has detailed statistical information (histograms) on the single leading column, even where multi-column indexes or statistics are used. The question then is how to combine two statistics histograms from two separate predicates.

For example, say you have a query with WHERE c1 = x AND c2 = y. The selectivity of c1 = x is computed to be 0.2 from histogram information. The selectivity of c2 = y is computed to be 0.1 from a separate histogram.

What is the selectivity of the two predicates together? 0.2? 0.1? 0.2 x 0.1? Somewhere in between?

Without specific additional information, SQL Server has to make an educated guess. The original default was to assume complete independence. The newer cardinality estimation framework uses exponential backoff (the 'somewhere in between' option).

Your case is slightly different, in that you have two equality tests on columns in a multi-column index, which comes with multi-column statistics. These are not so grand as they might sound. We still only get a histogram on the leading column, but the statistics object does contain average density information for multiple columns.

For example an index on (a,b,c) would provide density information for (a), (a,b), and (a,b,c). This frequency information does capture something about correlation, but it is a single number at each level. This means a frequency-based estimate will always produce the same estimate given the same number of columns.

SQL Server does produce a selectivity estimate from the multi-column frequency information, but it also computes selectivity from the individual column histograms (where available). The histogram estimate assumes independence, and does not use exponential backoff.

The server chooses the histogram-based estimate if it is a higher selectivity than the frequency-based estimate. This appears to be the case in your example.

Plan for computation:

  CSelCalcPointPredsFreqBased
      Distinct value calculation:
          CDVCPlanLeaf
              1 Multi-Column Stats, 0 Single-Column Stats, 0 Guesses

      Individual selectivity calculations:
          CSelCalcColumnInInterval
              Column: QCOL: [IIAI].DGItemID

          CSelCalcColumnInInterval
              Column: QCOL: [IIAI].AnswerSourceID

Loaded histogram for column QCOL: [IIAI].DGItemID from stats with id 2
Loaded histogram for column QCOL: [IIAI].AnswerSourceID from stats with id 3

Cardinality using multi-column statistics 5.45574e-07 and 
with independence assumption 0.00231336. 
Picking cardinality 0.00231336

Selectivity: 0.00231336

From information in the question, the individual selectivities are:

DGItemID = 1063536 out of 19299400

AnswerSourceID = 970079 out of 19299400

Assuming independence, for AND we multiply those selectivities then multiply by the full table cardinality to produce the row estimate:

19299400  (1063536/19299400  970079/19299400) = 53458.3427124.

There are a number of internal model variations that approach the task in different ways. Only a few are publicly documented and exposed via hints or trace flags.

Ordinarily, it would seem that the following hint could be helpful:

SELECT
    COUNT_BIG(*)
FROM [VaxxTracker].[dbo].[IRItemAnswer_Info] AS iiai1
WHERE
    iiai1.DGItemID = 1949 
    AND iiai1.AnswerSourceID = 1607
OPTION (USE HINT('ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES'));

Documentation

'ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES'

Causes SQL Server to generate a plan using minimum selectivity when estimating AND predicates for filters to account for full correlation. This hint name is equivalent to trace flag 4137 when used with cardinality estimation model of SQL Server 2012 (11.x) and earlier versions, and has similar effect when trace flag 9471 is used with cardinality estimation model of SQL Server 2014 (12.x) or higher.

Unfortunately, that hint does not apply when cardinality estimation started with a frequency-based calculation, using multi-column statistics.

You might get a better result in your case using the original CE model:

USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')

Code Snippets

SELECT
    COUNT_BIG(*)
FROM [VaxxTracker].[dbo].[IRItemAnswer_Info] AS iiai1
WHERE
    iiai1.DGItemID = 1949 
    AND iiai1.AnswerSourceID = 1607
OPTION (USE HINT('ASSUME_MIN_SELECTIVITY_FOR_FILTER_ESTIMATES'));

USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')

Context

StackExchange Database Administrators Q#305220, answer score: 6

Revisions (0)

No revisions yet.