patternsqlModerate

Estimations seem too low/inaccurate for nested loops operator

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

seemoperatorlowinaccurateloopstoonestedestimationsfor

Problem

This question is a follow-up to somebody else's question: Adding an INNER JOIN ruins query performance due to different execution plan despite updated STATISTICS and RECOMPILE, why?

My question is about a small part of the execution plan.

The Index Seek operator on the Events.nci_wi_... index (Node 22) estimates to return 77191 rows. This is an outer input for the Nested Loops operator (Node 19). The inner input is an index seek on Locations.UC_LocationID (Node 23). The estimated number of executions of this node is 2614.

My naive understanding was that since Nested Loops is being fed 77191 rows on the outer it has no other choice but to execute the inner operators exactly as many times (at least with the inner join logical operator). Why is there a difference here?

The Nested Loops itself estimates to return 40 rows. This is three orders of magnitude less than the number of rows on the outer input.

I understand this is possible when QO needs to estimate the number of matching rows statistically, but in this particular case there exists a foreign key constraint between Events and Location on columns (LocationId, TenantID), also TenantID and LocationID are NOT NULL if Events. I would then expect the optimizer to assume, that for every row in the outer input there exists exactly one row in the inner input, to the estimated number of rows for the Nested Loops should be 77191 rows as well. Why does this not happen here?

I am also curious why Nested Loops was chosen in this scenario and why the spill occurs at the Sort.

Solution

General consistency

SQL Server does not make any general guarantees about the consistency of estimates in execution plans. The estimates do start off being consistent because cardinality estimation is performed on the entire original tree early on. Thereafter, different subtrees (parts of the visible plan) can be re-estimated at different times and in different ways.

As you may know, the cost-based optimizer can explore logically equivalent representations for parts of the tree. When a new logical equivalence is generated for some part of the query tree, the optimizer may need to derive new cardinality estimates and other local properties to work out the cost of the new alternative. Bear in mind, the logical alternative may have quite different operators from the original tree fragment, and more or fewer of them.

Since such derivations are statistical by nature and sensitive to the replacement subtree shape, the new estimates and properties may be quite different from the originals. There's no general way to know which to prefer, even where there is a straight choice to be made.

Ultimately, the optimizer chooses the cheapest option according to its cost model for each part of the tree it considers separately. The final plan is a 'stitched together' patchwork of the cheapest per-group options. There can easily be mismatches in cardinality estimation and other properties at the seams.
This example

Looking at the estimated row counts in Plan Explorer:

You can see the 25 and 41 row estimates are popular. This is a strong indication those numbers featured in the initial round of cardinality estimation.

Focusing on the estimates in the area of interest:

The much larger estimates are indicative of a re-estimation on a new candidate sub-tree, with a different shape and potentially different predicates. This is just the normal operation of a cost-based optimizer with exploration capabilities.

Both sets of estimates will be logically sound and defensible, given the context they were derived in. They disagree with each other, but that's just the nature of such things. One estimate turned out to be closer to reality than the other. The trick would be to know which estimate was better in advance.

As an aside, the low initial estimate of 41 rows caused the optimizer to prefer a nested loops (or a rather an apply) physical implementation in that area of the plan. It also ultimately causes the sort to spill since memory is assigned proportional to cardinality and average row size.
Other details

The index seek at node 22 is subject to an optimized bitmap filter, pushed down from the probe side of the hash join at node 8. Further, it has the IN ROW optimization applied, which means it is pushed right down into the storage engine and evaluated by the access method, before any qualifying rows are surfaced to the query processor.

This is an optimized bitmap filter, so it was present, and its effects considered, during query optimization. This is different from a static bitmap filter introduced heuristically after optimization is complete. The static type of bitmap doesn't affect cardinality estimates for that reason.

However, it is more complex that than. An IN ROW bitmap has two components: a small bitmap capable of being evaluated in the storage engine, and the full bitmap, which is applied by the query processor as a residual. That difference only manifests in post-execution 'actual' row count effects (e.g. the Number of Rows Read) but I mention it for completeness.

The effect of a bitmap filter is usually evaluated at an exchange operator (repartition streams) on the probe side of the hash join. The cost of this alternative helps the optimizer decide whether to use a hash join with bitmaps or not.

As a separate optimization, subsequent processing tries to push this filter as far down the tree from the exchange as possible. This later processing is quite basic and does not attempt to re-derive costs or estimates for dependent operators. It falls into the class of things considered 'always good'.

There happens to be no exchange on the probe side of the hash join in this plan because the optimizer ultimately chose to use Broadcast partitioning for the exchange at node 10. All rows are sent to all threads, so no probe-side partitioning is needed. Nevertheless, the bitmap filter is pushed down from where the exchange would have been.

The optimizer doesn't start with a hash join + bitmaps plan. It's only ever considered as an alternative during a parallel-enabled run of cost-based exploration.

I can't give you a precise answer with all the numbers and transformation details without access to the database producing this plan. Even so, it is a pattern that's familiar to me through experience: Some of the estimates were derived early on and survived the optimization process. Others were derived during cost-based optimization. The final franken-plan includes aspects from each.

You also mention FK relationships.

Code Snippets

SELECT P.ProductID 
FROM Production.Product AS P
JOIN 
(
    SELECT TH.TransactionID, TH.ProductID 
    FROM Production.TransactionHistory AS TH
    UNION ALL 
    SELECT THA.TransactionID, THA.ProductID
    FROM Production.TransactionHistoryArchive AS THA
) AS U
    ON U.ProductID = P.ProductID
OPTION (FORCE ORDER);

Context

StackExchange Database Administrators Q#318844, answer score: 12

Revisions (0)

No revisions yet.