HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Bitmap Creation in Execution Plan Causes bad Estimate on Clustered Index Scan

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
estimateclusteredscanbadplanbitmapindexcreationexecutioncauses

Problem

Given the following simple query on the StackOverflow2010 database:

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
        JOIN Posts p
            ON u.id = p.OwnerUserId
WHERE   u.DisplayName = 'alex' AND
        p.CreationDate >= '2010-01-01' AND
        p.CreationDate <= '2010-03-01'


I was trying to understand why creating an index

CREATE INDEX IX_CreationDate ON Posts
(
    CreationDate
)
INCLUDE (OwnerUserId)


yields a better estimate on Posts.CreationDate

When I run the query with no indexes, I get Plan 1. In this plan, SQL Server estimates 298,910 rows coming out of the CI scan on Posts and actually 552 come back - this estimate is way off.

Once I add the index, I get Plan 2, which causes an index seek and much more accurate estimate.

I was curious as to why adding an index would cause a better estimate, since statistics are created when a column is used in a WHERE predicate, regardless of whether it is indexed or not.

On further inspection, I can see the predicate on Posts.CreationDate is different on plan 1 vs plan 2:

Plan 1 predicate

[StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]>='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]<='2010-03-01 00:00:00.000' AND PROBE([Bitmap1002],[StackOverflow2010].[dbo].[Posts].[OwnerUserId] as [p].[OwnerUserId],N'[IN ROW]')


Plan 2 predicate

Seek Keys[1]: Start: [StackOverflow2010].[dbo].[Posts].CreationDate >= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate <= Scalar Operator('2010-03-01 00:00:00.000')


So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe.

That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a

Solution

A number of factors in play:

The index comes with full scan statistics. The auto-created ones were sampled.

Different cardinality estimation models and execution modes handle the calculation differently. You might be happier with the estimate using the original CE model in this case:

USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')


Bitmap plan operators only appear in row mode parallel plans. I wrote about the details in Bitmap Magic (or… how SQL Server uses bitmap filters)

Bitmaps are possible in both serial and parallel batch mode plans. You have your database set to compatibility mode 130 so batch mode on rowstore is not available to you. Side note: you might want to apply the latest CU for 2019—you're still on RTM.

The estimate formula varies, but usually has its roots in estimating a semi join between filtered rows on the build side of the hash join and the target table using histograms. Sometimes it is a guess. Sometimes the bitmap is not considered at all:

In parallel row mode plans, there are two types of bitmap. The original type of bitmap was added heuristically after query optimization has completed. Since it is not present during optimization, it has no effect on cardinality estimates. These bitmaps are named Bitmapxxxx. Yours is one of these:

It is more difficult to see because the effect of the bitmap is mixed in with the CreationDate predicate. We can separate them out with undocumented trace flag 9130:

SELECT
    U.DisplayName,
    U.Reputation
FROM dbo.Users AS U
JOIN dbo.Posts AS P
    ON U.id = P.OwnerUserId
WHERE
    U.DisplayName = N'alex' 
    AND P.CreationDate >= CONVERT(datetime, '2010-01-01', 121)
    AND P.CreationDate <= CONVERT(datetime, '2010-03-01', 121)
OPTION 
(
    USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130'),
    QUERYTRACEON 9130
);


The bitmap is still applied in-row at the scan, but the predicate on CreationDate is in a later Filter operator:

The estimate on the scan is the full cardinality of the base table, despite the bitmap still being applied there:

If you're interested in seeing a plan without the bitmap to compare estimates, you can enable undocumented trace flag 9498.

The second type of row mode bitmap is a so-called optimized bitmap. These are evaluated as part of cost-based optimization, so they do have an effect on cardinality estimates and final plan shape. These bitmaps are named Opt_Bitmapxxx.

I wrote about the details for batch mode in Batch Mode Bitmaps in SQL Server.

Code Snippets

USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')
SELECT
    U.DisplayName,
    U.Reputation
FROM dbo.Users AS U
JOIN dbo.Posts AS P
    ON U.id = P.OwnerUserId
WHERE
    U.DisplayName = N'alex' 
    AND P.CreationDate >= CONVERT(datetime, '2010-01-01', 121)
    AND P.CreationDate <= CONVERT(datetime, '2010-03-01', 121)
OPTION 
(
    USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130'),
    QUERYTRACEON 9130
);

Context

StackExchange Database Administrators Q#306708, answer score: 10

Revisions (0)

No revisions yet.