patternsqlModerate
Bitmap Creation in Execution Plan Causes bad Estimate on Clustered Index Scan
Viewed 0 times
estimateclusteredscanbadplanbitmapindexcreationexecutioncauses
Problem
Given the following simple query on the StackOverflow2010 database:
I was trying to understand why creating an index
yields a better estimate on
When I run the query with no indexes, I get Plan 1. In this plan, SQL Server estimates 298,910 rows coming out of the CI scan on Posts and actually 552 come back - this estimate is way off.
Once I add the index, I get Plan 2, which causes an index seek and much more accurate estimate.
I was curious as to why adding an index would cause a better estimate, since statistics are created when a column is used in a
On further inspection, I can see the predicate on
Plan 1 predicate
Plan 2 predicate
So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe.
That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a
SELECT u.DisplayName,
u.Reputation
FROM Users u
JOIN Posts p
ON u.id = p.OwnerUserId
WHERE u.DisplayName = 'alex' AND
p.CreationDate >= '2010-01-01' AND
p.CreationDate <= '2010-03-01'I was trying to understand why creating an index
CREATE INDEX IX_CreationDate ON Posts
(
CreationDate
)
INCLUDE (OwnerUserId)yields a better estimate on
Posts.CreationDateWhen I run the query with no indexes, I get Plan 1. In this plan, SQL Server estimates 298,910 rows coming out of the CI scan on Posts and actually 552 come back - this estimate is way off.
Once I add the index, I get Plan 2, which causes an index seek and much more accurate estimate.
I was curious as to why adding an index would cause a better estimate, since statistics are created when a column is used in a
WHERE predicate, regardless of whether it is indexed or not.On further inspection, I can see the predicate on
Posts.CreationDate is different on plan 1 vs plan 2:Plan 1 predicate
[StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]>='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]<='2010-03-01 00:00:00.000' AND PROBE([Bitmap1002],[StackOverflow2010].[dbo].[Posts].[OwnerUserId] as [p].[OwnerUserId],N'[IN ROW]')Plan 2 predicate
Seek Keys[1]: Start: [StackOverflow2010].[dbo].[Posts].CreationDate >= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate <= Scalar Operator('2010-03-01 00:00:00.000')So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe.
That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a
Solution
A number of factors in play:
The index comes with full scan statistics. The auto-created ones were sampled.
Different cardinality estimation models and execution modes handle the calculation differently. You might be happier with the estimate using the original CE model in this case:
Bitmap plan operators only appear in row mode parallel plans. I wrote about the details in Bitmap Magic (or… how SQL Server uses bitmap filters)
Bitmaps are possible in both serial and parallel batch mode plans. You have your database set to compatibility mode 130 so batch mode on rowstore is not available to you. Side note: you might want to apply the latest CU for 2019—you're still on RTM.
The estimate formula varies, but usually has its roots in estimating a semi join between filtered rows on the build side of the hash join and the target table using histograms. Sometimes it is a guess. Sometimes the bitmap is not considered at all:
In parallel row mode plans, there are two types of bitmap. The original type of bitmap was added heuristically after query optimization has completed. Since it is not present during optimization, it has no effect on cardinality estimates. These bitmaps are named
It is more difficult to see because the effect of the bitmap is mixed in with the CreationDate predicate. We can separate them out with undocumented trace flag 9130:
The bitmap is still applied in-row at the scan, but the predicate on CreationDate is in a later Filter operator:
The estimate on the scan is the full cardinality of the base table, despite the bitmap still being applied there:
If you're interested in seeing a plan without the bitmap to compare estimates, you can enable undocumented trace flag 9498.
The second type of row mode bitmap is a so-called optimized bitmap. These are evaluated as part of cost-based optimization, so they do have an effect on cardinality estimates and final plan shape. These bitmaps are named
I wrote about the details for batch mode in Batch Mode Bitmaps in SQL Server.
The index comes with full scan statistics. The auto-created ones were sampled.
Different cardinality estimation models and execution modes handle the calculation differently. You might be happier with the estimate using the original CE model in this case:
USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')Bitmap plan operators only appear in row mode parallel plans. I wrote about the details in Bitmap Magic (or… how SQL Server uses bitmap filters)
Bitmaps are possible in both serial and parallel batch mode plans. You have your database set to compatibility mode 130 so batch mode on rowstore is not available to you. Side note: you might want to apply the latest CU for 2019—you're still on RTM.
The estimate formula varies, but usually has its roots in estimating a semi join between filtered rows on the build side of the hash join and the target table using histograms. Sometimes it is a guess. Sometimes the bitmap is not considered at all:
In parallel row mode plans, there are two types of bitmap. The original type of bitmap was added heuristically after query optimization has completed. Since it is not present during optimization, it has no effect on cardinality estimates. These bitmaps are named
Bitmapxxxx. Yours is one of these:It is more difficult to see because the effect of the bitmap is mixed in with the CreationDate predicate. We can separate them out with undocumented trace flag 9130:
SELECT
U.DisplayName,
U.Reputation
FROM dbo.Users AS U
JOIN dbo.Posts AS P
ON U.id = P.OwnerUserId
WHERE
U.DisplayName = N'alex'
AND P.CreationDate >= CONVERT(datetime, '2010-01-01', 121)
AND P.CreationDate <= CONVERT(datetime, '2010-03-01', 121)
OPTION
(
USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130'),
QUERYTRACEON 9130
);The bitmap is still applied in-row at the scan, but the predicate on CreationDate is in a later Filter operator:
The estimate on the scan is the full cardinality of the base table, despite the bitmap still being applied there:
If you're interested in seeing a plan without the bitmap to compare estimates, you can enable undocumented trace flag 9498.
The second type of row mode bitmap is a so-called optimized bitmap. These are evaluated as part of cost-based optimization, so they do have an effect on cardinality estimates and final plan shape. These bitmaps are named
Opt_Bitmapxxx.I wrote about the details for batch mode in Batch Mode Bitmaps in SQL Server.
Code Snippets
USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')SELECT
U.DisplayName,
U.Reputation
FROM dbo.Users AS U
JOIN dbo.Posts AS P
ON U.id = P.OwnerUserId
WHERE
U.DisplayName = N'alex'
AND P.CreationDate >= CONVERT(datetime, '2010-01-01', 121)
AND P.CreationDate <= CONVERT(datetime, '2010-03-01', 121)
OPTION
(
USE HINT ('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130'),
QUERYTRACEON 9130
);Context
StackExchange Database Administrators Q#306708, answer score: 10
Revisions (0)
No revisions yet.