patternsqlModerate
Cardinality Estimation for >= and > for intra step statistics value
Viewed 0 times
cardinalityvaluestepstatisticsforandestimationintra
Problem
I am trying to understand how SQL Server try to estimate for 'greater than' and 'greater than equal to' where clauses in SQL Server 2014.
I think I do understand the cardinality estimation when it hits the step for example if I do
The cardinality estimation is 6672 which can be easily calculated as
32(EQ_ROWS) + 6624(RANGE_ROWS) + 16 (EQ_ROWS) = 6672 (histogram in below screenshot)
But when I do
(increased the time to 10:48 so its not a step)
the estimate is 4844.13.
How is that calculated?
I think I do understand the cardinality estimation when it hits the step for example if I do
select * from charge where charge_dt >= '1999-10-13 10:47:38.550'The cardinality estimation is 6672 which can be easily calculated as
32(EQ_ROWS) + 6624(RANGE_ROWS) + 16 (EQ_ROWS) = 6672 (histogram in below screenshot)
But when I do
select * from charge where charge_dt >= '1999-10-13 10:48:38.550'(increased the time to 10:48 so its not a step)
the estimate is 4844.13.
How is that calculated?
Solution
The only difficulty is in deciding how to handle the histogram step(s) partially covered by the query predicate interval. Whole histogram steps covered by the predicate range are trivial as noted in the question.
Legacy Cardinality Estimator
The basic idea is to use
Cardinality = EQ_ROWS + (AVG_RANGE_ROWS F DISTINCT_RANGE_ROWS)
The same formula is used for
New Cardinality Estimator
The new CE modifies the previous algorithm slightly to differentiate between
Taking
Cardinality = EQ_ROWS + (AVG_RANGE_ROWS (F (DISTINCT_RANGE_ROWS - 1)))
For
Cardinality = EQ_ROWS + (AVG_RANGE_ROWS ((F (DISTINCT_RANGE_ROWS - 1)) + 1))
The
In the question example,
The result is 0.728219019233034. Plugging that into the formula for
Cardinality = EQ_ROWS + (AVG_RANGE_ROWS ((F (DISTINCT_RANGE_ROWS - 1)) + 1))
= 16 + (16.1956 ((0.728219019233034 (409 - 1)) + 1))
= 16 + (16.1956 ((0.728219019233034 408) + 1))
= 16 + (16.1956 * (297.113359847077872 + 1))
= 16 + (16.1956 * 298.113359847077872)
= 16 + 4828.1247307393343837632
= 4844.1247307393343837632
= 4844.12473073933 (to float precision)
This result agrees with the estimate of 4844.13 shown in the question.
The same query using the legacy CE (e.g. using trace flag 9481) should produce an estimate of:
Cardinality = EQ_ROWS + (AVG_RANGE_ROWS F DISTINCT_RANGE_ROWS)
= 16 + (16.1956 0.728219019233034 409)
= 16 + 4823.72307468722
= 4839.72307468722
Note the estimate would be the same for
Legacy Cardinality Estimator
F = fraction (between 0 and 1) of the step range covered by the query predicate.The basic idea is to use
F (linear interpolation) to determine how many of the intra-step distinct values are covered by the predicate. Multiplying this result by the average number of rows per distinct value (assuming uniformity), and adding the step equal rows gives the cardinality estimate:Cardinality = EQ_ROWS + (AVG_RANGE_ROWS F DISTINCT_RANGE_ROWS)
The same formula is used for
> and >= in the legacy CE.New Cardinality Estimator
The new CE modifies the previous algorithm slightly to differentiate between
> and >=.Taking
> first, the formula is:Cardinality = EQ_ROWS + (AVG_RANGE_ROWS (F (DISTINCT_RANGE_ROWS - 1)))
For
>= it is:Cardinality = EQ_ROWS + (AVG_RANGE_ROWS ((F (DISTINCT_RANGE_ROWS - 1)) + 1))
The
+ 1 reflects that when the comparison involves equality, a match is assumed (the inclusion assumption).In the question example,
F can be calculated as:DECLARE
@Q datetime = '1999-10-13T10:48:38.550',
@K1 datetime = '1999-10-13T10:47:38.550',
@K2 datetime = '1999-10-13T10:51:19.317';
DECLARE
@QR float = DATEDIFF(MILLISECOND, @Q, @K2), -- predicate range
@SR float = DATEDIFF(MILLISECOND, @K1, @K2) -- whole step range
SELECT
F = @QR / @SR;The result is 0.728219019233034. Plugging that into the formula for
>= with the other known values:Cardinality = EQ_ROWS + (AVG_RANGE_ROWS ((F (DISTINCT_RANGE_ROWS - 1)) + 1))
= 16 + (16.1956 ((0.728219019233034 (409 - 1)) + 1))
= 16 + (16.1956 ((0.728219019233034 408) + 1))
= 16 + (16.1956 * (297.113359847077872 + 1))
= 16 + (16.1956 * 298.113359847077872)
= 16 + 4828.1247307393343837632
= 4844.1247307393343837632
= 4844.12473073933 (to float precision)
This result agrees with the estimate of 4844.13 shown in the question.
The same query using the legacy CE (e.g. using trace flag 9481) should produce an estimate of:
Cardinality = EQ_ROWS + (AVG_RANGE_ROWS F DISTINCT_RANGE_ROWS)
= 16 + (16.1956 0.728219019233034 409)
= 16 + 4823.72307468722
= 4839.72307468722
Note the estimate would be the same for
> and >= using the legacy CE.Code Snippets
DECLARE
@Q datetime = '1999-10-13T10:48:38.550',
@K1 datetime = '1999-10-13T10:47:38.550',
@K2 datetime = '1999-10-13T10:51:19.317';
DECLARE
@QR float = DATEDIFF(MILLISECOND, @Q, @K2), -- predicate range
@SR float = DATEDIFF(MILLISECOND, @K1, @K2) -- whole step range
SELECT
F = @QR / @SR;Context
StackExchange Database Administrators Q#148523, answer score: 14
Revisions (0)
No revisions yet.