HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

SQL Server: Accurate Row Estimates without Histogram?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
estimateswithoutsqlhistogramaccurateserverrow

Problem

I'm exploring the optimizer in SQL Server, particularly histograms and I'm seeing some strange, results.

If I execute the query below, where I know there is no histogram on the field, SQL Server outputs a row estimate equal to the actual the result. If I were to run the same query where I use DECLARE @i NUMERIC(19,2) = 30.0 instead of a constant WHERE [Rate] is used or >= as well as 24.0 or 64 format queries where the predicate shows an implict conversion to typemoney.

For WHERE [Rate] > 6.5 the estimate is 28.44 which then stops at rates above 100.

For WHERE [Rate] > 7 where decimals are excluded, the 28.44 estimate runs all the way up to 255

For WHERE [Rate] >= 6.5 the 28.44 estimate runs up to about 100 then changes to 17.7764 up to 1000 before the estimate drops to 1.

So there doesn't seem to be much consistency. It seems like SQL Server is aware of the constraint based on some of the queries I'm experimenting with, but it also ignores its presence at times.

Solution

First, as you noticed, SQL Server will automatically create missing statistics (where possible) when compiling an execution plan, if the database option AUTO_CREATE_STATISTICS is on.

When statistics are not available, the default guess for an unknown inequality predicate is 0.3 (30%). When you use a local variable, the value in the variable cannot be sniffed (unless OPTION (RECOMPILE) is also specified). With a table cardinality of 316, the estimate is 0.3 * 316 = 94.8.

When you use a constant literal, the value is sniffed. Without statistics, it cannot use this sniffed value to check the histogram as it normally would, but it can judge its impact on the CHECK constraint (the one limiting Rate values to between $6.50 and $200).

If the sniffed value does not exclude the check constraint range completely, the estimate is based on the 0.09 (9%) guessed selectivity for a BETWEEN predicate (from the check constraint). 316 * 0.09 = 28.44.

If the sniffed value does exclude the check constraint range completely, the estimate is always 1 row (the cardinality estimator (almost) never produces an estimated number of rows less than 1).

If the query is simple enough to qualify for a trivial plan, and it is considered safe for simple parameterization, the constant literal is replaced with a parameter marker e.g. @1. This happens for a query like:

SELECT * 
FROM HumanResources.EmployeePayHistory AS EPH 
WHERE EPH.Rate > $200;


The execution plan shows an estimate of 1 row:

And the Scan Predicate shows a parameter marker:

If simple parameterization is prevented, e.g. by adding a clause that compares a constant with a constant:

SELECT * 
FROM HumanResources.EmployeePayHistory AS EPH 
WHERE EPH.Rate > $200
AND 1 = 1;


Without parameterization, this plan can never be reused with a different value for the parameter, so the optimizer can statically eliminate table access on the basis that the check constraint prevents any rows being returned. The resuting plan is:

Finally, be careful with types. The type of the Rate column is money, not decimal. Conversions can affect cardinality estimation in complicated ways. Specify a money literal with the $ prefix, or use an explicit CAST or CONVERT.

The internal details of cardinality estimation are not documented publicly (probably to avoid endless ever more detailed questions and complaints when things change), but there are a range of resources out there to aid you in this area. Just remember most of this is unoffical and so not supported by anyone. Some aspects are actually dangerous.

Some apply only to the original (pre-2014) cardinality estimator, some explain general principles applicable to both, and some are applicable to the "new" CE only. The following is not meant as an authoritative or complete list, just the ones that came to mind immediately:

  • The SQL Server 2014 Cardinality Estimator



  • 13 Things You Should Know About Statistics and the Query Optimizer



  • Cardinality Estimation (SQL Server)



  • Constant Folding and Expression Evaluation During Cardinality Estimation



  • Statistics Used by the Query Optimizer



  • Plan Caching and Recompilation



  • Nacho Alonso Portillo's blog (MSFT) (search for cardinality)



  • Dima Piliugin's blog



  • SQLblog.com posts



  • SQLperformance.com posts



  • Microsoft SQL Server Internals (book) Kalen Delaney et al.

Code Snippets

SELECT * 
FROM HumanResources.EmployeePayHistory AS EPH 
WHERE EPH.Rate > $200;
SELECT * 
FROM HumanResources.EmployeePayHistory AS EPH 
WHERE EPH.Rate > $200
AND 1 = 1;

Context

StackExchange Database Administrators Q#125794, answer score: 13

Revisions (0)

No revisions yet.