patternsqlModerate
statistics are up to date, but estimate is incorrect
Viewed 0 times
estimateincorrectarebutdatestatistics
Problem
When I do
For this query:
I get a query plan that makes a Clustered Index Seek on
But what baffles me is the incorrect value for Estimated Number of Rows:
According to this:
When the sample query WHERE clause value is equal to a histogram
RANGE_HI_KEY value, SQL Server will use the EQ_ROWS column in the
histogram to determine the number of rows that are equal to
This is also the way I would expect it to be, however it seems not to be the case in real life. I also tried some other
All in all: Can someone explain me why is
A bit more (possibly helpful) info:
The query seems to load a total of 5 different statistics objects, all of which contain
```
+-------------------------------------------------------------------------+----------+--------------+--------------+---------------------+--------------+------------+----------+---------------------+----------------+
| name
dbcc show_statistics ('Reports_Documents', PK_Reports_Documents) I get the following result for Report ID 18698: For this query:
SELECT *
FROM Reports_Documents
WHERE ReportID = 18698 option (recompile)I get a query plan that makes a Clustered Index Seek on
PK_Reports_Documents as expected. But what baffles me is the incorrect value for Estimated Number of Rows:
According to this:
When the sample query WHERE clause value is equal to a histogram
RANGE_HI_KEY value, SQL Server will use the EQ_ROWS column in the
histogram to determine the number of rows that are equal to
This is also the way I would expect it to be, however it seems not to be the case in real life. I also tried some other
RANGE_HI_KEY values that were present in the histogram provided by show_statistics and experienced the same. This issue in my case seems to cause some queries to use very unoptimal execution plans resulting in an execution time of a few minutes whereas I can get it to run in 1 sec with a query hint.All in all: Can someone explain me why is
EQ_ROWS from the histogram not being used for the Estimated Number of Rows and where does the incorrect estimate come from?A bit more (possibly helpful) info:
- Auto create statistics is on and all the statistics are up to date.
- The table being queried has about 80 million rows.
PK_Reports_Documentsis a combination PK consisting ofReportID INTandDocumentID CHAR(8)
The query seems to load a total of 5 different statistics objects, all of which contain
ReportID + some other columns from the table. They have all been freshly updated. RANGE_HI_KEY in the table below is the highest upper bound column value in the histogram.```
+-------------------------------------------------------------------------+----------+--------------+--------------+---------------------+--------------+------------+----------+---------------------+----------------+
| name
Solution
There is a simple solution to this:
Drop all of the
More information
The particular problem was that there were multiple sets of statistics for the column in question. The extra
As is often the case with sampled statistics, the resulting histograms did not cover the full range of the unerlying data. The query in the question happened to choose a value that was outside the histogram, resulting in a 1-row estimate.
The exact behaviour of the query optimizer when multiple sets of statistics exist for the same column is not fully documented. It does tend to prefer 'full scan' statistics over sampled, but it also prefers more recently-updated statistics to older ones.
Drop all of the
_dta_... statistics and stop blindly applying DTA recommendations.More information
The particular problem was that there were multiple sets of statistics for the column in question. The extra
dta statistics were created by sampling the data (the default behaviour for statistics not associated with an index).As is often the case with sampled statistics, the resulting histograms did not cover the full range of the unerlying data. The query in the question happened to choose a value that was outside the histogram, resulting in a 1-row estimate.
The exact behaviour of the query optimizer when multiple sets of statistics exist for the same column is not fully documented. It does tend to prefer 'full scan' statistics over sampled, but it also prefers more recently-updated statistics to older ones.
Context
StackExchange Database Administrators Q#93900, answer score: 10
Revisions (0)
No revisions yet.