patternsqlMinor
Does SQL Server use (unused)index stats for optimizer
Viewed 0 times
sqlunusedstatsfordoesoptimizerserverindexuse
Problem
I am wondering about unused index in MS SQL Server. By the Index usage DMV I can identify an index which has not been used for seeks, scans or lookups.
However I know from Oracle that an index might not be used in such a way in a execution plan, however it can still contribute statistics/cardinality information to the (Oracle) optimizer. This contribution is not monitored in the same way.
So I am wondering if in MSSQL a Index can have a similar positive effect even when it is not directly used (in a representative time frame)? And specifically, if it can be better than a column statistic (I.e. dropping the index would be harmful).
I haven’t seen this mentioned in any of the index tuning articles I have come along, so I assume MSSQL (up to 2017) does not have this concept, is that correct?
However I know from Oracle that an index might not be used in such a way in a execution plan, however it can still contribute statistics/cardinality information to the (Oracle) optimizer. This contribution is not monitored in the same way.
So I am wondering if in MSSQL a Index can have a similar positive effect even when it is not directly used (in a representative time frame)? And specifically, if it can be better than a column statistic (I.e. dropping the index would be harmful).
I haven’t seen this mentioned in any of the index tuning articles I have come along, so I assume MSSQL (up to 2017) does not have this concept, is that correct?
Solution
Yes, statistics based on indexes can be used to help with query plan creation even if the underlying index isn't used to access data in the plan. Consider that the query optimizer may consider many different query plans and data access paths while creating a query plan. The compiled query plan may end up not using one of the indexes that was considered. That certainly doesn't mean that any query plan that benefited from the statistics of that index needs to be invalidated, right?
An example might be helpful as well. First I'll throw about 6.5 million rows into a heap:
Next I'll create an index on one of the columns and look at the histogram for the statistics object that is automatically created.
Here's the histogram:
Based on the statistics there are 5806440 rows in the table with a value of 1 for
The query optimizer has a few different access paths for the data. It also has a few choices for how to calculate the aggregate. One of the considerations for the picking an algorithm for the agggregate is the cardinality estimate of the data. Here's a screenshot of the query plan:
Note that the estimate matches the histogram exactly even though the index isn't used to access data. Newer versions of SQL Server show which statistics were considered during optimization in the query plan. You can see that the statistic associated with the index was used:
However, the
An example might be helpful as well. First I'll throw about 6.5 million rows into a heap:
DROP TABLE IF EXISTS dbo.A_GOOD_HEAP;
CREATE TABLE dbo.A_GOOD_HEAP (
INDEXED_COLUMN BIGINT NULL,
OTHER_COLUMN BIGINT NULL
);
INSERT INTO dbo.A_GOOD_HEAP WITH (TABLOCK)
SELECT CASE WHEN t.RN % 10 = 0 THEN 0 ELSE 1 END
, RN
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t;Next I'll create an index on one of the columns and look at the histogram for the statistics object that is automatically created.
CREATE INDEX IX ON dbo.A_GOOD_HEAP (INDEXED_COLUMN);
DBCC SHOW_STATISTICS ('A_GOOD_HEAP', 'IX');Here's the histogram:
╔══════════════╦════════════╦═════════╦═════════════════════╦════════════════╗
║ RANGE_HI_KEY ║ RANGE_ROWS ║ EQ_ROWS ║ DISTINCT_RANGE_ROWS ║ AVG_RANGE_ROWS ║
╠══════════════╬════════════╬═════════╬═════════════════════╬════════════════╣
║ 0 ║ 0 ║ 645160 ║ 0 ║ 1 ║
║ 1 ║ 0 ║ 5806440 ║ 0 ║ 1 ║
╚══════════════╩════════════╩═════════╩═════════════════════╩════════════════╝Based on the statistics there are 5806440 rows in the table with a value of 1 for
INDEXED_COLUMN. Now consider this query:SELECT COUNT(DISTINCT OTHER_COLUMN)
FROM dbo.A_GOOD_HEAP
WHERE INDEXED_COLUMN = 1;The query optimizer has a few different access paths for the data. It also has a few choices for how to calculate the aggregate. One of the considerations for the picking an algorithm for the agggregate is the cardinality estimate of the data. Here's a screenshot of the query plan:
Note that the estimate matches the histogram exactly even though the index isn't used to access data. Newer versions of SQL Server show which statistics were considered during optimization in the query plan. You can see that the statistic associated with the index was used:
However, the
sys.dm_db_index_usage_stats dmv doesn't report any end user activity on the index.Code Snippets
DROP TABLE IF EXISTS dbo.A_GOOD_HEAP;
CREATE TABLE dbo.A_GOOD_HEAP (
INDEXED_COLUMN BIGINT NULL,
OTHER_COLUMN BIGINT NULL
);
INSERT INTO dbo.A_GOOD_HEAP WITH (TABLOCK)
SELECT CASE WHEN t.RN % 10 = 0 THEN 0 ELSE 1 END
, RN
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
) t;CREATE INDEX IX ON dbo.A_GOOD_HEAP (INDEXED_COLUMN);
DBCC SHOW_STATISTICS ('A_GOOD_HEAP', 'IX');╔══════════════╦════════════╦═════════╦═════════════════════╦════════════════╗
║ RANGE_HI_KEY ║ RANGE_ROWS ║ EQ_ROWS ║ DISTINCT_RANGE_ROWS ║ AVG_RANGE_ROWS ║
╠══════════════╬════════════╬═════════╬═════════════════════╬════════════════╣
║ 0 ║ 0 ║ 645160 ║ 0 ║ 1 ║
║ 1 ║ 0 ║ 5806440 ║ 0 ║ 1 ║
╚══════════════╩════════════╩═════════╩═════════════════════╩════════════════╝SELECT COUNT(DISTINCT OTHER_COLUMN)
FROM dbo.A_GOOD_HEAP
WHERE INDEXED_COLUMN = 1;Context
StackExchange Database Administrators Q#206614, answer score: 3
Revisions (0)
No revisions yet.