patternsqlModerate
Is there any reason for an index to contain the primary key?
Viewed 0 times
theprimarycontainanyreasonforindextherekey
Problem
I inherited a project using SQL Server 2005. Recently there were some performance problems and I started looking at the indexes, and found
and
which appears to be generated by some kind of automatic performance tuning tool. I think SQL Server comes with something like that but I'm not familiar with it.
Anyway,
Can someone explain why a tool would have generated these indexes and if there's any point in keeping them?
Just because the indexes are getting used doesn't mean they're a good design. SQL Server is using the wrong indexes lately, and I'm trying to determine why. More specifically, queries against this table that ran in milliseconds a month ago are now taking multiple seconds. The table is huge (4M rows), and growing steadily, but at the same pace it's been going at for years.
Almost every query joins across
I discovered that
_dta_index_survey_25_135059617__K1_K10_K19:
id ASC, sent_date ASC, group_id ASCand
_dta_index_survey_21_364632442__K18_K1_K2_K9:
group_id ASC, id ASC, campaign_id ASC, sent_date ASCwhich appears to be generated by some kind of automatic performance tuning tool. I think SQL Server comes with something like that but I'm not familiar with it.
Anyway,
id is the primary key, and what I'm trying to understand here is how it could possibly be beneficial to include the primary key within a secondary index... because you have to have the leftmost pieces of information to use an index and if you know id then you've got the row anyway.Can someone explain why a tool would have generated these indexes and if there's any point in keeping them?
Just because the indexes are getting used doesn't mean they're a good design. SQL Server is using the wrong indexes lately, and I'm trying to determine why. More specifically, queries against this table that ran in milliseconds a month ago are now taking multiple seconds. The table is huge (4M rows), and growing steadily, but at the same pace it's been going at for years.
Almost every query joins across
group_id or campaign_id. Often both. The application almost never specifically selects or sorts by the PK, but I could see how ID could be useful when comparing or merging lists of IDs within the database engine. ... But id is also an identity column, and an index holds lists of row ids, so it still seems redundant to have the ID in the key when it's also in the index "buckets". And the additional fields could still be covered by the index without being part of the key.I discovered that
(id, sent_date, group_id) hardly ever gets read. Meanwhile (group_id, id, campaign_id, sent_date) is one of the hottest indexes. Next I found that an indexSolution
You are right to be suspicious of the two indexes that you listed in your question, but there are a few narrow use cases for including the primary key in an index as a key column. By default SQL Server will make your primary key into a clustered key which is the more important concept for performance for this question, so instead if referring to primary keys to answer I'll refer to clustered keys instead.
For the index on
SQL Server is likely to perform a clustered index scan to satisfy that data. All of the data from the data pages will be read which includes columns in the table that aren't referenced by the query.
After creating the
Note that
That index could also be marginally useful if your clustered key was defined as
Again, the key columns on
It is more difficult to think of queries that would benefit from the index on
I suspect there are also some data distributions that could benefit from the
and possibly do less IO than using the clustered index. However, I do not know if the query optimizer would actually do this.
In summary, there are some some corner cases with covering indexes and sort orders which could benefit from the indexes that you listed in the question. However, in my experience it will be pretty unlikely those indexes will provide a noticeable (or even any) benefit for your workload. If those indexes do happen to provide a benefit it's likely that there will be smaller indexes that could provide the same benefit.
For the index on
id, sent_date, and group_id, queries that only select those columns may benefit because the index will be a covering index for that query. You say that if you know id then you've got the row but what you really have is a very efficient way to read all of the data in data pages for that row. Suppose your (rowstore) table just has a clustered index on the id column. Consider the following query:SELECT id, sent_date, group_id
FROM your_table
ORDER BY id;SQL Server is likely to perform a clustered index scan to satisfy that data. All of the data from the data pages will be read which includes columns in the table that aren't referenced by the query.
After creating the
_dta_index_survey_25_135059617__K1_K10_K19 index then SQL Server will be able to satisfy that query with an index scan on the new index. It is possible that the index will take up less pages than the table so that could be more efficient from an IO perspective. For example, suppose that you had a VARCHAR(2000) column that was heavily populated with data. That column will be included in the data pages for the clustered index but will not be included in the nonclustered index. That means that satisfying the query using the nonclustered index could incur fewer logical reads.Note that
sent_date and group_id aren't useful as key columns for this example. They could be INCLUDED columns for the same benefit. Note that without the ORDER BY clause an index on just sent_date and group_id would also be a covering index, so including the id column wouldn't be useful for that example without ORDER BY.That index could also be marginally useful if your clustered key was defined as
id DESC (id ASC is the default). SQL Server cannot do backwards clustered index scans in parallel, but the following query could be run in parallel with a scan on _dta_index_survey_25_135059617__K1_K10_K19:SELECT id
FROM your_table
ORDER BY id DESC;Again, the key columns on
sent_date and group_id aren't useful for this query, but the query optimizer could use the first column of the defined index. I cannot think of a query that benefits from all three key columns.It is more difficult to think of queries that would benefit from the index on
group_id, id, campaign_id, and sent_date. Assuming that the clustered key is defined as id ASC, I believe that ORDER BY group_id, id, campaign_id, sent_date can be satisfied by an index on just group_id because id is a unique clustering key and that index will be created in group_id then id order. As before, you could benefit if your index on id is sorted in a different order than the clustering key on id.I suspect there are also some data distributions that could benefit from the
_dta_index_survey_21_364632442__K18_K1_K2_K9 index when filtering on both group_id and id, even though id is a unique clustered key. Suppose that each data page in the table contains 2 rows and half the rows in the table have group_id = 1. In theory, SQL Server could use that index to satisfy the following query:SELECT group_id, id, campaign_id, sent_date
FROM your_table
WHERE group_id = 1 AND id IN (1,3,5,7,9,...);and possibly do less IO than using the clustered index. However, I do not know if the query optimizer would actually do this.
In summary, there are some some corner cases with covering indexes and sort orders which could benefit from the indexes that you listed in the question. However, in my experience it will be pretty unlikely those indexes will provide a noticeable (or even any) benefit for your workload. If those indexes do happen to provide a benefit it's likely that there will be smaller indexes that could provide the same benefit.
Code Snippets
SELECT id, sent_date, group_id
FROM your_table
ORDER BY id;SELECT id
FROM your_table
ORDER BY id DESC;SELECT group_id, id, campaign_id, sent_date
FROM your_table
WHERE group_id = 1 AND id IN (1,3,5,7,9,...);Context
StackExchange Database Administrators Q#160314, answer score: 10
Revisions (0)
No revisions yet.