HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Do I need separate indexes for each type of query, or will one multi-column index work?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
eachmultineedcolumnqueryindexesseparatetypeonewill

Problem

I somewhat know the answer to this question already, but I always feel as though there is more I need to pick up on the topic.

My basic understanding is that generally speaking, a single index that just includes all the fields you might be querying/sorting on at any given time isn't likely to be useful, yet I have seen this type of thing. As in, someone thought, "Well, if we just put all this stuff in an index, the database can use it to find what it needs", without having ever seen an execution plan for some of the actual queries being run.

Imagine a table like so:

id int pk/uid
name varchar(50)
customerId int (foreign key)
dateCreated datetime


I might see a single index including the name, customerId and dateCreated fields.

But my understanding is that such an index would not be used in a query like, for example:

SELECT [id], [name], [customerId], [dateCreated]
   FROM Representatives WHERE customerId=1 
   ORDER BY dateCreated


For such a query, it seems to me that a better idea would be an index including the customerId and dateCreated fields, with the customerId field being 'first'. This would create an index that would have the data organized in such a way that this query could quickly find what it needs - in the order that it needs.

Another thing I see, perhaps as frequently as the first, is individual indexes on each field; so, one each on name, customerId and dateCreated fields.

Unlike the first example, this type of arrangement seems to me sometimes to at least be partially useful; the query's execution plan may show that at least it's using the index on the customerId to select the records, but it's not using the index with the dateCreated field to sort them.

I know this is a broad question, because the specific answer to any particular query on any particular set of tables is usually to see what the execution plan says it's going to do, and otherwise take the specifics of the table(s) and queries into accoun

Solution

You are right in that your example query would not use that index.

The query planner will consider using an index if:

  • all the fields contained in it are referenced in the query



  • some of the fields starting from the beginning are referenced



It will not be able to make use of indexes that start with a field not used by the query.

So for your example:

SELECT [id], [name], [customerId], [dateCreated]
   FROM Representatives WHERE customerId=1 
   ORDER BY dateCreated


it would consider indexes such as:

[customerId]
[customerId], [dateCreated]
[customerId], [dateCreated], [name]


but not:

[name], [customerId], [dateCreated]


If it found both [customerId] and [customerId], [dateCreated], [name] its decision to prefer one over the other would depend on the index stats which depend on estimates of the balance of data in the fields. If [customerId], [dateCreated] were defined it should prefer that over the other two unless you give a specific index hint to the contrary.

It is not uncommon to see one index defined for every field in my experience either, though this is rarely optimal as the extra management needed to update the indexes on insert/update, and the extra space needed to store them, is wasted when half of them may never get used - but unless your DB sees write-heavy loads the performance is not going to stink badly even with the excess indexes.

Specific indexes for frequent queries that would otherwise be slow due to table or index scanning is generally a good idea, though don't overdo it as you could be exchanging one performance issue for another. If you do define [customerId], [dateCreated] as an index, for example, remember that the query planner will be able to use that for queries that would use an index on just [customerId] if present. While using just [customerId] would be slightly more efficient than using the compound index this may be mitigated by ending up having two indexes competing for space in RAM instead of one (though if your entire normal working set fits easily into RAM this extra memory competition may not be an issue).

Code Snippets

SELECT [id], [name], [customerId], [dateCreated]
   FROM Representatives WHERE customerId=1 
   ORDER BY dateCreated
[customerId]
[customerId], [dateCreated]
[customerId], [dateCreated], [name]
[name], [customerId], [dateCreated]

Context

StackExchange Database Administrators Q#197, answer score: 28

Revisions (0)

No revisions yet.