HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Order of composite index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
indexordercomposite

Problem

Suppose I have a query like:

SELECT *
FROM table_a
    JOIN table_b USING (id)
WHERE table_b.column = 1


I have an index on id and an index on column but often I add a composite index with both that can improve efficiency of queries like this. My question is concerning the order of the columns in the index. By trial and error I've found that sometimes the DBMS prefers the joined index first and sometimes it prefers the WHERE index first.

In the above query is there a hard, fast rule I can abide by to know which key order will work best?

Typically I just add both indexes, run EXPLAIN on the query and check to see which is preferred, then remove the other one. But this process feels like it could be improved by a better understanding of the logic involved in determining index order.

Solution

For this query

SELECT *
FROM table_a
    JOIN table_b USING (id)
WHERE table_b.column = 1


The optimal way is execute it is

  • The WHERE clause provides some filtering, so let's make use of it. That is, have an index on table_b starting with column. (Later we'll discuss whether to make it composite.) So, the Optimizer will use that index to find row(s) of table_b.



  • For each of those rows, JOIN to table_a. (Note that JOIN, not LEFT JOIN is being used; LEFT JOIN is a different story.)



  • To reach into table_a, an index starting with id is needed. (Note: USING(id) means table_a.id = table_b.id.)



So far, we have

b:  INDEX(column)
a:  INDEX(id)   -- though it probably exists as PRIMARY KEY(id)


Covering?

We don't know what other columns there are in the two tables. If there are very few columns, then it might be tempting to build a "covering" index. This is an index that contains all the columns needed anywhere in the SELECT. The benefit is some performance speedup by looking only in the index's BTree and not having to touch the data BTree.

For table_b, it would be tempting to say INDEX(column, id). That would be good (and 'covering') if there were only those two columns. But there are probably more columns. So, probably INDEX(column) is all that is worth doing.

For table_a, I assume that id is the PRIMARY KEY (which is, by definition, unique and an index). So nothing more is needed there.

Bottom line: Use the two single-column indexes listed above.

And this example does not exemplify anything about "composite" indexes. For more on that, see

Cardinality & Range

Cardinality and Composite

Single-column index

Indexing cookbook


but often I add a composite index with both that can improve efficiency of queries like this...

Better Example

As I said, your example does not exemplify the question. So, I'll try to answer "When should I use a composite index"? There are many cases (see the links); I'll give you a simple case.

WHERE x = 1
  AND y > 2


The relevant characteristics are:

  • x and y are in the same table. (Can't build an index across two tables.)



  • AND is used. (OR can't be optimized.)



  • One of the tests is with =. (Composite won't help if both are ranges.)



  • y is a "range" (examples: y>2, y LIKE 'm%', y BETWEEN ... AND ...).



The general rule is:

  • Put all the = columns first (x in my example)



  • Put one range column last (y)



That is, you must order it INDEX(x,y).

For WHERE x = 1 AND y = 2 (both =), it does not matter whether you have INDEX(x,y) or INDEX(y,x) .

Another tidbit: With ENGINE=InnoDB, the PRIMARY KEY column(s) are implicitly tacked onto each secondary key. Hence, your INDEX(column) is the same as INDEX(column, id). But this fact does not play a role in this discussion.

I realize that I am disagreeing with other Answers here (and elsewhere), but I stand my ground.

Code Snippets

SELECT *
FROM table_a
    JOIN table_b USING (id)
WHERE table_b.column = 1
b:  INDEX(column)
a:  INDEX(id)   -- though it probably exists as PRIMARY KEY(id)
WHERE x = 1
  AND y > 2

Context

StackExchange Database Administrators Q#205182, answer score: 5

Revisions (0)

No revisions yet.