patternsqlMinor
Order of composite index
Viewed 0 times
indexordercomposite
Problem
Suppose I have a query like:
I have an index on
In the above query is there a hard, fast rule I can abide by to know which key order will work best?
Typically I just add both indexes, run
SELECT *
FROM table_a
JOIN table_b USING (id)
WHERE table_b.column = 1I have an index on
id and an index on column but often I add a composite index with both that can improve efficiency of queries like this. My question is concerning the order of the columns in the index. By trial and error I've found that sometimes the DBMS prefers the joined index first and sometimes it prefers the WHERE index first.In the above query is there a hard, fast rule I can abide by to know which key order will work best?
Typically I just add both indexes, run
EXPLAIN on the query and check to see which is preferred, then remove the other one. But this process feels like it could be improved by a better understanding of the logic involved in determining index order.Solution
For this query
The optimal way is execute it is
So far, we have
Covering?
We don't know what other columns there are in the two tables. If there are very few columns, then it might be tempting to build a "covering" index. This is an index that contains all the columns needed anywhere in the
For
For
Bottom line: Use the two single-column indexes listed above.
And this example does not exemplify anything about "composite" indexes. For more on that, see
Cardinality & Range
Cardinality and Composite
Single-column index
Indexing cookbook
but often I add a composite index with both that can improve efficiency of queries like this...
Better Example
As I said, your example does not exemplify the question. So, I'll try to answer "When should I use a composite index"? There are many cases (see the links); I'll give you a simple case.
The relevant characteristics are:
The general rule is:
That is, you must order it
For
Another tidbit: With
I realize that I am disagreeing with other Answers here (and elsewhere), but I stand my ground.
SELECT *
FROM table_a
JOIN table_b USING (id)
WHERE table_b.column = 1The optimal way is execute it is
- The
WHEREclause provides some filtering, so let's make use of it. That is, have an index ontable_bstarting withcolumn. (Later we'll discuss whether to make it composite.) So, the Optimizer will use that index to find row(s) oftable_b.
- For each of those rows,
JOINtotable_a. (Note thatJOIN, notLEFT JOINis being used;LEFT JOINis a different story.)
- To reach into
table_a, an index starting withidis needed. (Note:USING(id)meanstable_a.id = table_b.id.)
So far, we have
b: INDEX(column)
a: INDEX(id) -- though it probably exists as PRIMARY KEY(id)Covering?
We don't know what other columns there are in the two tables. If there are very few columns, then it might be tempting to build a "covering" index. This is an index that contains all the columns needed anywhere in the
SELECT. The benefit is some performance speedup by looking only in the index's BTree and not having to touch the data BTree.For
table_b, it would be tempting to say INDEX(column, id). That would be good (and 'covering') if there were only those two columns. But there are probably more columns. So, probably INDEX(column) is all that is worth doing.For
table_a, I assume that id is the PRIMARY KEY (which is, by definition, unique and an index). So nothing more is needed there.Bottom line: Use the two single-column indexes listed above.
And this example does not exemplify anything about "composite" indexes. For more on that, see
Cardinality & Range
Cardinality and Composite
Single-column index
Indexing cookbook
but often I add a composite index with both that can improve efficiency of queries like this...
Better Example
As I said, your example does not exemplify the question. So, I'll try to answer "When should I use a composite index"? There are many cases (see the links); I'll give you a simple case.
WHERE x = 1
AND y > 2The relevant characteristics are:
xandyare in the same table. (Can't build an index across two tables.)
ANDis used. (ORcan't be optimized.)
- One of the tests is with
=. (Composite won't help if both are ranges.)
yis a "range" (examples:y>2,y LIKE 'm%',y BETWEEN ... AND ...).
The general rule is:
- Put all the
=columns first (xin my example)
- Put one range column last (
y)
That is, you must order it
INDEX(x,y).For
WHERE x = 1 AND y = 2 (both =), it does not matter whether you have INDEX(x,y) or INDEX(y,x) .Another tidbit: With
ENGINE=InnoDB, the PRIMARY KEY column(s) are implicitly tacked onto each secondary key. Hence, your INDEX(column) is the same as INDEX(column, id). But this fact does not play a role in this discussion.I realize that I am disagreeing with other Answers here (and elsewhere), but I stand my ground.
Code Snippets
SELECT *
FROM table_a
JOIN table_b USING (id)
WHERE table_b.column = 1b: INDEX(column)
a: INDEX(id) -- though it probably exists as PRIMARY KEY(id)WHERE x = 1
AND y > 2Context
StackExchange Database Administrators Q#205182, answer score: 5
Revisions (0)
No revisions yet.