patternModerate
Composite indexes: Most selective column first?
Viewed 0 times
selectivecolumnindexesfirstcompositemost
Problem
I’ve been reading about
In general, you should put the column expected to be used most often first in the index.
However, shortly after it says
create a composite index putting the most selective column first; that is, the column with the most values.
Oracle also says it here in other words
If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.
However, I have found a SO answer that says differently. It says
Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.
The first documentation I referenced says that you should first go by the most often used whereas the SO answer says that should only be for tie breaking. Then they also differ on the ordering.
This documentation also talks about
Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.
Another article says
The prefix column should be the most discriminating and the most widely used in queries
which I believe most discriminating would mean most distinctive.
All of this research still leads me to the same question; should The most selective column be first or last? Should the first column be the most used and only the most selective on a tie-break?
These articles seem to be contradicting each other, but they do offer some examples. From what I have gathered, it seems to be more efficient for the
composite indexes and I’m slightly confused about ordering. This documentation (little less than half way down) saysIn general, you should put the column expected to be used most often first in the index.
However, shortly after it says
create a composite index putting the most selective column first; that is, the column with the most values.
Oracle also says it here in other words
If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.
However, I have found a SO answer that says differently. It says
Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.
The first documentation I referenced says that you should first go by the most often used whereas the SO answer says that should only be for tie breaking. Then they also differ on the ordering.
This documentation also talks about
skip scanning and saysSkip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.
Another article says
The prefix column should be the most discriminating and the most widely used in queries
which I believe most discriminating would mean most distinctive.
All of this research still leads me to the same question; should The most selective column be first or last? Should the first column be the most used and only the most selective on a tie-break?
These articles seem to be contradicting each other, but they do offer some examples. From what I have gathered, it seems to be more efficient for the
least selective column to be the first in the ordering if you are anticipating Index Skip Scans. But I'm not really sure if that is correct.Solution
From AskTom
(in 9i, there is a new "index skip scan" -- search for that there to read about that. It makes the index (a,b) OR (b,a) useful in both of the above cases sometimes!)
So, the order of columns in your index depends on HOW YOUR QUERIES are written. You want to be able to use the index for as many queries as you can (so as to cut down on the over all number of indexes you have) -- that will drive the order of the columns. Nothing else (selectivity of a or b does not count at all).
One of the arguments for arranging columns in the composite index in order from the least discriminating(less distinct values) to the most discriminating(more distinct values) is for index key compression.
According to the index statistics, the first index is more compressible.
Another is how the index is used in your queries.
If your queries mostly use
For example, if you have queries like-
-then
If your queries mostly use
-then
If all of your queries always specify both columns then it doesn't matter which column come first in the composite index.
In conclusion,
the key considerations in column ordering of composite index are index key compression and how you are going to use this index in your queries.
References:
(in 9i, there is a new "index skip scan" -- search for that there to read about that. It makes the index (a,b) OR (b,a) useful in both of the above cases sometimes!)
So, the order of columns in your index depends on HOW YOUR QUERIES are written. You want to be able to use the index for as many queries as you can (so as to cut down on the over all number of indexes you have) -- that will drive the order of the columns. Nothing else (selectivity of a or b does not count at all).
One of the arguments for arranging columns in the composite index in order from the least discriminating(less distinct values) to the most discriminating(more distinct values) is for index key compression.
SQL> create table t as select * from all_objects;
Table created.
SQL> create index t_idx_1 on t(owner,object_type,object_name);
Index created.
SQL> create index t_idx_2 on t(object_name,object_type,owner);
Index created.
SQL> select count(distinct owner), count(distinct object_type), count(distinct object_name ), count(*) from t;
COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOBJECT_NAME) COUNT(*)
-------------------- -------------------------- -------------------------- ----------
30 45 52205 89807
SQL> analyze index t_idx_1 validate structure;
Index analyzed.
SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave from index_stats;
BTREE_SPACE PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
5085584 90 2 28
SQL> analyze index t_idx_2 validate structure;
Index analyzed.
SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave from index_stats;
BTREE_SPACE PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
5085584 90 1 14According to the index statistics, the first index is more compressible.
Another is how the index is used in your queries.
If your queries mostly use
col1, For example, if you have queries like-
select * from t where col1 = :a and col2 = :b;
select * from t where col1 = :a;
-then
index(col1,col2) would perform better. If your queries mostly use
col2,select * from t where col1 = :a and col2 = :b;
select * from t where col2 = :b;
-then
index(col2,col1) would perform better.If all of your queries always specify both columns then it doesn't matter which column come first in the composite index.
In conclusion,
the key considerations in column ordering of composite index are index key compression and how you are going to use this index in your queries.
References:
- Column order in Index
- It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?
- Index Skip Scan – Does Index Column Order Matter Any More ? (Warning Sign)
Code Snippets
SQL> create table t as select * from all_objects;
Table created.
SQL> create index t_idx_1 on t(owner,object_type,object_name);
Index created.
SQL> create index t_idx_2 on t(object_name,object_type,owner);
Index created.
SQL> select count(distinct owner), count(distinct object_type), count(distinct object_name ), count(*) from t;
COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOBJECT_NAME) COUNT(*)
-------------------- -------------------------- -------------------------- ----------
30 45 52205 89807
SQL> analyze index t_idx_1 validate structure;
Index analyzed.
SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave from index_stats;
BTREE_SPACE PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
5085584 90 2 28
SQL> analyze index t_idx_2 validate structure;
Index analyzed.
SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave from index_stats;
BTREE_SPACE PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
5085584 90 1 14Context
StackExchange Database Administrators Q#160649, answer score: 14
Revisions (0)
No revisions yet.