HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Composite indexes: Most selective column first?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
selectivecolumnindexesfirstcompositemost

Problem

I’ve been reading about composite indexes and I’m slightly confused about ordering. This documentation (little less than half way down) says


In general, you should put the column expected to be used most often first in the index.

However, shortly after it says


create a composite index putting the most selective column first; that is, the column with the most values.

Oracle also says it here in other words


If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.

However, I have found a SO answer that says differently. It says


Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.

The first documentation I referenced says that you should first go by the most often used whereas the SO answer says that should only be for tie breaking. Then they also differ on the ordering.

This documentation also talks about skip scanning and says


Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.

Another article says


The prefix column should be the most discriminating and the most widely used in queries

which I believe most discriminating would mean most distinctive.

All of this research still leads me to the same question; should The most selective column be first or last? Should the first column be the most used and only the most selective on a tie-break?

These articles seem to be contradicting each other, but they do offer some examples. From what I have gathered, it seems to be more efficient for the least selective column to be the first in the ordering if you are anticipating Index Skip Scans. But I'm not really sure if that is correct.

Solution

From AskTom


(in 9i, there is a new "index skip scan" -- search for that there to read about that. It makes the index (a,b) OR (b,a) useful in both of the above cases sometimes!)


So, the order of columns in your index depends on HOW YOUR QUERIES are written. You want to be able to use the index for as many queries as you can (so as to cut down on the over all number of indexes you have) -- that will drive the order of the columns. Nothing else (selectivity of a or b does not count at all).

One of the arguments for arranging columns in the composite index in order from the least discriminating(less distinct values) to the most discriminating(more distinct values) is for index key compression.

SQL> create table t as select * from all_objects;

Table created.

SQL> create index t_idx_1 on t(owner,object_type,object_name);

Index created.

SQL> create index t_idx_2 on t(object_name,object_type,owner);

Index created.

SQL> select count(distinct owner), count(distinct object_type), count(distinct object_name ), count(*)  from t;

COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOBJECT_NAME)      COUNT(*)
-------------------- -------------------------- --------------------------      ----------
                 30                         45                       52205      89807

SQL> analyze index t_idx_1 validate structure; 

Index analyzed.

SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave from index_stats;

BTREE_SPACE   PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
    5085584     90          2           28

SQL> analyze index t_idx_2 validate structure; 

Index analyzed.

SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave  from index_stats; 

BTREE_SPACE   PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
    5085584     90          1           14


According to the index statistics, the first index is more compressible.

Another is how the index is used in your queries.
If your queries mostly use col1,

For example, if you have queries like-

  • select * from t where col1 = :a and col2 = :b;



  • select * from t where col1 = :a;



-then index(col1,col2) would perform better.

If your queries mostly use col2,

  • select * from t where col1 = :a and col2 = :b;



  • select * from t where col2 = :b;



-then index(col2,col1) would perform better.
If all of your queries always specify both columns then it doesn't matter which column come first in the composite index.

In conclusion,
the key considerations in column ordering of composite index are index key compression and how you are going to use this index in your queries.

References:

  • Column order in Index



  • It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?



  • Index Skip Scan – Does Index Column Order Matter Any More ? (Warning Sign)

Code Snippets

SQL> create table t as select * from all_objects;

Table created.

SQL> create index t_idx_1 on t(owner,object_type,object_name);

Index created.

SQL> create index t_idx_2 on t(object_name,object_type,owner);

Index created.

SQL> select count(distinct owner), count(distinct object_type), count(distinct object_name ), count(*)  from t;

COUNT(DISTINCTOWNER) COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOBJECT_NAME)      COUNT(*)
-------------------- -------------------------- --------------------------      ----------
                 30                         45                       52205      89807

SQL> analyze index t_idx_1 validate structure; 

Index analyzed.

SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave from index_stats;

BTREE_SPACE   PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
    5085584     90          2           28

SQL> analyze index t_idx_2 validate structure; 

Index analyzed.

SQL> select btree_space, pct_used, opt_cmpr_count, opt_cmpr_pctsave  from index_stats; 

BTREE_SPACE   PCT_USED OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
----------- ---------- -------------- ----------------
    5085584     90          1           14

Context

StackExchange Database Administrators Q#160649, answer score: 14

Revisions (0)

No revisions yet.