HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Applying indexes on columns used in join condition

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
conditioncolumnsusedjoinindexesapplying

Problem

In the query below,how beneficial it is to use index on columns used in join conditions?

select * from table_a
inner join table_b on
           table_a.col1 = table_b.col1 and 
           table_a.col2 = table_b.col2


In above join, is it beneficial to index table table_a(col1,col2) or table table_b(col1,col2) and does order of columns in index impact performance?

If suppose there is an index on table_b(col1,col2). Does it mean we need to change the query as follows?

select * from table_a
    inner join table_b on
               table_b.col1 = table_a.col1 and 
               table_b.col2 = table_a.col2

Solution

B-tree indexes provide three main benefits:

  • Seek to a specific value or range of values



  • Return rows in index order



  • Smaller row size than the base table (clustered index or heap)



Without useful indexes, the optimizer's main join strategy is hash join, where the join keys from the smaller table are used to build a hash table, then join keys from the larger table are used to probe into that hash table for matches. Hash join requires memory to hold the complete hash table.

With useful indexes, the optimizer may also consider indexed nested loops join or merge join strategies:
Indexed nested loops join

This physical join strategy primarily takes advantage of benefit #1 above. It reads from the smaller table then seeks into an index on the larger table on the join keys. For equi-join, index key order is not significant. Seeking on (col1, col2) is possible whether the index keys are (col1, col2) or (col2, col1).

For your example, an index on (col1, col2) or (col2, col1) for either table would enable an indexed nested loops join strategy. Ideally, the index would be on the larger table.
Merge join (without explicit sorts)

This physical join primarily uses benefit #2 above (index order). The indexes must include all the join columns, in the same key order on both tables. A merge join on (col1, col2) can use indexes on (col1, col2) or (col2, col1), but the key order must be the same for both tables.

Merge join is most efficient when at least one of the inputs is guaranteed unique on the join keys.

For your example, indexes on (col1, col2) or (col2, col1) on both tables would be required. The indexes would need to have the same key order on both, and ideally the index would be unique on those columns for at least one table.

db<>fiddle demo

The textual order of ON clause predicates is not important. The following are exactly equivalent. Which you use is purely a matter of style:

-- I prefer this style
SELECT * 
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
    ON TB.col1 = TA.col1
    AND TB.col2 = TA.col2;

SELECT * 
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
    ON TA.col1 = TB.col1
    AND TA.col2 = TB.col2;


More generally, how indexes impact on physical join choice is only part of the equation. Indexes can be useful for other (non-join) predicates in the query, as well as other operations like aggregation, windowing, and final presentation order.

Balancing index usefulness for the whole query, and all queries in the workload, versus the cost of maintaining those indexes is a significant part of the art of database administration and tuning.

Code Snippets

-- I prefer this style
SELECT * 
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
    ON TB.col1 = TA.col1
    AND TB.col2 = TA.col2;

SELECT * 
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
    ON TA.col1 = TB.col1
    AND TA.col2 = TB.col2;

Context

StackExchange Database Administrators Q#180295, answer score: 13

Revisions (0)

No revisions yet.