patternsqlModerate
Applying indexes on columns used in join condition
Viewed 0 times
conditioncolumnsusedjoinindexesapplying
Problem
In the query below,how beneficial it is to use index on columns used in join conditions?
In above join, is it beneficial to index table
If suppose there is an index on
select * from table_a
inner join table_b on
table_a.col1 = table_b.col1 and
table_a.col2 = table_b.col2In above join, is it beneficial to index table
table_a(col1,col2) or table table_b(col1,col2) and does order of columns in index impact performance?If suppose there is an index on
table_b(col1,col2). Does it mean we need to change the query as follows?select * from table_a
inner join table_b on
table_b.col1 = table_a.col1 and
table_b.col2 = table_a.col2Solution
B-tree indexes provide three main benefits:
Without useful indexes, the optimizer's main join strategy is hash join, where the join keys from the smaller table are used to build a hash table, then join keys from the larger table are used to probe into that hash table for matches. Hash join requires memory to hold the complete hash table.
With useful indexes, the optimizer may also consider indexed nested loops join or merge join strategies:
Indexed nested loops join
This physical join strategy primarily takes advantage of benefit #1 above. It reads from the smaller table then seeks into an index on the larger table on the join keys. For equi-join, index key order is not significant. Seeking on
For your example, an index on
Merge join (without explicit sorts)
This physical join primarily uses benefit #2 above (index order). The indexes must include all the join columns, in the same key order on both tables. A merge join on
Merge join is most efficient when at least one of the inputs is guaranteed unique on the join keys.
For your example, indexes on
The textual order of
More generally, how indexes impact on physical join choice is only part of the equation. Indexes can be useful for other (non-join) predicates in the query, as well as other operations like aggregation, windowing, and final presentation order.
Balancing index usefulness for the whole query, and all queries in the workload, versus the cost of maintaining those indexes is a significant part of the art of database administration and tuning.
- Seek to a specific value or range of values
- Return rows in index order
- Smaller row size than the base table (clustered index or heap)
Without useful indexes, the optimizer's main join strategy is hash join, where the join keys from the smaller table are used to build a hash table, then join keys from the larger table are used to probe into that hash table for matches. Hash join requires memory to hold the complete hash table.
With useful indexes, the optimizer may also consider indexed nested loops join or merge join strategies:
Indexed nested loops join
This physical join strategy primarily takes advantage of benefit #1 above. It reads from the smaller table then seeks into an index on the larger table on the join keys. For equi-join, index key order is not significant. Seeking on
(col1, col2) is possible whether the index keys are (col1, col2) or (col2, col1).For your example, an index on
(col1, col2) or (col2, col1) for either table would enable an indexed nested loops join strategy. Ideally, the index would be on the larger table.Merge join (without explicit sorts)
This physical join primarily uses benefit #2 above (index order). The indexes must include all the join columns, in the same key order on both tables. A merge join on
(col1, col2) can use indexes on (col1, col2) or (col2, col1), but the key order must be the same for both tables.Merge join is most efficient when at least one of the inputs is guaranteed unique on the join keys.
For your example, indexes on
(col1, col2) or (col2, col1) on both tables would be required. The indexes would need to have the same key order on both, and ideally the index would be unique on those columns for at least one table.db<>fiddle demoThe textual order of
ON clause predicates is not important. The following are exactly equivalent. Which you use is purely a matter of style:-- I prefer this style
SELECT *
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
ON TB.col1 = TA.col1
AND TB.col2 = TA.col2;
SELECT *
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
ON TA.col1 = TB.col1
AND TA.col2 = TB.col2;More generally, how indexes impact on physical join choice is only part of the equation. Indexes can be useful for other (non-join) predicates in the query, as well as other operations like aggregation, windowing, and final presentation order.
Balancing index usefulness for the whole query, and all queries in the workload, versus the cost of maintaining those indexes is a significant part of the art of database administration and tuning.
Code Snippets
-- I prefer this style
SELECT *
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
ON TB.col1 = TA.col1
AND TB.col2 = TA.col2;
SELECT *
FROM dbo.table_a AS TA
JOIN dbo.table_b AS TB
ON TA.col1 = TB.col1
AND TA.col2 = TB.col2;Context
StackExchange Database Administrators Q#180295, answer score: 13
Revisions (0)
No revisions yet.