patternsqlCritical
Working of indexes in PostgreSQL
Viewed 0 times
postgresqlworkingindexes
Problem
I have a couple of questions regarding working of indexes in PostgreSQL.
I have a
-
Are these equivalent? If not then why?
-
If I create Primary Key(user_id1,user_id2), does it automatically create indexes for it and
If the indexes in the first question are not equivalent, then which index is created on above primary key command?
I have a
Friends table with the following index:Friends ( user_id1 ,user_id2)user_id1 and user_id2 are foreign keys to user table-
Are these equivalent? If not then why?
Index(user_id1,user_id2) and Index(user_id2,user_id1)-
If I create Primary Key(user_id1,user_id2), does it automatically create indexes for it and
If the indexes in the first question are not equivalent, then which index is created on above primary key command?
Solution
This answer is about (default) B-tree indexes. See this later, related answer about GiST, GIN etc.:
Here are the results of querying a table on the second column of a multicolumn index.
The effects are easy to reproduce for anybody. Try it at home.
I tested with PostgreSQL 9.0.5 on Debian using a medium sized table of a real-life database with 23322 rows. It implements the n:m relationship between the tables
The
The table is clustered on the
Sequential scans for queries on
I ran the queries a couple of times to populate the cache and the picked the best of ten runs to get comparable results.
Output of
Output of
Output of
Output of EXPLAIN ANALYZE:
Output of
Conclusion
As expected, the multi-column index is used for a query on the second column alone.
As expected, it is less efficient, but the query is still 3x faster than without the index.
After disabling index scans, the query planner chooses a bitmap heap scan, which performs almost as fast. Only after disabling that, too, it falls back to a sequential scan.
See other answer for the original quote from the manual.
Updates since Postgres 9.0
Everything still basically true in Postgres 13. Most notable changes:
All in favor of index performance. (Sequential scans got faster, too, though.)
- Why is a GiST index used for filtering on non-leading column?
Here are the results of querying a table on the second column of a multicolumn index.
The effects are easy to reproduce for anybody. Try it at home.
I tested with PostgreSQL 9.0.5 on Debian using a medium sized table of a real-life database with 23322 rows. It implements the n:m relationship between the tables
adr (address) and att (attribute), but that's not relevant here. Simplified schema:CREATE TABLE adratt (
adratt_id serial PRIMARY KEY
, adr_id integer NOT NULL
, att_id integer NOT NULL
, log_up timestamp NOT NULL DEFAULT (now()::timestamp)
, CONSTRAINT adratt_uni UNIQUE (adr_id, att_id)
);
The
UNIQUE constraint effectively implements a unique index. I repeated the test with a plain index to be sure and got identical results as expected.CREATE INDEX adratt_idx ON adratt(adr_id, att_id);The table is clustered on the
adratt_uni index and before the test I ran:CLUSTER adratt;
ANALYZE adratt;Sequential scans for queries on
(adr_id, att_id) are as fast as they can possibly be. The multicolumn index can still be used for a query condition on the second index column alone.I ran the queries a couple of times to populate the cache and the picked the best of ten runs to get comparable results.
- Query using both columns
SELECT *
FROM adratt
WHERE att_id = 90
AND adr_id = 10; adratt_id | adr_id | att_id | log_up
-----------+--------+--------+---------------------
123 | 10 | 90 | 2008-07-29 09:35:54
(1 row)
Output of
EXPLAIN ANALYZE:Index Scan using adratt_uni on adratt (cost=0.00..3.48 rows=1 width=20) (actual time=0.022..0.025 rows=1 loops=1)
Index Cond: ((adr_id = 10) AND (att_id = 90))
Total runtime: 0.067 ms
- Query using first column
SELECT * FROM adratt WHERE adr_id = 10; adratt_id | adr_id | att_id | log_up
-----------+--------+--------+---------------------
126 | 10 | 10 | 2008-07-29 09:35:54
125 | 10 | 13 | 2008-07-29 09:35:54
4711 | 10 | 21 | 2008-07-29 09:35:54
29322 | 10 | 22 | 2011-06-06 15:50:38
29321 | 10 | 30 | 2011-06-06 15:47:17
124 | 10 | 62 | 2008-07-29 09:35:54
21913 | 10 | 78 | 2008-07-29 09:35:54
123 | 10 | 90 | 2008-07-29 09:35:54
28352 | 10 | 106 | 2010-11-22 12:37:50
(9 rows)
Output of
EXPLAIN ANALYZE:Index Scan using adratt_uni on adratt (cost=0.00..8.23 rows=9 width=20) (actual time=0.007..0.023 rows=9 loops=1)
Index Cond: (adr_id = 10)
Total runtime: 0.058 ms
- Query using second column
SELECT * FROM adratt WHERE att_id = 90; adratt_id | adr_id | att_id | log_up
-----------+--------+--------+---------------------
123 | 10 | 90 | 2008-07-29 09:35:54
180 | 39 | 90 | 2008-08-29 15:46:07
...
(83 rows)
Output of
EXPLAIN ANALYZE:Index Scan using adratt_uni on adratt (cost=0.00..818.51 rows=83 width=20) (actual time=0.014..0.694 rows=83 loops=1)
Index Cond: (att_id = 90)
Total runtime: 0.849 ms
- Disable indexscan & bitmapscan
SET enable_indexscan = off;
SELECT * FROM adratt WHERE att_id = 90;Output of EXPLAIN ANALYZE:
Bitmap Heap Scan on adratt (cost=779.94..854.74 rows=83 width=20) (actual time=0.558..0.743 rows=83 loops=1)
Recheck Cond: (att_id = 90)
-> Bitmap Index Scan on adratt_uni (cost=0.00..779.86 rows=83 width=0) (actual time=0.544..0.544 rows=83 loops=1)
Index Cond: (att_id = 90)
Total runtime: 0.894 ms
SET enable_bitmapscan = off;
SELECT * FROM adratt WHERE att_id = 90;Output of
EXPLAIN ANALYZE:Seq Scan on adratt (cost=0.00..1323.10 rows=83 width=20) (actual time=0.009..2.429 rows=83 loops=1)
Filter: (att_id = 90)
Total runtime: 2.680 ms
Conclusion
As expected, the multi-column index is used for a query on the second column alone.
As expected, it is less efficient, but the query is still 3x faster than without the index.
After disabling index scans, the query planner chooses a bitmap heap scan, which performs almost as fast. Only after disabling that, too, it falls back to a sequential scan.
See other answer for the original quote from the manual.
Updates since Postgres 9.0
Everything still basically true in Postgres 13. Most notable changes:
- index-only scans in Postgres 9.2
- True covering indexes with the
INCLUDEkeyword in Postgres 11
- Multiple performance and space improvements (especially for multicolumn indexes) in Postgres 12
- Deduplication in Postgres 13.
All in favor of index performance. (Sequential scans got faster, too, though.)
Code Snippets
CREATE INDEX adratt_idx ON adratt(adr_id, att_id);CLUSTER adratt;
ANALYZE adratt;SELECT *
FROM adratt
WHERE att_id = 90
AND adr_id = 10;SELECT * FROM adratt WHERE adr_id = 10;SELECT * FROM adratt WHERE att_id = 90;Context
StackExchange Database Administrators Q#6115, answer score: 101
Revisions (0)
No revisions yet.