patternsqlMinor
Creating multicolumn index for WHERE and ORDER BY
Viewed 0 times
ordercreatingwheremulticolumnforandindex
Problem
I am trying to create an index to be used in WHERE and ORDER BY clauses at the same time.
Reading Postgres 14 documentation (11.4. Indexes and ORDER BY - https://www.postgresql.org/docs/14/indexes-ordering.html) led me to believe that:
In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honoured without a separate sorting step.
Wow, sound awesome, let's try it! I created a test table, an index that includes both WHERE and ORDER BY columns and populated it with data:
Here is an example of the data. Since the
```
Sort (cost=108.49..108.75 rows=106 width=42) (actual time=2.194..3.555 rows=106 loops=1)
Sort Key: answer_date, question_date"
Sort Method: quicksort Memory: 33kB
-> Bitmap Heap Scan on testdata (cost=5.11..104.92 rows=106 width=42) (actual time=0.057..1.1
Reading Postgres 14 documentation (11.4. Indexes and ORDER BY - https://www.postgresql.org/docs/14/indexes-ordering.html) led me to believe that:
In addition to simply finding the rows to be returned by a query, an index may be able to deliver them in a specific sorted order. This allows a query's ORDER BY specification to be honoured without a separate sorting step.
Wow, sound awesome, let's try it! I created a test table, an index that includes both WHERE and ORDER BY columns and populated it with data:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata
(
question_id TEXT NOT NULL UNIQUE PRIMARY KEY,
answerer_id TEXT NOT NULL,
question_date TIMESTAMPTZ NOT NULL,
answer_date TIMESTAMPTZ NOT NULL
);
DROP INDEX IF EXISTS idx1;
CREATE INDEX idx1 ON testdata (answerer_id, answer_date, question_date);
TRUNCATE testdata;
INSERT INTO testdata(question_id, answerer_id, question_date, answer_date)
SELECT CONCAT('question_', LPAD(i::TEXT, 4, '0')),
CONCAT('answerer_', LPAD(FLOOR(RANDOM() * (99 - 1 + 1) + 1)::TEXT, 2, '0')),
TIMESTAMPTZ '2021-01-01' + RANDOM() * INTERVAL '365 days',
TIMESTAMPTZ '2022-01-01' + RANDOM() * INTERVAL '365 days'
FROM GENERATE_SERIES(1, 9999) AS t(i);
VACUUM (FULL, ANALYZE) testdata;
EXPLAIN ANALYSE
SELECT *
FROM testdata
WHERE answerer_id = 'answerer_09'
ORDER BY answer_date,
question_date;
Here is an example of the data. Since the
answerer_id is a random number from 1 to 99, then ~100 rows out of 10K rows (~10% of all rows) should be returned for this query:EXPLAIN ANALYSE of the query gives me the following:```
Sort (cost=108.49..108.75 rows=106 width=42) (actual time=2.194..3.555 rows=106 loops=1)
Sort Key: answer_date, question_date"
Sort Method: quicksort Memory: 33kB
-> Bitmap Heap Scan on testdata (cost=5.11..104.92 rows=106 width=42) (actual time=0.057..1.1
Solution
For ~ 10 % of all rows it's typically not efficient to run an index scan. (Many factors in play here ...) What you see is a bitmap index scan. Why? See:
Bitmap index scans cannot carry over the index sort order into the result. So a final sort step is required.
You can "disable" alternative query plans to "force" your index scan (only for testing purposes!):
Or you can lower the expected cost for random access with:
Or you can
Either of these can convince the query planner to switch to an index scan with no additional sort step:
db<>fiddle here
For your test case with only few rows and a mildly selective predicate, it's hard to tell whether sequential scan, bitmap index scan or index scan will be faster. Tests with bigger tables are more revealing.
Either way, the query planner strictly bases its decision on the estimated
- Postgres not using index when index scan is much better option
Bitmap index scans cannot carry over the index sort order into the result. So a final sort step is required.
You can "disable" alternative query plans to "force" your index scan (only for testing purposes!):
SET enable_bitmapscan = off;
SET enable_seqscan = off;Or you can lower the expected cost for random access with:
SET random_page_cost = 1; -- or similarOr you can
LIMIT to only few result rows like you added.Either of these can convince the query planner to switch to an index scan with no additional sort step:
Index Scan using idx1 on testdata (cost=0.29..274.08 rows=104 width=42) (actual time=0.014..0.050 rows=104 loops=1)
Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.052 ms
Execution Time: 0.064 ms
db<>fiddle here
For your test case with only few rows and a mildly selective predicate, it's hard to tell whether sequential scan, bitmap index scan or index scan will be faster. Tests with bigger tables are more revealing.
Either way, the query planner strictly bases its decision on the estimated
cost (Setting SET enable_seqscan = off just makes sequential scans seem very expensive.) The plan expected to be cheapest wins. Table and column statistics, server configuration and cost settings should be as valid as possible to get valid estimates - and good query plans.Code Snippets
SET enable_bitmapscan = off;
SET enable_seqscan = off;SET random_page_cost = 1; -- or similarContext
StackExchange Database Administrators Q#302153, answer score: 7
Revisions (0)
No revisions yet.