patternsqlMinor
PostgreSQL not using index during count(*)
Viewed 0 times
postgresqlcountduringusingindexnot
Problem
I have a
This query takes between 30-60 seconds to run and searches millions of records.
However,
How do I speed up this query and make it use the index?
Edit: My Postgres version is 9.3.3. The table has about 20 million records, divided pretty evenly among each source_id, of which another 5 aren't included in the list.
COUNT(*) query in PostgreSQL that runs often, and looks like:SELECT COUNT(*)
FROM customer
WHERE source_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);This query takes between 30-60 seconds to run and searches millions of records.
EXPLAIN ANALYZE shows it's doing a sequential scan, so I created the index:CREATE INDEX customer_by_source ON customer (source_id)
WHERE source_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16);However,
EXPLAIN ANALYZE still shows it's doing a sequential scan and isn't using the index.How do I speed up this query and make it use the index?
Edit: My Postgres version is 9.3.3. The table has about 20 million records, divided pretty evenly among each source_id, of which another 5 aren't included in the list.
Solution
You have 21 distinct
Test with
And take note of all
Then (only in your session):
And repeat the procedure. Once more after:
This should tell you why Postgres chooses a sequential scan. Typically, if your index is substantially smaller than the table (index entries substantially smaller than table rows and / or substantially fewer rows) and if conditions for an index-only scan are met, Postgres will choose that route - unless your cost settings are seriously out of touch with the reality of your setup.
Most importantly, the visibility map has to show that whole pages are visible to all transactions.
You can try your query once more immediately after running
If index-only scans are not possible, a sequential scan will most probably be faster than a bitmap index scan, meaning that your index would be of no use.
Update for Postgres 9.6
There was an important restriction for partial indexes, improved in Postgres 9.6. The release notes:
Vondra, Kyotaro Horiguchi)
For example, an index defined by
specifies
this was disallowed because
A related restriction (not affecting this case!) is still in place as of Postgres 14, though. The manual:
However, PostgreSQL's planner is currently not very smart about such cases. It considers a query to be potentially executable by index-only scan only when all columns needed by the query are available from the index.
Can impede expression indexes. See:
source_id with roughly 1 million rows each ("divided pretty evenly among each source_id"). So your query counts roughly 3/4 of the whole table. An index can only help in special cases.Test with
EXPLAIN (just EXPLAIN is enough for this purpose) to see the estimated cost for each variant:EXPLAIN SELECT ... -- your queryAnd take note of all
cost= numbers in the output.Then (only in your session):
SET enable_seqscan = off;And repeat the procedure. Once more after:
SET enable_indexscan = off;This should tell you why Postgres chooses a sequential scan. Typically, if your index is substantially smaller than the table (index entries substantially smaller than table rows and / or substantially fewer rows) and if conditions for an index-only scan are met, Postgres will choose that route - unless your cost settings are seriously out of touch with the reality of your setup.
Most importantly, the visibility map has to show that whole pages are visible to all transactions.
VACUUM updates the visibility map after writes. Read details on the linked Wiki page.You can try your query once more immediately after running
VACUUM ANALYZE customer;.If index-only scans are not possible, a sequential scan will most probably be faster than a bitmap index scan, meaning that your index would be of no use.
Update for Postgres 9.6
There was an important restriction for partial indexes, improved in Postgres 9.6. The release notes:
- Allow use of an index-only scan on a partial index when the index's
WHEREclause references columns that are not indexed (Tomas
Vondra, Kyotaro Horiguchi)
For example, an index defined by
CREATE INDEX tidx_partial ON t(b) WHERE a > 0 can now be used for an index-only scan by a query thatspecifies
WHERE a > 0 and does not otherwise use a. Previouslythis was disallowed because
a is not listed as an index column.A related restriction (not affecting this case!) is still in place as of Postgres 14, though. The manual:
However, PostgreSQL's planner is currently not very smart about such cases. It considers a query to be potentially executable by index-only scan only when all columns needed by the query are available from the index.
Can impede expression indexes. See:
- Query on json / jsonb column super slow. Can I use an index?
Code Snippets
SET enable_seqscan = off;SET enable_indexscan = off;Context
StackExchange Database Administrators Q#126997, answer score: 9
Revisions (0)
No revisions yet.