patternsqlMinor
Postgres 9.4.4 query takes forever
Viewed 0 times
querytakesforeverpostgres
Problem
We're running Postgres 9.4.4 on CentOS 6.5 and have a SELECT query that has worked for years, but stopped working and hangs after we upgraded from 9.2 (it took a while to notice it, so I don't know if it was immediately after we upgraded or not).
In all tables
The sub-query for table_two returns about 2.5 million records. The sub-query for
I've seen others online with the same problem (query not returning when using
We have plenty of hardware (384 GB RAM, Xeon 64 cores, 16 disk 15k RPM RAID 10).
Here are the results of
```
QUERY PLAN
Index Only Scan using table_one_id_pk on table_one (cost=19690.90..64045129699.10 rows=370064 width=9)
Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))
SubPlan 2
-> Bitmap Heap Scan on table_three (cost=2446.92..19686.74 rows=8159 width=7)
Recheck Cond: (("timestamp" > (now() - '30 days'::interval)) AND (client_id > 0))
-> BitmapAnd (cost=2446.92..2446.92 rows=8159 width=0)
-> Bitmap Index Scan on table_one_timestamp_idx (cost=0.00..1040.00 rows=79941 width=0)
Index Cond: ("timestamp" > (now() - '30 days'::interval))
SELECT id || ':' || group_number AS uniq_id
FROM table_one
WHERE id || ':' || group_number NOT IN (
SELECT id || ':' || group_number
FROM table_two
)
AND id NOT IN (
SELECT id
FROM table_three
WHERE timestamp > NOW() - INTERVAL '30 days'
AND client_id > 0
);In all tables
id is an integer, but is stored as character varying (15) (legacy system). group_number is stored as a smallint.The sub-query for table_two returns about 2.5 million records. The sub-query for
table_three returns about 2,500 records. Both return in about 1 second if run separately. But adding in either query (or both) as sub-queries causes the query to just hang indefinitely (for days, if we let it run).I've seen others online with the same problem (query not returning when using
NOT IN). NOT IN seems like such a straight forward sub-query.We have plenty of hardware (384 GB RAM, Xeon 64 cores, 16 disk 15k RPM RAID 10).
- Why is this happening? (ie. is this a major ongoing bug in Postgres?)
- How can I fix / debug it in the meantime?
Here are the results of
EXPLAIN:```
QUERY PLAN
Index Only Scan using table_one_id_pk on table_one (cost=19690.90..64045129699.10 rows=370064 width=9)
Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1)))
SubPlan 2
-> Bitmap Heap Scan on table_three (cost=2446.92..19686.74 rows=8159 width=7)
Recheck Cond: (("timestamp" > (now() - '30 days'::interval)) AND (client_id > 0))
-> BitmapAnd (cost=2446.92..2446.92 rows=8159 width=0)
-> Bitmap Index Scan on table_one_timestamp_idx (cost=0.00..1040.00 rows=79941 width=0)
Index Cond: ("timestamp" > (now() - '30 days'::interval))
Solution
Assuming you checked off usual suspects in the wiki page as commented by @a_horse.
Also see the spin-off addressing the discussion of bitmap index scans and the size of
Query
This rewritten query should be substantially faster:
-
The most important problem is comparing a concatenated string between
-
Storing integer numbers as strings is expensive nonsense. You seem to be aware of that. Convert to
And possibly the same for
-
Indexes
Either way, key to performance are matching indexes.
Be sure to have multicolumn index on
Possibly allows index-only scans.
With
And I suggest a partial multicolumn index on
Usefulness deteriorates over time. Recreate the index with increased lower bound at opportune moments - which takes an exclusive lock on the table, so consider
You need to match the (updated) index condition in your queries. Add the condition even if that seems redundant. Like:
You can use a function as pseudo-constant in partial index and query and automate the process. Last chapter in this related answer:
Like you found yourself, increasing
With all suggested improvements in place you may not need to increase
Also see the spin-off addressing the discussion of bitmap index scans and the size of
work_mem.- "Recheck Cond:" line in query plans with a bitmap index scan
Query
This rewritten query should be substantially faster:
SELECT id || ':' || group_number AS uniq_id
-- id::text || ':' || group_number AS uniq_id -- with integer
FROM table_one t1
WHERE NOT EXISTS (
SELECT 1
FROM table_two t2
WHERE t2.id = t1.id
AND t2.group_number = t1.group_number
)
AND NOT EXISTS (
SELECT 1
FROM table_three t3
WHERE t3.timestamp > NOW() - interval '30 days'
AND t3.client_id > 0
AND t3.id = t1.id
);-
The most important problem is comparing a concatenated string between
table_one and table_two, which is generally more expensive than necessary and specifically not sargable.-
Storing integer numbers as strings is expensive nonsense. You seem to be aware of that. Convert to
integer if at all possible. If you only have valid integer numbers in the varchar column id, all you need to do is:ALTER TABLE table_one ALTER COLUMN id TYPE integer USING id::int;And possibly the same for
table_two.-
NOT IN carries a trap for NULL values on either side. That's why NOT EXISTS is almost always better. (Typically performs better on top of that.)- Select rows which are not present in other table
Indexes
Either way, key to performance are matching indexes.
Be sure to have multicolumn index on
table_one and table_two:CREATE INDEX t1_foo_idx ON table_one (id, group_number)
CREATE INDEX t2_foo_idx ON table_two (id, group_number)Possibly allows index-only scans.
With
integer instead of varchar, these would be smaller and more efficient, yet:- Is a composite index also good for queries on the first field?
And I suggest a partial multicolumn index on
table_three:CREATE INDEX t3_foo_idx ON table_three (timestamp, id)
WHERE client_id > 0
AND timestamp > '2015-06-07 0:0';Usefulness deteriorates over time. Recreate the index with increased lower bound at opportune moments - which takes an exclusive lock on the table, so consider
CREATE INDEX CONCURRENTLY. Detailed explanation:- Index optimization with dates
You need to match the (updated) index condition in your queries. Add the condition even if that seems redundant. Like:
...
AND NOT EXISTS (
SELECT 1
FROM table_three t3
WHERE t3.timestamp > NOW() - interval '30 days'
AND t3 timestamp > '2015-06-07 0:0' -- match index condition
AND t3.client_id > 0
AND t3.id = t1.id
);You can use a function as pseudo-constant in partial index and query and automate the process. Last chapter in this related answer:
- Get latest child per parent from big table - query is too slow
SET LOCALLike you found yourself, increasing
work_mem locally for the query helps if the query needs that much RAM. Consider SET LOCAL:- work_mem in postgresql on Linux
- Speed up creation of Postgres partial index
With all suggested improvements in place you may not need to increase
work_mem any more.Code Snippets
SELECT id || ':' || group_number AS uniq_id
-- id::text || ':' || group_number AS uniq_id -- with integer
FROM table_one t1
WHERE NOT EXISTS (
SELECT 1
FROM table_two t2
WHERE t2.id = t1.id
AND t2.group_number = t1.group_number
)
AND NOT EXISTS (
SELECT 1
FROM table_three t3
WHERE t3.timestamp > NOW() - interval '30 days'
AND t3.client_id > 0
AND t3.id = t1.id
);ALTER TABLE table_one ALTER COLUMN id TYPE integer USING id::int;CREATE INDEX t1_foo_idx ON table_one (id, group_number)
CREATE INDEX t2_foo_idx ON table_two (id, group_number)CREATE INDEX t3_foo_idx ON table_three (timestamp, id)
WHERE client_id > 0
AND timestamp > '2015-06-07 0:0';...
AND NOT EXISTS (
SELECT 1
FROM table_three t3
WHERE t3.timestamp > NOW() - interval '30 days'
AND t3 timestamp > '2015-06-07 0:0' -- match index condition
AND t3.client_id > 0
AND t3.id = t1.id
);Context
StackExchange Database Administrators Q#106126, answer score: 8
Revisions (0)
No revisions yet.