patternsqlMinor
PostgreSQL: ANY (VALUES(...)) in WHERE clause causes drastic slowdown
Viewed 0 times
postgresqlanywhereslowdowndrasticvaluesclausecauses
Problem
Okay, I previously asked a question regarding a large dataset but it was never answered, so I decided to cut it down and ask about a smaller subset of the previous setup and simplify what I am trying to accomplish in a new question - hoping this will be a bit clearer.
I have a single large table (
As you can see, I have several indexes (not all relevant to this question) and have
Yet a simple query like this:
`SELECT drug, reason FROM report_drugs WHERE drug = ANY(VALUES (9557), (17848),
(17880), (18223), (18550), (19020), (19084), (19234), (21295), (21742),
(23085), (26017), (27016), (29317), (33566), (35818), (37394), (39971),
(41505), (42162), (44000), (45168), (47386), (48848), (51472), (51570),
(51802), (52489), (52848), (53663), (54591), (55506), (55922), (57209),
(57671), (59311), (62022), (62532), (63485), (64134), (66236), (67394),
(67586), (68134), (68934), (70035), (70589), (70896), (73466), (75931),
(78686), (78985), (79217), (83294), (83619), (84964), (85831), (88330),
(899
I have a single large table (
report_drugs) that is 1775 MB on disk containing little over 33 million rows. The table layout: Column | Type | Modifiers
---------------+-----------------------------+-----------
rid | integer | not null
drug | integer | not null
created | timestamp without time zone |
reason | text |
duration | integer |
drugseq | integer |
effectiveness | integer |
Indexes:
"report_drugs_drug_idx" btree (drug) CLUSTER
"report_drugs_drug_rid_idx" btree (drug, rid)
"report_drugs_reason_idx" btree (reason)
"report_drugs_reason_rid_idx" btree (reason, rid)
"report_drugs_rid_idx" btree (rid)
As you can see, I have several indexes (not all relevant to this question) and have
CLUSTERed the table on the drug column index as this is mostly used to scope on. The table is also VACUUM ANALYZEd automatically and manually by me before getting any metrics.Yet a simple query like this:
`SELECT drug, reason FROM report_drugs WHERE drug = ANY(VALUES (9557), (17848),
(17880), (18223), (18550), (19020), (19084), (19234), (21295), (21742),
(23085), (26017), (27016), (29317), (33566), (35818), (37394), (39971),
(41505), (42162), (44000), (45168), (47386), (48848), (51472), (51570),
(51802), (52489), (52848), (53663), (54591), (55506), (55922), (57209),
(57671), (59311), (62022), (62532), (63485), (64134), (66236), (67394),
(67586), (68134), (68934), (70035), (70589), (70896), (73466), (75931),
(78686), (78985), (79217), (83294), (83619), (84964), (85831), (88330),
(899
Solution
You're just comparing against a list of values, using IN would be simpler in this case:
Alternatively, if you still use ANY, using an array literal results in the same query plan as IN for me:
I tried this is a smaller test table, and Postgres was able to do an index scan for the version of the query using IN and ANY with an array literal, but not for the query using ANY with VALUES.
The resulting plan is something like the following (but my test table and data is somewhat different):
This should be much faster than the query plan you showed as it scans the index once, while your plan loops as many times as you have drugs in that WHERE clause.
SELECT drug, reason FROM drugs WHERE drug IN (9557,17848,17880,18223,18550);Alternatively, if you still use ANY, using an array literal results in the same query plan as IN for me:
SELECT drug, reason FROM drugs WHERE drug = ANY ('{9557,17848,17880,18223,18550}');I tried this is a smaller test table, and Postgres was able to do an index scan for the version of the query using IN and ANY with an array literal, but not for the query using ANY with VALUES.
The resulting plan is something like the following (but my test table and data is somewhat different):
Index Scan using test_data_id_idx on test_data (cost=0.43..57.43 rows=12 width=8) (actual time=0.014..0.028 rows=12 loops=1)
Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))This should be much faster than the query plan you showed as it scans the index once, while your plan loops as many times as you have drugs in that WHERE clause.
Code Snippets
SELECT drug, reason FROM drugs WHERE drug IN (9557,17848,17880,18223,18550);SELECT drug, reason FROM drugs WHERE drug = ANY ('{9557,17848,17880,18223,18550}');Index Scan using test_data_id_idx on test_data (cost=0.43..57.43 rows=12 width=8) (actual time=0.014..0.028 rows=12 loops=1)
Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))Context
StackExchange Database Administrators Q#171815, answer score: 3
Revisions (0)
No revisions yet.