snippetsqlMinor
How to optimize IN query on indexed column
Viewed 0 times
columnqueryindexedoptimizehow
Problem
I have a table with over 50M records. One of the fields is
I'm noticing that when I run the query below, the execution time is higher
However, the execution time is faster with
Query plan for
Query plan for
```
explain analyze SELECT count(total_amount) FROM mytable
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
COLOR_CODE. I've set an index on the column COLOR_CODE like this:"mytable_colorcode_idx" btree (color_code)I'm noticing that when I run the query below, the execution time is higher
SELECT count(total_amount) FROM mytable
WHERE color_code in ('red','green') and sale_date = '1970'However, the execution time is faster with
OR clause:SELECT count(total_amount) FROM mytable
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'Query plan for
IN explain analyze SELECT count(total_amount) FROM mytable
WHERE color_code in ('red','green') and sale_date = '1970'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
-> Bitmap Heap Scan on mytable (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Rows Removed by Index Recheck: 5067635
Filter: (sale_date = 1970)
Heap Blocks: exact=38679 lossy=496680
-> Bitmap Index Scan on mytable_colorcode_idx (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
Planning time: 0.165 ms
Execution time: 63524.100 ms
(10 rows)Query plan for
OR ```
explain analyze SELECT count(total_amount) FROM mytable
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Solution
You can't compare performance of:
with:
because they are not logically equivalent (will return different results). A trivial example:
However:
In short
For the comparison to be meaningful, you need to change your query to:
My guess is that you won't see much difference performance-wise with that and your original expression.
That said, I would suggest an index like:
WHERE color_code in ('red','green') and sale_date = '1970'with:
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'because they are not logically equivalent (will return different results). A trivial example:
with T (color_code, sale_date) as (
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';
color_code | sale_date
------------+-----------
red | 1970
(1 row)However:
with T (color_code, sale_date) as (
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';
color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)In short
AND has higher precedence than OR so your optimized expression A OR B AND C is evaluated as A OR (B AND C). Your original expression is evaluated as (A OR B) AND C. For the comparison to be meaningful, you need to change your query to:
select * from T
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';My guess is that you won't see much difference performance-wise with that and your original expression.
That said, I would suggest an index like:
CREATE INDEX ... ON ... (sale_date, color_code)Code Snippets
WHERE color_code in ('red','green') and sale_date = '1970'WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'with T (color_code, sale_date) as (
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code in ('green', 'red')
and sale_date = '1970';
color_code | sale_date
------------+-----------
red | 1970
(1 row)with T (color_code, sale_date) as (
values ('red', '1970'), ('green','1969')
)
select * from T
where color_code = 'green' or color_code = 'red'
and sale_date = '1970';
color_code | sale_date
------------+-----------
red | 1970
green | 1969
(2 rows)select * from T
where (color_code = 'green' or color_code = 'red')
and sale_date = '1970';Context
StackExchange Database Administrators Q#243790, answer score: 5
Revisions (0)
No revisions yet.