HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to optimize IN query on indexed column

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
columnqueryindexedoptimizehow

Problem

I have a table with over 50M records. One of the fields is COLOR_CODE. I've set an index on the column COLOR_CODE like this:

"mytable_colorcode_idx" btree (color_code)


I'm noticing that when I run the query below, the execution time is higher

SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'


However, the execution time is faster with OR clause:

SELECT count(total_amount) FROM mytable 
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


Query plan for IN

explain analyze SELECT count(total_amount) FROM mytable 
WHERE color_code in ('red','green') and sale_date = '1970'
                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2074238.07..2074238.08 rows=1 width=8) (actual time=63520.150..63520.150 rows=1 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=53504.73..2069923.27 rows=1725919 width=6) (actual time=3509.920..63080.519 rows=1727037 loops=1)
         Recheck Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
         Rows Removed by Index Recheck: 5067635
         Filter: (sale_date = 1970)
         Heap Blocks: exact=38679 lossy=496680
         ->  Bitmap Index Scan on mytable_colorcode_idx  (cost=0.00..53073.26 rows=1725919 width=0) (actual time=3501.777..3501.777 rows=1727037 loops=1)
               Index Cond: ((color_code)::text = ANY ('{red,green}'::text[]))
 Planning time: 0.165 ms
 Execution time: 63524.100 ms
(10 rows)


Query plan for OR

```
explain analyze SELECT count(total_amount) FROM mytable
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Solution

You can't compare performance of:

WHERE color_code in ('red','green') and sale_date = '1970'


with:

WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'


because they are not logically equivalent (will return different results). A trivial example:

with T (color_code, sale_date) as ( 
     values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
   and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red        | 1970
(1 row)


However:

with T (color_code, sale_date) as ( 
    values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
  and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red        | 1970
 green      | 1969
(2 rows)


In short AND has higher precedence than OR so your optimized expression A OR B AND C is evaluated as A OR (B AND C). Your original expression is evaluated as (A OR B) AND C.

For the comparison to be meaningful, you need to change your query to:

select * from T 
where (color_code = 'green' or color_code = 'red') 
  and sale_date = '1970';


My guess is that you won't see much difference performance-wise with that and your original expression.

That said, I would suggest an index like:

CREATE INDEX ... ON ... (sale_date, color_code)

Code Snippets

WHERE color_code in ('red','green') and sale_date = '1970'
WHERE color_code = 'red' or color_code = 'green' and sale_date = '1970'
with T (color_code, sale_date) as ( 
     values ('red', '1970'), ('green','1969')
 ) 
 select * from T 
 where color_code in ('green', 'red') 
   and sale_date = '1970';

 color_code | sale_date 
------------+-----------
 red        | 1970
(1 row)
with T (color_code, sale_date) as ( 
    values ('red', '1970'), ('green','1969')
) 
select * from T 
where color_code = 'green' or color_code = 'red' 
  and sale_date = '1970';

color_code | sale_date 
------------+-----------
 red        | 1970
 green      | 1969
(2 rows)
select * from T 
where (color_code = 'green' or color_code = 'red') 
  and sale_date = '1970';

Context

StackExchange Database Administrators Q#243790, answer score: 5

Revisions (0)

No revisions yet.