HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMajor

Index not used with = ANY() but used with IN

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withusedanybutindexnot

Problem

Table t has two indexes:

create table t (a int, b int);
create type int_pair as (a int, b int);
create index t_row_idx on t (((a,b)::int_pair));
create index t_a_b_idx on t (a,b);

insert into t (a,b)
select i, i
from generate_series(1, 100000) g(i)
;


No index is used with the ANY operator:

explain analyze
select *
from t
where (a,b) = any(array[(1,1),(1,2)])
;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..1693.00 rows=1000 width=8) (actual time=0.042..126.789 rows=1 loops=1)
   Filter: (ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
   Rows Removed by Filter: 99999
 Planning time: 0.122 ms
 Execution time: 126.836 ms


But one of them is used with the IN operator:

explain analyze
select *
from t
where (a,b) in ((1,1),(1,2))
;
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Only Scan using t_a_b_idx on t  (cost=0.29..8.32 rows=1 width=8) (actual time=0.028..0.029 rows=1 loops=1)
   Index Cond: (a = 1)
   Filter: ((b = 1) OR (b = 2))
   Heap Fetches: 1
 Planning time: 0.161 ms
 Execution time: 0.066 ms


It uses the record index if the record is cast to the correct type:

```
explain analyze
select *
from t
where (a,b)::int_pair = any(array[row(1,1),row(1,2)])
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Scan using t_row_idx on t (cost=0.42..12.87 rows=2 width=8) (actual time=0.106..0.126 rows=1 loops=1)
Index Cond: (ROW(a, b)::int_pair = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
Planning ti

Solution

Internally, there are two separate forms of IN, and also two separate forms of the ANY construct.

One of each, taking a set, is equivalent to the other and expr IN () also leads to the same query plan as expr = ANY() that can use a plain index. Details:

  • IN vs ANY operator in PostgreSQL



Consequently, the following two queries are equivalent and both can use the plain index t_a_b_idx (which can also be the solution if you are trying to get your query to use the index):

EXPLAIN ANALYZE
SELECT *
FROM t
WHERE (a,b) = ANY(VALUES (1,1),(1,2));


Or:

...
WHERE (a,b) IN (VALUES (1,1),(1,2));


Identical for both:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.33..16.71 rows=1 width=8) (actual time=0.101..0.101 rows=0 loops=1)
   ->  Unique  (cost=0.04..0.05 rows=2 width=8) (actual time=0.068..0.070 rows=2 loops=1)
         ->  Sort  (cost=0.04..0.04 rows=2 width=8) (actual time=0.067..0.068 rows=2 loops=1)
               Sort Key: "*VALUES*".column1, "*VALUES*".column2
               Sort Method: quicksort  Memory: 25kB
               ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=8) (actual time=0.005..0.005 rows=2 loops=1)
   ->  Index Only Scan using t_plain_idx on t  (cost=0.29..8.32 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=2)
         Index Cond: ((a = "*VALUES*".column1) AND (b = "*VALUES*".column2))
         Heap Fetches: 0
 Planning time: 4.080 ms
 Execution time: 0.202 ms


However, this cannot easily be passed to a function, since there are no "table variables" in Postgres. Which leads to the problem that started this topic:

  • Pass multiple sets or arrays of values to a function



There are various workarounds for that problem. One being the alternative answer I added there. Some others:

  • Insert values from a record variable into a table



  • Declare variable of table type in PL/pgSQL



The second form of each is different: ANY takes an actual array, while IN takes a comma separated list of values.

This has different consequences for typing the input. As we can see in the EXPLAIN output of the question, this form:

WHERE (a,b) = ANY(ARRAY[(1,1),(1,2)]);


is seen as shorthand for:

ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)])


And actual ROW values are compared. Postgres is not currently smart enough to see that the index on the composite type t_row_idx is applicable. Nor does it realize that the simple index t_a_b_idx should be applicable as well.

An explicit cast helps to overcome this lack of smarts:

WHERE (a,b)::int_pair = ANY(ARRAY[(1,1),(1,2)]::int_pair[]);


Casting the right operand (::int_pair[]) is optional (though preferable for performance and to avoid ambiguities). Once the left operand has a well-known type, the right operand is coerced from "anonymous record" to a matching type. Only then, the operator is defined unambiguously. And Postgres picks applicable indexes based on the operator and the left operand. For many operators that define a COMMUTATOR, the query planner can flip operands to bring the indexed expression to the left. But that's not possible with the ANY construct.

Related:

-
Is there a way to usefully index a text column containing regex patterns?

WHERE (a,b) IN ((1,1),(1,2));


.. values are taken as elements and Postgres is able to compare individual integer values as we can see in the EXPLAIN output once more:

Filter: ((b = 1) OR (b = 2))


Hence Postgres finds that the simple index t_a_b_idx can be used.

Consequently, there would be another solution for the particular case in the example: since the custom composite type int_pair in the example happens to be equivalent to the row type of the table t itself, we could simplify:

CREATE INDEX t_row_idx2 ON t ((t.*));


Shorter, equivalent syntax:

CREATE INDEX t_row_idx2 ON t ((t));


But the first variant is safer. The second variant would resolve to the column if a column of the same name should exist.

Then this query would use the index without any more explicit casting:

EXPLAIN ANALYZE
SELECT *
FROM   t
WHERE  t = ANY(ARRAY[(1,1),(1,2)]);


QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=40.59..496.08 rows=1000 width=8) (actual time=0.19
1..0.191 rows=0 loops=1)
   Recheck Cond: (t.* = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
   ->  Bitmap Index Scan on t_row_idx2  (cost=0.00..40.34 rows=1000 width=0) (actual time=0.188..0.188 rows=0 loops=1)
         Index Cond: (t.* = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
 Planning time: 2.575 ms
 Execution time: 0.267 ms


But typical use cases won't be able to utilize the implicitly existing type of the table row.

Code Snippets

EXPLAIN ANALYZE
SELECT *
FROM t
WHERE (a,b) = ANY(VALUES (1,1),(1,2));
...
WHERE (a,b) IN (VALUES (1,1),(1,2));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.33..16.71 rows=1 width=8) (actual time=0.101..0.101 rows=0 loops=1)
   ->  Unique  (cost=0.04..0.05 rows=2 width=8) (actual time=0.068..0.070 rows=2 loops=1)
         ->  Sort  (cost=0.04..0.04 rows=2 width=8) (actual time=0.067..0.068 rows=2 loops=1)
               Sort Key: "*VALUES*".column1, "*VALUES*".column2
               Sort Method: quicksort  Memory: 25kB
               ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=8) (actual time=0.005..0.005 rows=2 loops=1)
   ->  Index Only Scan using t_plain_idx on t  (cost=0.29..8.32 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=2)
         Index Cond: ((a = "*VALUES*".column1) AND (b = "*VALUES*".column2))
         Heap Fetches: 0
 Planning time: 4.080 ms
 Execution time: 0.202 ms
WHERE (a,b) = ANY(ARRAY[(1,1),(1,2)]);
ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)])

Context

StackExchange Database Administrators Q#125413, answer score: 27

Revisions (0)

No revisions yet.