patternsqlMajor
Index not used with = ANY() but used with IN
Viewed 0 times
withusedanybutindexnot
Problem
Table
No index is used with the
But one of them is used with the
It uses the record index if the record is cast to the correct type:
```
explain analyze
select *
from t
where (a,b)::int_pair = any(array[row(1,1),row(1,2)])
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Scan using t_row_idx on t (cost=0.42..12.87 rows=2 width=8) (actual time=0.106..0.126 rows=1 loops=1)
Index Cond: (ROW(a, b)::int_pair = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
Planning ti
t has two indexes:create table t (a int, b int);
create type int_pair as (a int, b int);
create index t_row_idx on t (((a,b)::int_pair));
create index t_a_b_idx on t (a,b);
insert into t (a,b)
select i, i
from generate_series(1, 100000) g(i)
;No index is used with the
ANY operator:explain analyze
select *
from t
where (a,b) = any(array[(1,1),(1,2)])
;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..1693.00 rows=1000 width=8) (actual time=0.042..126.789 rows=1 loops=1)
Filter: (ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
Rows Removed by Filter: 99999
Planning time: 0.122 ms
Execution time: 126.836 msBut one of them is used with the
IN operator:explain analyze
select *
from t
where (a,b) in ((1,1),(1,2))
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_a_b_idx on t (cost=0.29..8.32 rows=1 width=8) (actual time=0.028..0.029 rows=1 loops=1)
Index Cond: (a = 1)
Filter: ((b = 1) OR (b = 2))
Heap Fetches: 1
Planning time: 0.161 ms
Execution time: 0.066 msIt uses the record index if the record is cast to the correct type:
```
explain analyze
select *
from t
where (a,b)::int_pair = any(array[row(1,1),row(1,2)])
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Index Scan using t_row_idx on t (cost=0.42..12.87 rows=2 width=8) (actual time=0.106..0.126 rows=1 loops=1)
Index Cond: (ROW(a, b)::int_pair = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
Planning ti
Solution
Internally, there are two separate forms of
One of each, taking a set, is equivalent to the other and
Consequently, the following two queries are equivalent and both can use the plain index
Or:
Identical for both:
However, this cannot easily be passed to a function, since there are no "table variables" in Postgres. Which leads to the problem that started this topic:
There are various workarounds for that problem. One being the alternative answer I added there. Some others:
The second form of each is different:
This has different consequences for typing the input. As we can see in the
is seen as shorthand for:
And actual ROW values are compared. Postgres is not currently smart enough to see that the index on the composite type
An explicit cast helps to overcome this lack of smarts:
Casting the right operand (
Related:
-
Is there a way to usefully index a text column containing regex patterns?
.. values are taken as elements and Postgres is able to compare individual integer values as we can see in the
Hence Postgres finds that the simple index
Consequently, there would be another solution for the particular case in the example: since the custom composite type
Shorter, equivalent syntax:
But the first variant is safer. The second variant would resolve to the column if a column of the same name should exist.
Then this query would use the index without any more explicit casting:
But typical use cases won't be able to utilize the implicitly existing type of the table row.
IN, and also two separate forms of the ANY construct.One of each, taking a set, is equivalent to the other and
expr IN () also leads to the same query plan as expr = ANY() that can use a plain index. Details:- IN vs ANY operator in PostgreSQL
Consequently, the following two queries are equivalent and both can use the plain index
t_a_b_idx (which can also be the solution if you are trying to get your query to use the index):EXPLAIN ANALYZE
SELECT *
FROM t
WHERE (a,b) = ANY(VALUES (1,1),(1,2));Or:
...
WHERE (a,b) IN (VALUES (1,1),(1,2));Identical for both:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.33..16.71 rows=1 width=8) (actual time=0.101..0.101 rows=0 loops=1)
-> Unique (cost=0.04..0.05 rows=2 width=8) (actual time=0.068..0.070 rows=2 loops=1)
-> Sort (cost=0.04..0.04 rows=2 width=8) (actual time=0.067..0.068 rows=2 loops=1)
Sort Key: "*VALUES*".column1, "*VALUES*".column2
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=8) (actual time=0.005..0.005 rows=2 loops=1)
-> Index Only Scan using t_plain_idx on t (cost=0.29..8.32 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=2)
Index Cond: ((a = "*VALUES*".column1) AND (b = "*VALUES*".column2))
Heap Fetches: 0
Planning time: 4.080 ms
Execution time: 0.202 msHowever, this cannot easily be passed to a function, since there are no "table variables" in Postgres. Which leads to the problem that started this topic:
- Pass multiple sets or arrays of values to a function
There are various workarounds for that problem. One being the alternative answer I added there. Some others:
- Insert values from a record variable into a table
- Declare variable of table type in PL/pgSQL
The second form of each is different:
ANY takes an actual array, while IN takes a comma separated list of values.This has different consequences for typing the input. As we can see in the
EXPLAIN output of the question, this form:WHERE (a,b) = ANY(ARRAY[(1,1),(1,2)]);is seen as shorthand for:
ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)])And actual ROW values are compared. Postgres is not currently smart enough to see that the index on the composite type
t_row_idx is applicable. Nor does it realize that the simple index t_a_b_idx should be applicable as well.An explicit cast helps to overcome this lack of smarts:
WHERE (a,b)::int_pair = ANY(ARRAY[(1,1),(1,2)]::int_pair[]);Casting the right operand (
::int_pair[]) is optional (though preferable for performance and to avoid ambiguities). Once the left operand has a well-known type, the right operand is coerced from "anonymous record" to a matching type. Only then, the operator is defined unambiguously. And Postgres picks applicable indexes based on the operator and the left operand. For many operators that define a COMMUTATOR, the query planner can flip operands to bring the indexed expression to the left. But that's not possible with the ANY construct.Related:
-
Is there a way to usefully index a text column containing regex patterns?
WHERE (a,b) IN ((1,1),(1,2));.. values are taken as elements and Postgres is able to compare individual integer values as we can see in the
EXPLAIN output once more:Filter: ((b = 1) OR (b = 2))Hence Postgres finds that the simple index
t_a_b_idx can be used.Consequently, there would be another solution for the particular case in the example: since the custom composite type
int_pair in the example happens to be equivalent to the row type of the table t itself, we could simplify:CREATE INDEX t_row_idx2 ON t ((t.*));Shorter, equivalent syntax:
CREATE INDEX t_row_idx2 ON t ((t));But the first variant is safer. The second variant would resolve to the column if a column of the same name should exist.
Then this query would use the index without any more explicit casting:
EXPLAIN ANALYZE
SELECT *
FROM t
WHERE t = ANY(ARRAY[(1,1),(1,2)]);QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=40.59..496.08 rows=1000 width=8) (actual time=0.19
1..0.191 rows=0 loops=1)
Recheck Cond: (t.* = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
-> Bitmap Index Scan on t_row_idx2 (cost=0.00..40.34 rows=1000 width=0) (actual time=0.188..0.188 rows=0 loops=1)
Index Cond: (t.* = ANY (ARRAY[ROW(1, 1), ROW(1, 2)]))
Planning time: 2.575 ms
Execution time: 0.267 msBut typical use cases won't be able to utilize the implicitly existing type of the table row.
Code Snippets
EXPLAIN ANALYZE
SELECT *
FROM t
WHERE (a,b) = ANY(VALUES (1,1),(1,2));...
WHERE (a,b) IN (VALUES (1,1),(1,2));QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.33..16.71 rows=1 width=8) (actual time=0.101..0.101 rows=0 loops=1)
-> Unique (cost=0.04..0.05 rows=2 width=8) (actual time=0.068..0.070 rows=2 loops=1)
-> Sort (cost=0.04..0.04 rows=2 width=8) (actual time=0.067..0.068 rows=2 loops=1)
Sort Key: "*VALUES*".column1, "*VALUES*".column2
Sort Method: quicksort Memory: 25kB
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=8) (actual time=0.005..0.005 rows=2 loops=1)
-> Index Only Scan using t_plain_idx on t (cost=0.29..8.32 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=2)
Index Cond: ((a = "*VALUES*".column1) AND (b = "*VALUES*".column2))
Heap Fetches: 0
Planning time: 4.080 ms
Execution time: 0.202 msWHERE (a,b) = ANY(ARRAY[(1,1),(1,2)]);ROW(a, b) = ANY (ARRAY[ROW(1, 1), ROW(1, 2)])Context
StackExchange Database Administrators Q#125413, answer score: 27
Revisions (0)
No revisions yet.