HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Use array_position() function to get the most frequent value for column from pg_stats

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fromthecolumnfunctionvaluegetarray_positionforusefrequent

Problem

I'm trying to execute this simple query in order to check whether some value (1000) belongs to MCV list used by Postgres query optimizer:

SELECT array_position(most_common_vals, 1000) 
FROM pg_stats 
WHERE tablename = 'tenk1' 
AND attname = 'unique1';


But receive the following error message:

ERROR:  function array_position(anyarray, integer) does not exist


How can it be fixed?

array_position() is the standard function as described here and the following statement returns 2 as expected:

SELECT array_position('{1,2,3}', 2);

Solution

Solution

Assuming the data type of your column tenk1.unique1 is integer:

SELECT array_position(most_common_vals::text::int[], 1000) 
FROM   pg_stats 
WHERE  tablename = 'tenk1' 
AND    attname = 'unique1';


Use your actual column type and the corresponding array type.

You get the position, or NULL if the value is not in the MCV list.

The solution is short - unlike the ...

Explanation

The function array_position() is defined to take (anyarray, anyelement) (or (anyarray, anyelement, integer) for the second variant).

The column pg_stats.most_common_vals has the polymorphic data type anyarray to be able to hold arrays of any data type - for obvious reasons.

anyarray and anyelement are not allowed as data types for user-created tables. For users, both are polymorphic pseudo-types. (But Postgres can use them in system tables.)

Multiple polymorphic variables in the same function have to resolve to the same (or corresponding) data types. The manual:


Furthermore, if there are positions declared anyarray and others
declared anyelement, the actual array type in the anyarray positions
must be an array whose elements are the same type appearing in the
anyelement positions.

And:


Thus, when more than one argument position is declared with a
polymorphic type, the net effect is that only certain combinations of
actual argument types are allowed.

Bold emphasis mine.

You have found a corner case where function type resolution fails for the combination of the polymorphic anyarray with an integer - or any non-polymorphic type in second position.

1000 in your expression array_position(most_common_vals, 1000) is a numeric constant resolving to integer. These would fail in similar fashion:

array_position(most_common_vals, '1000')  -- untyped string literal


ERROR:  function array_position(anyarray, unknown) does not exist


array_position(most_common_vals, '1000'::text)


ERROR:  function array_position(anyarray, text) does not exist


Furthermore, there are no casts defined for anyarray, being a pseudo-type in user-land:

SELECT * FROM pg_cast WHERE castsource = 'anyarray'::regtype;  -- nothing found


The workaround is to cast to text as stepping stone, since any type can be cast to text. Then cast to integer[], arriving at the solution above.

In closing, I think this is a shortcoming in function type resolution that could be resolved (easily?). But since the data type anyarray is not supposed to be used like this in user-land to begin with I doubt that any developer will spend time on it ...

Code Snippets

SELECT array_position(most_common_vals::text::int[], 1000) 
FROM   pg_stats 
WHERE  tablename = 'tenk1' 
AND    attname = 'unique1';
array_position(most_common_vals, '1000')  -- untyped string literal
ERROR:  function array_position(anyarray, unknown) does not exist
array_position(most_common_vals, '1000'::text)
ERROR:  function array_position(anyarray, text) does not exist

Context

StackExchange Database Administrators Q#212215, answer score: 5

Revisions (0)

No revisions yet.