patternsqlMinor
Optimize query matching first N items of an array
Viewed 0 times
arrayqueryfirstitemsoptimizematching
Problem
I'm working with a Postgres DB full of chess game data right now, where each game is a row in a 'records' table. The players' moves and the (optional) computer evaluations of those moves each have their own column and are stored as arrays.
I've written a query to retrieve all evaluations of a specified opening sequence of moves. (You'd think the computer evaluation would be consistent - but it's not.) The length of the opening sequence is arbitrary - it could be one move, it could be thirty.
Here's an example query, which finds all games starting with the same ten-move opening sequence and then, for each game with an evaluation, returns the computer's evaluation of that point in the game -
I'm not certain it's relevant, but the move data is always an alphanumeric string from 2-6 characters, and the computer evaluations are largely decimals (both positive and negative) but do include the occasional special character (forced checkmates have an octothorpe for a prefix).
Here's the relevant snippet of the table description -
And here's the query plan from EXPLAIN ANALYZE:
```
Gather (cost=1000.00..736354.70 rows=905 width=516) (actual time=28251.267..28253.139 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on records (cost=0.00..735264.20 rows=377 width=516) (actual time=28243.233..28243.234 rows=0 loops=3)
Filter: ((evaluation IS NOT NULL) AND ((moves[1:10])::text[] = '{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}'::text
I've written a query to retrieve all evaluations of a specified opening sequence of moves. (You'd think the computer evaluation would be consistent - but it's not.) The length of the opening sequence is arbitrary - it could be one move, it could be thirty.
Here's an example query, which finds all games starting with the same ten-move opening sequence and then, for each game with an evaluation, returns the computer's evaluation of that point in the game -
SELECT evaluation[10]
FROM records
WHERE moves[1:10]::text[] = ARRAY['b4', 'e5', 'Bb2', 'd6', 'Nf3', 'Nf6', 'g3', 'Bg4', 'Bg2', 'h5']::text[]
AND evaluation IS NOT NULL;I'm not certain it's relevant, but the move data is always an alphanumeric string from 2-6 characters, and the computer evaluations are largely decimals (both positive and negative) but do include the occasional special character (forced checkmates have an octothorpe for a prefix).
Here's the relevant snippet of the table description -
Column | Type |
-----------------+--------------------------------+-
id | bigint |
moves | character varying(255)[] |
evaluation | character varying(255)[] |
"records_pkey" PRIMARY KEY, btree (id)
Access method: heapAnd here's the query plan from EXPLAIN ANALYZE:
```
Gather (cost=1000.00..736354.70 rows=905 width=516) (actual time=28251.267..28253.139 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on records (cost=0.00..735264.20 rows=377 width=516) (actual time=28243.233..28243.234 rows=0 loops=3)
Filter: ((evaluation IS NOT NULL) AND ((moves[1:10])::text[] = '{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}'::text
Solution
You need index support to be fast. Tricky without re-designing your table. The following solution should perform excellently (microseconds instead of seconds), but requires some skill. Buckle up.
Expression index on
Just take a few leading array elements, say 8. That should be very selective already. More would just make the index bigger, for little extra filtering.
Convert to a string. No separator. That allows false positives, but unlikely enough to matter. Eventually, we filter for exact results anyway.
Only
So we could "fake" an immutable wrapper function. That's possible for any user who can create functions:
Or, better yet, define an honestly
The rest works without superuser privileges.
Related:
Either way, we now have a function
If a major percentage of rows has
Query
For queries with 10 array elements or more:
Use the expression
Then add original exact filters to narrow the results down to exact matches:
Looks redundant, is logically redundant, but allows to use the index to great effect.
For arrays with less than 8 elements (our indexed lead), or generally, use
db<>fiddle here
Similar, with more explanation:
Cluster table rows
It will generally help performance to physically sort your table rows, so that each query can read results from one or few data pages, instead of fetching from all over the place.
Use
Expression index on
IMMUTABLE functionJust take a few leading array elements, say 8. That should be very selective already. More would just make the index bigger, for little extra filtering.
Convert to a string. No separator. That allows false positives, but unlikely enough to matter. Eventually, we filter for exact results anyway.
Only
IMMUTABLE functions are allowed in index expressions. But array_to_string() is only STABLE, not IMMUTABLE, because it takes anyarray and some element types don't have an immutable text representation. We only deal with text (well, varchar(255) for no good reason, but all the same), and that is, in fact, immutable. But array_to_string() doesn't know that.So we could "fake" an immutable wrapper function. That's possible for any user who can create functions:
CREATE OR REPLACE FUNCTION public.f_8moves(text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT array_to_string($1[1:8], '')
$func$;Or, better yet, define an honestly
IMMUTABLE variant of array_to_string() for only text[] input, directly based on the underlying C function. Faster and cleaner. Let's call it array_to_string_immutable() to be clear. That requires superuser privileges:-- SET ROLE postgres; -- you must be superuser
CREATE OR REPLACE FUNCTION public.array_to_string_immutable(text[], text)
RETURNS text
LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'array_to_text';The rest works without superuser privileges.
CREATE OR REPLACE FUNCTION public.f_8moves(text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.array_to_string_immutable($1[1:8], '')
$func$;Related:
- Does PostgreSQL support "accent insensitive" collations?
Either way, we now have a function
public.f_8moves(text[]) to be used in the following index:CREATE INDEX records_8moves_idx ON records (public.f_8moves(moves) COLLATE "C");COLLATE "C" is exactly what we need to allow left-anchored (your expressed requirement) LIKE expressions. See:- Is there a difference between text_pattern_ops and COLLATE "C"?
- Is unique index better than unique constraint when an index with an operator class is required
If a major percentage of rows has
evaluation IS NOT NULL, add that filter to the index, to make it a partial index on top:CREATE INDEX records_8moves_idx ON records (public.f_8moves(moves) COLLATE "C")
WHERE evaluation IS NOT NULL;Query
For queries with 10 array elements or more:
SELECT evaluation[10]
FROM records
WHERE public.f_8moves(moves) = public.f_8moves('{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}') COLLATE "C"
AND moves[1:10] = '{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}'
AND evaluation IS NOT NULL;COLLATE "C" is required to match the index.Use the expression
public.f_8moves(moves) to match the functional index. The right side of the expression can be given as string directly. But I use the same function for convenience.Then add original exact filters to narrow the results down to exact matches:
AND moves[1:10] = '{b4,e5,Bb2,d6,Nf3,Nf6,g3,Bg4,Bg2,h5}'Looks redundant, is logically redundant, but allows to use the index to great effect.
For arrays with less than 8 elements (our indexed lead), or generally, use
LIKE with a left-anchored pattern:SELECT evaluation[10]
FROM records
WHERE public.f_8moves(moves) LIKE (public.f_8moves('{b4,e5}') || '%') COLLATE "C" -- COLLATE "C" is optional for LIKE
AND moves[1:2] = '{b4,e5}'
AND evaluation IS NOT NULL;db<>fiddle here
Similar, with more explanation:
- Index max row size error
- Full-text search in Postgres or CouchDB?
Cluster table rows
It will generally help performance to physically sort your table rows, so that each query can read results from one or few data pages, instead of fetching from all over the place.
Use
CLUSTER or the non-blocking alternatives pg_repack or pg_squeeze. More in the link above.Code Snippets
CREATE OR REPLACE FUNCTION public.f_8moves(text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT array_to_string($1[1:8], '')
$func$;-- SET ROLE postgres; -- you must be superuser
CREATE OR REPLACE FUNCTION public.array_to_string_immutable(text[], text)
RETURNS text
LANGUAGE internal IMMUTABLE PARALLEL SAFE STRICT AS 'array_to_text';CREATE OR REPLACE FUNCTION public.f_8moves(text[])
RETURNS text
LANGUAGE sql IMMUTABLE PARALLEL SAFE STRICT AS
$func$
SELECT public.array_to_string_immutable($1[1:8], '')
$func$;CREATE INDEX records_8moves_idx ON records (public.f_8moves(moves) COLLATE "C");CREATE INDEX records_8moves_idx ON records (public.f_8moves(moves) COLLATE "C")
WHERE evaluation IS NOT NULL;Context
StackExchange Database Administrators Q#299039, answer score: 7
Revisions (0)
No revisions yet.