HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to reduce query size with many repeated UNION subqueries?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
repeatedwithsizequeryunionreducemanyhowsubqueries

Problem

I use Postgres 13 and have a table defined with the following DDL:

CREATE TABLE item_codes (
    code    bytea                    NOT NULL,
    item_id bytea                    NOT NULL,
    time    TIMESTAMP WITH TIME ZONE NOT NULL,
    PRIMARY KEY (item_id, code)
);

CREATE INDEX ON item_codes (code, time, item_id);


I use the following query:

SELECT DISTINCT time, item_id
FROM (
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3965623166306238383033393437613338373162313934383034366139653239'
       ORDER BY time, item_id
       LIMIT 100)
       UNION ALL
      (SELECT time, item_id
       FROM item_codes
       WHERE code = '\x3836653432356638366638636338393364373935343938303233343363373561'
       ORDER BY time, item_id
       LIMIT 100)
     ) AS items
ORDER BY time, item_id
LIMIT 100;


The code is generated dynamically and the number of subqueries with UNION ALL depends on how many different code values are needed. It can get pretty long.

Naively rewriting the query to, what I think, is equivalent

SELECT DISTINCT time, item_id
FROM item_codes
WHERE code IN ('\x3965623166306238383033393437613338373162313934383034366139653239',
                  '\x3836653432356638366638636338393364373935343938303233343363373561')
ORDER BY time, item_id
LIMIT 100


makes it many times slower and unacceptable.

Two main questions:

-
Is it possible to rewrite the original query in a more concise way without duplicating subqueries for each code value while still keeping a fast execution plan?

-
Why can Postrgres not optimize the second query? Am I missing something and it's not an equivalent?

Query plan of the original query with UNIONs:

```
Limit (cost=1.12..7.33 rows=100 width=41)
-> Merge Append (cost=1.12..13.53 rows=200 width=41)
Sort Key: btc_tx_addresses.tx_time, btc_tx_addresses.tx_id"
-> Limit (cost=0.56..4.76 rows=100 width=41)
-> Index Only Scan using btc_tx_addr

Solution

Q1. Alternative without repeating subqueries

Possible. Provide a set of input values, and then attach a LATERAL subquery:

SELECT DISTINCT time, item_id
FROM   unnest('{\\x3965623166306238383033393437613338373162313934383034366139653239
              , \\x3836653432356638366638636338393364373935343938303233343363373561}'::bytea[]) c(code)
CROSS  JOIN LATERAL (
   SELECT time, item_id
   FROM   item_codes ic
   WHERE  ic.code = c.code
   ORDER  BY 1, 2
   LIMIT  100
   ) ic
ORDER  BY 1, 2
LIMIT  100;


I am unnesting an input array to provide the set. As an aside: escape \ with \ inside the array literal, or use an ARRAY constructor instead:

ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'
    , '\x3836653432356638366638636338393364373935343938303233343363373561']::bytea[]


Or:

ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'::bytea
    , '\x3836653432356638366638636338393364373935343938303233343363373561']


Alternatively, a VALUES expression does the trick, too:

SELECT DISTINCT time, item_id
FROM  (
   VALUES
     ('\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
   , ('\x3836653432356638366638636338393364373935343938303233343363373561')
   ) c(code)
CROSS  JOIN LATERAL ( ...


Q2. Why?

Because Postgres just doesn't have this kind of index skip scan implemented as query plan, yet. So we have to shoehorn it in.

Related:

  • Can spatial index help a "range - order by - limit" query



  • Optimize a query with small LIMIT, predicate on one column and order by another



  • Query last N related rows per row

Code Snippets

SELECT DISTINCT time, item_id
FROM   unnest('{\\x3965623166306238383033393437613338373162313934383034366139653239
              , \\x3836653432356638366638636338393364373935343938303233343363373561}'::bytea[]) c(code)
CROSS  JOIN LATERAL (
   SELECT time, item_id
   FROM   item_codes ic
   WHERE  ic.code = c.code
   ORDER  BY 1, 2
   LIMIT  100
   ) ic
ORDER  BY 1, 2
LIMIT  100;
ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'
    , '\x3836653432356638366638636338393364373935343938303233343363373561']::bytea[]
ARRAY['\x3965623166306238383033393437613338373162313934383034366139653239'::bytea
    , '\x3836653432356638366638636338393364373935343938303233343363373561']
SELECT DISTINCT time, item_id
FROM  (
   VALUES
     ('\x3965623166306238383033393437613338373162313934383034366139653239'::bytea)
   , ('\x3836653432356638366638636338393364373935343938303233343363373561')
   ) c(code)
CROSS  JOIN LATERAL ( ...

Context

StackExchange Database Administrators Q#323884, answer score: 9

Revisions (0)

No revisions yet.