HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to optimize PostgreSQL OR query on two indexed columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcolumnsqueryindexedtwooptimizehow

Problem

I have a large partitioned table that stores monetary transactions between accounts.
CREATE TABLE "transactions" (
"from" BYTEA NOT NULL -- sender account
,"to" BYTEA NOT NULL -- receiver account
,"type" INTEGER NOT NULL -- type of transfer
,"ts" TIMESTAMP NOT NULL -- timestamp of transfer
) PARTITION BY RANGE ("ts");


"transactions" has indexes on "ts", "from" and "to" (with "ts" for ordering).
CREATE INDEX "transactions_ts_idx" ON "transactions" USING BTREE ("ts");
CREATE INDEX "transactions_from_idx" ON "transactions" USING BTREE ("from", "ts");
CREATE INDEX "transactions_to_idx" ON "transactions" USING BTREE ("to", "ts");


I would like to query all transactions involving a given account, or no account, something like:
-- given an account
SELECT * FROM "transactions"
WHERE "from" = '' OR "to" = ''
ORDER BY "ts" DESC;


and
-- all transactions irrespective of account
SELECT * FROM "transactions"
ORDER BY "ts" DESC;

SELECT * FROM "transactions"
WHERE "from" = '' OR "to" = ''
ORDER BY "ts" DESC;


The subsequent query plan for the first query given an `` involves BITMAP INDEX SCANs on the "from" and "to" indexes followed by BITMAP OR, over each partition, as expected.

...
|  ->  BitmapOr  (cost=535.22..535.22 rows=17744 width=0) (actual time=6.355..6.356 rows=0 loops=1)                                     |
|    ->  Bitmap Index Scan on transactions_2022_12_from_idx  (cost=0.00..185.51 rows=7058 width=0) (actual time=2.742..2.742 rows=0 loops=1)  |
|      Index Cond: (("from")::bytea = '\xc5db3df907e7aa97f2da491e328578be27e9e644'::bytea)                                            |
|    ->  Bitmap Index Scan on transactions_2022_12_to_idx  (cost=0.00..340.84 rows=10686 width=0) (actual time=3.613..3.613 rows=0 loops=1)     |
|      Index Cond: (("to")::bytea = '\xc5db3df907e7aa97f2da491e328578be27e9e644'::bytea)                                              |
...


However I'm concerned about the CPU and

Solution

That query won't become much faster. Note that you won't be able to support both the WHERE condition and the ORDER BY with indexes; you have to decide which one you want. To support the WHERE condition, have indexes on from and to; to support ORDER BY, have an index on ts.

If you are willing to rewrite the query, it could be faster:

((SELECT ts, "from", "to", value
  FROM transactions
  WHERE "from" = '\xc5db3df907e7aa97f2da491e328578be27e9e644'
  ORDER BY ts DESC
  LIMIT 10)
 UNION ALL
 (SELECT ts, "from", "to", value
  FROM transactions
  WHERE "to" = '\xc5db3df907e7aa97f2da491e328578be27e9e644'
  ORDER BY ts DESC
  LIMIT 10))
ORDER BY ts DESC
LIMIT 10;


That query would benefit from two indexes, one on ("from", ts), and one on ("to", ts).

You should avoid column names that are SQL keywords.

Code Snippets

((SELECT ts, "from", "to", value
  FROM transactions
  WHERE "from" = '\xc5db3df907e7aa97f2da491e328578be27e9e644'
  ORDER BY ts DESC
  LIMIT 10)
 UNION ALL
 (SELECT ts, "from", "to", value
  FROM transactions
  WHERE "to" = '\xc5db3df907e7aa97f2da491e328578be27e9e644'
  ORDER BY ts DESC
  LIMIT 10))
ORDER BY ts DESC
LIMIT 10;

Context

StackExchange Database Administrators Q#320878, answer score: 9

Revisions (0)

No revisions yet.