HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Efficient SELECT for array of tuples in PostgreSQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlarrayefficientforselecttuples

Problem

Assume I have a table records with the following structure

  • id (unique int)



  • updated (timestamp)



I have an input_array with values [[id_1, timestamp_1], [id_4, timestamp_4], ...]. I'll refer to each element as tuple_1, tuple_4, etc.

I'm looking for the most efficient query (in PostgreSQL v11.2+) to select [id_1, id_4, ...] from records, but only where tuple_{n}.updated > row{n}.updated. Assume input_array may contains thousands of tuples, and records upwards of a million rows.

I don't even know where to begin with this. Lateral join comes to mind, as does unnest, and where in, but everything I've tried so far fails miserably

Update I'm open to input_array being in any format (tuples, two separate arrays, whatever), and updated being an int

Solution

If you aren't fixed on the array input, you can use a tuple comparison.

select *
from records
where (id, updated) in ( (1, timestamp '2019-01-01 00:00:00'),
                         (2, timestamp '2019-01-02 00:00:00') )


That can make use of a regular btree index on (id, updated)

Note that this uses = for both values and is equivalent to

where (id = 1 and updated = timestamp '2019-01-01 00:00:00')
   or (id = 2 and updated = timestamp '2019-01-02 00:00:00')


But you want to compare the timestamps using >. You can do that if you join against a values clause:

select r.*
from records r
  join (
    values 
       (1, timestamp '2019-01-01 00:00:00'),
       (2, timestamp '2019-01-02 00:00:00') 
  ) as t(id,upd) on  r.id = t.id
where r.updated > t.upd;


Online example: https://rextester.com/GGC83046

Code Snippets

select *
from records
where (id, updated) in ( (1, timestamp '2019-01-01 00:00:00'),
                         (2, timestamp '2019-01-02 00:00:00') )
where (id = 1 and updated = timestamp '2019-01-01 00:00:00')
   or (id = 2 and updated = timestamp '2019-01-02 00:00:00')
select r.*
from records r
  join (
    values 
       (1, timestamp '2019-01-01 00:00:00'),
       (2, timestamp '2019-01-02 00:00:00') 
  ) as t(id,upd) on  r.id = t.id
where r.updated > t.upd;

Context

StackExchange Database Administrators Q#245394, answer score: 12

Revisions (0)

No revisions yet.