HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

How do I get PostgreSQL FDW to push down the LIMIT to the (single) backend server?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlbackendthelimitgetpushdownsinglehowserver

Problem

I've set up a PostgreSQL FDW server with the following table, sharded by user_id over four servers:

CREATE TABLE my_big_table
(
    user_id bigint NOT NULL,
    serial bigint NOT NULL,         -- external, incrementing only
    some_object_id bigint NOT NULL,
    timestamp_ns bigint NOT NULL,
    object_type smallint NOT NULL,
    other_type smallint NOT NULL,
    data bytea
) PARTITION BY HASH (user_id) ;


CREATE SERVER shardA
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host '192.168.200.11', port '5432', dbname 'postgres', fetch_size '10000');
 .
 .
 .
CREATE SERVER shardD
    FOREIGN DATA WRAPPER postgres_fdw
    OPTIONS (host '192.168.200.14', port '5432', dbname 'postgres', fetch_size '10000');


create foreign table my_big_table_mod4_s0 partition of my_big_table
    FOR VALUES WITH (MODULUS 4, REMAINDER 0) server shardA
    OPTIONS (table_name 'my_big_table_mod4_s0');
 .
 .
 .
create foreign table my_big_table_mod4_s3 partition of my_big_table
    FOR VALUES WITH (MODULUS 4, REMAINDER 3) server shardD
    OPTIONS (table_name 'my_big_table_mod4_s3');


On the backend servers I have set up a table with several indexes, its data CLUSTERed by (user_id, serial) across several partitions. I don't think those details are very relevant for my actual question, though.

The common query against my cluster is in the pattern of:

SELECT * from my_big_table
WHERE
  user_id = 12345     -- only 1 user, always! --> single foreign server.
ORDER BY serial DESC  -- get 'newest' 90% of the time, 10% ASC
LIMIT 1000;           -- max limit 1000, sometimes less


For users with 100.000 records I see the issue leading to poor performance: explain shows LIMIT and sorting happens on FDW, not pushed down. Why?

```
Limit (cost=927393.08..927395.58 rows=1000 width=32)
Output: my_big_table_mod4_s0.serial, my_big_table_mod4_s0.some_object_id, my_big_table_mod4_s0.timestamp_ns, my_big_table_mod4_s0.object_type, my_big_table_mod4_s0.other_type,

Solution

This works as expected since version 12 with commit d50d172e51 indeed, but only for non-partitioned (non-sharded) tables.

Running the query directly against the foreign table name (my_big_table_mod4_s0), the LIMIT is pushed down correctly.

Reported as a bug ; I don't see a technical reason why this should not work with partitioning (partition pruning) involved.

Update: turns out this is not really a bug, given the complexity of the planner and partition pruning in combination with FDW, but more of a feature request. The original author of the aforementioned commit indicates work on this may be done for PostgreSQL 13. :-)

Lesson learned: FDW is not really an efficient query router for all type of queries (yet).

Work-around for now with partitioned (sharded) tables on FDW for me is to create a function in plpgsql to determine the foreign table name based on the declarative partitioning layout (mod(user_id, 4) in my case). (I believe it's getting out of scope to fully include that here.)

Context

StackExchange Database Administrators Q#242358, answer score: 2

Revisions (0)

No revisions yet.