HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

manual execution of specific remote query with postgresql_fdw

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withquerymanualspecificremoteexecutionpostgresql_fdw

Problem

In 9.4b2, postgresql_fdw doesn't know how to "push down" aggregate queries on remote tables, e.g.

> explain verbose select max(col1) from remote_tables.table1;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Aggregate  (cost=605587.30..605587.31 rows=1 width=4)
   Output: max(col1)
   ->  Foreign Scan on remote_tables.table1  (cost=100.00..565653.20 rows=15973640 width=4)
         Output: col1, col2, col3
         Remote SQL: SELECT col1 FROM public.table1


It would obviously be much more efficient to send SELECT max(col1) FROM public.table1 to the remote server and just pull the one row back.

Is there a way to perform this optimization manually? I would be satisfied with something as low-level as (hypothetically speaking)

EXECUTE 'SELECT max(col1) FROM public.table1' ON remote RETURNING (col1 INTEGER);


although of course a higher-level construct would be preferred.

I'm aware that I could do something like this with dblink, but that would involve rewriting a large body of code that already uses foreign tables, so I'd prefer not to.

EDIT: Here's the query plan for Erwin Brandstetter's suggestion:

=> explain verbose select col1 from remote_tables.table1 
-> order by col1 desc nulls last limit 1;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Limit  (cost=645521.40..645521.40 rows=1 width=4)
   Output: url
   ->  Sort  (cost=645521.40..685455.50 rows=15973640 width=4)
         Output: col1
         Sort Key: table1.col1
         ->  Foreign Scan on remote_tables.table1  (cost=100.00..565653.20 rows=15973640 width=4)
               Output: col1
               Remote SQL: SELECT col1 FROM public.table1


This is better, in that it fetches only `c

Solution

For the time being, it seems that the best available option is to create a view on the remote server that encapsulates the query needing to be "pushed down". postgres_fdw is happy to define and use foreign tables backed by views on the remote, and regular old query optimization within the view does the Right Thing. For instance, given

CREATE VIEW id_ranges AS
SELECT 'url_strings'::text AS tbl,
    min(url_strings.id)::bigint AS lo,
    max(url_strings.id)::bigint AS hi
   FROM url_strings
UNION
 SELECT 'captured_pages'::text AS tbl,
    min(captured_pages.url)::bigint AS lo,
    max(captured_pages.url)::bigint AS hi
   FROM captured_pages
UNION
 -- ... several more like that ...


on the remote, and a FOREIGN TABLE of the same name on the local server,

SELECT lo, hi FROM id_ranges WHERE tbl = 'url_strings';


the existing pushdown optimization will send the WHERE constraint to the remote, and the remote will scan only one table (making use of indexes if possible) and send back a single-row result.

Code Snippets

CREATE VIEW id_ranges AS
SELECT 'url_strings'::text AS tbl,
    min(url_strings.id)::bigint AS lo,
    max(url_strings.id)::bigint AS hi
   FROM url_strings
UNION
 SELECT 'captured_pages'::text AS tbl,
    min(captured_pages.url)::bigint AS lo,
    max(captured_pages.url)::bigint AS hi
   FROM captured_pages
UNION
 -- ... several more like that ...
SELECT lo, hi FROM id_ranges WHERE tbl = 'url_strings';

Context

StackExchange Database Administrators Q#77306, answer score: 2

Revisions (0)

No revisions yet.