HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to simplify a nested SELECT with PostgreSQL arrays?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlarrayswithnestedsimplifyhowselect

Problem

I'm trying to speed up and simplify a SQL query against an imported OpenStreetMap database (OSM). The database is stored in a PostgreSQL 9.2.4 server.

This OSM import features two particular tables, planet_osm_rels and planet_osm_ways. In the first table exists an relation for country boundaries, which I can extract by querying an hstore column, tags_hstore. The resulting members attribute then contains an text array giving me a bunch of information, including which ways are part of this relation. The ID of the particular ways are prefixed with w to indicate that it is a way ID, e.g. w23412. To get the actual nodes of the ways, I need to query the planet_osm_ways table with the IDs I obtained, minus the w, of course.

To summarize, I've got the following table structure:

Table "public.planet_osm_rels"
   Column    |   Type   | Modifiers 
-------------+----------+-----------
 id          | bigint   | not null
 way_off     | smallint | 
 rel_off     | smallint | 
 parts       | bigint[] | 
 members     | text[]   | 
 tags        | text[]   | 
 pending     | boolean  | not null
 tags_hstore | hstore   | 
Indexes:
    "planet_osm_rels_pkey" PRIMARY KEY, btree (id)
    "planet_osm_rels_idx" btree (id) WHERE pending
    "planet_osm_rels_parts" gin (parts) WITH (fastupdate=off)
    "planet_osm_rels_tags_hstore_idx" gin (tags_hstore)

   Table "public.planet_osm_ways"
   Column    |   Type   | Modifiers 
-------------+----------+-----------
 id          | bigint   | not null
 nodes       | bigint[] | not null
 tags        | text[]   | 
 pending     | boolean  | not null
 tags_hstore | hstore   | 
Indexes:
    "planet_osm_ways_pkey" PRIMARY KEY, btree (id)
    "planet_osm_ways_idx" btree (id) WHERE pending
    "planet_osm_ways_nodes" gin (nodes) WITH (fastupdate=off)


I've come up with the following query:

```
SELECT nodes
FROM planet_osm_ways
WHERE id IN (
SELECT trim(leading 'w' from unnest)::int
FROM (
SELECT unnest

Solution

IN queries with huge sets are notoriously slow. It's often faster to use a JOIN instead:

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT ltrim(member, 'w')::bigint AS id
   FROM  (
      SELECT unnest(members) AS member
      FROM   planet_osm_rels
      WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
      ) u
   WHERE member LIKE 'w%'
   ) x USING (id);


But that's not the most important problem here. I wonder why the GIN index planet_osm_rels_tags_hstore_idx is not being used. Are you selecting large enough parts of the table planet_osm_rels to justify a sequential scan?

Oh, and id is type bigint. So cast to bigint instead of int for less friction.

If you can extract "way IDs" and save them redundantly in a separate column way_ids bigint[] in your table, your query would become quite a bit simpler and faster, with one less subquery level:

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT unnest(way_ids) AS id
   FROM   planet_osm_rels
   WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
   ) u USING (id);

Code Snippets

SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT ltrim(member, 'w')::bigint AS id
   FROM  (
      SELECT unnest(members) AS member
      FROM   planet_osm_rels
      WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
      ) u
   WHERE member LIKE 'w%'
   ) x USING (id);
SELECT nodes
FROM   planet_osm_ways
JOIN   (
   SELECT unnest(way_ids) AS id
   FROM   planet_osm_rels
   WHERE  (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", ...')
   ) u USING (id);

Context

StackExchange Database Administrators Q#42390, answer score: 6

Revisions (0)

No revisions yet.