HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Extracting nodes from a graph database

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
nodesgraphdatabaseextractingfrom

Problem

I'm new to Postgres and PostGIS, but not to geospatial applications.

I have a table loaded up with graph data in the form of links (edges). The link database has about 60,000,000 rows. I am trying to extract the nodes to allow for quicker searching. For some of my proof-of-concept work I was able to use the link table for the searches, but there will be lots of duplicates, and there's no guarantee that either the source column or the target column contains all nodes.

This is Postgres & PostGIS, so I am also using the table to cache a geometry->geography conversion. Yes I do need to use geography fields. I'm also copying the geometry information "just in case".

Table creation SQL:

-- Recreate table and index
DROP TABLE IF EXISTS nodes;

CREATE TABLE nodes (node integer PRIMARY KEY, geog geography(POINT,4326) );

CREATE INDEX geogIndex ON nodes USING GIST(geog);

SELECT AddGeometryColumn('nodes', 'geom', 4326, 'POINT', 2);

-- Insert all unique nodes from the source column
INSERT INTO nodes (node,geog,geom)
       SELECT DISTINCT ON (source) source,geography( ST_Transform(geom_source,4326)),geom_source 
       FROM view_topo;

-- Insert any nodes in the target column that we don't have already
INSERT INTO nodes (node,geog,geom)
       SELECT DISTINCT ON (target) target,geography( ST_Transform(geom_target,4326)),geom_target 
       FROM view_topo
       WHERE NOT EXISTS( SELECT 1 FROM nodes WHERE nodes.node = view_topo.target);

VACUUM ANALYZE;


I left the first INSERT running overnight and it took about 2-3hrs to run. This resulted in about 40,000,000 unique nodes being added.

I have just enabled the second INSERT and the VACUUM ANALYZE. I am expecting it to take until at least lunchtime.

Luckily this is only a batch job that has to be executed once after I've loaded a new link table, but is there a better way? Is there a faster way?

Solution

Check out PostgreSQL's tips on adding a lot of data into a table. In particular, are you sure that you need that index before INSERTing all that data? It might speed things up if you create the index after all the data has been added to the table.

Context

StackExchange Code Review Q#92, answer score: 5

Revisions (0)

No revisions yet.