HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimizing an Intersect query between two huge spatial tables

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablesqueryintersecthugetwobetweenoptimizingspatial

Problem

I am having a hard time trying to improve an intersect between two spatial tables and I would like to receive any tips about the table designs, queries or dba configs.

Tables:

Table teste.recorte_grade has 1,655,569 rows right now, but this a sub sample made for this test of a 9 million rows table.

CREATE TABLE teste.recorte_grade
(
  id integer NOT NULL DEFAULT nextval('teste."Recorte_grade_id_seq"'::regclass),
  id_gre character varying(21),
  indice_gre character varying(16),
  the_geom geometry(Polygon),
  CONSTRAINT "Recorte_grade_pkey" PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX sidx_recorte_grade_geom
  ON teste.recorte_grade
  USING gist
  (the_geom);


Table teste2.uso_2012 has 177,888 rows and this is all data that it will ever have.

CREATE TABLE teste2.uso_2012
(
  id integer NOT NULL,
  gridcode smallint NOT NULL,
  geom geometry(MultiPolygon) NOT NULL,
  CONSTRAINT pk_id_uso_2012 PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);    
CREATE INDEX idx_hash_calsse_uso_2012_teste2
  ON teste2.uso_2012
  USING hash
  (gridcode);    
CREATE INDEX sidx_uso_2012_geom_teste2
  ON teste2.uso_2012
  USING gist
  (geom);


Problem:

All I want is the area and the gridcode of each intersection between both tables, basically, the result of this query:

Select grade.id, uso.gridcode, , st_area(st_intersection(grade.the_geom, uso.geom)) 
from teste2.uso_2012 as uso 
    inner join teste.recorte_grade as grade on ST_Intersects(grade.the_geom, uso.geom) = 't' 
    order by grade.id


However this query ran for about 16 hours without any result when I decided to cancel its execution. If it took this long with the sub sample, imagine with the full data set.

Both tables were vacuum analyzed before.

EXPLAIN for slow query: http://explain.depesz.com/s/PEV

I thought it might be a good idea to separate this in multiple queries for one gridcode each time. That's why I created the hash index.

This is the data distribution in the `teste2.u

Solution

Update:

Since this was published, the problem evolve and now we deal with 140kk+ rows. However, Postgis also evolved and it is now possible to "fix" the the feature table. No more need of Dice from ArcMap.

I did it using a ST_VoronoiPolygons approach. I created a working gist with a function that breaks features on the original table in feasible sizes for processing.

Original Answer:

For information, I was able to do this using other tools and splitting the workload:

  • To everything work fine, I had to limit each feature in 800 vertices


max and 15 km² max. I tried some tools and even a recurring procedure
on plsql, but without success. The only thing that I tried and split
everything correctly was the Dice feature of ArcMap;

  • I divided my 170k+ rows in chunks of 20 rows and ran six


instances of the query in parallel to compute the area using a .net console app;

  • With the area stored, I was able to do some processing, also in


chunks and parallel, to compute the value of each single 9kk+ cell.

This process now takes "only" 3 hours to finish.

The ST_Intersects is light. The problem was this calculation:

st_area(st_intersection(grade.the_geom, uso.geom)).


Build the intersection of complex features and calculate the area was the demanding task.

Code Snippets

st_area(st_intersection(grade.the_geom, uso.geom)).

Context

StackExchange Database Administrators Q#133312, answer score: 2

Revisions (0)

No revisions yet.