patternsqlMinor
Optimizing an Intersect query between two huge spatial tables
Viewed 0 times
tablesqueryintersecthugetwobetweenoptimizingspatial
Problem
I am having a hard time trying to improve an intersect between two spatial tables and I would like to receive any tips about the table designs, queries or dba configs.
Tables:
Table
Table
Problem:
All I want is the area and the
However this query ran for about 16 hours without any result when I decided to cancel its execution. If it took this long with the sub sample, imagine with the full data set.
Both tables were vacuum analyzed before.
I thought it might be a good idea to separate this in multiple queries for one
This is the data distribution in the `teste2.u
Tables:
Table
teste.recorte_grade has 1,655,569 rows right now, but this a sub sample made for this test of a 9 million rows table.CREATE TABLE teste.recorte_grade
(
id integer NOT NULL DEFAULT nextval('teste."Recorte_grade_id_seq"'::regclass),
id_gre character varying(21),
indice_gre character varying(16),
the_geom geometry(Polygon),
CONSTRAINT "Recorte_grade_pkey" PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX sidx_recorte_grade_geom
ON teste.recorte_grade
USING gist
(the_geom);Table
teste2.uso_2012 has 177,888 rows and this is all data that it will ever have.CREATE TABLE teste2.uso_2012
(
id integer NOT NULL,
gridcode smallint NOT NULL,
geom geometry(MultiPolygon) NOT NULL,
CONSTRAINT pk_id_uso_2012 PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX idx_hash_calsse_uso_2012_teste2
ON teste2.uso_2012
USING hash
(gridcode);
CREATE INDEX sidx_uso_2012_geom_teste2
ON teste2.uso_2012
USING gist
(geom);Problem:
All I want is the area and the
gridcode of each intersection between both tables, basically, the result of this query:Select grade.id, uso.gridcode, , st_area(st_intersection(grade.the_geom, uso.geom))
from teste2.uso_2012 as uso
inner join teste.recorte_grade as grade on ST_Intersects(grade.the_geom, uso.geom) = 't'
order by grade.idHowever this query ran for about 16 hours without any result when I decided to cancel its execution. If it took this long with the sub sample, imagine with the full data set.
Both tables were vacuum analyzed before.
EXPLAIN for slow query: http://explain.depesz.com/s/PEVI thought it might be a good idea to separate this in multiple queries for one
gridcode each time. That's why I created the hash index.This is the data distribution in the `teste2.u
Solution
Update:
Since this was published, the problem evolve and now we deal with 140kk+ rows. However, Postgis also evolved and it is now possible to "fix" the the feature table. No more need of
I did it using a ST_VoronoiPolygons approach. I created a working gist with a function that breaks features on the original table in feasible sizes for processing.
Original Answer:
For information, I was able to do this using other tools and splitting the workload:
max and 15 km² max. I tried some tools and even a recurring procedure
on plsql, but without success. The only thing that I tried and split
everything correctly was the Dice feature of ArcMap;
instances of the query in parallel to compute the area using a .net console app;
chunks and parallel, to compute the value of each single 9kk+ cell.
This process now takes "only" 3 hours to finish.
The
Build the intersection of complex features and calculate the area was the demanding task.
Since this was published, the problem evolve and now we deal with 140kk+ rows. However, Postgis also evolved and it is now possible to "fix" the the feature table. No more need of
Dice from ArcMap.I did it using a ST_VoronoiPolygons approach. I created a working gist with a function that breaks features on the original table in feasible sizes for processing.
Original Answer:
For information, I was able to do this using other tools and splitting the workload:
- To everything work fine, I had to limit each feature in 800 vertices
max and 15 km² max. I tried some tools and even a recurring procedure
on plsql, but without success. The only thing that I tried and split
everything correctly was the Dice feature of ArcMap;
- I divided my 170k+ rows in chunks of 20 rows and ran six
instances of the query in parallel to compute the area using a .net console app;
- With the area stored, I was able to do some processing, also in
chunks and parallel, to compute the value of each single 9kk+ cell.
This process now takes "only" 3 hours to finish.
The
ST_Intersects is light. The problem was this calculation:st_area(st_intersection(grade.the_geom, uso.geom)).Build the intersection of complex features and calculate the area was the demanding task.
Code Snippets
st_area(st_intersection(grade.the_geom, uso.geom)).Context
StackExchange Database Administrators Q#133312, answer score: 2
Revisions (0)
No revisions yet.