patternsqlMinor
Compound Index on 43 Million PostgreSQL table
Viewed 0 times
postgresqlcompoundmillionindextable
Problem
This question is related to a prior one of I asked: Order of columns in a compound index in PostgreSQL (and query order)
Rather than overload that question, I figure I can sharpen and limit my question here. Given the following query (and EXPLAIN ANALYZE), is the compound index I am creating helping?
This first query was run with only simple indexes (a GIST on outline) and a (BTREE on pid).
The query is:
--
The result was the following (which is faster, but perhaps that is only because the database was warm).
```
"Unique (cost=581.76..581.76 rows=1 width=89) (actual time=110.436..110.655 rows=249 loops=1)"
" -> Sort (cost=581.76..581.76 rows=1 width=89) (actual time=110.434..110.477 rows=1377 loops=1)"
" Sort Key: path"
" Sort Method: quicksort Memory: 242kB"
" -> Bitmap Heap Scan on inventory inv (cost=577.48..581.75 rows=1 width=89) (actual time=39.257..105.878 rows=1377 loops=1)"
" Recheck Cond: ((pid = ANY ('{20010,20046}'::integer[])) AND ('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline))"
" Rows Removed by Index Recheck: 3731"
" Filter: (_st_distance('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography, outline, 0::double precision, false) BitmapAnd (cost=577.48..5
Rather than overload that question, I figure I can sharpen and limit my question here. Given the following query (and EXPLAIN ANALYZE), is the compound index I am creating helping?
This first query was run with only simple indexes (a GIST on outline) and a (BTREE on pid).
The query is:
EXPLAIN ANALYZE SELECT DISTINCT ON (path) oid, pid, product_name, type, path, size
FROM portal.inventory AS inv
WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((21.51947021484375 51.55059814453125, 18.9129638671875 51.55059814453125, 18.9129638671875 48.8287353515625, 21.51947021484375 48.8287353515625, 21.51947021484375 51.55059814453125))'), inv.outline)
AND (inv.pid in (20010,20046))--
The result was the following (which is faster, but perhaps that is only because the database was warm).
```
"Unique (cost=581.76..581.76 rows=1 width=89) (actual time=110.436..110.655 rows=249 loops=1)"
" -> Sort (cost=581.76..581.76 rows=1 width=89) (actual time=110.434..110.477 rows=1377 loops=1)"
" Sort Key: path"
" Sort Method: quicksort Memory: 242kB"
" -> Bitmap Heap Scan on inventory inv (cost=577.48..581.75 rows=1 width=89) (actual time=39.257..105.878 rows=1377 loops=1)"
" Recheck Cond: ((pid = ANY ('{20010,20046}'::integer[])) AND ('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline))"
" Rows Removed by Index Recheck: 3731"
" Filter: (_st_distance('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography, outline, 0::double precision, false) BitmapAnd (cost=577.48..5
Solution
In a GiST index, the order of columns has a different significance than in a B-tree index. Per documentation:
A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.
In short: put the most selective columns first.
Your
If most of your (important) queries include conditions on both columns, a multicolumn index may serve you well. Else, single columns may be better overall.
Table layout
This is an educated guess since I don't have complete information.
-
Don't use
-
Don't use the name
-
Create a lookup table for types and only put a small integer
-
I prefer the type
For example:
A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.
In short: put the most selective columns first.
Your
EXPLAIN output shows that the condition on pid is more selective (rows=7836) than the one on outline (rows=63112). If that can be generalized (a single example may be misleading) I suggest this alternative:CREATE INDEX inventory_compound_idx ON portal.inventory USING gist (pid, outline);If most of your (important) queries include conditions on both columns, a multicolumn index may serve you well. Else, single columns may be better overall.
Table layout
This is an educated guess since I don't have complete information.
-
Don't use
oid as column name. It's easy to confuse with the OID.-
Don't use the name
date for a timestamp column. Or rather: don't use the name date for any column, don't use names of base-types for identifiers at all. Can lead to confusing mistakes and error messages.-
Create a lookup table for types and only put a small integer
type_id into the big table. Pack fixed-length types tightly so not to waste space to padding. Details.-
I prefer the type
text (or varchar without length limit) over varchar(n). Details.For example:
CREATE TABLE portal.inventory (
inventory_id bigint PRIMARY KEY
,type_id integer NOT NULL REFERENCES inv_type(type_id)
,pid integer NOT NULL
,size bigint NOT NULL
,ts timestamp NOT NULL
,outline geography(Polygon,4326)
,product_name text
,path text
);Code Snippets
CREATE INDEX inventory_compound_idx ON portal.inventory USING gist (pid, outline);CREATE TABLE portal.inventory (
inventory_id bigint PRIMARY KEY
,type_id integer NOT NULL REFERENCES inv_type(type_id)
,pid integer NOT NULL
,size bigint NOT NULL
,ts timestamp NOT NULL
,outline geography(Polygon,4326)
,product_name text
,path text
);Context
StackExchange Database Administrators Q#61084, answer score: 8
Revisions (0)
No revisions yet.