HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Compound Index on 43 Million PostgreSQL table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcompoundmillionindextable

Problem

This question is related to a prior one of I asked: Order of columns in a compound index in PostgreSQL (and query order)

Rather than overload that question, I figure I can sharpen and limit my question here. Given the following query (and EXPLAIN ANALYZE), is the compound index I am creating helping?

This first query was run with only simple indexes (a GIST on outline) and a (BTREE on pid).

The query is:

EXPLAIN ANALYZE SELECT DISTINCT ON (path) oid, pid, product_name, type, path, size 
FROM portal.inventory AS inv 
WHERE ST_Intersects(st_geogfromtext('SRID=4326;POLYGON((21.51947021484375 51.55059814453125, 18.9129638671875 51.55059814453125, 18.9129638671875 48.8287353515625, 21.51947021484375 48.8287353515625, 21.51947021484375 51.55059814453125))'), inv.outline) 
AND (inv.pid in (20010,20046))


--

The result was the following (which is faster, but perhaps that is only because the database was warm).

```
"Unique (cost=581.76..581.76 rows=1 width=89) (actual time=110.436..110.655 rows=249 loops=1)"
" -> Sort (cost=581.76..581.76 rows=1 width=89) (actual time=110.434..110.477 rows=1377 loops=1)"
" Sort Key: path"
" Sort Method: quicksort Memory: 242kB"
" -> Bitmap Heap Scan on inventory inv (cost=577.48..581.75 rows=1 width=89) (actual time=39.257..105.878 rows=1377 loops=1)"
" Recheck Cond: ((pid = ANY ('{20010,20046}'::integer[])) AND ('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography && outline))"
" Rows Removed by Index Recheck: 3731"
" Filter: (_st_distance('0103000020E6100000010000000500000000000000FC843540000000007AC6494000000000B8E93240000000007AC6494000000000B8E9324000000000146A484000000000FC84354000000000146A484000000000FC843540000000007AC64940'::geography, outline, 0::double precision, false) BitmapAnd (cost=577.48..5

Solution

In a GiST index, the order of columns has a different significance than in a B-tree index. Per documentation:

A multicolumn GiST index can be used with query conditions that
involve any subset of the index's columns. Conditions on additional
columns restrict the entries returned by the index, but the condition
on the first column is the most important one for determining how much
of the index needs to be scanned. A GiST index will be relatively
ineffective if its first column has only a few distinct values, even
if there are many distinct values in additional columns.

In short: put the most selective columns first.

Your EXPLAIN output shows that the condition on pid is more selective (rows=7836) than the one on outline (rows=63112). If that can be generalized (a single example may be misleading) I suggest this alternative:

CREATE INDEX inventory_compound_idx ON portal.inventory USING gist (pid, outline);


If most of your (important) queries include conditions on both columns, a multicolumn index may serve you well. Else, single columns may be better overall.
Table layout

This is an educated guess since I don't have complete information.

-
Don't use oid as column name. It's easy to confuse with the OID.

-
Don't use the name date for a timestamp column. Or rather: don't use the name date for any column, don't use names of base-types for identifiers at all. Can lead to confusing mistakes and error messages.

-
Create a lookup table for types and only put a small integer type_id into the big table. Pack fixed-length types tightly so not to waste space to padding. Details.

-
I prefer the type text (or varchar without length limit) over varchar(n). Details.

For example:

CREATE TABLE portal.inventory (
   inventory_id bigint PRIMARY KEY
  ,type_id      integer NOT NULL REFERENCES inv_type(type_id)
  ,pid          integer NOT NULL
  ,size         bigint NOT NULL
  ,ts           timestamp NOT NULL
  ,outline      geography(Polygon,4326)
  ,product_name text
  ,path         text
);

Code Snippets

CREATE INDEX inventory_compound_idx ON portal.inventory USING gist (pid, outline);
CREATE TABLE portal.inventory (
   inventory_id bigint PRIMARY KEY
  ,type_id      integer NOT NULL REFERENCES inv_type(type_id)
  ,pid          integer NOT NULL
  ,size         bigint NOT NULL
  ,ts           timestamp NOT NULL
  ,outline      geography(Polygon,4326)
  ,product_name text
  ,path         text
);

Context

StackExchange Database Administrators Q#61084, answer score: 8

Revisions (0)

No revisions yet.