HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Index partitioned table to prevent Postgres from doing a sequential scan

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
preventscanpostgresdoingsequentialindexfrompartitionedtable

Problem

I have a Postgres database with a partitioned table that will contain ~2,000,000,000 entries.

I have defined a partitioned database based on the first letter of the "identifier" - this is split into 37 sub-tables, [0-9, a-z, default (catchall for everything else)].

The database is very simple and straight forward, and is defined below.

create table entries (
id bigserial,
identifier text null,
password text null,
additional_fields jsonb
)
partition by list (lower(left(identifier, 1)));
ALTER DATABASE credentials SET constraint_exclusion=on;

CREATE TABLE entries_0 PARTITION OF entries for values in ('0');
CREATE TABLE entries_1 PARTITION OF entries for values in ('1');
CREATE TABLE entries_2 PARTITION OF entries for values in ('2');
CREATE TABLE entries_3 PARTITION OF entries for values in ('3');
...
CREATE TABLE entries_z PARTITION OF entries for values in ('z');

ALTER TABLE entries_0 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '0');
ALTER TABLE entries_1 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '1');
ALTER TABLE entries_2 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '2');
ALTER TABLE entries_3 ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = '3');
...
ALTER TABLE entries_z ADD CONSTRAINT first_letter CHECK (lower(left(identifier, 1)) = 'z');

CREATE INDEX ident_idx on entries(identifier);


However, when I run an EXPLAIN it says that is it is still doing a sequential scan.

EXPLAIN SELECT * FROM entries where identifier = 'some_identifier_from_subtable_s' LIMIT 1;


Output:

```
Limit (cost=0.00..140.92 rows=1 width=104)
-> Append (cost=0.00..43418531.72 rows=308113 width=104)
-> Seq Scan on entries_0 (cost=0.00..23239.38 rows=2 width=68)
Filter: (identifier = 'some_identifier_from_subtable_s'::text)
-> Seq Scan on entries_1 (cost=0.00..150187.81 rows=6 width=68)
Filter: (identifie

Solution

Try adding the partition key explicitly (redundantly!). Like:

SELECT *
FROM   entries 
WHERE  identifier = 'some_identifier_from_subtable_s'
AND    lower(left(identifier, 1)) = 's'  -- 1st letter of above identifier
LIMIT  1;


This should allow Postgres to understand it can prune all other partitions from the query.

Can be derived from $1 of course:

AND    lower(left(identifier, 1)) = lower(left($1, 1))


You'll still see a sequential scan on the one partition, unless you create an index like you had in mind:

CREATE INDEX ident_idx on entries(identifier);


That works in Postgres 11 or later because, quoting the manual:


When CREATE INDEX is invoked on a partitioned table, the default behavior is to recurse to all partitions to ensure they all have matching indexes.

In Postgres 10 or older you have to create indexes per partition.

You may have to run ANALYZE on the table after creating the index if you follow up with the query immediately, before autovacuum had time to kick in.

Code Snippets

SELECT *
FROM   entries 
WHERE  identifier = 'some_identifier_from_subtable_s'
AND    lower(left(identifier, 1)) = 's'  -- 1st letter of above identifier
LIMIT  1;
AND    lower(left(identifier, 1)) = lower(left($1, 1))
CREATE INDEX ident_idx on entries(identifier);

Context

StackExchange Database Administrators Q#233406, answer score: 4

Revisions (0)

No revisions yet.