HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Testing efficiency of a BRIN index in Postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
brinpostgrestestingindexefficiency

Problem

I have found many use cases in our OLAP queries for BRIN indexes, as they are much much smaller and have comparable speed in the same order of magnitude as BTREE indexes.

Typically I use BRIN indexes for monotonically (or close to that) increasing timestamp columns. In our OLTP databases they are closely placed on the disk since naturally they get sequentially written as time goes.

In our warehousing Postgres instance some parts of the tables are loaded in big batches not necessarily ordered by time.

Is there some analytical query which would tell me if the data is placed too randomly to use BRINs even if seemingly the column value distribution should follow all the prerequisites for creating a BRIN index?

Solution

BRIN indexes only make sense for big tables (thousands of data pages or much more). (You already know that, just to guide the general public.) But how to tell whether rows are physically sorted enough?

The manual offers a hint:

BRIN indexes (a shorthand for Block Range INdexes) store summaries
about the values stored in consecutive physical block ranges of a
table. Thus, they are most effective for columns whose values are
well-correlated with the physical order of the table rows.

Bold emphasis mine.

ANALYZE gathers a corresponding statistic stored in pg_statistic. Best accessed via the column correlation in the system view pg_stats by humans:

correlation float4

Statistical correlation between physical row ordering and logical
ordering of the column values. This ranges from -1 to +1. When the
value is near -1 or +1, an index scan on the column will be estimated
to be cheaper than when it is near zero, due to reduction of random
access to the disk. (This column is null if the column data type does
not have a

You might even increase the statistics target temporarily, run
ANALYZE, and reset the value, to get a one-time correlation with increased validity.

So, once the table is analyzed (manually, or by
autovacuum) inspect:

SELECT schemaname, tablename, attname, n_distinct, correlation
FROM   pg_stats
WHERE  schemaname = 'public'
AND    tablename  = 'tbl'
AND    attname    = 'my_index_col';


-
A
correlation close to 0 means a BRIN index will be useless.

-
A
correlation close to 1 (or -1) means a BRIN index will be excellent.

  • With a low count in n_distinct (relative to the total count) or a negative ratio close to 0, you might increase the setting for pages_per_range (default 128) accordingly.



  • With a high count of n_distinct (relative to the total count) or a negative ratio close to -1, you might decrease the setting for pages_per_range accordingly.



-
Everything in between is a grey area. Many factors at play. Also depends a lot on the avg row size and typical queries. In my experience, values close to 0 or to 1 / -1 are common, which makes the decision easy.

That said, if the table is not sorted enough, you might make it so with
CLUSTER or one of the less blocking community tools pg_squeeze or pg_repack. Rewriting the whole table in sort order is expensive for big tables, and deteriorates over time if there are UPDATE (or DELETE + INSERT`) operations. But can pay for certain use cases. See:

  • Configuring PostgreSQL for read performance

Code Snippets

SELECT schemaname, tablename, attname, n_distinct, correlation
FROM   pg_stats
WHERE  schemaname = 'public'
AND    tablename  = 'tbl'
AND    attname    = 'my_index_col';

Context

StackExchange Database Administrators Q#214646, answer score: 5

Revisions (0)

No revisions yet.