HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

How un-clustered is a CLUSTER USING table in PostgreSQL

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlclusteredusinghowtablecluster

Problem

I have some tables which benefit greatly from CLUSTER ON/CLUSTER USING in Postgres SQL:

# CLUSTER table USING index_name;
# ANALYZE VERBOSE table;
# CLUSTER VERBOSE;


A maintenance task periodically runs CLUSTER VERBOSE to keep things fresh. But is there a test I can run to see how fragmented the table is, prior to running CLUSTER VERBOSE? Maybe something like:

# CLUSTER ANALYZE
table 40000 records. 4000 observed clusters, 5000 potential clusters (20% fragmentation)


Note that I use CLUSTER so data accessed at the same time is "defragmented" into a small number of disk blocks. For example I have thousands of attributes that go with each page. a CLUSTER page_attribute USING page_id; puts all the attributes next to each other, greatly reducing disk load.

Solution

I am not entirely sure how much this information helps, but the system table pg_stats contains a correlation column:

select schemaname,tablename,attname,correlation 
from pg_stats 
where schemaname='public' 
order by correlation;


From the manual


Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is null if the column data type does not have a < operator.)

This value is on a per-column basis, so my guess is, that if the correlation for the first index column is near zero, the table is "un-clustered", if near -1 or 1 it is closer to the order of the index and thus "more clustered". This if course can only work for B-Tree indexes that are defined as ASC (I guess).

Code Snippets

select schemaname,tablename,attname,correlation 
from pg_stats 
where schemaname='public' 
order by correlation;

Context

StackExchange Database Administrators Q#42368, answer score: 6

Revisions (0)

No revisions yet.