patternsqlMinor
How un-clustered is a CLUSTER USING table in PostgreSQL
Viewed 0 times
postgresqlclusteredusinghowtablecluster
Problem
I have some tables which benefit greatly from CLUSTER ON/CLUSTER USING in Postgres SQL:
A maintenance task periodically runs CLUSTER VERBOSE to keep things fresh. But is there a test I can run to see how fragmented the table is, prior to running CLUSTER VERBOSE? Maybe something like:
Note that I use CLUSTER so data accessed at the same time is "defragmented" into a small number of disk blocks. For example I have thousands of attributes that go with each page. a
# CLUSTER table USING index_name;
# ANALYZE VERBOSE table;
# CLUSTER VERBOSE;A maintenance task periodically runs CLUSTER VERBOSE to keep things fresh. But is there a test I can run to see how fragmented the table is, prior to running CLUSTER VERBOSE? Maybe something like:
# CLUSTER ANALYZE
table 40000 records. 4000 observed clusters, 5000 potential clusters (20% fragmentation)Note that I use CLUSTER so data accessed at the same time is "defragmented" into a small number of disk blocks. For example I have thousands of attributes that go with each page. a
CLUSTER page_attribute USING page_id; puts all the attributes next to each other, greatly reducing disk load.Solution
I am not entirely sure how much this information helps, but the system table
From the manual
Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is null if the column data type does not have a < operator.)
This value is on a per-column basis, so my guess is, that if the correlation for the first index column is near zero, the table is "un-clustered", if near -1 or 1 it is closer to the order of the index and thus "more clustered". This if course can only work for B-Tree indexes that are defined as ASC (I guess).
pg_stats contains a correlation column:select schemaname,tablename,attname,correlation
from pg_stats
where schemaname='public'
order by correlation;From the manual
Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is null if the column data type does not have a < operator.)
This value is on a per-column basis, so my guess is, that if the correlation for the first index column is near zero, the table is "un-clustered", if near -1 or 1 it is closer to the order of the index and thus "more clustered". This if course can only work for B-Tree indexes that are defined as ASC (I guess).
Code Snippets
select schemaname,tablename,attname,correlation
from pg_stats
where schemaname='public'
order by correlation;Context
StackExchange Database Administrators Q#42368, answer score: 6
Revisions (0)
No revisions yet.