HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Adjust settings to reduce the number of read blocks for large query in postgres

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
numbersettingsadjustthereadblockspostgresqueryreducelarge

Problem

Using postgres 9.6, i don't understand how shared buffers work with indexes.

Setup

  • Postgres 9.6



  • All default settings



  • shared_buffers: 128Mb



  • work_mem: 4Mb



  • block_size: 8192



  • ...



That means that shared_buffer size is 128 1024 1024 / 8192 = 16384 blocks.

Test data

I create a simple table with random data and on index on each column.

DROP TABLE IF EXISTS sandbox;

CREATE TABLE sandbox AS
SELECT generate_series(1, 4000000) AS pk,
       random() AS x;

CREATE INDEX ON sandbox(pk);
CREATE INDEX ON sandbox(x);


I went with 4M rows. Here's the size taken by table & index:

postgres=# SELECT relname AS "relation",
       pg_relation_size(C.oid) / 8192 AS "blocks",
       pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE relname = 'sandbox' ;
 relation | blocks |  size  
----------+--------+--------
 sandbox  |  21622 | 169 MB


So the table size (169 Mb) is greater than the shared buffer (128Mb)

Explain & analyze

I want to retrieve the rows with a condition on x. This will extract ~2% of my data.

postgres=# explain (ANALYZE, buffers, format text)
SELECT *
FROM sandbox
WHERE x   Bitmap Index Scan on sandbox_x_idx  (cost=0.00..24628.43 rows=1333333 width=0) (actual time=20.458..20.458 rows=80218 loops=1)
         Index Cond: (x < '0.02'::double precision)
         Buffers: shared read=222
 Planning time: 0.293 ms
 Execution time: 573.949 ms


One can see that:

  • The Bitmap Index Scan on sandbox_x_idx indicate



  • rows=1333333 for the cost part. So 1/3 of the table size. What is the meaning of that figure?



  • rows=80218 which represents 2% of the table size.



  • Heap blocks is equal to 21113. This is approximately 164Mb so this it stores in the heap all the table right? Why does he have to go through all the table despite the index?



  • Buffers is not used as there are only read (from disk) and no hits (from memory). Even if i rerun the query, i get t

Solution

I agree with @dezso you should either start partitioning, or maybe have a look at the CLUSTER instruction (https://www.postgresql.org/docs/9.4/static/sql-cluster.html). As described, it will physically reorder data based on the index information which seems to be what you're trying to achieve.

Context

StackExchange Database Administrators Q#167333, answer score: 3

Revisions (0)

No revisions yet.