patternsqlMinor
Adjust settings to reduce the number of read blocks for large query in postgres
Viewed 0 times
numbersettingsadjustthereadblockspostgresqueryreducelarge
Problem
Using postgres 9.6, i don't understand how shared buffers work with indexes.
Setup
That means that shared_buffer size is 128 1024 1024 / 8192 = 16384 blocks.
Test data
I create a simple table with random data and on index on each column.
I went with 4M rows. Here's the size taken by table & index:
So the table size (169 Mb) is greater than the shared buffer (128Mb)
Explain & analyze
I want to retrieve the rows with a condition on x. This will extract ~2% of my data.
One can see that:
Setup
- Postgres 9.6
- All default settings
- shared_buffers: 128Mb
- work_mem: 4Mb
- block_size: 8192
- ...
That means that shared_buffer size is 128 1024 1024 / 8192 = 16384 blocks.
Test data
I create a simple table with random data and on index on each column.
DROP TABLE IF EXISTS sandbox;
CREATE TABLE sandbox AS
SELECT generate_series(1, 4000000) AS pk,
random() AS x;
CREATE INDEX ON sandbox(pk);
CREATE INDEX ON sandbox(x);I went with 4M rows. Here's the size taken by table & index:
postgres=# SELECT relname AS "relation",
pg_relation_size(C.oid) / 8192 AS "blocks",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE relname = 'sandbox' ;
relation | blocks | size
----------+--------+--------
sandbox | 21622 | 169 MBSo the table size (169 Mb) is greater than the shared buffer (128Mb)
Explain & analyze
I want to retrieve the rows with a condition on x. This will extract ~2% of my data.
postgres=# explain (ANALYZE, buffers, format text)
SELECT *
FROM sandbox
WHERE x Bitmap Index Scan on sandbox_x_idx (cost=0.00..24628.43 rows=1333333 width=0) (actual time=20.458..20.458 rows=80218 loops=1)
Index Cond: (x < '0.02'::double precision)
Buffers: shared read=222
Planning time: 0.293 ms
Execution time: 573.949 msOne can see that:
- The
Bitmap Index Scan on sandbox_x_idxindicate
- rows=1333333 for the cost part. So 1/3 of the table size. What is the meaning of that figure?
- rows=80218 which represents 2% of the table size.
- Heap blocks is equal to 21113. This is approximately 164Mb so this it stores in the heap all the table right? Why does he have to go through all the table despite the index?
- Buffers is not used as there are only
read(from disk) and nohits(from memory). Even if i rerun the query, i get t
Solution
I agree with @dezso you should either start partitioning, or maybe have a look at the
CLUSTER instruction (https://www.postgresql.org/docs/9.4/static/sql-cluster.html). As described, it will physically reorder data based on the index information which seems to be what you're trying to achieve.Context
StackExchange Database Administrators Q#167333, answer score: 3
Revisions (0)
No revisions yet.