patternsqlMinor
Optimizing select count result in Postgresql
Viewed 0 times
postgresqlresultselectoptimizingcount
Problem
I'm trying to optimize a table containing 80million+ rows.
It takes 20+ minutes to get the row count results.
I've tried clustering, vacuum full and reindex but the performance didn't improve.
What do i need to configure or adjust in order to improve data query and retrieval?
I'm using Postgresql 12 under Windows 2019.
Update info:
-
(I don't know how to get the row size in kb/mb)
Machine Info:
Table info :
```
Table "public.doc_details"
Column | Type | Collation | Nullable | Default
-------------------------+--------------------------------+-----------+----------+----------------------------------------------
id | integer | | not null | nextval('doc_details_id_seq'::regclass)
trans_ref_number | character varying(30) | | not null |
outbound_time | timestamp(0) without time zone | | |
lm_tracking | character varying(30) | | not null |
cargo_dealer_tracking | character varying(30) | | not null |
order_sn
It takes 20+ minutes to get the row count results.
I've tried clustering, vacuum full and reindex but the performance didn't improve.
What do i need to configure or adjust in order to improve data query and retrieval?
I'm using Postgresql 12 under Windows 2019.
Update info:
- Total rows now around 92million+
- Table column count = 44
-
Explain query result using 'select count(*) from doc_details':
Finalize Aggregate (cost=5554120.84..5554120.85 rows=1 width=8) (actual time=1249204.001..1249210.027 rows=1 loops=1)
-> Gather (cost=5554120.63..5554120.83 rows=2 width=8) (actual time=1249203.642..1249210.020 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=5553120.63..5553120.63 rows=1 width=8) (actual time=1249153.615..1249153.616 rows=1 loops=3)
-> Parallel Seq Scan on doc_details (cost=0.00..5456055.30 rows=38826130 width=0) (actual time=3.793..1245165.604 rows=31018949 loops=3)
Planning Time: 1.290 ms
Execution Time: 1249210.115 ms(I don't know how to get the row size in kb/mb)
Machine Info:
- Windows 2019 Datacenter
- 32GB Memory
- Postgresql 12
Table info :
```
Table "public.doc_details"
Column | Type | Collation | Nullable | Default
-------------------------+--------------------------------+-----------+----------+----------------------------------------------
id | integer | | not null | nextval('doc_details_id_seq'::regclass)
trans_ref_number | character varying(30) | | not null |
outbound_time | timestamp(0) without time zone | | |
lm_tracking | character varying(30) | | not null |
cargo_dealer_tracking | character varying(30) | | not null |
order_sn
Solution
From the PostgreSQL Wiki:
The reason is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table. PostgreSQL must walk through all rows to determine visibility. This normally results in a sequential scan reading information about every row in the table.
Reference: Slow Counting (PostgreSQL Wiki)
Because of that, there is no faster way (for PostgreSQL) to read the 94 million + rows. PostgreSQL is going to painstakingly read row for row as can be seen in your explain plan.
Possible Solutions
Increasing the
Sets the amount of memory the database server uses for shared memory buffers. The default is typically 128 megabytes (128MB), but might be less if your kernel settings will not support it (as determined during initdb). This setting must be at least 128 kilobytes. However, settings significantly higher than the minimum are usually needed for good performance. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8kB. (Non-default values of BLCKSZ change the minimum value.) This parameter can only be set at server start.
If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where even larger settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount. Larger settings for shared_buffers usually require a corresponding increase in max_wal_size, in order to spread out the process of writing large quantities of new or changed data over a longer period of time.
On systems with less than 1GB of RAM, a smaller percentage of RAM is appropriate, so as to leave adequate space for the operating system.
Reference: 20.4. Resource Consumption (PostgreSQL Documentation)
The reason is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table. PostgreSQL must walk through all rows to determine visibility. This normally results in a sequential scan reading information about every row in the table.
Reference: Slow Counting (PostgreSQL Wiki)
Because of that, there is no faster way (for PostgreSQL) to read the 94 million + rows. PostgreSQL is going to painstakingly read row for row as can be seen in your explain plan.
Possible Solutions
Increasing the
shared_buffers setting in the postgresql.conf file might help alleviate the performance issue slightly, by allowing PostgreSQL to store more data in memory.Sets the amount of memory the database server uses for shared memory buffers. The default is typically 128 megabytes (128MB), but might be less if your kernel settings will not support it (as determined during initdb). This setting must be at least 128 kilobytes. However, settings significantly higher than the minimum are usually needed for good performance. If this value is specified without units, it is taken as blocks, that is BLCKSZ bytes, typically 8kB. (Non-default values of BLCKSZ change the minimum value.) This parameter can only be set at server start.
If you have a dedicated database server with 1GB or more of RAM, a reasonable starting value for shared_buffers is 25% of the memory in your system. There are some workloads where even larger settings for shared_buffers are effective, but because PostgreSQL also relies on the operating system cache, it is unlikely that an allocation of more than 40% of RAM to shared_buffers will work better than a smaller amount. Larger settings for shared_buffers usually require a corresponding increase in max_wal_size, in order to spread out the process of writing large quantities of new or changed data over a longer period of time.
On systems with less than 1GB of RAM, a smaller percentage of RAM is appropriate, so as to leave adequate space for the operating system.
Reference: 20.4. Resource Consumption (PostgreSQL Documentation)
Context
StackExchange Database Administrators Q#317626, answer score: 6
Revisions (0)
No revisions yet.