principlesqlMinor
ANALYZE strategy for big tables in PostgreSQL
Viewed 0 times
postgresqltablesanalyzebigforstrategy
Problem
In our PostgreSQL 9.4.4 database we have a table that receives around 600k new records each day. Each day, nightly, we are performing some ETL exports from the table. If it has not been analyzed before the export, it is really slow. If we run
The second option requires us to specify the auto vacuuming/analyzing settings per table, since the defaults are not working properly for big tables. And even if we do set it properly, there are some edge cases, when it still can cause troubles. For example:
is the default now - so if our table is ~25 million records, it will ANALYZE after 2.5 millions, which is not frequent enough for us (we have ~600k new transactions every day). But if we set it to 0.02 (~500k records) it may happen, that for a day with many transactions (900k for example) we will have run ANALYZE after 500k but 400k will remain unanalysed, which would impact the query performance.
The table structure:
```
Table "public.bet_transactions"
Column | Type | Modifiers
----------------------------+-----------------------------+---------------------------------------------------------------
id | integer | not null default nextval('bet_transactions_id_seq'::regclass)
account_id | integer | not null
amount_cents | integer | not null default 0
money_amount_cents | integer | not null default 0
bonus_amount_cents | integer | not null default 0
wager_amount_cents | integer
ANALYZE, much faster, because the planner uses multicolumn index that has the field we query for. What would be the preferred way to solve the slow query issue? I see three options:ANALYZEright before the export,
- use automatic vacuum/analyze feature,
- add query specific index.
The second option requires us to specify the auto vacuuming/analyzing settings per table, since the defaults are not working properly for big tables. And even if we do set it properly, there are some edge cases, when it still can cause troubles. For example:
autovacuum_analyze_scale_factor = 0.1is the default now - so if our table is ~25 million records, it will ANALYZE after 2.5 millions, which is not frequent enough for us (we have ~600k new transactions every day). But if we set it to 0.02 (~500k records) it may happen, that for a day with many transactions (900k for example) we will have run ANALYZE after 500k but 400k will remain unanalysed, which would impact the query performance.
The table structure:
```
Table "public.bet_transactions"
Column | Type | Modifiers
----------------------------+-----------------------------+---------------------------------------------------------------
id | integer | not null default nextval('bet_transactions_id_seq'::regclass)
account_id | integer | not null
amount_cents | integer | not null default 0
money_amount_cents | integer | not null default 0
bonus_amount_cents | integer | not null default 0
wager_amount_cents | integer
Solution
As already stated in the above comments there are some details hidden. I understand from your question that the query plan changes after an
This may indicate that the statistical data used by the query planner are not reflecting the real distribution of the data.
Much more important seems to me to adjust the size of the sample taken by
ANALYZE.This may indicate that the statistical data used by the query planner are not reflecting the real distribution of the data.
ANALYZE in any case takes only a sample - it does not investigate the whole table. This means tweaking the autovacuum_analyze_threshold makes only sense to me if the new rows would change the distribution in the whole table dramatically. This depends on your use case.Much more important seems to me to adjust the size of the sample taken by
ANALYE. You can influence the sample size for your table by setting the statistics target (unfortunately it is not mentioned in the question). In this blog post it is shown, how the statistics target influences the validity of the sample taken by ANALYZE.Context
StackExchange Database Administrators Q#112722, answer score: 6
Revisions (0)
No revisions yet.