debugsqlMinor
Tiny table causes extreme performance degradation, fixed by forced VACUUM. Why?
Viewed 0 times
whyvacuumtabletinyforceddegradationperformancefixedextremecauses
Problem
I use PostgreSQL 9.6.
I have a query that joins 17 tables, 9 of those having several million rows. The query was running fine but its performance degraded rapidly this week. EXPLAIN's output didn't help (all scans are index scans except for the very small tables) and I had to try and remove tables from the query to isolate one that caused the degradation.
It turns out that an unremarkable table containing 40 rows broke the query: 800 ms without the table vs 30 s with it. I ran VACUUM FULL on the table, which ran in about a second and now the performance is back to normal.
My questions:
In the process of debugging, I made a base backup to another server, so I have two filesystem-level copies of the DB, one of them where I didn't run VACUUM FULL. When I logged on to the unvacuumed copy with pgAdmin, I got the following message:
The estimated rowcount on the table "public.clients" deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.
The unvacuumed table has 40 rows counted and 0 estimated. Here are the rest of its statistics in a screenshot.
I have a query that joins 17 tables, 9 of those having several million rows. The query was running fine but its performance degraded rapidly this week. EXPLAIN's output didn't help (all scans are index scans except for the very small tables) and I had to try and remove tables from the query to isolate one that caused the degradation.
It turns out that an unremarkable table containing 40 rows broke the query: 800 ms without the table vs 30 s with it. I ran VACUUM FULL on the table, which ran in about a second and now the performance is back to normal.
My questions:
- What can explain that a
- How to avoid the same problem in the future?
In the process of debugging, I made a base backup to another server, so I have two filesystem-level copies of the DB, one of them where I didn't run VACUUM FULL. When I logged on to the unvacuumed copy with pgAdmin, I got the following message:
The estimated rowcount on the table "public.clients" deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.
The unvacuumed table has 40 rows counted and 0 estimated. Here are the rest of its statistics in a screenshot.
Solution
The table may be small, but as long as Postgres expects roughly 0 rows, chances are it is going to chose a different query plan than for roughly 40 rows - for which the same query plan is not as efficient.
Since joins multiply result rows rather than just adding to them, the 40 rows in the tiny table can have a massive effect when joined to big tables with several million rows like in your example. The difference can easily explain a factor 30 in execution time.
Or as the manual puts it:
It is important to have reasonably accurate statistics, otherwise poor
choices of plans might degrade database performance.
Default
But for a database holding multiple tables with several millions rows, I would consider tuning per-table settings for selected tables and a manual
The remaining questions
Q1. Why did autovacuum not launch
Q2. Why did
Q2 is simple: While other important statistics are only updated by
Number of rows in the table. This is only an estimate used by the
planner. It is updated by
as
Q1 is more sophisticated.
The manual again:
The daemon schedules
rows inserted or updated; it has no knowledge of whether that will
lead to meaningful statistical changes.
Relevant settings (among others):
Specifies the minimum number of inserted, updated or deleted tuples
needed to trigger an
tuples. This parameter can only be set in the
on the server command line; but the setting can be overridden for
individual tables by changing table storage parameters.
Specifies a fraction of the table size to add to
only be set in the postgresql.conf file or on the server command line;
but the setting can be overridden for individual tables by changing
table storage parameters.
Bold emphasis mine.
Demo
Be sure the test DB is mostly idle to avoid testing artifacts and you are running with default settings:
Most importantly:
Basically, autovacuum checks once every minute whether any table has last_estimate / 100 + 50 rows changed and launches
To understand what happened in your case:
You'll get
Still
To see more details (including the timestamp of the
Related:
Solution
Run
You might also want to audit the settings for some of your big tables for other reasons. Compare:
Also important
You are joining 17 tables, which is well beyond the default setting for
-
Complex view becomes slow when adding ORDER BY in outer query with small LIMIT
-
Can we execute an optimal plan instead of a generic one in the first execution of a PL/pgSQL function?
P.S.:
I think I fou
Since joins multiply result rows rather than just adding to them, the 40 rows in the tiny table can have a massive effect when joined to big tables with several million rows like in your example. The difference can easily explain a factor 30 in execution time.
Or as the manual puts it:
It is important to have reasonably accurate statistics, otherwise poor
choices of plans might degrade database performance.
Default
autovacuum settings are ok for most installations. Consider:- Are regular VACUUM ANALYZE still recommended under 9.1?
But for a database holding multiple tables with several millions rows, I would consider tuning per-table settings for selected tables and a manual
ANALYZE on the whole DB from time to time.The remaining questions
Q1. Why did autovacuum not launch
ANALYZE automatically?Q2. Why did
VACUUM FULL fix the problem?Q2 is simple: While other important statistics are only updated by
ANALYZE, the basic count estimate in pg_class.reltuples is updated more often. The manual:Number of rows in the table. This is only an estimate used by the
planner. It is updated by
VACUUM, ANALYZE, and a few DDL commands suchas
CREATE INDEX.Q1 is more sophisticated.
The manual again:
The daemon schedules
ANALYZE strictly as a function of the number ofrows inserted or updated; it has no knowledge of whether that will
lead to meaningful statistical changes.
Relevant settings (among others):
autovacuum_analyze_threshold (integer)Specifies the minimum number of inserted, updated or deleted tuples
needed to trigger an
ANALYZE in any one table. The default is 50tuples. This parameter can only be set in the
postgresql.conf file oron the server command line; but the setting can be overridden for
individual tables by changing table storage parameters.
autovacuum_analyze_scale_factor (floating point)Specifies a fraction of the table size to add to
autovacuum_analyze_threshold when deciding whether to trigger anANALYZE. The default is 0.1 (10% of table size). This parameter canonly be set in the postgresql.conf file or on the server command line;
but the setting can be overridden for individual tables by changing
table storage parameters.
Bold emphasis mine.
Demo
Be sure the test DB is mostly idle to avoid testing artifacts and you are running with default settings:
SELECT * FROM pg_settings WHERE name ~ '^autovacuum|track_counts';Most importantly:
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50
autovacuum_naptime = 60
track_counts = onBasically, autovacuum checks once every minute whether any table has last_estimate / 100 + 50 rows changed and launches
ANALYZE for those.To understand what happened in your case:
CREATE TABLE t50 (id int primary key, foo text);
INSERT INTO t50 SELECT g, 'txt' || g FROM generate_series(1,50) g;
SELECT reltuples FROM pg_class WHERE oid = 't50'::regclass;pg_class.reltuples is the estimated row count of the table. More here:- Fast way to discover the row count of a table in PostgreSQL
You'll get
0. Wait for 2 minutes to make sure we cross the 1 minute delay. Check again. Still 0. Now insert one more row and check again:INSERT INTO t50 VALUES (51, 'txt51 triggers analyze');
SELECT reltuples FROM pg_class WHERE oid = 't50'::regclass;Still
0. Wait another 2 minutes, check again. Tada! We see the updated count of 51. Autovacuum did not kick in until 51 rows were inserted (or updated or deleted).To see more details (including the timestamp of the
last_autoanalyze):SELECT * FROM pg_stat_all_tables WHERE relid = 't50'::regclass;Related:
- When is Postgres autovacuum executed
Solution
Run
ANALYZE on public.clients manually once (or on the whole DB, it's cheap) and use more aggressive per-table autovacuum settings for this important table. Like:ALTER TABLE public.clients SET (autovacuum_analyze_scale_factor = 0.01
, autovacuum_analyze_threshold = 10);You might also want to audit the settings for some of your big tables for other reasons. Compare:
- Debug query on big table that is sometimes slow
Also important
You are joining 17 tables, which is well beyond the default setting for
join_collapse_limit which is 8. You may want to use explicit join syntax (maybe you already do) and rewrite your query to place the most selective tables (or the ones with the most selective predicates) first in the SELECT list. Related:-
Complex view becomes slow when adding ORDER BY in outer query with small LIMIT
-
Can we execute an optimal plan instead of a generic one in the first execution of a PL/pgSQL function?
P.S.:
I think I fou
Code Snippets
SELECT * FROM pg_settings WHERE name ~ '^autovacuum|track_counts';autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50
autovacuum_naptime = 60
track_counts = onCREATE TABLE t50 (id int primary key, foo text);
INSERT INTO t50 SELECT g, 'txt' || g FROM generate_series(1,50) g;
SELECT reltuples FROM pg_class WHERE oid = 't50'::regclass;INSERT INTO t50 VALUES (51, 'txt51 triggers analyze');
SELECT reltuples FROM pg_class WHERE oid = 't50'::regclass;SELECT * FROM pg_stat_all_tables WHERE relid = 't50'::regclass;Context
StackExchange Database Administrators Q#181960, answer score: 6
Revisions (0)
No revisions yet.