HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Why wouldn't VACUUM ANALYZE clear all dead tuples?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyvacuumanalyzealldeadwouldncleartuples

Problem

We run a "manual" VACUUM ANALYZE VERBOSE on some of our larger tables after we do major DELETE/INSERT changes to them. This seems to work without issue although sometimes a table's VACUUM job will run for hours (see this post for similar issues and reasoning).

On doing more research I found that we have large tables with a large number of dead tuples even after running VACUUM. For example, here are some of the stats produced from the query in this response.

-[ RECORD 50 ]--+---------------------------
relname         | example_a
last_vacuum     | 2014-09-23 01:43
last_autovacuum | 2014-08-01 01:19
n_tup           |    199,169,568
dead_tup        |    111,048,906
av_threshold    |     39,833,964
expect_av       | *
-[ RECORD 51 ]--+---------------------------
relname         | example_b
last_vacuum     | 2014-09-23 01:48
last_autovacuum | 2014-08-30 12:40
n_tup           |    216,596,624
dead_tup        |    117,224,220
av_threshold    |     43,319,375
expect_av       | *
-[ RECORD 52 ]--+---------------------------
relname         | example_c
last_vacuum     | 2014-09-23 01:55
last_autovacuum | 2014-09-23 18:25
n_tup           |    309,831,136
dead_tup        |    125,047,233
av_threshold    |     61,966,277
expect_av       | *


The last field states that these (and most tables) would meet the threshold for autovacuum. However, having just run VACUUM ANALYZE VEBOSE on each of those tables, shouldn't the dead tuple count be 0 (or close to 0, not 125M of 300M)?

The documentation states:


VACUUM reclaims storage occupied by dead tuples.

Does this mean our VACUUM is not working?

UPDATE

Per request in repsonse here are some logs from the VERBOSE jobs:

```
INFO: vacuuming "public.example_1"
INFO: scanned index "idx_example_1_on_gp_id_and_dd_id" to remove 378386 row versions
DETAIL: CPU 1.83s/3.42u sec elapsed 23.01 sec.
INFO: scanned index "index_example_1_on_q_id" to remove 378386 row versions
DETAIL: CPU 2.10s/3.91u sec elapsed 18.9

Solution

VACUUM can only remove dead tuples which are long-dead, that is, dead to all possible uses. If you have long-lived transactions, they may prevent the recently-dead tuples from being removed.

This is an example of a situation where a long-lived transaction prevented removal:

INFO:  "pgbench_accounts": found 0 removable, 2999042 nonremovable row versions in 49181 out of 163935 pages
DETAIL:  2999000 dead row versions cannot be removed yet.


It is not really long-lived transactions, but long lived snapshots. Certainly a long running select or insert statement will do that. For isolation levels higher than read-committed, the whole transaction will retain the snapshot until it is down, so if some opens a repeatable read transaction and then goes on vacation without committing it, that would be a problem. Hung-up prepared transactions will as well (if you don't know what a prepared transaction is, then you probably aren't using them).

The examples you show don't indicate a problem, but you also say the problem had resolved by then. If this is a recurring problem, you should probably start logging the output of your VACUUM VERBOSE statements, so that you can find the information that covers the period during which the problem exists.

The multiple passes over the index are because of your maintenance_work_mem settings. It can only remove one tuple for every 6 bytes of memory on each pass over the index, and needs to make multiple passes if you need to remove more than that. So increasing maintenance_work_mem will help.

Code Snippets

INFO:  "pgbench_accounts": found 0 removable, 2999042 nonremovable row versions in 49181 out of 163935 pages
DETAIL:  2999000 dead row versions cannot be removed yet.

Context

StackExchange Database Administrators Q#77560, answer score: 14

Revisions (0)

No revisions yet.