HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Does SELECT remove dead rows like VACUUM does?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
rowsvacuumdeadlikeremovedoesselect

Problem

I was fiddling with VACUUM and noticed some unexpected behavior where SELECTing rows from a table seems to reduce the work VACUUM has to do afterwards.

Test Data

Note: autovacuum is disabled

CREATE TABLE numbers (num bigint);
ALTER TABLE numbers SET (
  autovacuum_enabled = 'f',
  toast.autovacuum_enabled = 'f'
);

INSERT INTO numbers SELECT generate_series(1, 5000);


Trial 1

Now we run an update on all the rows,

UPDATE numbers SET num = 0;


And when we run VACUUM (VERBOSE) numbers; we get,

INFO:  vacuuming "public.numbers"
INFO:  "numbers": removed 5000 row versions in 23 pages
INFO:  "numbers": found 5000 removable, 5000 nonremovable row versions in 45 out of 45 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 6585
There were 0 unused item pointers.


Trial 2

Now we issue another UPDATE, but this time we add a SELECT afterward,

UPDATE numbers SET num = 1;
SELECT * FROM numbers;


And when we run VACUUM (VERBOSE) numbers; we get,

INFO:  vacuuming "public.numbers"
INFO:  "numbers": removed 56 row versions in 22 pages
INFO:  "numbers": found 56 removable, 5000 nonremovable row versions in 45 out of 45 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 6586
There were 56 unused item pointers.


What exactly is happening here? Why does the second version I run, after the SELECT remove dead tuples from the pages it visits, quite like VACUUM does?

I am running Postgres 11.3 on macOS 10.14.5.

Solution

From this post on /r/PostgreSQL to an answer by Laurenz Albe it seems that Heap Only Tuples (HOT) updates may be responsible. From the description of HOT updates in src/backend/access/heap/README.HOT


Effectively, space reclamation happens during tuple retrieval when the
page is nearly full (<10% free) and a buffer cleanup lock can be
acquired. This means that UPDATE, DELETE, and SELECT can trigger space
reclamation, but often not during INSERT ... VALUES because it does
not retrieve a row.

The quote is not in the original answer, but the rest is a quote,


To support or refute this theory, run the following query:

SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_user_tables
WHERE schemaname = 'public' AND relname = 'TABLE_NAME';




If n_tup_hot_upd is greater than zero, we have got a case.

Code Snippets

SELECT n_tup_upd, n_tup_hot_upd
FROM pg_stat_user_tables
WHERE schemaname = 'public' AND relname = 'TABLE_NAME';

Context

StackExchange Database Administrators Q#239890, answer score: 5

Revisions (0)

No revisions yet.