patternsqlMinor
PostgreSQL auto-vacuum: "skipped frozen" pages causing massive bloating
Viewed 0 times
postgresqlskippedvacuummassiveautobloatingfrozencausingpages
Problem
I am having a problem with a table I am using for a forum where people can add their products.
Everytime a user loads a page, the "online timestamp" of the user gets updated (a PHP line prevents it from updating more than 1 time per hour), which triggers the "online timestamp" of his respective products to be updated as well (trigger). Users can add and edit products anytime. Here's the table definition:
There are around 100,000 users, with around 400,000 products. The table gets updated very often (around 2 times per second) due to the "online timestamp" needing to be updated.
Here's the problem: Every week, the database gets so bloated that I have to interrupt the website, dump the database to a .sql file, delete the database, and re-import the dump. The fresh database has a size of around 2 GB, and the bloated database can reach 80 - 100 GB before I have to dump + re-import. The WAL directory (pg_xlog) never goes over 1.1 GB so I don't think WAL has any issues.
Here's a typical query:
```
> EXPLAIN ANALYZE SELECT COUNT(*) FROM products WHERE categoryid = 4;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on products (cost=6405.66..228100.41 rows=63642 width=1193) (actual time=6.100..26.484 rows=
Everytime a user loads a page, the "online timestamp" of the user gets updated (a PHP line prevents it from updating more than 1 time per hour), which triggers the "online timestamp" of his respective products to be updated as well (trigger). Users can add and edit products anytime. Here's the table definition:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
userid INT NOT NULL REFERENCES users(id),
categoryid INT NOT NULL REFERENCES categories (id),
regionid INT NOT NULL REFERENCES regions(id),
title VARCHAR(255) NOT NULL,
description TEXT NOT NULL,
price DECIMAL(7,2) NOT NULL,
create_timestamp BIGINT NOT NULL,
modify_timestamp BIGINT NOT NULL,
online_timestamp BIGINT NOT NULL
);
CREATE INDEX ON products (categoryid);
CREATE INDEX ON products (userid);
CREATE INDEX ON products (regionid);There are around 100,000 users, with around 400,000 products. The table gets updated very often (around 2 times per second) due to the "online timestamp" needing to be updated.
Here's the problem: Every week, the database gets so bloated that I have to interrupt the website, dump the database to a .sql file, delete the database, and re-import the dump. The fresh database has a size of around 2 GB, and the bloated database can reach 80 - 100 GB before I have to dump + re-import. The WAL directory (pg_xlog) never goes over 1.1 GB so I don't think WAL has any issues.
Here's a typical query:
```
> EXPLAIN ANALYZE SELECT COUNT(*) FROM products WHERE categoryid = 4;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on products (cost=6405.66..228100.41 rows=63642 width=1193) (actual time=6.100..26.484 rows=
Solution
Those "frozen" messages refer to pages that are known from the visibility map's freeze bits to contain only frozen tuples. A frozen tuple is one that all current and future transactions can "see", i.e. it committed before the start of any still-running xact. These shouldn't be removed, and are not the problem; they're the data that doesn't change.
Vacuum is doing useful work, see:
It isn't truncating the tables (removing pages), but it doesn't need to... and it shouldn't. Because your load pattern means you'll always need at least twice the real data space to keep track of dead rows from all your
However, the degree of bloat you encounter is surprising. It's clear that something isn't working right. Maybe you found an issue in the new freeze map code, but I'd be looking for other explanations first.
Do you have lots of long-running transactions? If your app fails to close transactions you'll see lots of messages about "non-removable" rows.
Can you supply an
If you enable autovacuum logging, does anything interesting show up?
Is it tables or indexes that get bloated? Just a couple of tables or all of them equally? Just the busy ones? Some indexes worse than others? Etc. Look into this.
Separately, your design is pretty bad TBH. You should really move that activity data to a pair of side-tables that joins on the user and product tables respectively. Update the activity row there, so you're not constantly rewriting your main tables full of data that doesn't change much. It'll help with disk I/O, cache hit rates, and more.
Vacuum is doing useful work, see:
tuples: 170090 removed ...It isn't truncating the tables (removing pages), but it doesn't need to... and it shouldn't. Because your load pattern means you'll always need at least twice the real data space to keep track of dead rows from all your
UPDATE churn.However, the degree of bloat you encounter is surprising. It's clear that something isn't working right. Maybe you found an issue in the new freeze map code, but I'd be looking for other explanations first.
Do you have lots of long-running transactions? If your app fails to close transactions you'll see lots of messages about "non-removable" rows.
Can you supply an
EXPLAIN (BUFFERS, ANALYZE, VERBOSE) from a problem query when the system is getting bloated? As well as a VACUUM (FULL, VERBOSE) on one or more problem tables? (This will lock the table while it's rewriting it).If you enable autovacuum logging, does anything interesting show up?
Is it tables or indexes that get bloated? Just a couple of tables or all of them equally? Just the busy ones? Some indexes worse than others? Etc. Look into this.
Separately, your design is pretty bad TBH. You should really move that activity data to a pair of side-tables that joins on the user and product tables respectively. Update the activity row there, so you're not constantly rewriting your main tables full of data that doesn't change much. It'll help with disk I/O, cache hit rates, and more.
Code Snippets
tuples: 170090 removed ...Context
StackExchange Database Administrators Q#165564, answer score: 6
Revisions (0)
No revisions yet.