HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

how do you prevent dead rows from hanging around in postgresql?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
aroundrowspreventpostgresqlyoudeadhanginghowfrom

Problem

I have production and staging RDS instances on amazon, and staging's data is a direct copy of production so both instances have duplicate data.

Doing a EXPLAIN ANALYZE SELECT * from my_table WHERE my_col=true;resulted in this:

Seq Scan on my_table (cost=0.00..142,775.73 rows=1 width=1,436) (actual time=18,170.294..18,170.294 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 360275


Where as in production, it was:

Seq Scan on my_table (cost=0.00..62,145.88 rows=1 width=1,450) (actual time=282.487..282.487 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 366442


When running select pg_total_relation_size('my_table'::regclass);

I found that staging's size was almost double of production. From what I've read, I see that postgresql's MVCC is responsible for this as it keeps multiple versions of rows around. I manually ran VACUUM FULL and afterwards saw that staging's size had been cut down by 2/3. Running that same explain analyze now shows:

Seq Scan on my_table  (cost=0.00..56094.75 rows=1 width=1436) (actual time=1987.340..1987.340 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 360287 Total runtime: 1987.547 ms


Which is great-- but what I don't understand is, the documentation suggests that auto vacuum should kick in and be cleaning up these dead rows, yet clearly that was not happening.

I've read several places talk about "don't let your indexes get bloat", and I don't quite understand 1) how an index gets bloat, and 2) how to prevent an index from getting bloat.

How can I prevent this from happening again in the future?

UPDATE

Here are my autovacuum settings:

```
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline
-------------------------------------+-----------+------+------------+-

Solution

Auto-vacuuming should eventually get around to cleaning it up (assuming you haven't disabled it), but it may not be getting around to it soon enough for your purposes. There are many settings which can control auto-vacuuming and how/when it's done, which may be of interest: here and here.

This can be especially true of tables with high churn. That is, tables with lots of insertions and deletions. Long-running and idle transactions can also be a factor here, as MVCC will kick in and prevent the dead tuples from being reclaimed. The fact that manually doing a VACUUM frees the dead tuples suggests that this isn't the case for you, though, and it may be the former issue instead.

In general, it's not recommended to do a VACUUM FULL, as that takes out an exclusive table lock, particularly when most rows in a table have been updated/deleted.

From the doc:


The FULL option is not recommended for routine use, but might be
useful in special cases. An example is when you have deleted or
updated most of the rows in a table and would like the table to
physically shrink to occupy less disk space and allow faster table
scans. VACUUM FULL will usually shrink the table more than a plain
VACUUM would.

Is your usage pattern such that this would be the case? You did mention a "direct copy" was involved, but it's not clear exactly how that's being done.

I have had cases with high-churn tables where the default auto-vacuum rate just wasn't enough, and even relatively small amounts of dead tuples would greatly affect the query speed (this was in a large table which was queried very often and where the query needed to be extremely fast, and as such, was highly affected by dead tuples).

To help with this, I setup a manual VACUUM ANALYZE of the table (so it will both free up the tuples and aid the query planner by updating the stats) in a cron job that was set to run every 5 minutes. Since there weren't that many dead tuples, the VACUUM was pretty fast, and the constant vacuuming keeps the dead tuple count low enough so as to keep queries of that table fast.

Edit in response to comment from OP:

In the VACUUM doc, it says that:


VACUUM reclaims storage occupied by dead tuples

The doc then says that (emphasis mine):


VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected
table. This is a handy combination form for routine maintenance
scripts. See ANALYZE for more details about its processing.

So it definitely reclaims dead tuples.

Context

StackExchange Database Administrators Q#106928, answer score: 5

Revisions (0)

No revisions yet.