snippetsqlMinor
how do you prevent dead rows from hanging around in postgresql?
Viewed 0 times
aroundrowspreventpostgresqlyoudeadhanginghowfrom
Problem
I have production and staging RDS instances on amazon, and staging's data is a direct copy of production so both instances have duplicate data.
Doing a
Where as in production, it was:
When running
I found that staging's size was almost double of production. From what I've read, I see that postgresql's MVCC is responsible for this as it keeps multiple versions of rows around. I manually ran
Which is great-- but what I don't understand is, the documentation suggests that auto vacuum should kick in and be cleaning up these dead rows, yet clearly that was not happening.
I've read several places talk about "don't let your indexes get bloat", and I don't quite understand 1) how an index gets bloat, and 2) how to prevent an index from getting bloat.
How can I prevent this from happening again in the future?
UPDATE
Here are my autovacuum settings:
```
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline
-------------------------------------+-----------+------+------------+-
Doing a
EXPLAIN ANALYZE SELECT * from my_table WHERE my_col=true;resulted in this:Seq Scan on my_table (cost=0.00..142,775.73 rows=1 width=1,436) (actual time=18,170.294..18,170.294 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 360275Where as in production, it was:
Seq Scan on my_table (cost=0.00..62,145.88 rows=1 width=1,450) (actual time=282.487..282.487 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 366442When running
select pg_total_relation_size('my_table'::regclass);I found that staging's size was almost double of production. From what I've read, I see that postgresql's MVCC is responsible for this as it keeps multiple versions of rows around. I manually ran
VACUUM FULL and afterwards saw that staging's size had been cut down by 2/3. Running that same explain analyze now shows:Seq Scan on my_table (cost=0.00..56094.75 rows=1 width=1436) (actual time=1987.340..1987.340 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 360287 Total runtime: 1987.547 msWhich is great-- but what I don't understand is, the documentation suggests that auto vacuum should kick in and be cleaning up these dead rows, yet clearly that was not happening.
I've read several places talk about "don't let your indexes get bloat", and I don't quite understand 1) how an index gets bloat, and 2) how to prevent an index from getting bloat.
How can I prevent this from happening again in the future?
UPDATE
Here are my autovacuum settings:
```
name | setting | unit | category | short_desc | extra_desc | context | vartype | source | min_val | max_val | enumvals | boot_val | reset_val | sourcefile | sourceline
-------------------------------------+-----------+------+------------+-
Solution
Auto-vacuuming should eventually get around to cleaning it up (assuming you haven't disabled it), but it may not be getting around to it soon enough for your purposes. There are many settings which can control auto-vacuuming and how/when it's done, which may be of interest: here and here.
This can be especially true of tables with high churn. That is, tables with lots of insertions and deletions. Long-running and idle transactions can also be a factor here, as MVCC will kick in and prevent the dead tuples from being reclaimed. The fact that manually doing a
In general, it's not recommended to do a
From the doc:
The FULL option is not recommended for routine use, but might be
useful in special cases. An example is when you have deleted or
updated most of the rows in a table and would like the table to
physically shrink to occupy less disk space and allow faster table
scans. VACUUM FULL will usually shrink the table more than a plain
VACUUM would.
Is your usage pattern such that this would be the case? You did mention a "direct copy" was involved, but it's not clear exactly how that's being done.
I have had cases with high-churn tables where the default auto-vacuum rate just wasn't enough, and even relatively small amounts of dead tuples would greatly affect the query speed (this was in a large table which was queried very often and where the query needed to be extremely fast, and as such, was highly affected by dead tuples).
To help with this, I setup a manual
Edit in response to comment from OP:
In the VACUUM doc, it says that:
VACUUM reclaims storage occupied by dead tuples
The doc then says that (emphasis mine):
VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected
table. This is a handy combination form for routine maintenance
scripts. See ANALYZE for more details about its processing.
So it definitely reclaims dead tuples.
This can be especially true of tables with high churn. That is, tables with lots of insertions and deletions. Long-running and idle transactions can also be a factor here, as MVCC will kick in and prevent the dead tuples from being reclaimed. The fact that manually doing a
VACUUM frees the dead tuples suggests that this isn't the case for you, though, and it may be the former issue instead.In general, it's not recommended to do a
VACUUM FULL, as that takes out an exclusive table lock, particularly when most rows in a table have been updated/deleted.From the doc:
The FULL option is not recommended for routine use, but might be
useful in special cases. An example is when you have deleted or
updated most of the rows in a table and would like the table to
physically shrink to occupy less disk space and allow faster table
scans. VACUUM FULL will usually shrink the table more than a plain
VACUUM would.
Is your usage pattern such that this would be the case? You did mention a "direct copy" was involved, but it's not clear exactly how that's being done.
I have had cases with high-churn tables where the default auto-vacuum rate just wasn't enough, and even relatively small amounts of dead tuples would greatly affect the query speed (this was in a large table which was queried very often and where the query needed to be extremely fast, and as such, was highly affected by dead tuples).
To help with this, I setup a manual
VACUUM ANALYZE of the table (so it will both free up the tuples and aid the query planner by updating the stats) in a cron job that was set to run every 5 minutes. Since there weren't that many dead tuples, the VACUUM was pretty fast, and the constant vacuuming keeps the dead tuple count low enough so as to keep queries of that table fast.Edit in response to comment from OP:
In the VACUUM doc, it says that:
VACUUM reclaims storage occupied by dead tuples
The doc then says that (emphasis mine):
VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected
table. This is a handy combination form for routine maintenance
scripts. See ANALYZE for more details about its processing.
So it definitely reclaims dead tuples.
Context
StackExchange Database Administrators Q#106928, answer score: 5
Revisions (0)
No revisions yet.