HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlModerate

How can I request a flush of the postgresql transaction logs?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlcanthelogsflushrequesttransactionhow

Problem

I have the following problem: a "vertical" Linux distribution (Sophos UMT) comes with PostgreSQL 9.2 to store its configuration. Unfortunately, since the last update, it seems that the transaction logs (WAL) of some instances are growing without ever being flushed. This causes the pg_xlog folder to grow up to be several order of magnitude larger than the base folder.

I'm now in a delicate situation: due to the excessive growth of the WAL files, the disk of one of these machines (a VM) will get full before Monday. I have already opened a support case with the vendor but, so far, they aren't being very helpful (they suggest we rebuild the VM with larger disks).

This database is never backed up because the software is performing backups in a different way (it has its own backup procedure and sends backup files by email) and I suppose that this is the reason why the WAFs are growing so much.

I'm afraid that I'm far from being a PostgreSQL expert so it's very likely I am asking a silly or obvious question but, what is the procedure for requesting the WAL files to be flushed ?

Ideally, I'm looking for a procedure that will allow me to flush these WAL files on the problematic system in order to buy myself enough time to get the vendor to issue a better fix.

Edit:
As requested, here is the output of the SELECT version(); query:

PostgreSQL 9.2.4 on i686-pc-linux-gnu, compiled by gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 32-bit


(1 row)

And the SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');
query

```
hot_standby | on | configuration file
listen_addresses | * | configuration file
log_destination | syslog | configuration file
log_min_duration_statement | -1 | configuration file
log_min_error_statement | error | configuration file
log_

Solution

Most likely what you're seeing is a huge checkpoint_segments value and long checkpoint_timeout; alternately, they might have set wal_keep_segments to a very large value if it's supposed to support streaming replication.

You can force a checkpoint with the CHECKPOINT command. This may stall the database for some time if it has accumulated a huge amount of WAL and hasn't been background-writing it. If checkpoint_completion_target is low (less than 0.8 or 0.9) then there's likely to be a big backlog of work to do at checkpoint time. Be prepared for the database to become slow and unresponsive during the checkpoint. You cannot abort a checkpoint once it begins by normal means; you can crash the database and restart it, but that just puts you back to where you were.

I'm not certain, but I have the feeling a checkpoint could also result in growth of the main database - and do so before any space is freed in the WAL, if it is at all. So a checkpoint could potentially trigger you running out of space, something that's very hard to recover from without adding more storage at least temporarily.

Now would be a very good time to get a proper backup of the database - use pg_dump -Fc dbname to dump each database, and pg_dumpall --globals-only to dump user definitions etc.

If you can afford the downtime, stop the database and take a file-system level copy of the entire data directory (the folder containing pg_xlog, pg_clog, global, base, etc). Do not do this while the server is running and do not omit any files or folders, they are all important (well, except pg_log, but it's a good idea to keep the text logs anyway).

If you'd like more specific comment on the likely cause (and so I can be more confident in my hypothesis is) you can run the following queries and paste their output into your answer (in a code-indented block) then comment so I'm notified:

SELECT version();

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');


It is possible that setting checkpoint_completion_target = 1 then stopping and restarting the DB might cause it to start aggressively writing out queued up WAL. It won't free any until it does a checkpoint, but you could force one once write activity slows down (as measured with sar, iostat, etc). I have not tested to see if checkpoint_completion_target affects already-written WAL when changed in a restart; consider testing this on a throwaway test PostgreSQL you initdb on another machine first.

Backups have nothing to do with WAL retention and growth; it isn't backup related.

See:

  • The PostgreSQL manual on WAL



  • The PostgreSQL manual on WAL configuration



  • depesz on checkpoint_completion_target



  • depesz on WAL

Code Snippets

SELECT version();

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

Context

StackExchange Database Administrators Q#43011, answer score: 10

Revisions (0)

No revisions yet.