HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

What to do with WAL files for Postgres Slave reset

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whatpostgreswithslavefilesforwalreset

Problem

So last night our PG Slave ran out of space after a lot of reconfiguring of disk space, new drives etc its now reporting the following error:

FATAL:  could not receive data from WAL stream: FATAL:  requested WAL segment 00000001000018F70000008A has already been removed


From the reading around this I've done, it appears that the only solution is to re-sync the slave with pg_start_backup() et al. Based on this, I have a few questions.

  • Is there a better way of fixing the slave that I've simply missed or overlooked?



  • Do I need to clear out the WAL files on the slave and/or master prior or during the backup?



  • Does pg_start_backup lock the database during this time?



As requested, the log file can be found: http://pastebin.com/9F8vJh6R, have removed the rest of the file as its just 5 hours of the same repeated error

Many thanks

Solution

If you have WAL archiving enabled on the master (archive_command is set and archive_mode is on), set a restore_command in your replica's recovery.conf to allow it to fetch WAL from the WAL archive.

If there's no WAL archive, then there's no record of needed deltas between the master and the replica anymore. So you must resync them.

Typically this is done by making a new pg_basebackup of the replica. If the replica is big, though, it can be helpful to use rsync to resync the replica from the master doing block compares. To do this, you:

  • pg_start_backup() on the master



  • Stop the replica if running



  • rsync the master to the replica



  • pg_stop_backup() on the master



  • Copy any additional files from pg_xlog on the master to the replica, up to the file reported by pg_stop_backup()



  • Start the replica



It's simpler if you have WAL archiving enabled, since you then don't have to manually copy WAL, you just set a restore_command on the replica.

All sound too complicated? Use pg_basebackup.

As for your other questions:

  • NEVER delete WAL from the master. Ever. Extremely bad. Hands off pg_xlog.



  • pg_start_backup doesn't "lock" the database. It does prevent VACUUM from cleaning up dead rows, so it can increase bloat on high write activity tables, but that's about it.

Context

StackExchange Database Administrators Q#98631, answer score: 5

Revisions (0)

No revisions yet.