debugsqlMinor
How to recover Postgres replication slave after fatal error
Viewed 0 times
aftererrorpostgresslavefatalreplicationrecoverhow
Problem
On 4 Debian 8 Jessie servers, I have PostgreSQL 9.4.3 master + 3 slaves. After substantial data changes on master, slave logs showed this error:
What steps do i need to restore/rebuild the slaves?
LOG: started streaming WAL from primary at 182/0 on timeline 1
FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000018200000000 has already been removedWhat steps do i need to restore/rebuild the slaves?
Solution
If you set up an
If you didn't set up an
You can find more out about this with the replication slot documentation.
Otherwise, you need to have an
Setting up the
An example of a similar situation to the one you are in, and one potential solution to prevent it, is covered in Offsite Replication Problems and How to Solve Them.
Hopefully this is enough information for you to get your replicas back to a working state. =)
archive_command and have your WAL segments saved elsewhere, you can just point a restore_command in the recovery.conf on each of the secondaries at your WAL archive, and they should grab the next needed segment and carry on happily.If you didn't set up an
archive_command, you'll need to take a pg_basebackup on each of your secondaries, because without that WAL segment, they can never catch up. Since you're on 9.4, I would recommend also setting up a replication slot, which will prevent the primary from recycling a WAL segment that is required for the streaming replicas.You can find more out about this with the replication slot documentation.
Otherwise, you need to have an
archive_command set that saves the WAL elsewhere to catch up, or you need to adjust wal_keep_segments high enough on the primary that it doesn't recycle files under heavy load on your system.Setting up the
archive_command in your postgresql.conf is covered in the Continuous Archiving and Point In Time Recovery documentation.An example of a similar situation to the one you are in, and one potential solution to prevent it, is covered in Offsite Replication Problems and How to Solve Them.
Hopefully this is enough information for you to get your replicas back to a working state. =)
Context
StackExchange Database Administrators Q#119114, answer score: 6
Revisions (0)
No revisions yet.