patternsqlMinor
postgres 9.4 timeline issue
Viewed 0 times
issuetimelinepostgres
Problem
We've had to move around the master a bit. It started on server
The initial dump happened like
recovery file looks like
01 got moved to 02 (which was a slave) We need to move it again so we built 04 and trying to slave it off 02 and getting the following errors. 2018-02-25 17:00:08 UTC FATAL: highest timeline 3 of the primary is behind recovery timeline 4
2018-02-25 17:00:13 UTC FATAL: highest timeline 3 of the primary is behind recovery timeline 4
2018-02-25 17:00:18 UTC FATAL: highest timeline 3 of the primary is behind recovery timeline 4The initial dump happened like
pg_basebackup --verbose --progress -d "host=10.132.x.x user=backup password=...." -D /var/lib/postgresql/9.4/main/ -l 'instance restore' --xlog-method=streamrecovery file looks like
restore_command = 'if [ -f /srv/postgresql/archive/${DATASET}/%f ]; then cp /srv/postgresql/archive/${DATASET}/%f %p; else aws s3 cp --quiet s3://company-backups/postgresql/${DATASET}/archive/%f %p; fi'
standby_mode = 'on'
primary_conninfo = 'host=10.132.x.x user=backup password=....'
recovery_target_timeline = 'latest'
trigger_file = '/var/lib/postgresql/9.4/main/failover'Solution
As it sounds, you are in a split brain situation. The original master (01) was never stopped from being master, and after the promotion of 02, it became just another master.
Fixing such issues pre-9.5 is not so easy (at that version
I'd take a logical dump from both 01 and 02 to start (to check if there is anything that has to be manually replayed from 01 to 02), stop 01 altogether, remove the older timelines WAL segments from the archive (well, you can move them somewhere else just in case) and then try to build a slave based on 02 again.
You can also use
Fixing such issues pre-9.5 is not so easy (at that version
pg_rewind became an element of the PostgreSQL ecosystem) - you will need some manual cleanup, most probably. What is certain is if you got writes to 01 after promotion of 02, they will be lost (or the writes on 02, depending what you choose to do).I'd take a logical dump from both 01 and 02 to start (to check if there is anything that has to be manually replayed from 01 to 02), stop 01 altogether, remove the older timelines WAL segments from the archive (well, you can move them somewhere else just in case) and then try to build a slave based on 02 again.
You can also use
pg_xlogdump to see which relations (tables, indexes, etc.) got writes since the split brain started. (Note that from version 10 the utility name is pg_waldump.)Context
StackExchange Database Administrators Q#198780, answer score: 4
Revisions (0)
No revisions yet.