HiveBrain v1.2.0
Get Started
← Back to all entries
debugsqlMinor

Postgres 9.4, replication slots, doesn't work failover master

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresreplicationslotsdoesnmasterworkfailover

Problem

I've installed postgres master and slave nodes by this manual:

At the master (10.0.0.1):

sudo -u postgres initdb -D /data/postgres/main


Change postgresql.conf:

listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 2
logging_collector = on
max_wal_senders = 5


Add record for replication to pg_hba.conf:

host replication replica 10.0.0.2/32 md5


Then start postgres

service postgresql start


Create replication user:

CREATE USER replica WITH REPLICATION ENCRYPTED PASSWORD '123456';


And add replication slot:

SELECT pg_create_physical_replication_slot('standby_slot');


At the slave (10.0.0.2):

pg_basebackup -h 10.0.0.1 -U replica -D /db/postgres/main -X s -P


Change state to standby in postgresql.conf:

hot_standby = on


Add recovery.conf:

standby_mode = 'on'
primary_conninfo = 'host=10.0.0.1 port=5432 user=replica password=123456'
primary_slot_name = 'standby_slot'
trigger_file = 'trigger'


Start standby:

service postgresql start


So, replication is working fine. If I create database, it will appear both at the master and the slave.

If slave is going down and then starting, all changes come there. It all fine too.

But I have problem if master is down. When it stops, I create trigger file at the slave. Then I create another database at the slave, which now is primary. After that I start master, trigger file remove automatically, recovery.conf rename to recovery.done automatically too. Ok. But how to synchronize changes from slave (new primary) to master (old primary)?

I found that I've got to do these four steps:

At the master:

Stop postgres:

service postgresql stop


At the slave:

Switch backup mode on:

SELECT pg_backup_start('backup', true);


Then rsync new data from slave to master.

And then switch backup mode off:

SELECT pg_backup_stop();


After last step I get message

```
NOTICE: WAL archiving is not enabled; you must ensure that all

Solution

You cannot just jump back to the old master, as it is said here:


Once failover to the standby occurs, there is only a single server in
operation. This is known as a degenerate state. The former standby is
now the primary, but the former primary is down and might stay down.
To return to normal operation, a standby server must be recreated,
either on the former primary system when it comes up, or on a third,
possibly new, system.

For this, you need a recovery.conf on the old master. This should be nearly the same as the present one, only the connection has to point elsewhere.

Context

StackExchange Database Administrators Q#102719, answer score: 4

Revisions (0)

No revisions yet.