HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Automating failover in PostgreSQL 9.1

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlautomatingfailover

Problem

How does one setup two identical servers for automatic failover in PostgreSQL 9.1.

OS


Centos 5

PostgreSQL 9.1 compiled from source

The postgres user account exists on both machines and has a ssh passwordless key to connect to both machines.

My Current Setup :

Master server configuration:

postgresql.conf:

listen_address = '*'
wal_level = hot_standby
max_wal_senders = 3
checkpoint_segments = 16    
wal_keep_segments = 8 
archive_mode = on    
archive_command = 'cp "%p" /opt/pgsql91/archive/"%f"'


pg_hba.conf:

host  replication   all   10.0.66.1/32      trust
 host  replication   all   10.0.66.2/32      trust


Standby Server

postgresql.conf and pg_hba.conf are identical to what is configured on the master server.

recovery.conf:

standby_mode = 'on'
 primary_conninfo = 'host=10.0.66.1'
 trigger_file = '/opt/pgsql91/data/trigger.txt'


Thanks to hzRoot, I now understand how to switch the server from standby to master.

Using the following commands, I can synchronize the new slave with the new master and then get replication backup and running.

On the new master (10.0.66.2)

  • su - postgres



  • touch trigger.txt in /opt/pgsql91/data/



  • recovery.conf becomes recovery.done



  • psql -c ";SELECT pg_start_backup('backup', true)";



  • rsync -a -v -e ssh /opt/pgsql91/data/ 10.0.66.1:/opt/pgsql91/data/ --exclude postmaster.pid



  • psql -c ";SELECT pg_stop_backup()";



On the new slave (10.0.66.1)

  • create the recovery.conf : cp recovery.done to recovery.conf



  • vi recovery.conf change ip address : primary_conninfo = 'host=10.0.66.2'



  • start postgresql



So my questions are now :

  • Is this the correct way to switch roles?



  • Has anyone automated this process, if so what did you do?



  • If synchronous replication is enabled, I noticed the new master server won't commit any transactions because it is waiting for the slave to respond. There is no slave however because the other server, the old master is down. Is this correct or do I need

Solution

Check out repmrg:


repmgr is a set of open source tools that helps DBAs and System
administrators manage a cluster of PostgreSQL databases..


By taking advantage of the Hot Standby capability introduced in
PostgreSQL 9, repmgr greatly simplifies the process of setting up and
managing database with high availability and scalability requirements.


repmgr simplifies administration and daily management, enhances
productivity and reduces the overall costs of a PostgreSQL cluster by:



  • monitoring the replication process; allowing DBAs to issue high



  • availability operations such as switch-overs and fail-overs.




It does two things:

  • repmgr: command program that performs tasks on your cluster and then exits



  • repmgrd: management and monitoring daemon that watches the cluster and can automate remote actions.



For automatic failover, repmgrd does the trick and is not a SPOF in your network, like pgPool. However, it is still important to monitor all deamons and bring them back up after failure.

Version 2.0 is about to be released, including RPM's.

Context

StackExchange Database Administrators Q#9587, answer score: 8

Revisions (0)

No revisions yet.