HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to measure or estimate the streaming replication lag in Postgres 9.2

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
estimatethepostgresreplicationlagstreaminghowmeasure

Problem

I've been asked to reverse engineer an app that uses postgresql in the back end.
I can see that there is some database replica
tion going on... based on my reading, I *think it's called streaming replication.
The settings on the master server look like this:

wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 32


My question is this: what impacts the delay between creating a new record on the master, and when it shows up / is replicated to the slave?

From reading the manual, (http://www.postgresql.org/docs/current/static/hot-standby.html) I see that the model it uses is "eventual consistency"... paragraph 4 on that page states:


The data on the standby takes some time to arrive from the primary
server so there will be a measurable delay between primary and
standby. Running the same query nearly simultaneously on both primary
and standby might therefore return differing results. We say that data
on the standby is eventually consistent with the primary.

But is there any pattern / ways to guesstimate how long it'll take? Or is it really just arbitrary?

If you can point me in the right direction, I'd appreciate it.
Thanks.

Solution

My question is this: what impacts the delay between creating a new record on the master, and when it shows up / is replicated to the slave?

What really matters

Replication lag depends on a few things.


The archive command is only invoked on completed WAL segments. Hence, if your server generates only little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To put a limit on how old unarchived data can be, you can set archive_timeout to force the server to switch to a new WAL segment file at least that often. Note that archived files that are archived early due to a forced switch are still the same length as completely full files. It is therefore unwise to set a very short archive_timeout — it will bloat your archive storage. archive_timeout settings of a minute or so are usually reasonable.

wal_level

First, in order to use streaming replication meaningfully WAL has to be set to hot_standby (now called replica in 9.6+). Logical decoding is useful to make sense of WAL logs if you need to do that. For instance, using logical decoding you can make use of these plugins:

  • test_decoding – the default plugin



  • wal2json – shows the changes in JSON format



  • decoder_raw – reconstructs the query that has applied the change



See this post for more information

Other things mentioned

I don't think these things matter substantially,

  • wal_keep_segments this stores how many past log files you're keeping. If you exceed this limit you fall out of sync. If you transfer fast enough and don't have connect interruptions I'm not sure how much it matters. You can now use replication slots to do this in a better way.



  • max_wal_senders WAL shipping either works or not. You set this on servers exporting wal logs to enable replication and it goes against max_connections.

Context

StackExchange Database Administrators Q#93221, answer score: 2

Revisions (0)

No revisions yet.