patternsqlMinor

Postgres Streaming Replication lagging, using lots of CPU and little I/O

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

postgreslaggingreplicationlotsstreamingusinglittleandcpu

Problem

We have 2 dedicated DB Servers. One is master and other is its streaming slave.

Postgres version 9.0.17.

We have similar hardware on both servers and using 15krpm SAS drive in Raid 10 for Database cluster. The only difference is that on server we have 128 GB ram and on slave we have 64 GB ram.

The issue which we are facing is that our streaming slave is lagging behind.

On the streaming slave we have

postgres=# select pg_last_xlog_receive_location();
 pg_last_xlog_receive_location
-------------------------------
 1F7D/DD3BA000

postgres=# select pg_last_xlog_replay_location();
 pg_last_xlog_replay_location
------------------------------
 1F75/5BE08980

This difference between wal received and replay is increasing constantly.

Please let me know that how this issue can be resolved.

Output of perf top -u postgres is

```
98.30% postgres (.) DropRelFileNodeBuffers
0.20% postgres (.) 0x000000000008b05b
0.14% (kernel) (k) copy_user_generic_string
0.13% postgres (.) hash_seq_search
0.07% postgres (.) hash_search_with_hash_value
0.05% (kernel) (k) _spin_lock
0.05% (kernel) (k) apic_timer_interrupt
0.03% (kernel) (k) do_timer
0.03% postgres (.) RememberFsyncRequest
0.03% (kernel) (k) __sb_start_write
0.02% (jbd2) (k) do_get_write_access
0.02% (kernel) (k) unroll_tree_refs
0.02% (cciss) (k) do_cciss_request
0.02% (ext4) (k) ext4_check_dir_entry
0.02% (kernel) (k) path_walk
0.02% (kernel) (k) scheduler_tick
0.02% postgres (.) LWLockAcquire
0.02% (kernel) (k) dyntick_save_progress_counter
0.02% (kernel) (k) _spin_lock_irqsave
0.02% (kernel) (k) audit_syscall_entry
0.02% (ext4) (k) ext4_journal_start_sb
0.02% (kernel) (k) radix_tree_range_tag_if_tagged
0.01% (kernel) (k) sys_getppid
0.01% (kernel) (k) native_read_tsc
0

Solution

Current (true at least in PostgreSQL 9.4 and older) PostgreSQL releases have single-threaded WAL recovery.

This means that replay of the write-ahead log occurs in only one recovery worker, and is thus able to benefit less from I/O concurrency than a normal running master. This can result in WAL replay lagging behind in cases where the replica and master have similar hardware, even when you'd expect the master (which is generally also under more load) to be the slower server.

(It'd be great to improve this, but wanted it enough to do the work yet, and nobody seems keen to fund it at the moment).

That said, your case with very high CPU use doesn't seem to fit with issues of low I/O concurrency. In your situation I'd be attaching gdb and seeing what the replay proess was doing, or using perf top to examine what the system as a whole is up to, then dig deeper with perf once I had some clues. I'd also be looking closely at iotop, vmstat, iostat, the PostgreSQL logs, dmesg, etc.

After investigation

Profiles revealed that most time was being spent in DropRelFileNodeBuffers.

That does a linear scan through shared_buffers whenever a relfilenode is deleted - caused by truncate, drop table, cluster, drop index, etc. This must be done during WAL replay, as well as on the main node.

So this suggests that:

Your shared_buffers is probably very big; and

You're probably doing lots of operations that delete relfilenodes

Reducing shared_buffers on the replica may well help.

Context

StackExchange Database Administrators Q#62831, answer score: 9

Revisions (0)

No revisions yet.