patternsqlMinor
Postgres Streaming Replication lagging, using lots of CPU and little I/O
Viewed 0 times
postgreslaggingreplicationlotsstreamingusinglittleandcpu
Problem
We have 2 dedicated DB Servers. One is master and other is its streaming slave.
Postgres version 9.0.17.
We have similar hardware on both servers and using 15krpm SAS drive in Raid 10 for Database cluster. The only difference is that on server we have 128 GB ram and on slave we have 64 GB ram.
The issue which we are facing is that our streaming slave is lagging behind.
On the streaming slave we have
This difference between wal received and replay is increasing constantly.
Please let me know that how this issue can be resolved.
Output of
```
98.30% postgres (.) DropRelFileNodeBuffers
0.20% postgres (.) 0x000000000008b05b
0.14% (kernel) (k) copy_user_generic_string
0.13% postgres (.) hash_seq_search
0.07% postgres (.) hash_search_with_hash_value
0.05% (kernel) (k) _spin_lock
0.05% (kernel) (k) apic_timer_interrupt
0.03% (kernel) (k) do_timer
0.03% postgres (.) RememberFsyncRequest
0.03% (kernel) (k) __sb_start_write
0.02% (jbd2) (k) do_get_write_access
0.02% (kernel) (k) unroll_tree_refs
0.02% (cciss) (k) do_cciss_request
0.02% (ext4) (k) ext4_check_dir_entry
0.02% (kernel) (k) path_walk
0.02% (kernel) (k) scheduler_tick
0.02% postgres (.) LWLockAcquire
0.02% (kernel) (k) dyntick_save_progress_counter
0.02% (kernel) (k) _spin_lock_irqsave
0.02% (kernel) (k) audit_syscall_entry
0.02% (ext4) (k) ext4_journal_start_sb
0.02% (kernel) (k) radix_tree_range_tag_if_tagged
0.01% (kernel) (k) sys_getppid
0.01% (kernel) (k) native_read_tsc
0
Postgres version 9.0.17.
We have similar hardware on both servers and using 15krpm SAS drive in Raid 10 for Database cluster. The only difference is that on server we have 128 GB ram and on slave we have 64 GB ram.
The issue which we are facing is that our streaming slave is lagging behind.
On the streaming slave we have
postgres=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
1F7D/DD3BA000
postgres=# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
1F75/5BE08980This difference between wal received and replay is increasing constantly.
Please let me know that how this issue can be resolved.
Output of
perf top -u postgres is ```
98.30% postgres (.) DropRelFileNodeBuffers
0.20% postgres (.) 0x000000000008b05b
0.14% (kernel) (k) copy_user_generic_string
0.13% postgres (.) hash_seq_search
0.07% postgres (.) hash_search_with_hash_value
0.05% (kernel) (k) _spin_lock
0.05% (kernel) (k) apic_timer_interrupt
0.03% (kernel) (k) do_timer
0.03% postgres (.) RememberFsyncRequest
0.03% (kernel) (k) __sb_start_write
0.02% (jbd2) (k) do_get_write_access
0.02% (kernel) (k) unroll_tree_refs
0.02% (cciss) (k) do_cciss_request
0.02% (ext4) (k) ext4_check_dir_entry
0.02% (kernel) (k) path_walk
0.02% (kernel) (k) scheduler_tick
0.02% postgres (.) LWLockAcquire
0.02% (kernel) (k) dyntick_save_progress_counter
0.02% (kernel) (k) _spin_lock_irqsave
0.02% (kernel) (k) audit_syscall_entry
0.02% (ext4) (k) ext4_journal_start_sb
0.02% (kernel) (k) radix_tree_range_tag_if_tagged
0.01% (kernel) (k) sys_getppid
0.01% (kernel) (k) native_read_tsc
0
Solution
Current (true at least in PostgreSQL 9.4 and older) PostgreSQL releases have single-threaded WAL recovery.
This means that replay of the write-ahead log occurs in only one recovery worker, and is thus able to benefit less from I/O concurrency than a normal running master. This can result in WAL replay lagging behind in cases where the replica and master have similar hardware, even when you'd expect the master (which is generally also under more load) to be the slower server.
(It'd be great to improve this, but wanted it enough to do the work yet, and nobody seems keen to fund it at the moment).
That said, your case with very high CPU use doesn't seem to fit with issues of low I/O concurrency. In your situation I'd be attaching
After investigation
Profiles revealed that most time was being spent in
That does a linear scan through
So this suggests that:
Reducing
This means that replay of the write-ahead log occurs in only one recovery worker, and is thus able to benefit less from I/O concurrency than a normal running master. This can result in WAL replay lagging behind in cases where the replica and master have similar hardware, even when you'd expect the master (which is generally also under more load) to be the slower server.
(It'd be great to improve this, but wanted it enough to do the work yet, and nobody seems keen to fund it at the moment).
That said, your case with very high CPU use doesn't seem to fit with issues of low I/O concurrency. In your situation I'd be attaching
gdb and seeing what the replay proess was doing, or using perf top to examine what the system as a whole is up to, then dig deeper with perf once I had some clues. I'd also be looking closely at iotop, vmstat, iostat, the PostgreSQL logs, dmesg, etc.After investigation
Profiles revealed that most time was being spent in
DropRelFileNodeBuffers.That does a linear scan through
shared_buffers whenever a relfilenode is deleted - caused by truncate, drop table, cluster, drop index, etc. This must be done during WAL replay, as well as on the main node.So this suggests that:
- Your
shared_buffersis probably very big; and
- You're probably doing lots of operations that delete relfilenodes
Reducing
shared_buffers on the replica may well help.Context
StackExchange Database Administrators Q#62831, answer score: 9
Revisions (0)
No revisions yet.