patternsqlMinor
Postgresql 11 logical replication - stuck in `catchup` state
Viewed 0 times
postgresqlstuckreplicationstatecatchuplogical
Problem
I'm running two postgresql 11 servers - master and slave (setup with logical replication).
The problem I'm facing is that today after weeks of uninterrupted work slave got out of sync with this error message:
I did see this error message before and my process was to increase
So then I wanted to restore replication however the state of replication is stuck on catchup:
I tried to restart slave a few tim
The problem I'm facing is that today after weeks of uninterrupted work slave got out of sync with this error message:
2019-09-16 07:39:44.332 CEST [30117] ERROR: could not send data to WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
2019-09-16 07:39:44.539 CEST [12932] LOG: logical replication apply worker for subscription "logical_from_master" has started
2019-09-16 07:39:44.542 CEST [27972] LOG: background worker "logical replication worker" (PID 30117) exited with exit code 1I did see this error message before and my process was to increase
wal_sender_timeout on master (more details on this here: logical replication in postgresql - "server closed the connection unexpectedly")So then I wanted to restore replication however the state of replication is stuck on catchup:
master=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state
-------+----------+---------+-------------------+---------------+-----------------+-------------+-------------------------------+--------------+---------+--------------+--------------+--------------+--------------+-----------------+-----------------+-----------------+---------------+------------
86864 | 16680 | my_user | logical_from_master | 10.10.10.10 | | 46110 | 2019-09-16 12:45:56.491325+02 | | catchup | D55/FA04D4B8 | D55/F9E74158 | D55/F9E44CD8 | D55/F9E74030 | 00:00:03.603104 | 00:00:03.603104 | 00:00:03.603104 | 0 | async
(1 row)
I tried to restart slave a few tim
Solution
As you already found out your master database is too busy for single replication worker to handle all of the changes.
You need to cluster your tables - but make sure you do it in such way that tables with foreign keys are handled with the same worker, otherwise you might get into situation where foreign key constraint will prevent data from being inserted into one table because table foreign key points at had not been updated yet.
You need to cluster your tables - but make sure you do it in such way that tables with foreign keys are handled with the same worker, otherwise you might get into situation where foreign key constraint will prevent data from being inserted into one table because table foreign key points at had not been updated yet.
Context
StackExchange Database Administrators Q#248863, answer score: 2
Revisions (0)
No revisions yet.