patternMinor
MongoDB replication: going into maintenance mode with 10333 other maintenance mode tasks in progress
Viewed 0 times
going10333withintomodemongodbreplicationprogressmaintenancetasks
Problem
I have a MongoDB instance where resync is required.
what does this line mean?
what are maintenance mode tasks? No docs from MongoDB. Why there are 10333 queued? Howto see them (list)? With a search engine I found also log entries with
2016-11-07T11:59:23.330+0000 I REPL [ReplicationExecutor] syncing from: x.x.x.x:27017
2016-11-07T11:59:23.354+0000 W REPL [rsBackgroundSync] we are too stale to use x.x.x.x:27017 as a sync source
2016-11-07T11:59:23.354+0000 I REPL [ReplicationExecutor] could not find member to sync from
2016-11-07T11:59:23.354+0000 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode
2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct 4 07:41:29:1)
2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5)
2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2016-11-07T11:59:23.355+0000 I REPL [ReplicationExecutor] going into maintenance mode with 10333 other maintenance mode tasks in progresswhat does this line mean?
[ReplicationExecutor] going into maintenance mode with 10333 other maintenance mode tasks in progresswhat are maintenance mode tasks? No docs from MongoDB. Why there are 10333 queued? Howto see them (list)? With a search engine I found also log entries with
with 0 other maintenance mode tasks in progressSolution
What are maintenance mode tasks?
The "maintenance mode tasks" message is referring to a counter of successive calls to the
As at MongoDB 3.4, changes in maintenance mode are currently only noted in the MongoDB log. This command is generally only used internally by
Here's an annotated set of log lines and the associated
Why there are 10333 queued?
In MongoDB 3.2 a replica set member that becomes "too stale" (i.e. doesn't have any oplog entries in common with another healthy member of the replica set) will remain in RECOVERING mode and periodically check if a new valid sync source is available. Each check currently increments the "maintenance task" counter, so this doesn't actually indicate a meaningful number of tasks if the member has become stale.
In theory "too stale" is not a terminal state as conceivably a member with a larger oplog may temporarily be offline; in practice a "too stale to catch up error" generally means a manual resync is required.
In this case the replica set member in question went stale almost two weeks earlier, so the maintenance mode counter has continued to creep up over time. There's a related issue in the MongoDB Jira you can watch/upvote: SERVER 23899: Reset maintenance mode when transitioning from too-stale to valid sync source.
The "maintenance mode tasks" message is referring to a counter of successive calls to the
replSetMaintenance command and (as at MongoDB 3.4) isn't associated with specific queued tasks. The replSetMaintenance command is used to keep a secondary in RECOVERING state while some maintenance work is done. A RECOVERING member remains online and potentially syncing, but is excluded from normal read operations (eg. using secondary read preferences with a driver). Each invocation of replSetMaintenance either increases the task counter (if true) or decreases it (if false). When the counter reaches 0 the member will transition from RECOVERING back into SECONDARY state assuming it is healthy.As at MongoDB 3.4, changes in maintenance mode are currently only noted in the MongoDB log. This command is generally only used internally by
mongod, but you can invoke it manually as well.Here's an annotated set of log lines and the associated
mongo shell commands showing the task counter changing:// db.adminCommand({replSetMaintenance: 1})
[ReplicationExecutor] going into maintenance mode with 0 other maintenance mode tasks in progress
[ReplicationExecutor] transition to RECOVERING
// db.adminCommand({replSetMaintenance: 1})
[ReplicationExecutor] going into maintenance mode with 1 other maintenance mode tasks in progress
// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] leaving maintenance mode (1 other maintenance mode tasks ongoing)
// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] leaving maintenance mode (0 other maintenance mode tasks ongoing)
[ReplicationExecutor] transition to SECONDARY
// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] Attempted to leave maintenance mode but it is not currently activeWhy there are 10333 queued?
In MongoDB 3.2 a replica set member that becomes "too stale" (i.e. doesn't have any oplog entries in common with another healthy member of the replica set) will remain in RECOVERING mode and periodically check if a new valid sync source is available. Each check currently increments the "maintenance task" counter, so this doesn't actually indicate a meaningful number of tasks if the member has become stale.
In theory "too stale" is not a terminal state as conceivably a member with a larger oplog may temporarily be offline; in practice a "too stale to catch up error" generally means a manual resync is required.
2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct 4 07:41:29:1)
2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5)In this case the replica set member in question went stale almost two weeks earlier, so the maintenance mode counter has continued to creep up over time. There's a related issue in the MongoDB Jira you can watch/upvote: SERVER 23899: Reset maintenance mode when transitioning from too-stale to valid sync source.
Code Snippets
// db.adminCommand({replSetMaintenance: 1})
[ReplicationExecutor] going into maintenance mode with 0 other maintenance mode tasks in progress
[ReplicationExecutor] transition to RECOVERING
// db.adminCommand({replSetMaintenance: 1})
[ReplicationExecutor] going into maintenance mode with 1 other maintenance mode tasks in progress
// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] leaving maintenance mode (1 other maintenance mode tasks ongoing)
// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] leaving maintenance mode (0 other maintenance mode tasks ongoing)
[ReplicationExecutor] transition to SECONDARY
// db.adminCommand({replSetMaintenance: 0})
[ReplicationExecutor] Attempted to leave maintenance mode but it is not currently active2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] our last optime : (term: 20, timestamp: Oct 4 07:41:29:1)
2016-11-07T11:59:23.354+0000 I REPL [rsBackgroundSync] oldest available is (term: 20, timestamp: Oct 17 02:13:33:5)Context
StackExchange Database Administrators Q#154464, answer score: 4
Revisions (0)
No revisions yet.