patternModerate
Mongo DB Replica set Stuck at RECOVERING state
Viewed 0 times
recoveringreplicamongostuckstateset
Problem
We have created a replica set and now the problem is 2 members of replica set [3 members set] are in recovering mode from 48 hours. Initially the size of recovering nodes was increasing and now even that has stopped. So in recovering nodes they are stuck after 90 GB of data with 60+ GB of local data.
How to come out of this mode ?
How to come out of this mode ?
Solution
The easy, albeit a bit unsecure way
This is a bit unsecure as it is unknown why the secondaries entered the Recovering state.
The more secure, but also more intrusive way
As above, but stop your application during the process. This prevents the possibility that your application is inserting more data than the secondaries are able to replicate. However, the problem may occur during production.
The most secure, but also most intrusive way
Some notes:
Use MMS. It's free, it's easy to set up and it gives you good information about your replica set. Try to keep the value for "replication lag" around 0, and take all means necessary that your replication lag is never greater than the "replication oplog window".
Always make sure you have a 1Gb network and a (sorry) shitload of RAM. The more, the better. Additional rule of thumb: rather half the RAM and SSDs than double the RAM and no SSDs (with RAM remaining within reasonable limits).
Disclaimer: Always make a backup of production data before fiddling with it.
- Stop the first secondary
- Delete the content of it's
dbpath
- Restart the secondary
- Wait for it to catch up with the primary
- Repeat process with the second secondary
This is a bit unsecure as it is unknown why the secondaries entered the Recovering state.
The more secure, but also more intrusive way
As above, but stop your application during the process. This prevents the possibility that your application is inserting more data than the secondaries are able to replicate. However, the problem may occur during production.
The most secure, but also most intrusive way
- Shut down the whole replica set
- Remove the content of
dbpathon both secondaries
- Copy the content of
dbpathto both secondaries'dbpath
- Start the old primary.
- Start one of the old secondaries.
- Wait until a new primary is elected.
- Start the remaining secondary.
Some notes:
Use MMS. It's free, it's easy to set up and it gives you good information about your replica set. Try to keep the value for "replication lag" around 0, and take all means necessary that your replication lag is never greater than the "replication oplog window".
Always make sure you have a 1Gb network and a (sorry) shitload of RAM. The more, the better. Additional rule of thumb: rather half the RAM and SSDs than double the RAM and no SSDs (with RAM remaining within reasonable limits).
Disclaimer: Always make a backup of production data before fiddling with it.
Context
StackExchange Database Administrators Q#77881, answer score: 15
Revisions (0)
No revisions yet.