patternsqlMinor
AOAG Database went into NOT SYNC\SUSPECT MODE
Viewed 0 times
aoagintomodedatabasewentsuspectnotsync
Problem
I have SQL Server AOAG 2014 with 3 replicas. 2 Syncronous Commit with Automatic Failover and 1 (BCP) Asynchronos with Manual Failover.
It is a DEV environment and I scheduled log backups every 15 mins.
BCP Replica AOAG Database went into "NOT SNCHRONIZING\SUSPECT" mode.
When I checked the logs it says LOG DRIVE is 100 GB and it is completely occupied with this specific Database Log.
When I checked SYS.DATABASES --> Log_reuse_wait_desc -- I have seen "AVAILABILITY REPLICA".
I tried to backup log of this particular database but still I cannot shrink this particular database.
Finally , I have to re-initialize the database by removing it from AOAG and then by re-adding. then backup log for this particular DB and shrink it . This fixed the issue.
What else can I do for this particular situation.
It is a DEV environment and I scheduled log backups every 15 mins.
BCP Replica AOAG Database went into "NOT SNCHRONIZING\SUSPECT" mode.
When I checked the logs it says LOG DRIVE is 100 GB and it is completely occupied with this specific Database Log.
When I checked SYS.DATABASES --> Log_reuse_wait_desc -- I have seen "AVAILABILITY REPLICA".
I tried to backup log of this particular database but still I cannot shrink this particular database.
Finally , I have to re-initialize the database by removing it from AOAG and then by re-adding. then backup log for this particular DB and shrink it . This fixed the issue.
What else can I do for this particular situation.
Solution
This situation is caused by the transaction logs not being properly hardened on one of your secondary replicas. This could happen is one of them was offline for a period of time, or was unable to connect correctly.
To help identify the replica having the problem you can query
There are some odd things that happen with AGs sometimes, particulary after a machine has been restarted, whereby one of the databases in the AG will just stop syncing (and there will be no logged errors or issues). Rather than remove it from the AG entirely, and then add it back in you can attempt to restart the
To help identify the replica having the problem you can query
sys.dm_hadr_database_replica_states, in particular pay attention to the log_send_queue_size column. There are some odd things that happen with AGs sometimes, particulary after a machine has been restarted, whereby one of the databases in the AG will just stop syncing (and there will be no logged errors or issues). Rather than remove it from the AG entirely, and then add it back in you can attempt to restart the
hadr_endpoint on the secondary that is being the problem, and then watch to see if traffic starts flowing again.Context
StackExchange Database Administrators Q#183952, answer score: 2
Revisions (0)
No revisions yet.