HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

How AlwaysOn Availability Group secondary replica catches up with primary after secondary server long downtime

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
aftercatchesprimarygroupdowntimewithreplicalongsecondaryavailability

Problem

Can someone point me to the MS article or blog post that explains in details how AlwaysOn Availability Group secondary replica catches up with primary after secondary server long downtime?

I did below tests with AAG (async, manual failover, read-only configuration).
A) Killed secondary instance during continuos insert into primary and started secondary instance few minutes after. AAG dashboard turned into a green almost immediately after secondary restart and started to catch up with primary until number of rows became the same in both instances. No transaction log backup was done.
B) Same as A) but few transaction logs were done from primary during the test.

Questions are:

1) What is the size of log cache/messaging framework etc that are used to keep tran log blocks (which are sent to secondary replica)

2) Can above structure (log cache/send queue etc - whatever is used as transport for AAG replication) sizes be configured/increased (similar to encrease of tran log backup retention period in log shipping, for example)?

3) As I backed up (truncated) tran log in test B) and secondary replica was syncronised with primary automatically what was used to find row difference between primary and secondary (apparently not tran log as it was truncated) and then bring then in sync?

4) How does this automatic catch up process work and what are its limitations?

Solution

There is no 'cache'. Is just log. The primary log is the messaging framework. Primary writes into the log file, and secondaries receive a copy. When connected the secondaries receive the log immediately and the primary is free to reuse the log. When disconnected the primary is forbidden from truncating the log and must append to it. When reconnected the secondary receives the log it missed and primary is free to reuse again the log, no longer has to append (grow).

You taking log backups on the primary does not change anything. When the log is required to be kept for a disconnected secondary, backup does not truncate the log.

There is no 'message' to sent to the secondaries to keep it updated. The secondary is simply running recovery on the database, that's all it does, applying the log received from the primary. The Write Ahead Protocol guarantees that whatever the secondary is 'recovering' is going to be identical to whatever the primary has in the database.

There are some optimization (eg. when connected the primary usually sends the log from memory, not from disk) and there are a bunch of additional control messages sent outside of the log stream, but these are details that distract from the core issue: the secondary is simply running recovery applying the log received from the primary, and the primary has to keep said log until it was acknowledged by secondary.

Your confusion arise from the believe that backup has truncated the log on the primary, therefore there must be another mechanism to synchronize. This is incorrect, as backup did no in fact truncate the log.

Context

StackExchange Database Administrators Q#89670, answer score: 5

Revisions (0)

No revisions yet.