patternsqlMinor
HADR_SYNC_COMMIT waits on SQL Server
Viewed 0 times
sqlwaitsserverhadr_sync_commit
Problem
Let me preface this post by saying that I am missing a few events from my trace, but I have since added them for the next time this occurs.
Lately we have been seeing weird spurts of HADR_SYNC_COMMIT wait types in our environment (~40k tran/s). Today's "incident" occurred at 4:58am:
Before continuing, I must add that we were doing ONLINE index maintenance on a large audit table (audit in the sense that an OLTP table trigger records to this audit table at a high volume) and that index rebuild was blocked itself for ~22 seconds. Clearly, that is playing a role in this particular instance, but I am not quite sure how it relates to HADR_SYNC_COMMIT. Additionally, we have been seeing this occur during the day when we do not do index maintenance.
Looking at the trace, here is what I see on the primary:
...and everything on the secondary:
...and finally back on the primary:
A similar issue happened again 12/1/2023 around 4:11am and I believe I see what is happening. Unfortunately, I do not have extended events for this scenario, but I do have some logging that might paint a clearer picture. Starting at 2023-12-01 04:10:18.5430090, Ola's index maintenance recorded a start time for an Index on the database in question. Ola's reported a completion time of 2023-12-01 04:11:13.9431563, but I believe that the actual REBUILD WITH ONLINE = ON completed much sooner than that.
When reviewing DPA, I noticed that pagelatch_sh and pagelatch_ex waits spiked at 4:11:03-4:11:04am:
Immediately following those waits, that same query started seeing HADR_SYNC_COMMIT and those same wait completely fell off at 4:11:13-4:11:14am, which is exactly when Ola's reported the index finishing. My assumption is that the index REBUILD was committed at 4:11:03am (~45 seconds worth of work), which caused the unrelated INSERT queries that were in the same database to simply wait for all of those log blocks to harden on the secondary. As soon as the index completed, the remaining log blocks were hard
Lately we have been seeing weird spurts of HADR_SYNC_COMMIT wait types in our environment (~40k tran/s). Today's "incident" occurred at 4:58am:
Before continuing, I must add that we were doing ONLINE index maintenance on a large audit table (audit in the sense that an OLTP table trigger records to this audit table at a high volume) and that index rebuild was blocked itself for ~22 seconds. Clearly, that is playing a role in this particular instance, but I am not quite sure how it relates to HADR_SYNC_COMMIT. Additionally, we have been seeing this occur during the day when we do not do index maintenance.
Looking at the trace, here is what I see on the primary:
...and everything on the secondary:
...and finally back on the primary:
A similar issue happened again 12/1/2023 around 4:11am and I believe I see what is happening. Unfortunately, I do not have extended events for this scenario, but I do have some logging that might paint a clearer picture. Starting at 2023-12-01 04:10:18.5430090, Ola's index maintenance recorded a start time for an Index on the database in question. Ola's reported a completion time of 2023-12-01 04:11:13.9431563, but I believe that the actual REBUILD WITH ONLINE = ON completed much sooner than that.
When reviewing DPA, I noticed that pagelatch_sh and pagelatch_ex waits spiked at 4:11:03-4:11:04am:
Immediately following those waits, that same query started seeing HADR_SYNC_COMMIT and those same wait completely fell off at 4:11:13-4:11:14am, which is exactly when Ola's reported the index finishing. My assumption is that the index REBUILD was committed at 4:11:03am (~45 seconds worth of work), which caused the unrelated INSERT queries that were in the same database to simply wait for all of those log blocks to harden on the secondary. As soon as the index completed, the remaining log blocks were hard
Solution
I wanted to say kudos for diving in and investigating using XE, that's definitely a correct step in the eventual solution. Having said that, it's unlikely we're going to be able to directly help in any meaningful way. Let me explain.
HAD_SYNC_COMMIT waits are almost like any other wait in that they are a wait type but they actually reset every ~3 seconds (implementation detail) which is where
When investigating these types of waits you'll unfortunately need a great deal of data including a packet capture of the local and remote hosts, XE for HADR, File/Disk, Scheduling, and ETW traces in Windows for Disk, Filter, CPU. It's a large amount of data and I'm not suggesting you do it here because the issue is sort of in the information you've given.
Any large and long transactions (especially index related activities) aren't a very good fit for how Availability Groups are implemented. Offline index rebuilds are completed as a single behemoth transaction, ONLINE are better as they are smaller transactions at various points, but both operations can really drive CPU and IO - which isn't necessarily a bad thing. If your server has the performance ability to generate a large amount of log in a short period of time where generation is faster than network latency + disk latency, the database (and the instance) will not be able to keep up with the outstanding log generation to send to the other replicas. You'll start hitting sync commit waits on synchronous replicas (since it only applies to synchronous commit partners).
Availability Groups send data via log blocks, which are the smallest unit logical grouping of items at the log level and have a minimum size of 512 bytes (used to be 1 sector size, that has since changed given 4k, 8k, 16k, and even 32k sector sizes) and maximum of 60k, though their actual size depends on whether a commit ends the block or the block reaches the maximum size. The point of this is that the transaction itself is not the unit of transfer and multiple transactions from multiple different sessions can and will be intermingled throughout a log block. Thus if you're generating a large amount of log blocks and transactions are interwoven through those, any transactions committing will need the outstanding log block for its own transaction to be hardened before continuing on, which will likely be part of another log block which may also be much further down the line as things start to back up.
Another point of contention is the network, which again is an implementation detail as part of UCS in SQL Server. Outside of Distributed AGs in SQL Server 2022+ (which has some issues right now), a single connection is used to stream data over TCP to each replica. This means, even with a large bandwidth network you'll be limited by the latency as that will determine the rough number of outstanding requests which may or may not hit the throttling limit in SQL Server (this limit was raised in 2022). This means you'll never be able to seriously fill a large bandwidth pipe with a single instance of SQL Server using Availability Groups and that latency is much more important - thus a sync commit replica halfway around the world with a latency of 150ms is going to severely back up and large log generation workloads - i.e. index rebuilds.
I mentioned the need for scheduling and cpu data, this is due to the fact that the log generation happens, but that needs to be packaged up and sent across the network. This all happens asynchronously across a variety of threads and queues. If the instance is having cpu pressure on a single cpu, this could impact the packaging (compression, encryption, serialization, UCS information [boxcars], message information) and sending (send thread, receive thread) along with secondary replicas which are doing the same thing in the opposite order. This is especially true on readable secondary replicas where the read workload can hinder or otherwise bring down the ability of the AG to harden in a timely manner. Note that hardening requires a write to the disk (which is where the IO data comes in) and does not need to be redone.
This all is to say, when rebuilding indexes, this is expected and I wouldn't waste time delving into it further.
HAD_SYNC_COMMIT waits are almost like any other wait in that they are a wait type but they actually reset every ~3 seconds (implementation detail) which is where
hadr_db_commit_mgr_harden_still_waiting comes in and it will fire for as many rounds of checking as is needed until the log block is marked hardened. Outside of the implementation details, everything else is like other waits and the same issues will occur - remember that SQL Server uses cooperative scheduling and if you get into the details is event driven, such that many events being signaled (such as a wait is completed and the task can move forward) can cause bursting convoy type issues (all threads are waiting, then all threads are "woken up" which causes a cpu/scheduling race, which then causes the convoy again).When investigating these types of waits you'll unfortunately need a great deal of data including a packet capture of the local and remote hosts, XE for HADR, File/Disk, Scheduling, and ETW traces in Windows for Disk, Filter, CPU. It's a large amount of data and I'm not suggesting you do it here because the issue is sort of in the information you've given.
I must add that we were doing ONLINE index maintenance on a large audit table [...]Any large and long transactions (especially index related activities) aren't a very good fit for how Availability Groups are implemented. Offline index rebuilds are completed as a single behemoth transaction, ONLINE are better as they are smaller transactions at various points, but both operations can really drive CPU and IO - which isn't necessarily a bad thing. If your server has the performance ability to generate a large amount of log in a short period of time where generation is faster than network latency + disk latency, the database (and the instance) will not be able to keep up with the outstanding log generation to send to the other replicas. You'll start hitting sync commit waits on synchronous replicas (since it only applies to synchronous commit partners).
Availability Groups send data via log blocks, which are the smallest unit logical grouping of items at the log level and have a minimum size of 512 bytes (used to be 1 sector size, that has since changed given 4k, 8k, 16k, and even 32k sector sizes) and maximum of 60k, though their actual size depends on whether a commit ends the block or the block reaches the maximum size. The point of this is that the transaction itself is not the unit of transfer and multiple transactions from multiple different sessions can and will be intermingled throughout a log block. Thus if you're generating a large amount of log blocks and transactions are interwoven through those, any transactions committing will need the outstanding log block for its own transaction to be hardened before continuing on, which will likely be part of another log block which may also be much further down the line as things start to back up.
Another point of contention is the network, which again is an implementation detail as part of UCS in SQL Server. Outside of Distributed AGs in SQL Server 2022+ (which has some issues right now), a single connection is used to stream data over TCP to each replica. This means, even with a large bandwidth network you'll be limited by the latency as that will determine the rough number of outstanding requests which may or may not hit the throttling limit in SQL Server (this limit was raised in 2022). This means you'll never be able to seriously fill a large bandwidth pipe with a single instance of SQL Server using Availability Groups and that latency is much more important - thus a sync commit replica halfway around the world with a latency of 150ms is going to severely back up and large log generation workloads - i.e. index rebuilds.
I mentioned the need for scheduling and cpu data, this is due to the fact that the log generation happens, but that needs to be packaged up and sent across the network. This all happens asynchronously across a variety of threads and queues. If the instance is having cpu pressure on a single cpu, this could impact the packaging (compression, encryption, serialization, UCS information [boxcars], message information) and sending (send thread, receive thread) along with secondary replicas which are doing the same thing in the opposite order. This is especially true on readable secondary replicas where the read workload can hinder or otherwise bring down the ability of the AG to harden in a timely manner. Note that hardening requires a write to the disk (which is where the IO data comes in) and does not need to be redone.
This all is to say, when rebuilding indexes, this is expected and I wouldn't waste time delving into it further.
Code Snippets
I must add that we were doing ONLINE index maintenance on a large audit table [...]Context
StackExchange Database Administrators Q#333626, answer score: 5
Revisions (0)
No revisions yet.