patternsqlMinor
Misaligned Log IO Issues and Unmatched LUN sector sizes in AlwaysOn
Viewed 0 times
logissuessizessectormisalignedlununmatchedandalwayson
Problem
I'm running into a little bit of a snag on a high log send queue in a 2014 SP1 Asynchronous Mode AlwaysOn Environment with a secondary replica in Azure.
For one of our higher transaction volume DB, we are seeing log send queue sizes up to nearly 2GB for extended periods. Initially we saw a CPU bottleneck on the secondary due to the sizing of the Azure VM, which we corrected. However we were still left with some high log queue sizes and slow send rates, coupled with respective high re-dos.
We traced this back to an issue with IO using the following error log message:
"There have been 112313856 misaligned log IOs which required falling back to synchronous IO. The current IO is on file..." More specifically this has to do with unmatched disk sector sizes between our SAN and Azure LUNs.
This knowledge base article seems to address exactly what my issue is:
https://support.microsoft.com/en-us/kb/2936603
However its a year old and applies to CU5, which is about a year old, and we are about 8 patches past CU5. Additionally I can't find any official documentation from MS on trace flag 1800. Does anybody have an similar experience with either this error message, or information on trace flag 1800? If so was your issue corrected with this flag, or were you forced to match your LUN disk sector sizes between both replicas?
Thanks!
For one of our higher transaction volume DB, we are seeing log send queue sizes up to nearly 2GB for extended periods. Initially we saw a CPU bottleneck on the secondary due to the sizing of the Azure VM, which we corrected. However we were still left with some high log queue sizes and slow send rates, coupled with respective high re-dos.
We traced this back to an issue with IO using the following error log message:
"There have been 112313856 misaligned log IOs which required falling back to synchronous IO. The current IO is on file..." More specifically this has to do with unmatched disk sector sizes between our SAN and Azure LUNs.
This knowledge base article seems to address exactly what my issue is:
https://support.microsoft.com/en-us/kb/2936603
However its a year old and applies to CU5, which is about a year old, and we are about 8 patches past CU5. Additionally I can't find any official documentation from MS on trace flag 1800. Does anybody have an similar experience with either this error message, or information on trace flag 1800? If so was your issue corrected with this flag, or were you forced to match your LUN disk sector sizes between both replicas?
Thanks!
Solution
Since you are running
I guess the root of my question is, does flag 1800 have any affect at all after SQL 2014 CU5?
Yes, TF 1800 is required to be turned ON even if you are on SP1. This is because you have mis-aligned disk sector size.
The disks that store the log files of the primary and secondary replica in an AlwaysOn Availability Group (AG) have different sector sizes. For example:
If you want to understand more about the problem, refer to
Its always a best practice to have same hardware configuration on primary and secondary servers - Disk, CPU, RAM, etc.
SQL Server 2014 SP1 which is build 12.0.4100, it includes all the updates from the lower RTM branch including all the CUs. You still need to enable TF 1800 (as start up parameter with a restart of SQL Server service - this has to be done during maintenance window, since this is a downtime for your application.)I guess the root of my question is, does flag 1800 have any affect at all after SQL 2014 CU5?
Yes, TF 1800 is required to be turned ON even if you are on SP1. This is because you have mis-aligned disk sector size.
The disks that store the log files of the primary and secondary replica in an AlwaysOn Availability Group (AG) have different sector sizes. For example:
- The primary replica log file is located in a disk that has the sector size of 512 bytes. However, the secondary replica log file is located in a disk that has the sector size of four kilobytes (KB).
- The primary replica log file is located in an on-premise local system that has the sector size of 512 bytes. However, the secondary replica is located in Windows Azure Storage that has the sector size of four kilobytes (KB).
If you want to understand more about the problem, refer to
- Storage Spaces/VHDx and 4K Sector Size by Bob Dorr
- This blog post shows how to address the problem by modifying the sector size - Message misaligned log IOs which required falling back to synchronous IO in SQL Server Error Log
Its always a best practice to have same hardware configuration on primary and secondary servers - Disk, CPU, RAM, etc.
Context
StackExchange Database Administrators Q#122457, answer score: 3
Revisions (0)
No revisions yet.