patternsqlMinor
Unexplained log ship slow down
Viewed 0 times
logslowunexplaineddownship
Problem
I have a number of SQL 2008 instances all running Microsoft SQL Server 2008 (SP4) (confirmed with
Two of them exist solely to log ship with Red Gate SQL Backup 7.4.0.23, and I'm having issues with one of them. (It's one of the 2008 R2 servers, if that makes a difference.) I'm using a t-sql job that scrolls through a very long list of databases (dynamically pulled from the other servers) and restores them.
Previously, this job took less than 10 minutes. It's now taking an hour and a half to two and a half hours. There have been no code changes and no radical increases in the number of databases to restore. Its sibling server, with almost identical code, is running this job in under 4 minutes. (The sibling server is one of the non-R2 servers, if that makes a difference.)
The Event Log and SQL Error log show an error of:
Operating system error 0x80770006 (failed to retrieve text for this error. Reason: 317)."
I don't know if this is the cause of the problem or not; Google suggests that this occurs when different versions of SQL Server coexist, or Red Gate SQL Backup 6.x needs a special patch. I don't think either of these are the issue because the error is intermittent, the SQL Server versions are identical, and I'm running Red Gate SQL Backup 7.x, but I could certainly be wrong. Red Gate forums suggested running a query to see if VAS memory was low, since that could cause similar issues.
Other things I've tried to resolve the issue include:
select @@VERSION on the servers in question). They run on either Windows Server 2008 or Windows Server 2008 R2. Two of them exist solely to log ship with Red Gate SQL Backup 7.4.0.23, and I'm having issues with one of them. (It's one of the 2008 R2 servers, if that makes a difference.) I'm using a t-sql job that scrolls through a very long list of databases (dynamically pulled from the other servers) and restores them.
Previously, this job took less than 10 minutes. It's now taking an hour and a half to two and a half hours. There have been no code changes and no radical increases in the number of databases to restore. Its sibling server, with almost identical code, is running this job in under 4 minutes. (The sibling server is one of the non-R2 servers, if that makes a difference.)
The Event Log and SQL Error log show an error of:
Operating system error 0x80770006 (failed to retrieve text for this error. Reason: 317)."
I don't know if this is the cause of the problem or not; Google suggests that this occurs when different versions of SQL Server coexist, or Red Gate SQL Backup 6.x needs a special patch. I don't think either of these are the issue because the error is intermittent, the SQL Server versions are identical, and I'm running Red Gate SQL Backup 7.x, but I could certainly be wrong. Red Gate forums suggested running a query to see if VAS memory was low, since that could cause similar issues.
VAS Total avail mem, KB Max free size, KB
8320072080 8314974784
Other things I've tried to resolve the issue include:
- Cleaning old log files out of "C:\ProgramData\Red Gate\SQL Backup\Log[instancename]", because the last time the job slowed down it was because there were too many log files in that directory.
- Checking for and resolving any memory issues on the server.
- Making sure that antivirus has exclusions fo
Solution
This is bigger than a comment but something to test first and then implement :
You are logshipping 800+ databases. Thats a large amount of databases that you logship every 15 mins.
You should offload some databases to another server. IMHO 800 databases on a single server is a lot!
We had similar problem with logshipping when we logshipped 200+ databases from NY to LD region.
What we did is as below :
-
There was blocking writing to
-
We implemented trace flag TF – 1236. It will introduce Database lock partitioning. Partitioning the DATABASE lock keeps the depth of the lock list manageable in each local partition. This significantly optimizes the access path that is used to obtain a DATABASE lock.
-
create indexes on
You are logshipping 800+ databases. Thats a large amount of databases that you logship every 15 mins.
You should offload some databases to another server. IMHO 800 databases on a single server is a lot!
We had similar problem with logshipping when we logshipped 200+ databases from NY to LD region.
What we did is as below :
-
There was blocking writing to
msdb.dbo.sysjobhistory with (TABLOCKX). The TABLOCK hint means that access to the sysjobhistory is always serialized. And since you have lot of jobs running every 15mins, there will be contention (blocking).-
We implemented trace flag TF – 1236. It will introduce Database lock partitioning. Partitioning the DATABASE lock keeps the depth of the lock list manageable in each local partition. This significantly optimizes the access path that is used to obtain a DATABASE lock.
-
create indexes on
sysjobhistory and log_shipping_monitor_history_detail tables as below :use msdb
go
create nonclustered index [nc_DBA_sysjobhistory] on dbo.sysjobhistory (
,[job_id]
,[step_id]
,[run_date]
)
include (
[instance_id]
,[step_name]
,[sql_message_id]
,[sql_severity]
,[message]
,[run_status]
,[run_time]
,[run_duration]
,[retries_attempted]
,[server]
)
go
use [msdb]
go
----- this will help sys.sp_MSprocesslogshippingretentioncleanup proc (delete from msdb.dbo.log_shipping_monitor_history_detail).. that does the cleanup of logshipping.
create nonclustered index [nc_DBA_LogShipping_monitor_history_detail] on [dbo].[log_shipping_monitor_history_detail] (
[agent_id] asc
,[agent_type] asc
,[log_time_utc] asc
)
goCode Snippets
use msdb
go
create nonclustered index [nc_DBA_sysjobhistory] on dbo.sysjobhistory (
,[job_id]
,[step_id]
,[run_date]
)
include (
[instance_id]
,[step_name]
,[sql_message_id]
,[sql_severity]
,[message]
,[run_status]
,[run_time]
,[run_duration]
,[retries_attempted]
,[server]
)
go
use [msdb]
go
----- this will help sys.sp_MSprocesslogshippingretentioncleanup proc (delete from msdb.dbo.log_shipping_monitor_history_detail).. that does the cleanup of logshipping.
create nonclustered index [nc_DBA_LogShipping_monitor_history_detail] on [dbo].[log_shipping_monitor_history_detail] (
[agent_id] asc
,[agent_type] asc
,[log_time_utc] asc
)
goContext
StackExchange Database Administrators Q#139555, answer score: 4
Revisions (0)
No revisions yet.