patternsqlMinor
SQL Server DB becomes unusable overnight
Viewed 0 times
sqlunusableovernightserverbecomes
Problem
Yesterday, my SQL Server database was fine. Today it is almost unusable - it's slowed down by a factor of between five to twenty, depending on when I hit it.
Some data was added to the server in an overnight load process, but nothing like a volume that should impact a database that much. About 50,000 plain text records (no XML or other frippery).
The server was patched this morning before we rebooted it. However, none of our other database servers which also got patched are behaving differently.
Resource Monitor would seem to suggest that its disk IO that's at fault. It's running at close 100% of capacity on the .mdf file the whole time, even when there's not much actually happening in the database. Access to Templog.ldf is also running quite high.
No-one here is an expert DBA (we're all developers with varying amount of SQL skills) and we're all baffled by what's happened. We've tried running sp_updatestats and moving some of the big indexes to different discs, to no avail.
I think this must have something to do with the patch - it seems too much of a co-incidence. A colleague is convinced that it's the data load having caused the size of the mdf to increase to a point where it's caused execution plans to become inefficient.
What on earth has caused this? How can we find out, and what can we do to fix it?
EDIT:
Using
It's the Standard version of SQL Server 2008 R2.
Microsoft SQL Server 2008 R2 (SP2) - 10.50.4033.0 (X64)
Jul 9 2014 16:04:25
Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1) (Hypervisor)
The server has 72GB of RAM and three quad-core 2GHz processors.
The patching was only applied to Windows.
Some data was added to the server in an overnight load process, but nothing like a volume that should impact a database that much. About 50,000 plain text records (no XML or other frippery).
The server was patched this morning before we rebooted it. However, none of our other database servers which also got patched are behaving differently.
Resource Monitor would seem to suggest that its disk IO that's at fault. It's running at close 100% of capacity on the .mdf file the whole time, even when there's not much actually happening in the database. Access to Templog.ldf is also running quite high.
No-one here is an expert DBA (we're all developers with varying amount of SQL skills) and we're all baffled by what's happened. We've tried running sp_updatestats and moving some of the big indexes to different discs, to no avail.
I think this must have something to do with the patch - it seems too much of a co-incidence. A colleague is convinced that it's the data load having caused the size of the mdf to increase to a point where it's caused execution plans to become inefficient.
What on earth has caused this? How can we find out, and what can we do to fix it?
EDIT:
Using
sp_WhoIsActive reveals nothing out of the ordinary. It registers my own use of the sproc and some commands from a colleague who is currently trying to move another index. That's probably holding up the DB right now but it was running just as badly before.It's the Standard version of SQL Server 2008 R2.
SELECT @@VERSION gives:Microsoft SQL Server 2008 R2 (SP2) - 10.50.4033.0 (X64)
Jul 9 2014 16:04:25
Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1) (Hypervisor)
The server has 72GB of RAM and three quad-core 2GHz processors.
The patching was only applied to Windows.
Solution
It could happen that a small amount of data reaches a certain limit in the SQL Server to force another plan or something like that. This is not unlikely. But the fact that your disc seems to be heavily under duty takes me to another conclusion.
There are 2 possible base reasons for your slow down.
Let's take a look at part No. 1
It might be that your SQL Server configuration might be broken. This can cause serious problems regarding your Server speed and the disc usage.
Please check in the first instance your basic server settings. Those basic settings are
Here is a complete script:
Compare the result with your documented values in your installation steps. Are they still the same?
It may have many reasons why your server behaves so strange. I would normally bet, that your
Another problem can be a to high affinity on disc or processes. If you used a shared Server (SQL Server + other services) with a dedicated disc for SQL Server (which may be a rare case, but it could be), this could be your problem. Your server normally used to have for example 3 cpus for processes and one for I/O. The other 12 cpus are used for other services. In this case your affinity mask is wrong and uses for example an automatic configuration. This means your Server uses all the 16 cores for processes and I/O dynamically. If you have huge processes running, they can put a huge load on the disc, which it may not handle. But in fact, I don't believe that this is your case. It would be faster (even if just a bit) if this would apply, but your case is a slow down.
Another problem may be a too high degree of parallelism. Which means you have too many threads idling on one partial of a query. This could also cause a huge slow down if the parallelism don't work as expected. But this won't describe your high I/O in total.
Now let's take a look at part No. 2 too
You load a bunch of rows into your system. Even if this is a regular job, it could raised a limit in which your query plans escalate. It could be even the case that your insert in combination with SQL Server produces this behavior.
Yοu mentioned that you already tried to migrate your indices to another disc, which seems to help. This can be the happened just to the fact that you split the load on two different discs.
It may be that your indices were fractured, that your plans were fractured or that your statistics are just outdated.
You can do this manually over the interface for each single statistic element. Which would be a pain. Or you can try this code:
This will give you a complete information over each index (and heap) and the statistics behind them. Even if you run
This means, your statistics may be outdated even if you run the update.
You can take a look at the query above. If you find some pretty old statistics in some tables, you may want to run an manual statistic update for this table:
After that, you may want to give your server a kick in the ass to throw away all old plans.
If you just want to clean all caches, you might want to run this instead:
This will clean up all caches, not just the plan cache. I would normally wa
There are 2 possible base reasons for your slow down.
- You upgraded your system and rebooted it
- You load a bunch of data in it
Let's take a look at part No. 1
It might be that your SQL Server configuration might be broken. This can cause serious problems regarding your Server speed and the disc usage.
Please check in the first instance your basic server settings. Those basic settings are
max server memory, affinity I/O mask, affinity mask and max degree of parallelism. You may need to enable the advanced options using show advanced options.Here is a complete script:
-- enable advanced options
EXEC sp_configure 'show advanced options',1
-- apply configuration
RECONFIGURE
-- how much memory can the sql server allocate?
EXEC sp_configure 'max server memory'
-- which cpu is used to run I/O operations
EXEC sp_configure 'affinity I/O mask'
-- which cpus can run processes?
EXEC sp_configure 'affinity mask'
-- how many threads can work on one query part?
EXEC sp_configure 'max degree of parallelism'Compare the result with your documented values in your installation steps. Are they still the same?
It may have many reasons why your server behaves so strange. I would normally bet, that your
max server memory is just wrong. This will cause your SQL Server permanently swapping data pages. He can't hold everything in his memory. This means he need to read the pages from the disc, update it, write it instantly back. If another update comes along and uses the same page for an update, it can't be read from the memory. Instead the server needs to read it again from the disc. Just swapping...Another problem can be a to high affinity on disc or processes. If you used a shared Server (SQL Server + other services) with a dedicated disc for SQL Server (which may be a rare case, but it could be), this could be your problem. Your server normally used to have for example 3 cpus for processes and one for I/O. The other 12 cpus are used for other services. In this case your affinity mask is wrong and uses for example an automatic configuration. This means your Server uses all the 16 cores for processes and I/O dynamically. If you have huge processes running, they can put a huge load on the disc, which it may not handle. But in fact, I don't believe that this is your case. It would be faster (even if just a bit) if this would apply, but your case is a slow down.
Another problem may be a too high degree of parallelism. Which means you have too many threads idling on one partial of a query. This could also cause a huge slow down if the parallelism don't work as expected. But this won't describe your high I/O in total.
Now let's take a look at part No. 2 too
You load a bunch of rows into your system. Even if this is a regular job, it could raised a limit in which your query plans escalate. It could be even the case that your insert in combination with SQL Server produces this behavior.
Yοu mentioned that you already tried to migrate your indices to another disc, which seems to help. This can be the happened just to the fact that you split the load on two different discs.
It may be that your indices were fractured, that your plans were fractured or that your statistics are just outdated.
- lets check the statistics last update
You can do this manually over the interface for each single statistic element. Which would be a pain. Or you can try this code:
SELECT name AS indexname,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexesThis will give you a complete information over each index (and heap) and the statistics behind them. Even if you run
sp_updatestats it doesn't mean that the statistics were updated. The part when an update is quite tricky, even if you run sp_updatestats or even if auto update statistics is enabled, the statistics won't be updated just in time. Here are some edge points, when an update is needed/generated:- An empty table gets one or more rows
- A table with more than 500 rows updates 20% + 500 additional rows and an insert happened afterwards
- When 500 rows were changed in a table that holds less than 500 rows
This means, your statistics may be outdated even if you run the update.
You can take a look at the query above. If you find some pretty old statistics in some tables, you may want to run an manual statistic update for this table:
UPDATE STATISTICS dbo.YourBadTable WITH FULLSCANAfter that, you may want to give your server a kick in the ass to throw away all old plans.
DBCC FREEPROCCACHEIf you just want to clean all caches, you might want to run this instead:
DBCC FREESYSTEMCACHE ('ALL')This will clean up all caches, not just the plan cache. I would normally wa
Code Snippets
-- enable advanced options
EXEC sp_configure 'show advanced options',1
-- apply configuration
RECONFIGURE
-- how much memory can the sql server allocate?
EXEC sp_configure 'max server memory'
-- which cpu is used to run I/O operations
EXEC sp_configure 'affinity I/O mask'
-- which cpus can run processes?
EXEC sp_configure 'affinity mask'
-- how many threads can work on one query part?
EXEC sp_configure 'max degree of parallelism'SELECT name AS indexname,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexesUPDATE STATISTICS dbo.YourBadTable WITH FULLSCANDBCC FREEPROCCACHEDBCC FREESYSTEMCACHE ('ALL')Context
StackExchange Database Administrators Q#105677, answer score: 3
Revisions (0)
No revisions yet.