HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Database starts always in recovery mode

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
startsmodealwaysdatabaserecovery

Problem

Everytime I restart my server, the database is always in recovery mode, and it takes about 20 minutes for it to behave as normal. This always and only happens when I restart the server, so I have a few questions...

  • I was told this could be caused by a large log file? Could that be correct? If not then what could be the other causes?



  • I need to lower the space of the log file to prevent recoveries. What is better: shrinking or truncating?



  • How can I shrink or truncate a log file / database to lower the size? What is the syntax?



I am currently using Microsoft SQL Server 2008.

Solution

I have the same issue and I believe I have resolved it but I have not been able to fully test it to confirm.

I believe the issues is related to the number of VLFs you have in your log file and not its size. If you have a large logfile it is likely that it grew organically through auto growth events and that it was not an intentional planned growth. If that is the case the you might have thousands of VLFs inside log files.

Here is a query to see how many VLFs you have that I used from here:

Create Table #stage(
    FileID      int
  , FileSize    bigint
  , StartOffset bigint
  , FSeqNo      bigint
  , [Status]    bigint
  , Parity      bigint
  , CreateLSN   numeric(38));

Create Table #results(
    Database_Name   sysname
  , VLF_count       int 
);

Exec sp_msforeachdb N'Use ?; 
            Insert Into #stage 
            Exec sp_executeSQL N''DBCC LogInfo(?)''; 

            Insert Into #results 
            Select DB_Name(), Count(*) 
            From #stage; 

            Truncate Table #stage;'

Select * 
From #results
Order By VLF_count Desc;

Drop Table #stage;
Drop Table #results;


For a further explanation of what VLFs are see this link.

I believe the issue is that with so many VLFs it takes SQL server a long time to assess their state and then bring the database out of recovery. If you shrink your log file to the smallest size you can, often the size of the first VLF that was created in the log file, then you can immediately intentionally grow it again and thereby have it create the right number of VLFs (something less than 16).

Once this is complete I believe you will be able to see that your database comes out of recovery much faster.

I have not had a chance to test fail over of our production instances after I resolved our own VLF issues so I would be very curious if you can confirm this is the root cause of the issue. Experimentally I have seen the time it takes to come out of restoring in our staging environment dramatically reduced due to this so hopefully that is it.

Code Snippets

Create Table #stage(
    FileID      int
  , FileSize    bigint
  , StartOffset bigint
  , FSeqNo      bigint
  , [Status]    bigint
  , Parity      bigint
  , CreateLSN   numeric(38));

Create Table #results(
    Database_Name   sysname
  , VLF_count       int 
);

Exec sp_msforeachdb N'Use ?; 
            Insert Into #stage 
            Exec sp_executeSQL N''DBCC LogInfo(?)''; 

            Insert Into #results 
            Select DB_Name(), Count(*) 
            From #stage; 

            Truncate Table #stage;'

Select * 
From #results
Order By VLF_count Desc;

Drop Table #stage;
Drop Table #results;

Context

StackExchange Database Administrators Q#23608, answer score: 6

Revisions (0)

No revisions yet.