HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does database recovery take so long?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whytakelongdatabaserecoverydoes

Problem

On my SQL Server 2012 x64 Std instance I had a large transaction fail due to the transaction log filling up yesterday, it couldn't be rolled back and SQL Server has restarted the specific DB to perform recovery.

The DB is around 750GB with the tlog hitting 170GB (very large ETL job was running). The job was only running for a few hours but the recovery so far has taken >24 hours (70% complete, on phase 3 of 3).

What I can't understand it why it's taking so long? There doesn't seem to be any pressure on the disks, sys.dm_exec_requests shows it waiting on PAGEIOLATCH_EX/SH, which I'd expect. I could've restored the entire DB in this time...

If anyone can shed any light, it would be greatly appreciated.

EDIT: As requested, getting the output of the error log:

Recovery of database 'MyDB' (6) is 77% complete (approximately 28100 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
Recovery of database 'MyDB' (6) is 77% complete (approximately 28095 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
Recovery of database 'MyDB' (6) is 77% complete (approximately 28099 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
Recovery of database 'MyDB' (6) is 77% complete (approximately 28102 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
Recovery of database 'MyDB' (6) is 77% complete (approximately 28097 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.


Also, the sys.dm_exec_requestsinfo you're after:

session_id  command status  percent_complete
35  DB STARTUP  background  86.06061


IFI and "Perform volume maintenance tasks" enabled/granted.

Solution

I had a large transaction fail due to the transaction log filling up yesterday, it couldn't be rolled back and SQL Server has restarted the specific DB to perform recovery.

This is not totally correct when a transaction starts in SQL Server it reserves space in transaction log in case the transaction has to rollback. From Transaction Log architecture BOL doc


Rollback operations are also logged. Each transaction reserves space
on the transaction log to make sure that enough log space exists to
support a rollback that is caused by either an explicit rollback
statement or if an error is encountered. The amount of space reserved
depends on the operations performed in the transaction, but generally
it is equal to the amount of space used to log each operation. This
reserved space is freed when the transaction is completed

So I guess you can understand now.


What I can't understand it why it's taking so long?

Query rollback operation is mostly single threaded while when same query executes it can use parallelism/multiple threads to perform same operation making it very fast as compared to the rollback. I suggest you read What Bob Dorr has to say about rollback taking long time. To get more insight about what occurs when rollback is in progress please read This article. Restarting SQL Server service is not going to help much, after restart the rollbak process had to start again. You have to wait and let the rollback process complete.

You can also use sys.dm_exec_requests to track your rollback.

select    
session_id,     
command,     
status,     
percent_complete    
from sys.dm_exec_requests    
where command IN ('killed/rollback','rollback','db_startup')


The other reason which Remus mentioned is too many VLF's. If you have to many VLF's recovery can take lot of time. Also other factor that can delay recovery is if SQL Server service account does not have Perform Volume maintenacce task right or instant file initialization. You can use this link to check if IFI is present or not.

EDIT:

From The output you posted

session_id  command status  percent_complete
35  DB STARTUP  background  86.06061


The startup process is 86 % complete. Just keep monitoring it, eventually it will complete. Also as stated rollback process can get blocked and can incur waits so keep monitoring that as well.


2012 x64 Standard SP2

You have standard edition which cannot take advantage of fast recovery which is present in enterprise edition. With fast recovery database can come online after second phase of recovery(the redo phase) however it is alwasy not the case so please read the article. As an additional reading you can read When is fast recovery used. Standard edition limitation could be one more reason why it is taking database time to come online.


One other thing, the rollback session (35) is itself blocking another system process from performing CHECKPOINT

Yes it can and you cannot do anything as you have no control over system process in terms you cannot kill it.

Code Snippets

select    
session_id,     
command,     
status,     
percent_complete    
from sys.dm_exec_requests    
where command IN ('killed/rollback','rollback','db_startup')
session_id  command status  percent_complete
35  DB STARTUP  background  86.06061

Context

StackExchange Database Administrators Q#126711, answer score: 7

Revisions (0)

No revisions yet.