HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

I'm performing a post-mortem on an Oracle database that's crashed its' host. Where should I begin?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
performingcrashedwheredatabasepostitshostthatshouldoracle

Problem

I have an Oracle database (10.2.0.1.0 I believe) on a Windows 2003 server that I believe has successfully crashed the OS itself twice. Debates on platforms and stability aside, the evidence suggests that the server becomes unresponsive to any remote access (ports open but services don't respond) due to what I can only guess is a bad state in the Oracle's process. The IT staff noted the process appeared to be preventing it from successfully rebooting when they went to power-cycle it.

I don't have physical access to the server. However, since it's up and running again, are there any logs/dumps/etc. that I can check which may point me in the right direction? If you were me, where would you begin? Google hasn't been kind on the subject.

Solution

The first thing to look at would be the database's alert.log file. If the database was having problems at the times that were identified, you'll get error messages in the alert.log and, most likely, you'll get pointers to detailed trace files.

Are you licensed to use the AWR? Is statspack installed? If none of the database processes was crashing, it's possible that the server was unresponsive because the application was issuing runaway SQL and Oracle was crushing the server. An AWR/ statspack report from the time in question will show whether Oracle was actually doing anything at the time or not. If you have any Windows monitoring information from the time in question, that would also be useful. If the Windows performance monitors show a pile of activity and Oracle shows none, for example, that would be very interesting.

Context

StackExchange Database Administrators Q#1169, answer score: 3

Revisions (0)

No revisions yet.