patternsqlMinor
Who or what initiated the failover
Viewed 0 times
thewhatwhoinitiatedfailover
Problem
We have our clusters set to High Availability and manual failover. I have an alert to email DBAs whenever a failover occurs. What we would like to also see in the email is WHY and/or WHO. We have other teams in there that restart, test and delete things all the time and we would like to know if this is someone working on something vs something wrong. Right now the email is "The availability group database "TESTIN2" is changing roles from "PRIMARY" to "RESOLVING" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required."
Thanks
Thanks
Solution
What we would like to also see in the email is WHY and/or WHO.
Then you have quite a good bit of development work ahead of you. Seriously.
Everyone asks this same question, but it is not an easy one to answer with any more resolution than "It failed over at this time." Things such as virtualization make this even harder as the tools and the hypervisor itself may do things that are outside the purview of your sandbox.
However, to get you started, places to scrub for data:
Best of luck.
We have other teams in there that restart, test and delete things all the time and we would like to know if this is someone working on something vs something wrong.
That's an organizational information sharing/Administration issue and not one you're going to have much help in dealing with through a simple SQL Server Alert. In fact, to do this properly, you'll want to take SQL Server out of the picture!
Most likely you're going to need to write a service that constantly consumes event logs, traces, error logs, etc., and then takes action when issues occur.
Then you have quite a good bit of development work ahead of you. Seriously.
Everyone asks this same question, but it is not an easy one to answer with any more resolution than "It failed over at this time." Things such as virtualization make this even harder as the tools and the hypervisor itself may do things that are outside the purview of your sandbox.
However, to get you started, places to scrub for data:
- SQL Server Errorlog
- Application Event Log
- System Event Log
- Cluster Log/Clustering Event Log
- Hypervisor Logs/Database
- Internal Lists of Downtimes/Upgrades/Patches/Etc.
Best of luck.
We have other teams in there that restart, test and delete things all the time and we would like to know if this is someone working on something vs something wrong.
That's an organizational information sharing/Administration issue and not one you're going to have much help in dealing with through a simple SQL Server Alert. In fact, to do this properly, you'll want to take SQL Server out of the picture!
Most likely you're going to need to write a service that constantly consumes event logs, traces, error logs, etc., and then takes action when issues occur.
Context
StackExchange Database Administrators Q#176199, answer score: 4
Revisions (0)
No revisions yet.