snippetMinor
How to implement the four-eyes principle for emergency fixes?
Viewed 0 times
eyestheprincipleimplementfourfixesemergencyforhow
Problem
Consider this scenario (any comparison with real world situations is purely by accident):
If you implemented some approval procedure similar to my answer to "What are possible implementations (or examples) of the four-eyes principle?", then you're out of luck ... here are your choices:
So how to implement the four-eyes principle for emergency fixes? ... So that you get production up and running asap, i.e. around 3:25 am ... And so that you can also close the call (and go back to where you came from)?
- 3:07 am: incoming support call "Something in production went down, I need your help!".
- 3:12 am: connected to the system (logon accepted) ... and no time for coffee.
- 3:15 am: lucky you, right away you could spot the issue via some error message somewhere.
- 3:17 am: use your SCM toolbox to grab the code, fix the issue, test it, great ... my fix works!
- 3:20 am: get in touch with the DevOps-team to ship the fix and to get production running again.
- 3:21 am: red flag ... "To respect four-eyes, we need 2 more eyes to get approval for this fix".
- 3:22 am: ggggrrrreat, now what, who else can we call (= wake up some manager)?
If you implemented some approval procedure similar to my answer to "What are possible implementations (or examples) of the four-eyes principle?", then you're out of luck ... here are your choices:
- Your fix will be stuck (read: production will be down) until 2 more eyes got involved.
- You figure out a way to get around the missing eyes.
So how to implement the four-eyes principle for emergency fixes? ... So that you get production up and running asap, i.e. around 3:25 am ... And so that you can also close the call (and go back to where you came from)?
Solution
In the SCM-world where I'm mostly familiar with, the above scenario is typically addressed by what's called the "abbreviated-approval list procedure.
Here is a blueprint of it:
With such solution in place, the call can be closed around 3:23 am ... since there will be no more red flag at 3:21 am ... ggggrrreat, time for a beer to celebrate my fix to get production going again (instead of coffee) ... and fingers crossed the outstanding post approvals will come in soon ...
Here is a blueprint of it:
- Define your business hours, say from 8 am to 6 pm.
- Define a complete approval list of (say) 3 levels of approval (for roles X, Y and Z).
- Define an abbreviated approval list of (say) only 1 level of approval (only for roles X).
- Planned changes always require all approvals from the complete approval list.
- For Unplanned changes, the complete approval list is used also to gather the required approvals, provided the approvals are to be issued during the defined business hours.
- For any approvals of unplanned changes that are to be issued outside the defined business hours:
- Only the approvals from the abbreviated approval list (such as role X above) are required to authorize the change. And after the authorization by the abbreviated approval list is given, the deployment of the change (in the target environment) will actually be performed.
- But additional post-approvals will be needed afterwards (within a reasonable amount of hours/days), i.e from all roles contained in the complete approval list (such as role Y and Z above), which are not also contained in the abbreviated approval list (such as role X above). And if within the (upfront) agreed amount of hours/days not all post-approvals have been issued (e.g because the fix worked "this" time, but was only like a temporary fix), then the change might be subject to a rollback. While there is at least 1 outstanding post-approval, the change is marked as "waiting post approvals".
With such solution in place, the call can be closed around 3:23 am ... since there will be no more red flag at 3:21 am ... ggggrrreat, time for a beer to celebrate my fix to get production going again (instead of coffee) ... and fingers crossed the outstanding post approvals will come in soon ...
Context
StackExchange DevOps Q#437, answer score: 8
Revisions (0)
No revisions yet.