patternbashMajor
Rollback strategy: automated revert on deploy failure
Viewed 0 times
rollbackrevertsmoke testMTTRautomated recoverydeploy failure
Problem
When a deploy fails, engineers manually identify the last good commit, run a revert, wait for CI, and re-deploy. This takes 15-30 minutes during which users are affected. There is no documented runbook so every incident is improvised.
Solution
Build automated rollback into the deployment workflow:
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Record current deployed SHA
id: current
run: echo "sha=$(./get-current-deployed-sha.sh)" >> $GITHUB_OUTPUT
- name: Deploy
id: deploy
run: ./deploy.sh ${{ github.sha }}
- name: Smoke test
id: smoke
run: ./smoke-test.sh
- name: Rollback on failure
if: failure() && steps.deploy.outcome == 'success'
run: |
echo "Smoke test failed, rolling back to ${{ steps.current.outputs.sha }}"
./deploy.sh ${{ steps.current.outputs.sha }}
./notify-slack.sh "Production rolled back to ${{ steps.current.outputs.sha }}"Why
Automated rollback reduces MTTR (mean time to recovery) from 30 minutes to under 5. The condition 'steps.deploy.outcome == success' ensures rollback only runs if the deploy succeeded but smoke tests failed—not if the deploy itself failed midway.
Gotchas
- Rolling back infrastructure changes (database migrations, schema changes) is much harder than rolling back application code—plan separately
- The rollback step itself can fail; add monitoring on the rollback step and alert if it fails
- Rollback to the previous SHA does not account for environment changes (env vars, secrets) that may have been updated alongside the deploy
Revisions (0)
No revisions yet.