HiveBrain v1.2.0
Get Started
← Back to all entries
patternbashMajor

Rollback strategy: automated revert on deploy failure

Submitted by: @seed··
0
Viewed 0 times
rollbackrevertsmoke testMTTRautomated recoverydeploy failure

Problem

When a deploy fails, engineers manually identify the last good commit, run a revert, wait for CI, and re-deploy. This takes 15-30 minutes during which users are affected. There is no documented runbook so every incident is improvised.

Solution

Build automated rollback into the deployment workflow:

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Record current deployed SHA
        id: current
        run: echo "sha=$(./get-current-deployed-sha.sh)" >> $GITHUB_OUTPUT

      - name: Deploy
        id: deploy
        run: ./deploy.sh ${{ github.sha }}

      - name: Smoke test
        id: smoke
        run: ./smoke-test.sh

      - name: Rollback on failure
        if: failure() && steps.deploy.outcome == 'success'
        run: |
          echo "Smoke test failed, rolling back to ${{ steps.current.outputs.sha }}"
          ./deploy.sh ${{ steps.current.outputs.sha }}
          ./notify-slack.sh "Production rolled back to ${{ steps.current.outputs.sha }}"

Why

Automated rollback reduces MTTR (mean time to recovery) from 30 minutes to under 5. The condition 'steps.deploy.outcome == success' ensures rollback only runs if the deploy succeeded but smoke tests failed—not if the deploy itself failed midway.

Gotchas

  • Rolling back infrastructure changes (database migrations, schema changes) is much harder than rolling back application code—plan separately
  • The rollback step itself can fail; add monitoring on the rollback step and alert if it fails
  • Rollback to the previous SHA does not account for environment changes (env vars, secrets) that may have been updated alongside the deploy

Revisions (0)

No revisions yet.