How-To Guides
Execute Rollback Safely
Recover from failed or risky changes using the narrowest effective rollback mode.
Task Outcome
Rollback actions restore service stability while keeping blast radius controlled and evidence complete.
When To Use This Guide
- Deployment introduced regressions or instability.
- Promotion caused unacceptable runtime impact.
- Emergency recovery is needed under approval controls.
Prerequisites
- Incident scope is identified (tenant/environment/service).
- Candidate rollback targets are known.
- Approval and break-glass posture is clear.
Choose Rollback Mode
- Use when broad recovery is required immediately.
- Highest blast radius; strongest review needed.
- Use when impact is isolated to one or few services.
- Usually best balance of speed and precision.
- Use for targeted correction of specific fields.
- Lowest blast radius, highest review effort.
Step-By-Step Execution
Open rollback workspace and choose candidate target commit/version.
Run rollback preview and confirm exact impact scope.
Select rollback mode based on urgency and blast radius needs.
Submit rollback request and complete approvals if required.
Execute rollback and monitor runtime recovery signals.
Validate drift and service status after recovery.
Capture closure evidence and define preventive follow-up.
Decision Guide
- Prefer service rollback first if scope is known.
- Use full rollback only when fast containment is essential.
- Service rollback is typically the default path.
- Keep unrelated services out of recovery scope.
- Field rollback can fix precision issues with low side effects.
- Requires high-confidence review of exact field impact.
- Verify stability window before incident closure.
- Open prevention task to reduce repeat rollback frequency.
Validation Checklist
- Preview confirms intended rollback scope only.
- Approval requirements are completed and auditable.
- Runtime recovers and remains stable.
- Recovery notes include preventive action owner.
Common Failure Modes
- Executing rollback from wrong target selection.
- Choosing full rollback when service rollback was enough.
- Closing incident before drift/runtime re-verification.
- Missing evidence chain for who approved and executed recovery.
Recovery Note Template
Incident reference:
Rollback mode used:
Target selection:
Requester / approver / executor:
Runtime recovery summary:
Residual risks:
Preventive action: