Rollback Operations
Complete guide to rollback modes, selective rollback, approval workflows, and safe incident recovery.
What Is Rollback
Rollback reverts a deployment to a previous known-good state. When an incident occurs — a bad image tag, a misconfigured variable, a broken manifest — rollback lets you recover quickly by restoring configuration from a specific point in deployment history.
GitOpsHQ supports multiple rollback strategies with different scopes and granularity, so you can match the recovery action to the severity and scope of the incident.
Rollback Is Not Undo
Rollback creates a new commit that reverts state to a previous revision. It does not erase history. Every rollback action is fully traceable in both Git history and the GitOpsHQ audit trail.
Feature Gating
Rollback capabilities are gated behind feature flags: basic_rollback for full mode, rollback for the general rollback framework, selective_rollback for service and field-level modes, and rollback_approval for the approval workflow integration.
Rollback Modes
GitOpsHQ offers three rollback modes, each targeting a different scope of recovery:
Full Rollback
Reverts the entire tenant-environment to a previous deployment state. All services, bindings, variables, and configurations are restored to the selected historical revision.
When to use:
- Multiple services are affected by the incident
- You need to restore a known-good state quickly
- The exact failing component is unclear
What it reverts:
- All workload image tags and configurations
- All Kustomize and manifest bindings
- All HQ Variable values (as they were at the target revision)
- All environment-level configuration
Trade-off: Full rollback may revert healthy services alongside the broken one. Use service or field-level rollback if you need surgical precision.
Service Rollback (Pro)
Reverts a single service within a tenant-environment to a previous state. All other services remain at their current revision.
When to use:
- The incident is isolated to a specific microservice
- Other services are healthy and should not be disturbed
- You want to minimize blast radius during recovery
What it reverts:
- The selected service's image tag and configuration
- The selected service's values and bindings
- Service-scoped HQ Variables for this service
Requirement: Requires Pro plan or higher.
Field-Level Rollback (Enterprise)
Reverts specific configuration fields for a service (e.g., only the image tag, only the replica count) while keeping all other fields at their current values.
When to use:
- You know exactly which field caused the incident
- You want maximum granularity — change one value, touch nothing else
- Post-incident, you need to revert a specific variable without affecting other recent changes
What it reverts:
- Only the explicitly selected fields within a service configuration
Requirement: Requires Enterprise plan. Field selection must be non-empty — the backend rejects empty field selections.
Mode Comparison
| Aspect | Full | Service (Pro) | Field-Level (Enterprise) |
|---|---|---|---|
| Scope | Entire tenant-environment | Single service | Specific fields within a service |
| Blast radius | High | Medium | Minimal |
| Speed | Fastest (single revert commit) | Fast | Moderate (field-level diffing) |
| Precision | Low | Medium | High |
| Risk | May revert healthy services | Isolated to one service | Surgical, but requires knowing the root cause |
| Plan | All plans | Pro+ | Enterprise |
Selective Rollback Wizard
The SelectiveRollbackWizard provides a guided, step-by-step interface for executing rollbacks safely.
Select rollback scope. Choose between Full, Service, or Field-Level rollback. The wizard explains the blast radius of each mode before you proceed.
Choose target tenant-environment. Select the tenant and environment where the rollback will be executed. The wizard shows the current deployment state and health status.
Select the target revision. Browse the deployment history for this tenant-environment. Each historical deployment shows its timestamp, changes, deployer, and revision ID. Select the revision you want to roll back to.
Preview the rollback diff. The wizard generates a detailed diff showing exactly what will change:
- Current state vs. target revision state
- Per-service breakdown of affected fields
- For field-level mode: only selected fields are shown
- Impact summary: number of services, bindings, and variables affected
Submit rollback. Add a justification explaining why the rollback is needed. If the environment's approval policy requires it, the rollback enters the approval queue. Otherwise, it executes immediately.
Deployment History
The DeploymentHistoryPanel shows the full deployment timeline for a tenant-environment, serving as the source of truth for rollback target selection.
| Column | Description |
|---|---|
| Revision | Unique deployment revision identifier |
| Timestamp | When the deployment was executed |
| Author | Who triggered the deployment |
| Changes | Summary of what changed in this deployment |
| Source | Whether this came from a release, promotion, or rollback |
| Status | Deployment result (success, failed, rolled back) |
| Actions | "Rollback to this revision" button, "Compare with current" link |
Comparing Revisions
Click "Compare with current" on any historical revision to see a side-by-side diff between that revision and the current deployed state. This helps you choose the right rollback target — especially when multiple deployments have occurred since the last known-good state.
Rollback Approval Workflow
Rollback requests follow the same approval framework as releases, with some rollback-specific considerations:
Approval Rules
| Rule | Description |
|---|---|
| Standard approval | Rollback requests follow the environment's configured approval policy (same as releases) |
| Self-approval blocking | The person who submitted the rollback cannot approve it (configurable per environment) |
| Quorum support | If the environment requires N approvals, the rollback request needs the same quorum |
| Break-glass override | In emergencies, a break-glass session can bypass the approval requirement |
Break-Glass for Incidents
During a critical incident, waiting for approval may not be acceptable. Break-glass is not a dedicated rollback feature — it is part of the general project approval flow. When a rollback requires approval, the system uses maybeCreateProjectApprovalRequest which checks mayUseBreakGlass. If the user has an active break-glass session, the approval requirement is bypassed:
- Open a break-glass session from the project's approval settings (this applies to all protected actions, not just rollback)
- Provide a justification for the emergency override
- Submit the rollback — approval is bypassed automatically
- The break-glass session, justification, and rollback are all recorded in the audit trail
- Post-incident review can examine break-glass actions
Break-Glass Audit
Break-glass bypasses approval but does not bypass audit. Every break-glass rollback is prominently flagged in the audit trail for post-incident review.
Rollback and Git
All rollback operations produce Git commits in the hosted repository, maintaining the GitOps principle that Git is the source of truth.
| Rollback Mode | Git Behavior |
|---|---|
| Full | Creates a single revert commit restoring the entire tenant-environment directory to the target revision's state |
| Service | Creates a targeted commit that reverts only the files belonging to the selected service |
| Field-Level | Creates a patch commit that modifies only the specific fields in the affected files |
Commit Metadata
Every rollback commit includes structured metadata:
Rollback: tenant/acme env/production → revision abc123
Mode: service
Service: api-gateway
Justification: Bad image tag v2.1.0 causing 500 errors
Author: jane@example.com
Approval: Approved by bob@example.com (quorum: 1/1)
Rollback Request ID: rbk-789xyzThe delivery generator regenerates manifests from the reverted state, ensuring that the cluster receives a consistent set of Kubernetes resources.
Rollback Requests (Enterprise)
Enterprise plans support formal rollback requests with structured metadata and governance:
Request Structure
| Field | Description |
|---|---|
| Scope | Full, service, or field-level |
| Target revision | The deployment revision to roll back to |
| Justification | Required text explaining why the rollback is needed |
| Affected services | List of services that will be changed (auto-calculated) |
| Selected fields | For field-level rollback: the specific fields to revert |
Request States
| State | Description |
|---|---|
pending | Request submitted, awaiting approval |
approved | Approval quorum met, ready to execute |
rejected | An approver rejected the request |
executing | Rollback is in progress |
completed | Rollback successfully executed, revert commit created |
failed | Execution failed (see error details) |
Post-Rollback Actions
After a rollback is executed, follow these verification steps:
Verify Git state. Check that the revert commit appears in the hosted repository with the correct content.
Check ArgoCD sync status. Confirm that ArgoCD detects the new commit and syncs the reverted manifests to the cluster. The cluster agent reports sync status in the Cluster Detail page.
Verify cluster health. Use the Cluster Detail page's inventory and drift tabs to confirm the rollback took effect. Check pod status, replica counts, and service health.
Review drift detection. Ensure there is no drift between the desired state (Git) and the actual state (cluster). Any drift indicates the rollback may not have fully propagated.
Document the incident. Add post-rollback notes to the rollback request or release. Capture the root cause and corrective actions for team review.
Error Handling
Rollback operations can encounter several blocking conditions:
| Scenario | Behavior | Resolution |
|---|---|---|
| Concurrent changes | Conflict detected between rollback and in-flight changes | Resolve the conflict manually or wait for the concurrent operation to complete |
| Agent unreachable | Git commit succeeds but cluster sync cannot be confirmed | The rollback is safe in Git; sync will occur when the agent reconnects |
| Invalid target revision | The selected revision does not exist or is corrupted | Choose a different revision from the deployment history |
| Empty field selection | Field-level rollback with no fields selected | Select at least one field to revert |
Error Reference
| Error Code | Message | Meaning |
|---|---|---|
400 | targetSha is required | No target revision was specified in the rollback request |
400 | rollback request is not pending approval | The request has already been processed (approved, rejected, or executed) |
400 | selected fields cannot be empty for field mode | Field-level rollback requires at least one field selection |
403 | cannot approve/reject your own rollback request | Self-approval is blocked for this environment |
409 | you have already reviewed this rollback request | Each approver can only submit one review per request |
500 | approval saved but rollback execution failed | The approval was recorded but the Git commit failed — retry execution |
Best Practices
- Start with the smallest scope. If you know which service caused the incident, use service rollback instead of full rollback.
- Always preview before executing. The rollback diff shows exactly what will change — review it even under time pressure.
- Use break-glass responsibly. It exists for real emergencies. Post-incident, review all break-glass actions.
- Document justifications. Rollback justifications are invaluable during post-incident reviews and compliance audits.
- Monitor after rollback. A successful rollback commit does not guarantee a healthy cluster. Verify sync status, pod health, and drift.
- Keep deployment history clean. Frequent rollbacks to the same revision indicate a recurring issue that needs root-cause analysis, not repeated rollbacks.