Rollback Operations

Complete guide to rollback modes, selective rollback, approval workflows, and safe incident recovery.

What Is Rollback

Rollback reverts a deployment to a previous known-good state. When an incident occurs — a bad image tag, a misconfigured variable, a broken manifest — rollback lets you recover quickly by restoring configuration from a specific point in deployment history.

GitOpsHQ supports multiple rollback strategies with different scopes and granularity, so you can match the recovery action to the severity and scope of the incident.

Rollback Is Not Undo

Rollback creates a new commit that reverts state to a previous revision. It does not erase history. Every rollback action is fully traceable in both Git history and the GitOpsHQ audit trail.

Feature Gating

Rollback capabilities are gated behind feature flags: basic_rollback for full mode, rollback for the general rollback framework, selective_rollback for service and field-level modes, and rollback_approval for the approval workflow integration.

Rollback Modes

GitOpsHQ offers three rollback modes, each targeting a different scope of recovery:

Full Rollback

Reverts the entire tenant-environment to a previous deployment state. All services, bindings, variables, and configurations are restored to the selected historical revision.

When to use:

Multiple services are affected by the incident
You need to restore a known-good state quickly
The exact failing component is unclear

What it reverts:

All workload image tags and configurations
All Kustomize and manifest bindings
All HQ Variable values (as they were at the target revision)
All environment-level configuration

Trade-off: Full rollback may revert healthy services alongside the broken one. Use service or field-level rollback if you need surgical precision.

Service Rollback (Pro)

Reverts a single service within a tenant-environment to a previous state. All other services remain at their current revision.

When to use:

The incident is isolated to a specific microservice
Other services are healthy and should not be disturbed
You want to minimize blast radius during recovery

What it reverts:

The selected service's image tag and configuration
The selected service's values and bindings
Service-scoped HQ Variables for this service

Requirement: Requires Pro plan or higher.

Field-Level Rollback (Enterprise)

Reverts specific configuration fields for a service (e.g., only the image tag, only the replica count) while keeping all other fields at their current values.

When to use:

You know exactly which field caused the incident
You want maximum granularity — change one value, touch nothing else
Post-incident, you need to revert a specific variable without affecting other recent changes

What it reverts:

Only the explicitly selected fields within a service configuration

Requirement: Requires Enterprise plan. Field selection must be non-empty — the backend rejects empty field selections.

Mode Comparison

Aspect	Full	Service (Pro)	Field-Level (Enterprise)
Scope	Entire tenant-environment	Single service	Specific fields within a service
Blast radius	High	Medium	Minimal
Speed	Fastest (single revert commit)	Fast	Moderate (field-level diffing)
Precision	Low	Medium	High
Risk	May revert healthy services	Isolated to one service	Surgical, but requires knowing the root cause
Plan	All plans	Pro+	Enterprise

Selective Rollback Wizard

The SelectiveRollbackWizard provides a guided, step-by-step interface for executing rollbacks safely.

Select rollback scope. Choose between Full, Service, or Field-Level rollback. The wizard explains the blast radius of each mode before you proceed.

Choose target tenant-environment. Select the tenant and environment where the rollback will be executed. The wizard shows the current deployment state and health status.

Select the target revision. Browse the deployment history for this tenant-environment. Each historical deployment shows its timestamp, changes, deployer, and revision ID. Select the revision you want to roll back to.

Preview the rollback diff. The wizard generates a detailed diff showing exactly what will change:

Current state vs. target revision state
Per-service breakdown of affected fields
For field-level mode: only selected fields are shown
Impact summary: number of services, bindings, and variables affected

Submit rollback. Add a justification explaining why the rollback is needed. If the environment's approval policy requires it, the rollback enters the approval queue. Otherwise, it executes immediately.

Deployment History

The DeploymentHistoryPanel shows the full deployment timeline for a tenant-environment, serving as the source of truth for rollback target selection.

Column	Description
Revision	Unique deployment revision identifier
Timestamp	When the deployment was executed
Author	Who triggered the deployment
Changes	Summary of what changed in this deployment
Source	Whether this came from a release, promotion, or rollback
Status	Deployment result (success, failed, rolled back)
Actions	"Rollback to this revision" button, "Compare with current" link

Comparing Revisions

Click "Compare with current" on any historical revision to see a side-by-side diff between that revision and the current deployed state. This helps you choose the right rollback target — especially when multiple deployments have occurred since the last known-good state.

Rollback Approval Workflow

Rollback requests follow the same approval framework as releases, with some rollback-specific considerations:

Loading diagram…

Approval Rules

Rule	Description
Standard approval	Rollback requests follow the environment's configured approval policy (same as releases)
Self-approval blocking	The person who submitted the rollback cannot approve it (configurable per environment)
Quorum support	If the environment requires N approvals, the rollback request needs the same quorum
Break-glass override	In emergencies, a break-glass session can bypass the approval requirement

Break-Glass for Incidents

During a critical incident, waiting for approval may not be acceptable. Break-glass is not a dedicated rollback feature — it is part of the general project approval flow. When a rollback requires approval, the system uses maybeCreateProjectApprovalRequest which checks mayUseBreakGlass. If the user has an active break-glass session, the approval requirement is bypassed:

Open a break-glass session from the project's approval settings (this applies to all protected actions, not just rollback)
Provide a justification for the emergency override
Submit the rollback — approval is bypassed automatically
The break-glass session, justification, and rollback are all recorded in the audit trail
Post-incident review can examine break-glass actions

Break-Glass Audit

Break-glass bypasses approval but does not bypass audit. Every break-glass rollback is prominently flagged in the audit trail for post-incident review.

Rollback and Git

All rollback operations produce Git commits in the hosted repository, maintaining the GitOps principle that Git is the source of truth.

Rollback Mode	Git Behavior
Full	Creates a single revert commit restoring the entire tenant-environment directory to the target revision's state
Service	Creates a targeted commit that reverts only the files belonging to the selected service
Field-Level	Creates a patch commit that modifies only the specific fields in the affected files

Commit Metadata

Every rollback commit includes structured metadata:

Rollback: tenant/acme env/production → revision abc123

Mode: service
Service: api-gateway
Justification: Bad image tag v2.1.0 causing 500 errors
Author: jane@example.com
Approval: Approved by bob@example.com (quorum: 1/1)
Rollback Request ID: rbk-789xyz

The delivery generator regenerates manifests from the reverted state, ensuring that the cluster receives a consistent set of Kubernetes resources.

Rollback Requests (Enterprise)

Enterprise plans support formal rollback requests with structured metadata and governance:

Request Structure

Field	Description
Scope	Full, service, or field-level
Target revision	The deployment revision to roll back to
Justification	Required text explaining why the rollback is needed
Affected services	List of services that will be changed (auto-calculated)
Selected fields	For field-level rollback: the specific fields to revert

Request States

State	Description
`pending`	Request submitted, awaiting approval
`approved`	Approval quorum met, ready to execute
`rejected`	An approver rejected the request
`executing`	Rollback is in progress
`completed`	Rollback successfully executed, revert commit created
`failed`	Execution failed (see error details)

Post-Rollback Actions

After a rollback is executed, follow these verification steps:

Verify Git state. Check that the revert commit appears in the hosted repository with the correct content.

Check ArgoCD sync status. Confirm that ArgoCD detects the new commit and syncs the reverted manifests to the cluster. The cluster agent reports sync status in the Cluster Detail page.

Verify cluster health. Use the Cluster Detail page's inventory and drift tabs to confirm the rollback took effect. Check pod status, replica counts, and service health.

Review drift detection. Ensure there is no drift between the desired state (Git) and the actual state (cluster). Any drift indicates the rollback may not have fully propagated.

Document the incident. Add post-rollback notes to the rollback request or release. Capture the root cause and corrective actions for team review.

Error Handling

Rollback operations can encounter several blocking conditions:

Scenario	Behavior	Resolution
Concurrent changes	Conflict detected between rollback and in-flight changes	Resolve the conflict manually or wait for the concurrent operation to complete
Agent unreachable	Git commit succeeds but cluster sync cannot be confirmed	The rollback is safe in Git; sync will occur when the agent reconnects
Invalid target revision	The selected revision does not exist or is corrupted	Choose a different revision from the deployment history
Empty field selection	Field-level rollback with no fields selected	Select at least one field to revert

Error Reference

Error Code	Message	Meaning
`400`	`targetSha is required`	No target revision was specified in the rollback request
`400`	`rollback request is not pending approval`	The request has already been processed (approved, rejected, or executed)
`400`	`selected fields cannot be empty for field mode`	Field-level rollback requires at least one field selection
`403`	`cannot approve/reject your own rollback request`	Self-approval is blocked for this environment
`409`	`you have already reviewed this rollback request`	Each approver can only submit one review per request
`500`	`approval saved but rollback execution failed`	The approval was recorded but the Git commit failed — retry execution

Best Practices

Start with the smallest scope. If you know which service caused the incident, use service rollback instead of full rollback.
Always preview before executing. The rollback diff shows exactly what will change — review it even under time pressure.
Use break-glass responsibly. It exists for real emergencies. Post-incident, review all break-glass actions.
Document justifications. Rollback justifications are invaluable during post-incident reviews and compliance audits.
Monitor after rollback. A successful rollback commit does not guarantee a healthy cluster. Verify sync status, pod health, and drift.
Keep deployment history clean. Frequent rollbacks to the same revision indicate a recurring issue that needs root-cause analysis, not repeated rollbacks.

On this page