GitOpsHQ Docs
Scenarios

End-To-End Scenarios

Cross-module real product flows demonstrating complete workflows from setup to deployment to recovery.

How To Use This Page

Each scenario walks through a complete cross-module workflow with concrete steps, expected outcomes, and the roles involved. Use these as reference blueprints when designing team processes, onboarding new members, or troubleshooting handoff gaps between teams.

Scenario 1: New Project Bootstrap

Goal: Go from an empty organization to a fully deployed service on a live cluster.

Roles involved: Platform Engineer, DevOps Engineer, Developer

Create the Organization — The Platform Engineer creates a new organization in GitOpsHQ, setting the org name, billing contact, and default timezone. This establishes the top-level security boundary for all resources.

Define Environments — Navigate to Settings → Environments and create environments that map to your delivery stages. A typical setup includes development, staging, and production. Each environment gets its own color badge and ordering rank for the promotion pipeline.

Register a Cluster — Go to Clusters → Register and provide a cluster name and the target Kubernetes context. GitOpsHQ generates a unique registration token and a one-line install command for the agent.

Install the GitOpsHQ Agent — On the target cluster, run the Helm install command provided during registration. The agent establishes a secure outbound gRPC connection to the GitOpsHQ hub and begins heartbeat reporting. Verify the cluster status shows Connected in the dashboard within 60 seconds.

helm upgrade --install gitopshq-agent \
  oci://ghcr.io/gitopshq-io/charts/gitopshq-agent \
  --namespace gitopshq-system --create-namespace \
  --set registrationToken="ghqa_reg_..." \
  --set hub.address="agent.gitopshq.io:50051"

Create a Project — The DevOps Engineer navigates to Projects → New Project, enters a project name (e.g., payment-service), and selects the environments this project will deploy to. GitOpsHQ initializes a Git repository structure for the project.

Initialize the Repository — Using the built-in Git editor or by pushing to the project's Git remote, create the base directory structure. GitOpsHQ expects a structure like:

environments/
  development/
    values.yaml
  staging/
    values.yaml
  production/
    values.yaml

Add a Helm Chart to the Registry — Navigate to Registry → Charts → Create and upload your application's Helm chart or reference an OCI chart. Publish as version 1.0.0. This chart becomes available for any project in the organization.

Create a Workload — In the project, go to Workloads → New Workload and select the chart from the registry. Name the workload (e.g., payment-api), choose the chart version, and specify the target namespace in the cluster.

Link Tenant — Navigate to Tenants and bind the workload to a tenant. Tenants represent isolated deployment targets (e.g., a specific customer or region). Map the tenant to the cluster and namespace where it should deploy.

Configure Values — Open the workload's values editor and set environment-specific overrides. Use HQ Variables for values that change across environments (e.g., {{hq.var.DATABASE_HOST}}). The editor provides real-time YAML validation and diff preview.

Create a Release — Go to Releases → New Release, select the workload changes to include, add a descriptive title like Initial deployment of payment-api v1.0.0, and submit. If approval policies are configured, the release enters the approval queue.

Deploy and Verify — Once approved (or immediately if no approval is required), the release triggers the delivery generator. GitOpsHQ commits the rendered manifests to the GitOps repository and the cluster agent syncs the changes. Monitor the deployment status in the release timeline until all resources show Healthy.

StepModule UsedOutcome
Create OrgOrganizationSecurity boundary established
Define EnvironmentsEnvironmentsPromotion stages configured
Register ClusterClustersCluster token generated
Install AgentClusters / AgentAgent connected and reporting
Create ProjectProjectsGit repository initialized
Initialize RepoGit EditorBase structure committed
Add ChartRegistryChart v1.0.0 published
Create WorkloadWorkloadsChart-to-namespace binding created
Link TenantTenantsDeployment target mapped
Configure ValuesValues / HQ VariablesEnvironment overrides set
Create ReleaseReleasesChange package submitted
DeployDelivery GeneratorManifests rendered and synced

Scenario 2: Multi-Tenant Rollout

Goal: Deploy a new feature across multiple tenants using a staged canary-to-full rollout strategy.

Roles involved: DevOps Engineer, SRE

Identify Change Scope — The DevOps Engineer updates the workload values for the new feature. This might involve bumping the image tag to v2.3.0, adding a new environment variable for a feature flag, and adjusting resource limits. All changes are made in the values editor with diff preview enabled.

Update Canary Tenant Values — Select the canary tenant (e.g., tenant-alpha) and apply the updated values only to this tenant. Use the environment compare view to confirm the canary tenant's values differ from the remaining tenants.

Create a Canary Release — Create a release scoped to the canary tenant only. Title it descriptively: Canary: payment-api v2.3.0 → tenant-alpha. Attach release notes describing the feature and any monitoring instructions.

Approve the Canary Release — The SRE reviews the release diff, confirms the blast radius is limited to tenant-alpha, and approves. The approval record captures the approver, timestamp, and any comments.

Deploy to Canary and Monitor — The release deploys to the canary tenant. Monitor for 30–60 minutes using the dashboard signals: pod health, error rates, and resource utilization. Check the audit log for any drift or unexpected state changes.

Promote to Next Batch — Once the canary is verified healthy, use the Promotion workflow to push the same changes to the next batch of tenants (e.g., tenant-beta, tenant-gamma). The promotion pipeline respects the environment ordering and any approval gates configured for each stage.

Batch Approval and Verification — Each batch passes through its own approval gate. The SRE monitors each batch deployment, verifying health signals before allowing the next batch to proceed. If any batch shows degradation, the promotion is paused.

Full Rollout — After all batches are verified, promote the changes to the remaining tenants. The release timeline shows the complete progression from canary through each batch to full rollout, with timestamps and approver details at every stage.

Loading diagram…

Scenario 3: Incident Response and Recovery

Goal: Detect a production issue introduced by a recent deployment, investigate the root cause, execute a rollback, and document the incident.

Roles involved: SRE, DevOps Engineer

Dashboard Alerts Drift — The SRE notices the dashboard fleet health panel showing a Drift Detected badge on the production environment for checkout-service. The drift indicator shows the live cluster state has diverged from the desired state in the GitOps repository.

Investigate Cluster Status — Navigate to Clusters → [cluster-name] and check the agent status. The cluster shows Connected but the workload sync status shows Degraded. The sync detail reveals that 2 of 5 pods are in CrashLoopBackOff.

View Pod Logs — Use the built-in diagnostics panel to stream logs from the failing pods. The logs reveal a NullReferenceException on startup caused by a missing configuration key that was removed in the latest deployment.

Correlate with Recent Deployment — Open the Audit Log and filter by checkout-service in the last 2 hours. The log shows a release was deployed 45 minutes ago that removed the CACHE_ENDPOINT environment variable. This is the root cause.

Identify the Bad Release — Navigate to Releases and find the offending release. View its diff to confirm the CACHE_ENDPOINT removal. Note the release ID and the commit SHA for the rollback target.

Execute Rollback — Go to Rollback → New Rollback and select the target: roll back checkout-service to the commit immediately before the offending release. Choose Service-scoped rollback to avoid affecting other workloads. Preview the rollback diff to confirm the CACHE_ENDPOINT variable will be restored.

Approve and Apply Rollback — If production rollbacks require approval (recommended), the SRE submits the rollback request. The on-call approver reviews and approves within the SLA. The rollback commits the previous state to the GitOps repository, and the agent syncs the cluster.

Verify Recovery — Monitor the deployment status. Within 2–3 minutes, the pods should transition from CrashLoopBackOff to Running. The drift indicator should clear, and the dashboard should return to Healthy status.

Document in Audit — The rollback is automatically recorded in the audit log with full traceability: who triggered it, who approved it, what changed, and the before/after state. Add a post-incident note linking to the root cause and the preventive action (e.g., add a validation rule that prevents removing required environment variables).

PhaseTimeActionTool
DetectionT+0Drift alert on dashboardDashboard
InvestigationT+5mView pod logs and audit trailDiagnostics, Audit Log
Root CauseT+10mIdentify bad releaseReleases
ContainmentT+12mSubmit rollback requestRollback
ApprovalT+15mApprove rollbackApprovals
RecoveryT+18mCluster syncs rolled-back stateAgent / Delivery
VerificationT+20mAll pods healthy, drift clearedDashboard
DocumentationT+30mIncident report filedAudit Log

Scenario 4: Policy-Driven Governance Setup

Goal: Configure OPA policies to enforce deployment standards and set up approval rules that gate production changes.

Roles involved: Platform Engineer

Create an OPA Policy — Navigate to Policies → OPA → New Policy. Write a Rego policy that enforces a specific standard. For example, require all production workloads to have resource limits defined:

package gitopshq.workload

deny[msg] {
  input.environment == "production"
  container := input.spec.containers[_]
  not container.resources.limits
  msg := sprintf("Container '%s' in production must have resource limits", [container.name])
}

Test in the OPA Playground — Use the built-in OPA Workbench to test the policy against sample inputs. Provide a test input with a container missing resource limits and verify the policy produces the expected denial message. Then test with a compliant input to verify it passes.

Assign Policy to Environments — Bind the policy to the production and staging environments. Choose the enforcement mode: Warn for staging (shows violations but does not block) and Enforce for production (blocks non-compliant releases).

Create an Approval Policy — Navigate to Policies → Approvals → New Policy. Configure a policy requiring at least 2 approvals from different teams for any production release. Set the quorum to 2, enable the "distinct teams" constraint, and disable self-approval.

Bind Approval Policy to Production — Attach the approval policy to the production environment. Any release targeting production will now require 2 approvals from different teams before it can execute.

Test with a Non-Compliant Deployment — Have a developer create a test release that deploys a workload to production without resource limits. The OPA policy evaluation should produce a violation. The release detail page shows the policy violation with a clear explanation of what needs to change.

Resolve the Violation — Update the workload values to include resource limits for all containers. Re-run the policy evaluation (or create a new release) and verify the violation is cleared. The release can now proceed to the approval queue.

Verify End-to-End Governance — The release enters the approval queue, requiring 2 approvals from different teams. Once approved, it deploys successfully. The audit log records the policy evaluation results, the approval chain, and the deployment outcome in a single traceable flow.

Loading diagram…

Scenario 5: CI/CD Pipeline Integration

Goal: Connect an external CI system (GitHub Actions, GitLab CI, Jenkins) to GitOpsHQ so that image builds automatically trigger deployments.

Roles involved: DevOps Engineer, Platform Engineer

Create a Service Account — Navigate to Settings → Service Accounts → New. Create a service account named ci-deployer and assign it the Release Creator role scoped to the target project. Service accounts are non-human identities designed for automation.

Generate an API Token — For the ci-deployer service account, generate a long-lived API token. Copy the token immediately as it will not be shown again. Store it as a secret in your CI system (e.g., GITOPSHQ_API_TOKEN in GitHub Actions secrets).

Configure Webhook Endpoint in CI — In your CI pipeline, add a step that calls the GitOpsHQ API after a successful image build. The API call updates the image tag in the workload values:

# GitHub Actions example
- name: Update GitOpsHQ
  run: |
    curl -X POST https://api.gitopshq.io/v1/projects/$PROJECT_ID/workloads/$WORKLOAD_ID/values \
      -H "Authorization: Bearer ${{ secrets.GITOPSHQ_API_TOKEN }}" \
      -H "Content-Type: application/json" \
      -d '{"path": "image.tag", "value": "${{ github.sha }}"}'

Set Up Image Update Webhook — Alternatively, configure a webhook in GitOpsHQ that listens for container registry events. Navigate to Settings → Webhooks → New and configure the webhook to trigger when a new image tag matching a pattern (e.g., v*.*.*) is pushed to your registry.

CI Builds and Pushes Image — A developer merges a pull request. The CI pipeline triggers, builds the container image, tags it as v2.4.0, and pushes it to the container registry.

Webhook Triggers Value Update — The registry webhook fires, and GitOpsHQ receives the event. It automatically updates the workload's image tag from v2.3.0 to v2.4.0 in the values file.

Release Auto-Created — If auto-release is configured for the project, GitOpsHQ automatically creates a release with the title Auto: image update to v2.4.0. The release includes the diff showing only the image tag change.

Approval Gate — For production environments, the auto-created release still passes through the configured approval policy. The approval request includes the CI build link, commit SHA, and the automated change context.

Deploy and Notify — Once approved, the release deploys. The webhook notification system sends a Slack/Teams message with the deployment status, linking back to the release in GitOpsHQ and the CI build that triggered it.

Loading diagram…

Scenario 6: Registry Catalog Management

Goal: Create, version, and distribute reusable Helm charts across the organization so that multiple projects can consume standardized templates.

Roles involved: Platform Engineer, DevOps Engineer

Create an Organization Helm Chart — Navigate to Registry → Charts → New Chart. Enter the chart name (e.g., microservice-base), a description, and the chart type (Helm). This creates an empty chart entry in the organization's registry.

Edit Chart Files — Use the built-in chart editor to define the chart structure. Create Chart.yaml, values.yaml, and template files. The microservice-base chart might include templates for a Deployment, Service, HPA, and Ingress with sensible defaults:

# Chart.yaml
apiVersion: v2
name: microservice-base
description: Standard microservice chart with deployment, service, HPA, and ingress
version: 1.0.0

Publish Version 1.0.0 — Once the chart files are ready, publish the chart as version 1.0.0. GitOpsHQ validates the chart structure, runs lint checks, and stores it in the OCI-compliant registry. The chart is now available to all projects in the organization.

Create a Project Workload from the Chart — In any project, create a new workload and select microservice-base@1.0.0 from the registry. Override the default values for the specific service (image, replicas, resource limits). The workload inherits the chart's template structure.

Update the Chart — The Platform Engineer identifies an improvement: add a PodDisruptionBudget template to the chart. Edit the chart files, add the PDB template, and update the values.yaml with a new pdb.enabled flag defaulting to true.

Publish Version 1.1.0 — Publish the updated chart as version 1.1.0. The registry shows both versions with a changelog diff between 1.0.0 and 1.1.0.

Upgrade Workloads Across Projects — Navigate to each project's workload and update the chart version from 1.0.0 to 1.1.0. The diff preview shows what the upgrade will change in the rendered output. Projects can upgrade at their own pace — the old version remains available.

Track Chart Adoption — The registry dashboard shows which projects and workloads are using each chart version, making it easy to identify consumers that haven't upgraded and to plan deprecation of older versions.

Chart VersionChangesConsumers
1.0.0Initial release: Deployment, Service, HPA, Ingress12 workloads
1.1.0Added PodDisruptionBudget template8 workloads (upgraded)
1.0.0 (deprecated)4 workloads (pending upgrade)

Scenario 7: Team Onboarding with RBAC

Goal: Set up a new team with appropriate roles and permissions, ensuring least-privilege access across projects and environments.

Roles involved: Platform Engineer, Team Lead

Create Teams — Navigate to Settings → Teams → New Team. Create teams that mirror your organizational structure. For example: platform-team, backend-team, frontend-team, sre-team. Each team is a named group that can be referenced in permission grants and approval policies.

Invite Members — For each team, invite members by email. GitOpsHQ sends invitation emails with a secure registration link. Team members can also be provisioned via SSO/OIDC integration if configured.

Assign Organization Roles — Assign each team an organization-level role that defines their baseline access:

TeamOrg RoleAccess Level
platform-teamAdminFull org management
backend-teamMemberProject access via grants
frontend-teamMemberProject access via grants
sre-teamOperatorCluster + runtime access

Create Resource Grants — Use Settings → Resource Grants to assign fine-grained permissions. Grant the backend-team write access to backend projects and read-only access to infrastructure projects. Grant frontend-team write access only to frontend projects.

backend-team → payment-service (write)
backend-team → order-service (write)
backend-team → infra-configs (read)
frontend-team → web-app (write)
frontend-team → mobile-bff (write)
sre-team → all-projects (read + rollback)

Configure Project-Level Permissions — Within each project, fine-tune permissions. For example, in the payment-service project, grant the backend-team release creation rights for development and staging, but require SRE approval for production.

Set Up Service Accounts for CI — Create service accounts for each team's CI pipeline. The backend-team CI gets a service account with release creation rights scoped to backend projects only. This ensures CI automation respects the same permission boundaries as human users.

Verify Access with Audit Log — Have a team member from each team perform a test action (e.g., view a project, create a draft release). Check the audit log to verify that access was granted or denied correctly. Confirm that cross-team boundary violations are blocked.

Document the RBAC Model — Record the team structure, role assignments, and permission grants in your organization's runbook. GitOpsHQ's audit log provides a continuous record of who accessed what, but having an explicit RBAC design document helps with periodic access reviews.

Loading diagram…

Scenario 8: Break-Glass Emergency Procedure

Goal: Handle a critical production incident when normal deployment paths are blocked by an environment freeze or approval bottleneck.

Roles involved: SRE, Platform Engineer (post-incident)

Break-Glass Is for Emergencies Only

Break-glass sessions bypass normal approval gates and are fully audited. Every action taken during a break-glass session is recorded with the justification, actor, timestamp, and scope. Misuse of break-glass should be reviewed in post-incident analysis.

Critical Alert Received — The SRE on-call receives a P1 alert: the checkout-service in production is returning 500 errors at a 40% rate. The monitoring dashboard confirms the issue started 10 minutes ago, correlating with a deployment that completed 15 minutes ago.

Normal Deploy Blocked by Freeze — The SRE attempts to create a rollback release, but production is currently under an environment freeze for a planned maintenance window. The UI shows: "Production environment is frozen. Releases are blocked until the freeze is lifted."

Initiate Break-Glass Session — Navigate to Emergency → Break-Glass → New Session. Select the scope: production environment, checkout-service workload. The break-glass form requires a mandatory justification and a severity classification.

Provide Justification — Enter a detailed justification: "P1 incident: checkout-service returning 500 errors at 40% rate since 14:35 UTC. Root cause identified as bad configuration in release REL-2847. Need to rollback to REL-2846 immediately. Normal path blocked by maintenance freeze." Attach the alert link for context.

Execute Emergency Rollback — With the break-glass session active, the freeze restriction is temporarily lifted for the specified scope. Create and immediately execute a rollback to the previous known-good release. The UI shows a prominent banner: "Break-Glass Session Active — All actions are being recorded."

Verify the Fix — Monitor the deployment. Within 3 minutes, the rollback completes and the 500 error rate drops to baseline. Verify pod health, check a sample of requests, and confirm the service is operating normally.

Close the Break-Glass Session — Navigate back to the break-glass session and close it. Enter a closing note: "Rollback to REL-2846 successful. Error rate returned to baseline. checkout-service healthy." The session duration, all actions taken, and the scope are sealed in the audit record.

Review the Audit Trail — The Platform Engineer reviews the break-glass audit trail the next business day. The audit shows exactly what was done, by whom, why, and what the outcome was. This review is part of the standard post-incident process.

Post-Incident Documentation — File a post-incident report that includes: the root cause (bad configuration in REL-2847), the detection time, the resolution time, the break-glass justification, and preventive actions. Recommended preventive actions might include: adding a pre-deploy validation for the configuration key, and reviewing the freeze policy to allow emergency rollbacks without break-glass.

PhaseTimeActionBreak-Glass?
AlertT+0P1 alert receivedNo
BlockedT+2mFreeze blocks rollbackNo
EscalationT+3mBreak-glass session initiatedYes — session opened
RollbackT+5mEmergency rollback executedYes — within session
RecoveryT+8mService restoredYes — within session
CloseT+10mBreak-glass session closedSession sealed
ReviewT+24hAudit trail reviewedPost-incident

Cross-Scenario Access and Approval Checklist

Use this checklist when designing any cross-module workflow to ensure access controls and approval gates are properly configured.

  • Can the user view this module or action given their role?
  • Can the user request an action but not execute it directly?
  • Is the action blocked by a freeze, stale source, or missing prerequisite?
  • Is the blocking reason displayed with an actionable next step?
  • Does the UI differentiate between "not permitted" and "permitted but gated"?
  • Is the request status clearly shown as pending, approved, rejected, cancelled, executed, or failed?
  • Is quorum progress visible (e.g., "1 of 2 approvals received")?
  • Is the distinct-team constraint reflected (e.g., "Needs approval from a different team")?
  • Is self-approval blocked when the policy requires it?
  • Is break-glass state visually distinct and time-scoped?
  • Is the cluster/agent available for the intended operation?
  • Are diagnostic constraints (log tail limits, time ranges) visible before execution?
  • Is post-execution verification linked to the delivery and audit context?
  • Can operators distinguish between authorization failures and runtime failures?
  • Is the rollback path clearly available when a deployment fails?

On this page