GitOpsHQ Docs
Team Playbooks

Platform Playbook

Governance and infrastructure guide for platform engineers managing organizations, RBAC, clusters, registries, and policies in GitOpsHQ.

Your Role in GitOpsHQ

As a Platform Engineer, you are the architect of your organization's GitOps foundation. You design the security model, manage cluster infrastructure, curate reusable artifacts, author governance policies, and establish the conventions that every other role follows. Your decisions determine the safety, consistency, and velocity of the entire delivery pipeline.

You set the guardrails — DevOps, Developers, and SREs operate within the boundaries you define.

Your Access Scope

You typically hold the Admin role at the organization level, giving you full access to all settings, policies, clusters, and registry operations. Use this power carefully — follow least-privilege principles when granting access to others.

Organization Governance

Setting Up a New Organization

Create the Organization — Navigate to Organizations → New. Choose a name that reflects the business unit or product line (e.g., acme-payments). Configure the default timezone, billing contact, and notification preferences.

Define Environments — Go to Settings → Environments and create the delivery stages your organization uses. The most common pattern is development → staging → production, but you can add more (e.g., qa, performance, canary). Set the ordering rank to define the promotion sequence.

Establish Environment Policies — For each environment, decide on the governance controls:

EnvironmentApproval RequiredFreeze SupportOPA Enforcement
DevelopmentNoNoWarn only
StagingOptional (1 approver)YesWarn only
ProductionRequired (2 approvers, distinct teams)YesEnforce

Configure Notification Channels — Set up Slack, Microsoft Teams, or webhook integrations for deployment notifications, approval requests, and alerts. Route different event types to appropriate channels.

RBAC Setup

A well-designed RBAC model balances security with developer productivity. The goal is least-privilege access that does not create friction for routine operations.

Role Hierarchy

RoleScopeTypical HolderCapabilities
AdminOrganizationPlatform EngineerFull access to all settings, policies, clusters, and registry
OperatorOrganizationSRECluster management, diagnostics, rollback, break-glass
MemberOrganizationDeveloper, DevOpsProject access governed by resource grants

Team Design Principles

  • Mirror your organizational structure — Create teams that match real reporting lines or functional groups. This makes approval policies meaningful (e.g., "requires approval from a different team").
  • Keep teams between 3–10 members — Too small and approval routing becomes a bottleneck. Too large and accountability is diluted.
  • Use service accounts for automation — Never share human credentials with CI/CD systems. Create dedicated service accounts with scoped tokens.

Resource Grant Patterns

Strict Model — Each team has write access only to their own projects. Cross-project access is read-only. Production access requires explicit grants.

backend-team → backend-projects (write)
backend-team → all-other-projects (read)
frontend-team → frontend-projects (write)
frontend-team → all-other-projects (read)
sre-team → all-projects (read + rollback)

Best for: regulated industries, large organizations, compliance-sensitive environments.

Collaborative Model — Teams have write access to related projects and broader read access. Production still requires approval gates.

product-team → product-projects (write, all envs)
product-team → shared-infra (read)
platform-team → all-projects (admin)

Best for: startups, small teams, fast-moving product organizations.

Cluster Management

Registration Workflow

Register the cluster in GitOpsHQ with a name and description. Choose a name that reflects the cluster's purpose and region (e.g., prod-us-east-1, staging-eu-west).
Install the agent using the generated Helm command. The agent runs in a dedicated namespace (gitopshq-system) and establishes a secure outbound connection.
Verify connectivity — The cluster should show as Connected within 60 seconds. If it does not, check network egress rules and agent logs.
Map to environments — Assign the cluster to one or more environments. A cluster can serve multiple environments (via namespace isolation) or be dedicated to a single environment.

Cluster Health Monitoring

Check these signals regularly:

SignalHealthyInvestigate
Agent heartbeatLast seen < 60s agoLast seen > 5 minutes ago
Sync statusAll workloads syncedAny workload out of sync
Resource capacityCPU/memory below 80%Consistently above 85%
Drift countZeroAny non-zero count

Registry Curation

The registry is your organization's library of reusable deployment templates. A well-curated registry eliminates configuration drift between projects and reduces onboarding time for new services.

Chart Governance

  • Naming convention — Use lowercase with hyphens: microservice-base, cronjob-template, statefulset-postgres.
  • Versioning — Follow semantic versioning strictly. Breaking template changes require a major version bump.
  • Deprecation — When retiring a chart version, mark it as deprecated in the registry. The adoption dashboard shows which projects still use it. Give teams a migration window before removal.
  • Validation — Every chart publish runs lint checks. Consider adding custom OPA policies that enforce chart standards (e.g., all charts must include health checks).

Chart Lifecycle

Loading diagram…

Policy Design

OPA Policy Strategy

Design OPA policies in layers, from broad organizational standards to environment-specific rules:

  1. Organizational policies — Apply everywhere. Examples: all containers must have resource limits, all images must come from approved registries.
  2. Environment policies — Apply to specific environments. Examples: production replicas must be >= 2, staging must not use latest tag.
  3. Project policies — Apply to specific projects. Examples: this project must use a specific ingress class.

Approval Policy Design

ScenarioRecommended Policy
Development deploymentNo approval required
Staging deployment1 approval from any team
Production deployment2 approvals from distinct teams
Production rollback1 approval (fast-path)
Break-glass sessionSelf-authorized with mandatory justification
Policy changeAdmin approval required

Environment Strategy

Multi-Environment Patterns

development → staging → production

Best for: most organizations. Simple, well-understood, sufficient for typical delivery workflows.

development → qa → performance → staging → production

Best for: organizations with dedicated QA or performance testing phases.

development → staging-tenant-A → staging-tenant-B → production

Best for: multi-tenant SaaS products where tenant-specific validation is required before production rollout.

Team Onboarding

When onboarding a new team to GitOpsHQ, follow this checklist:

Create the team in Settings → Teams with an appropriate name.
Invite members via email or SSO provisioning.
Assign the organization role (typically Member for development teams).
Create resource grants scoped to the projects the team will work on.
Set up a service account for the team's CI pipeline.
Walk through the Developer or DevOps playbook depending on the team's role.
Verify access by having a team member perform a test action and checking the audit log.
Document the team's access model in your organization's RBAC documentation.

Governance Anti-Patterns

Avoid these common mistakes:

Anti-PatternWhy It Is HarmfulBetter Approach
Granting Admin to avoid designing RBACEliminates accountability, creates security riskInvest time in proper resource grants
Using break-glass for routine operationsErodes trust in the emergency systemFix the underlying permission or policy gap
Production without quorum requirementsSingle point of failure for bad deploymentsRequire at least 2 approvals from distinct teams
No environment freeze capabilityCannot safely plan maintenance windowsConfigure freeze support for staging and production
Stale service accountsUnused tokens are security risksRotate tokens quarterly, remove unused accounts
Overly broad OPA policiesTeams work around policies instead of following themStart narrow, expand based on real incidents

Weekly Review Checklist

  • Review the audit log for unusual patterns (unexpected access, repeated failures)
  • Check service account token ages and rotate any older than 90 days
  • Review pending resource grant requests
  • Check registry chart adoption and nudge teams lagging on upgrades
  • Review approval queue latency — if consistently high, consider adjusting quorum or adding approvers
  • Verify cluster agent health across all registered clusters

On this page