Platform Playbook
Governance and infrastructure guide for platform engineers managing organizations, RBAC, clusters, registries, and policies in GitOpsHQ.
Your Role in GitOpsHQ
As a Platform Engineer, you are the architect of your organization's GitOps foundation. You design the security model, manage cluster infrastructure, curate reusable artifacts, author governance policies, and establish the conventions that every other role follows. Your decisions determine the safety, consistency, and velocity of the entire delivery pipeline.
You set the guardrails — DevOps, Developers, and SREs operate within the boundaries you define.
Your Access Scope
You typically hold the Admin role at the organization level, giving you full access to all settings, policies, clusters, and registry operations. Use this power carefully — follow least-privilege principles when granting access to others.
Organization Governance
Setting Up a New Organization
Create the Organization — Navigate to Organizations → New. Choose a name that reflects the business unit or product line (e.g., acme-payments). Configure the default timezone, billing contact, and notification preferences.
Define Environments — Go to Settings → Environments and create the delivery stages your organization uses. The most common pattern is development → staging → production, but you can add more (e.g., qa, performance, canary). Set the ordering rank to define the promotion sequence.
Establish Environment Policies — For each environment, decide on the governance controls:
| Environment | Approval Required | Freeze Support | OPA Enforcement |
|---|---|---|---|
| Development | No | No | Warn only |
| Staging | Optional (1 approver) | Yes | Warn only |
| Production | Required (2 approvers, distinct teams) | Yes | Enforce |
Configure Notification Channels — Set up Slack, Microsoft Teams, or webhook integrations for deployment notifications, approval requests, and alerts. Route different event types to appropriate channels.
RBAC Setup
A well-designed RBAC model balances security with developer productivity. The goal is least-privilege access that does not create friction for routine operations.
Role Hierarchy
| Role | Scope | Typical Holder | Capabilities |
|---|---|---|---|
| Admin | Organization | Platform Engineer | Full access to all settings, policies, clusters, and registry |
| Operator | Organization | SRE | Cluster management, diagnostics, rollback, break-glass |
| Member | Organization | Developer, DevOps | Project access governed by resource grants |
Team Design Principles
- Mirror your organizational structure — Create teams that match real reporting lines or functional groups. This makes approval policies meaningful (e.g., "requires approval from a different team").
- Keep teams between 3–10 members — Too small and approval routing becomes a bottleneck. Too large and accountability is diluted.
- Use service accounts for automation — Never share human credentials with CI/CD systems. Create dedicated service accounts with scoped tokens.
Resource Grant Patterns
Strict Model — Each team has write access only to their own projects. Cross-project access is read-only. Production access requires explicit grants.
backend-team → backend-projects (write)
backend-team → all-other-projects (read)
frontend-team → frontend-projects (write)
frontend-team → all-other-projects (read)
sre-team → all-projects (read + rollback)Best for: regulated industries, large organizations, compliance-sensitive environments.
Collaborative Model — Teams have write access to related projects and broader read access. Production still requires approval gates.
product-team → product-projects (write, all envs)
product-team → shared-infra (read)
platform-team → all-projects (admin)Best for: startups, small teams, fast-moving product organizations.
Cluster Management
Registration Workflow
prod-us-east-1, staging-eu-west).gitopshq-system) and establishes a secure outbound connection.Cluster Health Monitoring
Check these signals regularly:
| Signal | Healthy | Investigate |
|---|---|---|
| Agent heartbeat | Last seen < 60s ago | Last seen > 5 minutes ago |
| Sync status | All workloads synced | Any workload out of sync |
| Resource capacity | CPU/memory below 80% | Consistently above 85% |
| Drift count | Zero | Any non-zero count |
Registry Curation
The registry is your organization's library of reusable deployment templates. A well-curated registry eliminates configuration drift between projects and reduces onboarding time for new services.
Chart Governance
- Naming convention — Use lowercase with hyphens:
microservice-base,cronjob-template,statefulset-postgres. - Versioning — Follow semantic versioning strictly. Breaking template changes require a major version bump.
- Deprecation — When retiring a chart version, mark it as deprecated in the registry. The adoption dashboard shows which projects still use it. Give teams a migration window before removal.
- Validation — Every chart publish runs lint checks. Consider adding custom OPA policies that enforce chart standards (e.g., all charts must include health checks).
Chart Lifecycle
Policy Design
OPA Policy Strategy
Design OPA policies in layers, from broad organizational standards to environment-specific rules:
- Organizational policies — Apply everywhere. Examples: all containers must have resource limits, all images must come from approved registries.
- Environment policies — Apply to specific environments. Examples: production replicas must be >= 2, staging must not use
latesttag. - Project policies — Apply to specific projects. Examples: this project must use a specific ingress class.
Approval Policy Design
| Scenario | Recommended Policy |
|---|---|
| Development deployment | No approval required |
| Staging deployment | 1 approval from any team |
| Production deployment | 2 approvals from distinct teams |
| Production rollback | 1 approval (fast-path) |
| Break-glass session | Self-authorized with mandatory justification |
| Policy change | Admin approval required |
Environment Strategy
Multi-Environment Patterns
development → staging → productionBest for: most organizations. Simple, well-understood, sufficient for typical delivery workflows.
development → qa → performance → staging → productionBest for: organizations with dedicated QA or performance testing phases.
development → staging-tenant-A → staging-tenant-B → productionBest for: multi-tenant SaaS products where tenant-specific validation is required before production rollout.
Team Onboarding
When onboarding a new team to GitOpsHQ, follow this checklist:
Governance Anti-Patterns
Avoid these common mistakes:
| Anti-Pattern | Why It Is Harmful | Better Approach |
|---|---|---|
| Granting Admin to avoid designing RBAC | Eliminates accountability, creates security risk | Invest time in proper resource grants |
| Using break-glass for routine operations | Erodes trust in the emergency system | Fix the underlying permission or policy gap |
| Production without quorum requirements | Single point of failure for bad deployments | Require at least 2 approvals from distinct teams |
| No environment freeze capability | Cannot safely plan maintenance windows | Configure freeze support for staging and production |
| Stale service accounts | Unused tokens are security risks | Rotate tokens quarterly, remove unused accounts |
| Overly broad OPA policies | Teams work around policies instead of following them | Start narrow, expand based on real incidents |
Weekly Review Checklist
- Review the audit log for unusual patterns (unexpected access, repeated failures)
- Check service account token ages and rotate any older than 90 days
- Review pending resource grant requests
- Check registry chart adoption and nudge teams lagging on upgrades
- Review approval queue latency — if consistently high, consider adjusting quorum or adding approvers
- Verify cluster agent health across all registered clusters