Clusters & Agents
Complete guide to cluster registration, agent installation, runtime visibility, inventory, log streaming, and command operations.
Cluster Concept
A cluster in GitOpsHQ represents a registered Kubernetes cluster with an active agent connection. Clusters are the deployment targets for your project deliveries — the delivery generator commits manifests to Git, and the cluster agent ensures those manifests are synced and running.
Clusters also provide runtime visibility: inventory browsing, resource inspection, drift detection, log streaming, and operational commands — all from the GitOpsHQ UI without requiring direct kubectl access.
Agent-First Design
The agent initiates all connections outbound to the GitOpsHQ hub. No inbound ports or firewall rules are required on the cluster side. This makes the agent safe to deploy in locked-down environments, private networks, and air-gapped setups with outbound-only proxy access.
Cluster Registration
Register a cluster from the Clusters page (/clusters). The backend API is POST /api/v2/clusters.
Click "Register Cluster." Provide a cluster name, display name, provider (AWS EKS, GKE, AKS, on-prem), region, and environment. These map to the agent.clusterName, agent.displayName, agent.provider, agent.region, and agent.environment Helm values.
Receive the registration token and install command. The API returns a registrationToken (prefixed ghqa_reg_, 64 hex characters) and a ready-to-use installCommand. The token expires in 1 hour and is single-use -- copy it immediately.
Install the GitOpsHQ Agent -- On the target cluster, run the Helm install command provided during registration. The agent establishes a secure outbound gRPC connection to the GitOpsHQ hub and completes registration by calling the agent.v1.AgentHub/Register RPC. On success, the hub issues a long-lived agent token (prefixed ghqa_) that the agent persists automatically.
Verify connection. The cluster status shows Connected in the dashboard within 60 seconds. You can re-issue a registration token from the Cluster Detail page via POST /api/v2/clusters/{id}/join-token if the original expires.
Cluster Labels
Labels are key-value pairs attached to clusters for organization and filtering:
| Label Example | Use Case |
|---|---|
region:eu-west-1 | Filter clusters by geographic region |
tier:production | Distinguish production from staging/dev clusters |
provider:eks | Group by cloud provider |
team:platform | Track cluster ownership |
Labels are used in binding rules and approval policies to target specific cluster groups.
Agent Installation
The agent is a lightweight Go binary that runs as a Kubernetes Deployment in the target cluster.
Installation Methods
The GitOpsHQ Agent chart is distributed as an OCI artifact. No helm repo add is needed.
helm upgrade --install gitopshq-agent \
oci://ghcr.io/gitopshq-io/charts/gitopshq-agent \
--namespace gitopshq-system \
--create-namespace \
--set registrationToken="ghqa_reg_a1b2c3d4..." \
--set hub.address="agent.gitopshq.io:50051" \
--set capabilities.observe=true \
--set capabilities.argocdRead=trueCopy from the UI
You do not need to build this command manually. The cluster registration dialog returns a ready-to-paste installCommand with your token and hub address pre-filled. The command above is shown for reference.
Core values:
| Value | Default | Description |
|---|---|---|
registrationToken | "" | One-time registration token (ghqa_reg_*). Expires in 1 hour. |
hub.address | gitopshq.example.internal:50051 | GitOpsHQ hub gRPC endpoint |
hub.statusIntervalSeconds | 30 | Heartbeat interval in seconds |
agent.clusterName | default | Logical cluster identifier |
agent.displayName | "" | Human-readable name shown in the UI |
agent.provider | "" | Cloud provider (e.g., eks, gke, aks, on-prem) |
agent.region | "" | Region label |
agent.environment | "" | Environment label (e.g., production, staging) |
agent.tokenPath | /var/lib/gitopshq/identity/agent.json | Path where agent persists its identity |
Identity persistence:
| Value | Default | Description |
|---|---|---|
persistence.enabled | true | Enable identity persistence |
persistence.type | secret | secret (K8s Secret) or pvc (PersistentVolumeClaim) |
persistence.secretName | "" | Explicit Secret name (auto-generated if empty) |
persistence.size | 128Mi | PVC size (only when type: pvc) |
Capabilities (each toggles a feature flag):
| Value | Default | Description |
|---|---|---|
capabilities.observe | true | Inventory browsing and resource inspection |
capabilities.diagnosticsRead | true | Namespace-scoped pod diagnostics and log streaming |
capabilities.argocdRead | true | Read ArgoCD application status |
capabilities.argocdWrite | false | Sync and rollback ArgoCD applications |
capabilities.directDeploy | false | Apply manifests directly (non-ArgoCD mode) |
capabilities.kubernetesRestart | false | Rolling restart of Deployments |
capabilities.kubernetesScale | false | Scale Deployment replicas |
capabilities.credentialSync | false | Sync secrets from hub to cluster |
capabilities.tokenRotate | true | Allow hub-initiated token rotation |
RBAC, Security, and Networking:
| Value | Default | Description |
|---|---|---|
rbac.create | true | Create ClusterRole and binding |
rbac.profile | readonly | RBAC profile (readonly or readwrite) |
serviceAccount.create | true | Create a dedicated ServiceAccount |
tls.insecure | false | Skip TLS verification (dev only) |
proxy.httpProxy | "" | HTTP proxy for outbound connections |
proxy.httpsProxy | "" | HTTPS proxy for outbound connections |
proxy.noProxy | "" | No-proxy exclusion list |
Optional integrations:
| Value | Default | Description |
|---|---|---|
argocd.server | "" | ArgoCD API server address |
argocd.token | "" | ArgoCD API token |
argocd.insecure | false | Skip TLS for ArgoCD |
diagnostics.allowedNamespaces | [] | Namespaces the agent can read diagnostics from |
credentialSync.mode | mirrored | Credential sync strategy |
credentialSync.targets | [] | Target namespaces for credential sync |
directDeploy.defaultNamespace | "" | Default namespace for direct deploys |
directDeploy.fieldManager | gitopshq-agent | Server-side apply field manager |
For environments where Helm is not available, you can deploy the agent manually. Create the namespace, inject the registration token as an environment variable, and apply the agent Deployment.
kubectl create namespace gitopshq-system
kubectl create secret generic gitopshq-agent-registration \
--namespace gitopshq-system \
--from-literal=GITOPSHQ_REGISTRATION_TOKEN="ghqa_reg_a1b2c3d4..."
kubectl create secret generic gitopshq-agent-hub \
--namespace gitopshq-system \
--from-literal=GITOPSHQ_HUB_ADDRESS="agent.gitopshq.io:50051"Then create a Deployment that references these secrets as environment variables. The agent binary reads configuration entirely from environment variables:
| Env Variable | Required | Description |
|---|---|---|
GITOPSHQ_HUB_ADDRESS | Yes | Hub gRPC endpoint (e.g., agent.gitopshq.io:50051) |
GITOPSHQ_REGISTRATION_TOKEN | Yes (first run) | One-time registration token |
GITOPSHQ_IDENTITY_STORE_MODE | No | file or secret (default: file) |
GITOPSHQ_AGENT_TOKEN_PATH | No | Identity file path (default: /tmp/gitopshq-agent-token) |
GITOPSHQ_CLUSTER_NAME | No | Logical cluster name (default: default) |
GITOPSHQ_CAPABILITIES | No | Comma-separated capabilities (default: observe) |
GITOPSHQ_HUB_INSECURE | No | Skip TLS verification (default: false) |
Helm is strongly recommended
The Helm chart handles RBAC, identity Secret management, capability flags, and secure defaults automatically. Manual deployment requires you to maintain all of these yourself.
Agent RBAC Requirements
The Helm chart creates RBAC resources based on the rbac.profile value and enabled capabilities:
readonly profile (default) -- cluster-wide read access for inventory and inspection:
| Resource | Verbs | Capability |
|---|---|---|
pods, deployments, services, configmaps, secrets, statefulsets, daemonsets, jobs, cronjobs, ingresses, replicasets | get, list, watch | observe |
pods/log | get | diagnosticsRead |
events | get, list, watch | observe |
namespaces | get, list | observe |
applications.argoproj.io | get, list, watch | argocdRead |
readwrite profile -- adds write verbs for operational commands:
| Resource | Extra Verbs | Capability |
|---|---|---|
deployments | patch | kubernetesRestart, kubernetesScale |
applications.argoproj.io | patch, update | argocdWrite |
secrets | create, get, update, patch | credentialSync |
| All core resources | create, update, patch, delete | directDeploy |
Identity Secret RBAC -- when persistence.type: secret, the chart grants create, get, update, patch on the identity Secret so the agent can persist its token after registration.
Agent Architecture
The agent is designed for security, reliability, and minimal footprint:
Connection Model
| Aspect | Detail |
|---|---|
| Protocol | gRPC over HTTP/2 with TLS. Default port 50051 (configurable via AGENT_GRPC_PUBLIC_ADDRESS on the hub). |
| Direction | Agent initiates outbound -- no inbound ports or firewall rules required. |
| Registration | Unary RPC: agent.v1.AgentHub/Register. Authenticates with Authorization: Bearer <registrationToken>. |
| Ongoing stream | Bidirectional RPC: agent.v1.AgentHub/Connect. Authenticates with Authorization: Bearer <agentToken>. |
| Heartbeat | Agent sends status every hub.statusIntervalSeconds (default: 30s). Hub marks the cluster disconnected if no heartbeat arrives within the timeout window. |
| Reconnection | Automatic exponential backoff on disconnect. The agent re-establishes the Connect stream using its persisted agent token. |
Token Lifecycle
| Phase | Token Prefix | Format | Lifetime |
|---|---|---|---|
| Registration | ghqa_reg_ | 64 hex characters (32 random bytes) | Expires in 1 hour. Single use -- consumed on first successful Register RPC. |
| Agent (post-registration) | ghqa_ | 64 hex characters (32 random bytes) | Long-lived. Stored as SHA-256 hash on the hub. |
| Rotation | ghqa_ | Same format | Hub issues a new agent token via POST /api/v2/clusters/{id}/rotate-token. Old token invalidated after grace period. |
Token Security
Registration tokens are displayed once during cluster registration and expire in 1 hour. If lost or expired, issue a new one from the Cluster Detail page (POST /api/v2/clusters/{id}/join-token). Agent tokens are never displayed in the UI -- the agent persists them locally (as a Kubernetes Secret or file) and rotates them automatically when capabilities.tokenRotate is enabled.
Cluster Detail Page
The Cluster Detail page (/clusters/:id) is the operational hub for a registered cluster. It provides real-time visibility into the cluster's state, workloads, and health.
Health Status
The cluster header always shows the current connection health:
| Status | Indicator | Meaning |
|---|---|---|
| Connected | Green | Agent is connected and heartbeat is current |
| Disconnected | Red | Agent has not sent a heartbeat within the timeout window (default: 90 seconds) |
| Lagging | Yellow | Agent is connected but heartbeat is delayed (network instability or agent load) |
Cluster Tabs
Inventory
Browse the cluster's namespaces and resources in real-time.
- Namespace list with resource counts per type (pods, deployments, services, configmaps, secrets)
- Resource table filtered by namespace and type
- Resource summary: name, status, age, labels
- Click any resource to open the Resource Inspection view
The inventory is refreshed periodically by the agent and on-demand when you navigate to the tab.
ArgoCD Applications
If ArgoCD is running in the cluster, this tab shows all ArgoCD Application resources:
| Column | Description |
|---|---|
| Name | Application name |
| Sync Status | Synced, OutOfSync, Unknown |
| Health Status | Healthy, Degraded, Progressing, Missing |
| Source | Git repository and path |
| Destination | Target namespace |
| Last Sync | Timestamp of last successful sync |
Quick actions: Force Sync, Hard Refresh, View in ArgoCD UI (if ArgoCD dashboard URL is configured).
Drift Findings
Drift detection compares the desired state (from Git) with the actual state (from the cluster) for each resource managed by GitOpsHQ deliveries.
- Per-resource drift detail with field-level differences
- Drift severity classification (cosmetic vs. functional)
- "Resolve" action: force sync from Git to cluster
- Drift history: when drift was first detected, how long it persisted
Bindings
Shows which project deliveries target this cluster:
| Column | Description |
|---|---|
| Project | The project that generates manifests for this cluster |
| Tenant | Which tenants are bound to this cluster |
| Environment | Which environments target this cluster |
| Last Delivery | Timestamp and revision of the last deployed delivery |
| Sync Status | Whether the latest delivery is synced to the cluster |
Commands
History of all commands sent to the cluster and their execution results.
| Column | Description |
|---|---|
| Command | The operation type (sync, rollback, restart, scale) |
| Target | The resource targeted by the command |
| Requester | Who initiated the command |
| Status | pending, executing, completed, failed, rejected |
| Timestamp | When the command was issued and completed |
| Output | Command execution output or error message |
Filter by command type, status, or date range to investigate operational history.
Log Streaming
Real-time pod log streaming powered by xterm.js.
- Select namespace from the dropdown.
- Select pod from the filtered pod list.
- Select container (for multi-container pods).
- Start streaming. Logs appear in real-time in the terminal viewer. Previous logs (tail) are loaded first, then the stream follows new log lines.
Features:
- Search within logs: Ctrl+F to find specific log entries
- Pause/resume: Pause the stream to inspect a specific section
- Download: Export the current log buffer as a text file
- Timestamps: Toggle timestamp display on each log line
- Follow mode: Auto-scroll to latest log entries (enabled by default)
Credential Sync
Status of credential synchronization from GitOpsHQ to the cluster (e.g., image pull secrets, TLS certificates):
| Column | Description |
|---|---|
| Credential | The credential name and type |
| Target Namespace | Where the credential is synced in the cluster |
| Sync Status | synced, pending, failed |
| Last Synced | Timestamp of last successful sync |
| Error | Error message if sync failed |
Resource Inspection
Deep-dive into any Kubernetes resource:
- YAML view: Full resource YAML with syntax highlighting
- Events: Kubernetes events related to the resource (sorted by timestamp)
- Related resources: Linked resources (e.g., a Deployment's ReplicaSets and Pods)
- Labels and annotations: Searchable label/annotation display
- Conditions: Resource condition table (for Deployments, Nodes, Pods)
Cluster Commands
Commands are operational actions sent to the cluster agent for execution. They provide controlled cluster operations without requiring direct kubectl access.
Available Commands
| Command | Description | Target | Default Approval |
|---|---|---|---|
| Sync | Force ArgoCD to reconcile an application with its Git source | ArgoCD Application | Configurable |
| Rollback | Revert an ArgoCD application to a previous sync revision | ArgoCD Application | Usually required |
| Restart | Trigger a rolling restart of a Deployment (updates pod template annotation) | Deployment | Configurable |
| Scale | Change the replica count of a Deployment | Deployment | Configurable |
Command Execution Flow
Command Approval
Commands can require approval based on cluster approval policies:
- Bind an approval policy to a cluster from the Cluster Detail settings
- Configure which command types require approval per cluster
- Production clusters typically require approval for Rollback and Scale
- Development clusters can auto-approve all commands
- Command approval requests appear in the unified Approvals inbox (
/approvals)
Cluster Bindings
Bindings connect project deliveries to clusters — they define where generated manifests are deployed.
Binding Types
| Binding | Description |
|---|---|
| Agent binding | Links a project's delivery output to a specific cluster. The cluster receives manifests for the bound tenant-environments. |
| Multi-project | Multiple projects can target the same cluster. Each project's delivery output is scoped to its own namespace/path. |
| Multi-cluster | A single project can target multiple clusters (e.g., for multi-region deployments). Each cluster receives the same or region-specific manifests. |
Binding Configuration
Bindings are configured at the project level under Settings → Targets:
- Select the cluster to bind
- Map tenant-environments to the cluster
- Configure the target namespace pattern (e.g.,
{{ .tenant }}-{{ .environment }}) - Optionally set cluster-specific overrides (e.g., region-specific storage class)
Cluster Approval Policies
Approval policies can be bound directly to clusters to control operational actions:
| Policy Setting | Description |
|---|---|
| Command approvals | Which command types (sync, rollback, restart, scale) require approval on this cluster |
| Required approvers | Number of approvals needed for gated commands |
| Break-glass | Allow authorized operators to bypass approval in emergencies |
| Auto-approve | List of command types that execute immediately without approval |
Cluster approval requests appear in the unified Approvals inbox alongside release and promotion approvals.
Agent Health Monitoring
Monitor agent health across all clusters from the Clusters list page:
| Metric | Description |
|---|---|
| Connection uptime | Percentage of time the agent has been connected over the last 24h/7d/30d |
| Heartbeat latency | Average time between heartbeat signals |
| Last seen | Timestamp of the most recent agent heartbeat |
| Agent version | Currently running agent version |
| Capabilities | Reported capability flags: observe, diagnostics.read, argocd.read, argocd.write, direct.deploy, kubernetes.restart, kubernetes.scale, credential.sync, token.rotate |
Version Drift
If the agent version is significantly behind the latest release, some features may not work correctly. The Clusters page highlights outdated agents with a version warning badge.
Error Reference
Best Practices
Monitor agent health proactively. Set up notifications for agent disconnection events. A disconnected agent means no runtime visibility and no command execution — but Git-based deployments continue to work via ArgoCD.
Rotate agent tokens on schedule. Enable automatic token rotation (configured in the Helm chart) and review rotation logs periodically.
Use approval policies on production clusters. Require approval for Rollback and Scale commands on production clusters. Auto-approve Sync commands if your ArgoCD configuration is reliable.
Keep agent versions current. Update the agent Helm chart regularly to receive new capabilities, security patches, and performance improvements.
Use log streaming for incident investigation. Real-time log streaming eliminates the need for direct cluster access during incidents, reducing the number of people who need kubectl credentials.
Label clusters consistently. A consistent labeling scheme makes it easy to filter clusters, configure binding rules, and apply approval policies across cluster groups.