Clusters & Agents

Complete guide to cluster registration, agent installation, runtime visibility, inventory, log streaming, and command operations.

Cluster Concept

A cluster in GitOpsHQ represents a registered Kubernetes cluster with an active agent connection. Clusters are the deployment targets for your project deliveries — the delivery generator commits manifests to Git, and the cluster agent ensures those manifests are synced and running.

Clusters also provide runtime visibility: inventory browsing, resource inspection, drift detection, log streaming, and operational commands — all from the GitOpsHQ UI without requiring direct kubectl access.

Agent-First Design

The agent initiates all connections outbound to the GitOpsHQ hub. No inbound ports or firewall rules are required on the cluster side. This makes the agent safe to deploy in locked-down environments, private networks, and air-gapped setups with outbound-only proxy access.

Cluster Registration

Click "Register Cluster." Provide a cluster name, display name, provider (AWS EKS, GKE, AKS, on-prem), region, and environment. These map to the agent.clusterName, agent.displayName, agent.provider, agent.region, and agent.environment Helm values.

Receive the registration token and install command. The API returns a registrationToken (prefixed ghqa_reg_, 64 hex characters) and a ready-to-use installCommand. The token expires in 1 hour and is single-use -- copy it immediately.

Install the GitOpsHQ Agent -- On the target cluster, run the Helm install command provided during registration. The agent establishes a secure outbound gRPC connection to the GitOpsHQ hub and completes registration by calling the agent.v1.AgentHub/Register RPC. On success, the hub issues a long-lived agent token (prefixed ghqa_) that the agent persists automatically.

Verify connection. The cluster status shows Connected in the dashboard within 60 seconds. You can re-issue a registration token from the Cluster Detail page via POST /api/v2/clusters/{id}/join-token if the original expires.

Cluster Labels

Labels are key-value pairs attached to clusters for organization and filtering:

Label Example	Use Case
`region:eu-west-1`	Filter clusters by geographic region
`tier:production`	Distinguish production from staging/dev clusters
`provider:eks`	Group by cloud provider
`team:platform`	Track cluster ownership

Labels are used in binding rules and approval policies to target specific cluster groups.

Agent Installation

The agent is a lightweight Go binary that runs as a Kubernetes Deployment in the target cluster.

Installation Methods

The GitOpsHQ Agent chart is distributed as an OCI artifact. No helm repo add is needed.

helm upgrade --install gitopshq-agent \
  oci://ghcr.io/gitopshq-io/charts/gitopshq-agent \
  --namespace gitopshq-system \
  --create-namespace \
  --set registrationToken="ghqa_reg_a1b2c3d4..." \
  --set hub.address="agent.gitopshq.io:50051" \
  --set capabilities.observe=true \
  --set capabilities.argocdRead=true

Copy from the UI

You do not need to build this command manually. The cluster registration dialog returns a ready-to-paste installCommand with your token and hub address pre-filled. The command above is shown for reference.

Core values:

Value	Default	Description
`registrationToken`	`""`	One-time registration token (`ghqa_reg_*`). Expires in 1 hour.
`hub.address`	`gitopshq.example.internal:50051`	GitOpsHQ hub gRPC endpoint
`hub.statusIntervalSeconds`	`30`	Heartbeat interval in seconds
`agent.clusterName`	`default`	Logical cluster identifier
`agent.displayName`	`""`	Human-readable name shown in the UI
`agent.provider`	`""`	Cloud provider (e.g., `eks`, `gke`, `aks`, `on-prem`)
`agent.region`	`""`	Region label
`agent.environment`	`""`	Environment label (e.g., `production`, `staging`)
`agent.tokenPath`	`/var/lib/gitopshq/identity/agent.json`	Path where agent persists its identity

Identity persistence:

Value	Default	Description
`persistence.enabled`	`true`	Enable identity persistence
`persistence.type`	`secret`	`secret` (K8s Secret) or `pvc` (PersistentVolumeClaim)
`persistence.secretName`	`""`	Explicit Secret name (auto-generated if empty)
`persistence.size`	`128Mi`	PVC size (only when `type: pvc`)

Capabilities (each toggles a feature flag):

Value	Default	Description
`capabilities.observe`	`true`	Inventory browsing and resource inspection
`capabilities.diagnosticsRead`	`true`	Namespace-scoped pod diagnostics and log streaming
`capabilities.argocdRead`	`true`	Read ArgoCD application status
`capabilities.argocdWrite`	`false`	Sync and rollback ArgoCD applications
`capabilities.directDeploy`	`false`	Apply manifests directly (non-ArgoCD mode)
`capabilities.kubernetesRestart`	`false`	Rolling restart of Deployments
`capabilities.kubernetesScale`	`false`	Scale Deployment replicas
`capabilities.credentialSync`	`false`	Sync secrets from hub to cluster
`capabilities.tokenRotate`	`true`	Allow hub-initiated token rotation

RBAC, Security, and Networking:

Value	Default	Description
`rbac.create`	`true`	Create ClusterRole and binding
`rbac.profile`	`readonly`	RBAC profile (`readonly` or `readwrite`)
`serviceAccount.create`	`true`	Create a dedicated ServiceAccount
`tls.insecure`	`false`	Skip TLS verification (dev only)
`proxy.httpProxy`	`""`	HTTP proxy for outbound connections
`proxy.httpsProxy`	`""`	HTTPS proxy for outbound connections
`proxy.noProxy`	`""`	No-proxy exclusion list

Optional integrations:

Value	Default	Description
`argocd.server`	`""`	ArgoCD API server address
`argocd.token`	`""`	ArgoCD API token
`argocd.insecure`	`false`	Skip TLS for ArgoCD
`diagnostics.allowedNamespaces`	`[]`	Namespaces the agent can read diagnostics from
`credentialSync.mode`	`mirrored`	Credential sync strategy
`credentialSync.targets`	`[]`	Target namespaces for credential sync
`directDeploy.defaultNamespace`	`""`	Default namespace for direct deploys
`directDeploy.fieldManager`	`gitopshq-agent`	Server-side apply field manager

For environments where Helm is not available, you can deploy the agent manually. Create the namespace, inject the registration token as an environment variable, and apply the agent Deployment.

kubectl create namespace gitopshq-system

kubectl create secret generic gitopshq-agent-registration \
  --namespace gitopshq-system \
  --from-literal=GITOPSHQ_REGISTRATION_TOKEN="ghqa_reg_a1b2c3d4..."

kubectl create secret generic gitopshq-agent-hub \
  --namespace gitopshq-system \
  --from-literal=GITOPSHQ_HUB_ADDRESS="agent.gitopshq.io:50051"

Then create a Deployment that references these secrets as environment variables. The agent binary reads configuration entirely from environment variables:

Env Variable	Required	Description
`GITOPSHQ_HUB_ADDRESS`	Yes	Hub gRPC endpoint (e.g., `agent.gitopshq.io:50051`)
`GITOPSHQ_REGISTRATION_TOKEN`	Yes (first run)	One-time registration token
`GITOPSHQ_IDENTITY_STORE_MODE`	No	`file` or `secret` (default: `file`)
`GITOPSHQ_AGENT_TOKEN_PATH`	No	Identity file path (default: `/tmp/gitopshq-agent-token`)
`GITOPSHQ_CLUSTER_NAME`	No	Logical cluster name (default: `default`)
`GITOPSHQ_CAPABILITIES`	No	Comma-separated capabilities (default: `observe`)
`GITOPSHQ_HUB_INSECURE`	No	Skip TLS verification (default: `false`)

Helm is strongly recommended

The Helm chart handles RBAC, identity Secret management, capability flags, and secure defaults automatically. Manual deployment requires you to maintain all of these yourself.

Agent RBAC Requirements

The Helm chart creates RBAC resources based on the rbac.profile value and enabled capabilities:

readonly profile (default) -- cluster-wide read access for inventory and inspection:

Resource	Verbs	Capability
`pods`, `deployments`, `services`, `configmaps`, `secrets`, `statefulsets`, `daemonsets`, `jobs`, `cronjobs`, `ingresses`, `replicasets`	`get`, `list`, `watch`	`observe`
`pods/log`	`get`	`diagnosticsRead`
`events`	`get`, `list`, `watch`	`observe`
`namespaces`	`get`, `list`	`observe`
`applications.argoproj.io`	`get`, `list`, `watch`	`argocdRead`

readwrite profile -- adds write verbs for operational commands:

Resource	Extra Verbs	Capability
`deployments`	`patch`	`kubernetesRestart`, `kubernetesScale`
`applications.argoproj.io`	`patch`, `update`	`argocdWrite`
`secrets`	`create`, `get`, `update`, `patch`	`credentialSync`
All core resources	`create`, `update`, `patch`, `delete`	`directDeploy`

Identity Secret RBAC -- when persistence.type: secret, the chart grants create, get, update, patch on the identity Secret so the agent can persist its token after registration.

Agent Architecture

The agent is designed for security, reliability, and minimal footprint:

Loading diagram…

Connection Model

Aspect	Detail
Protocol	gRPC over HTTP/2 with TLS. Default port `50051` (configurable via `AGENT_GRPC_PUBLIC_ADDRESS` on the hub).
Direction	Agent initiates outbound -- no inbound ports or firewall rules required.
Registration	Unary RPC: `agent.v1.AgentHub/Register`. Authenticates with `Authorization: Bearer <registrationToken>`.
Ongoing stream	Bidirectional RPC: `agent.v1.AgentHub/Connect`. Authenticates with `Authorization: Bearer <agentToken>`.
Heartbeat	Agent sends status every `hub.statusIntervalSeconds` (default: 30s). Hub marks the cluster disconnected if no heartbeat arrives within the timeout window.
Reconnection	Automatic exponential backoff on disconnect. The agent re-establishes the `Connect` stream using its persisted agent token.

Token Lifecycle

Phase	Token Prefix	Format	Lifetime
Registration	`ghqa_reg_`	64 hex characters (32 random bytes)	Expires in 1 hour. Single use -- consumed on first successful `Register` RPC.
Agent (post-registration)	`ghqa_`	64 hex characters (32 random bytes)	Long-lived. Stored as SHA-256 hash on the hub.
Rotation	`ghqa_`	Same format	Hub issues a new agent token via `POST /api/v2/clusters/{id}/rotate-token`. Old token invalidated after grace period.

Token Security

Registration tokens are displayed once during cluster registration and expire in 1 hour. If lost or expired, issue a new one from the Cluster Detail page (POST /api/v2/clusters/{id}/join-token). Agent tokens are never displayed in the UI -- the agent persists them locally (as a Kubernetes Secret or file) and rotates them automatically when capabilities.tokenRotate is enabled.

Cluster Detail Page

The Cluster Detail page (/clusters/:id) is the operational hub for a registered cluster. It provides real-time visibility into the cluster's state, workloads, and health.

Health Status

The cluster header always shows the current connection health:

Status	Indicator	Meaning
Connected	Green	Agent is connected and heartbeat is current
Disconnected	Red	Agent has not sent a heartbeat within the timeout window (default: 90 seconds)
Lagging	Yellow	Agent is connected but heartbeat is delayed (network instability or agent load)

Cluster Tabs

Inventory

Browse the cluster's namespaces and resources in real-time.

Namespace list with resource counts per type (pods, deployments, services, configmaps, secrets)
Resource table filtered by namespace and type
Resource summary: name, status, age, labels
Click any resource to open the Resource Inspection view

The inventory is refreshed periodically by the agent and on-demand when you navigate to the tab.

ArgoCD Applications

If ArgoCD is running in the cluster, this tab shows all ArgoCD Application resources:

Column	Description
Name	Application name
Sync Status	`Synced`, `OutOfSync`, `Unknown`
Health Status	`Healthy`, `Degraded`, `Progressing`, `Missing`
Source	Git repository and path
Destination	Target namespace
Last Sync	Timestamp of last successful sync

Quick actions: Force Sync, Hard Refresh, View in ArgoCD UI (if ArgoCD dashboard URL is configured).

Drift Findings

Drift detection compares the desired state (from Git) with the actual state (from the cluster) for each resource managed by GitOpsHQ deliveries.

Per-resource drift detail with field-level differences
Drift severity classification (cosmetic vs. functional)
"Resolve" action: force sync from Git to cluster
Drift history: when drift was first detected, how long it persisted

Bindings

Shows which project deliveries target this cluster:

Column	Description
Project	The project that generates manifests for this cluster
Tenant	Which tenants are bound to this cluster
Environment	Which environments target this cluster
Last Delivery	Timestamp and revision of the last deployed delivery
Sync Status	Whether the latest delivery is synced to the cluster

Commands

History of all commands sent to the cluster and their execution results.

Column	Description
Command	The operation type (sync, rollback, restart, scale)
Target	The resource targeted by the command
Requester	Who initiated the command
Status	`pending`, `executing`, `completed`, `failed`, `rejected`
Timestamp	When the command was issued and completed
Output	Command execution output or error message

Filter by command type, status, or date range to investigate operational history.

Log Streaming

Real-time pod log streaming powered by xterm.js.

Select namespace from the dropdown.
Select pod from the filtered pod list.
Select container (for multi-container pods).
Start streaming. Logs appear in real-time in the terminal viewer. Previous logs (tail) are loaded first, then the stream follows new log lines.

Features:

Search within logs: Ctrl+F to find specific log entries
Pause/resume: Pause the stream to inspect a specific section
Download: Export the current log buffer as a text file
Timestamps: Toggle timestamp display on each log line
Follow mode: Auto-scroll to latest log entries (enabled by default)

Credential Sync

Status of credential synchronization from GitOpsHQ to the cluster (e.g., image pull secrets, TLS certificates):

Column	Description
Credential	The credential name and type
Target Namespace	Where the credential is synced in the cluster
Sync Status	`synced`, `pending`, `failed`
Last Synced	Timestamp of last successful sync
Error	Error message if sync failed

Resource Inspection

Deep-dive into any Kubernetes resource:

YAML view: Full resource YAML with syntax highlighting
Events: Kubernetes events related to the resource (sorted by timestamp)
Related resources: Linked resources (e.g., a Deployment's ReplicaSets and Pods)
Labels and annotations: Searchable label/annotation display
Conditions: Resource condition table (for Deployments, Nodes, Pods)

Cluster Commands

Commands are operational actions sent to the cluster agent for execution. They provide controlled cluster operations without requiring direct kubectl access.

Available Commands

Command	Description	Target	Default Approval
Sync	Force ArgoCD to reconcile an application with its Git source	ArgoCD Application	Configurable
Rollback	Revert an ArgoCD application to a previous sync revision	ArgoCD Application	Usually required
Restart	Trigger a rolling restart of a Deployment (updates pod template annotation)	Deployment	Configurable
Scale	Change the replica count of a Deployment	Deployment	Configurable

Command Execution Flow

Loading diagram…

Command Approval

Commands can require approval based on cluster approval policies:

Bind an approval policy to a cluster from the Cluster Detail settings
Configure which command types require approval per cluster
Production clusters typically require approval for Rollback and Scale
Development clusters can auto-approve all commands
Command approval requests appear in the unified Approvals inbox (/approvals)

Cluster Bindings

Bindings connect project deliveries to clusters — they define where generated manifests are deployed.

Binding Types

Binding	Description
Agent binding	Links a project's delivery output to a specific cluster. The cluster receives manifests for the bound tenant-environments.
Multi-project	Multiple projects can target the same cluster. Each project's delivery output is scoped to its own namespace/path.
Multi-cluster	A single project can target multiple clusters (e.g., for multi-region deployments). Each cluster receives the same or region-specific manifests.

Binding Configuration

Bindings are configured at the project level under Settings → Targets:

Select the cluster to bind
Map tenant-environments to the cluster
Configure the target namespace pattern (e.g., {{ .tenant }}-{{ .environment }})
Optionally set cluster-specific overrides (e.g., region-specific storage class)

Cluster Approval Policies

Approval policies can be bound directly to clusters to control operational actions:

Policy Setting	Description
Command approvals	Which command types (sync, rollback, restart, scale) require approval on this cluster
Required approvers	Number of approvals needed for gated commands
Break-glass	Allow authorized operators to bypass approval in emergencies
Auto-approve	List of command types that execute immediately without approval

Cluster approval requests appear in the unified Approvals inbox alongside release and promotion approvals.

Agent Health Monitoring

Monitor agent health across all clusters from the Clusters list page:

Metric	Description
Connection uptime	Percentage of time the agent has been connected over the last 24h/7d/30d
Heartbeat latency	Average time between heartbeat signals
Last seen	Timestamp of the most recent agent heartbeat
Agent version	Currently running agent version
Capabilities	Reported capability flags: `observe`, `diagnostics.read`, `argocd.read`, `argocd.write`, `direct.deploy`, `kubernetes.restart`, `kubernetes.scale`, `credential.sync`, `token.rotate`

Version Drift

If the agent version is significantly behind the latest release, some features may not work correctly. The Clusters page highlights outdated agents with a version warning badge.

Error Reference

Best Practices

Monitor agent health proactively. Set up notifications for agent disconnection events. A disconnected agent means no runtime visibility and no command execution — but Git-based deployments continue to work via ArgoCD.

Rotate agent tokens on schedule. Enable automatic token rotation (configured in the Helm chart) and review rotation logs periodically.

Use approval policies on production clusters. Require approval for Rollback and Scale commands on production clusters. Auto-approve Sync commands if your ArgoCD configuration is reliable.

Keep agent versions current. Update the agent Helm chart regularly to receive new capabilities, security patches, and performance improvements.

Use log streaming for incident investigation. Real-time log streaming eliminates the need for direct cluster access during incidents, reducing the number of people who need kubectl credentials.

Label clusters consistently. A consistent labeling scheme makes it easy to filter clusters, configure binding rules, and apply approval policies across cluster groups.

Error Reference

On this page