My connector is offline

A connector normally flips to Online within 30 seconds of install. If it does not, or if a previously healthy connector drops to Offline, Degraded, or Needs recovery, use this runbook.

What the status means

Offline: the connector is not currently connected, but it may recover on its own after a transient network interruption or process restart.
Degraded: at least one runtime path is unhealthy, but not every connector path is down.
Needs recovery: the connector is no longer in a normal reconnect path. Treat this as durable identity loss, invalid bootstrap fallback, or another bounded recovery case. The connector does not silently retry the original bootstrap token in this state; it fails closed and waits for an explicit operator-initiated recovery flow.

Autonomous mode vs guided recovery

How the connector recovers from a restart depends on whether durable identity storage is present.

Mode	Storage posture	Runtime outcome	Operator action
Autonomous mode	Durable `/data` or `PAM_AGENT_DATA_DIR` is mounted and intact	The connector loads its identity bundle on startup and performs mTLS reconnect without operator intervention	None
Guided recovery	No persistent `/data` or the identity volume was lost / recreated empty	The connector cannot reconnect normally and enters `needs_recovery` while preserving connector configuration in the control plane	Mint a fresh single-use enrollment token and re-pair the connector

The connector loads its identity bundle on startup and performs mTLS reconnect without any operator action in autonomous mode. This is the default for correctly configured VM, Docker, and Kubernetes deployments. Restarts, rescheduled pods, and container recreation all succeed automatically as long as durable storage survives.

In guided recovery, the connector has no local identity and cannot reconnect. The connector configuration -- org assignment, endpoint ID, and allowed networks -- is preserved in the control plane, but a new enrollment is required. This is the explicit needs_recovery state.

Bootstrap retirement occurs after the first successful enrollment and the first confirmed online heartbeat. Once the connector identity bundle is persisted to durable storage and the control plane records that heartbeat, the bootstrap token is consumed and retired. All subsequent restarts must use the persisted identity for mTLS reconnect. There is no path -- in either mode -- where a previously enrolled connector silently falls back to the bootstrap token as a normal restart credential.

Quick checks

Is the host up? SSH or RDP into the connector machine; confirm Docker / the native service is running.
- Docker: docker ps | grep vaultpam-connector — the container should be Up.
- systemd: systemctl status vaultpam-connector.
Can the host reach the control plane? From the connector machine:
```
curl -v https://<control-plane-host>/healthz
```
If this fails, you have a network/DNS/firewall issue. Ports 443 (control plane HTTPS) and optionally 51820/udp (VPN reverse tunnel) must be open outbound.
Is the enrolment token still valid? Tokens expire after 4 hours by default. If the connector has never come online and the token expired, generate a new one from Connectors → pick connector → Regenerate token.
Did the connector lose its durable identity storage? A previously paired connector should restart from the persisted identity under /data or PAM_AGENT_DATA_DIR.
- Docker: confirm the container still mounts the same named volume or bind mount at /data.
- Native / VM: confirm the service still points at the same durable host path and that the connector user can read it.
- Kubernetes: confirm the pod still mounts the expected PVC and was not redeployed onto emptyDir or another ephemeral layer.

Identify the failure class

1. First-boot bootstrap failure

Use this branch if the connector never paired successfully.

Common signals:

the connector never became Online
logs show token expiry, token revoke, CSR validation failure, or CA trust failure
the UI still shows an onboarding or activation state rather than ready

Recovery:

Fix the trust, network, or token problem.
Generate a fresh enrollment token if the original one expired or was revoked.
Retry pairing.

2. Normal reconnect failure

Use this branch if the connector paired before and still has its durable identity.

Common signals:

temporary Offline after reboot, deploy, or network flap
cp_tunnel_heartbeat_timeouts_total alert activity
the local data directory is still present and readable

Recovery:

Restore outbound HTTPS reachability to the control plane.
Confirm the connector still has its local identity bundle on durable storage.
Restart the connector process or pod once.
Validate that the connector returns to Online without using a new enrollment token.

3. Durable identity loss or invalid bootstrap fallback

Use this branch if the connector paired before, but now behaves like a brand-new connector.

Common signals:

UI or API shows Needs recovery
logs show 401 ENROLLMENT_TOKEN_INVALID after a previously successful pairing
enrollment endpoint 401 alerts spike after restart
the durable storage mount was replaced, removed, or recreated empty

Recovery:

Stop the connector or scale the pod to zero before changing credentials.
Try to reattach the original durable storage first.
If the original identity is gone, treat this as controlled reprovision:
- revoke the stale bootstrap token if it still exists
- mint a fresh single-use enrollment token from the control plane
- ensure durable storage is mounted correctly before starting again
- pair exactly once with the new token
Validate that subsequent restarts reuse the persisted identity instead of asking for another token.
Confirm that preserved connector configuration is still present; if configuration is missing, escalate as a separate deployment issue.

Do not treat repeated bootstrap retries as a normal restart path.

Logs to inspect

Docker: docker logs -n 200 vaultpam-connector.
Native: /var/log/vaultpam/connector.log (Linux) or %PROGRAMDATA%\VaultPAM\connector.log (Windows).

Common errors:

Log fragment	Meaning	Fix
`x509: certificate signed by unknown authority`	Host does not trust the CA	Import the CA cert bundle (`/etc/vaultpam/ca.crt`) — see the runbook printed during install
`connection refused`	Network blocked	Check corporate firewall outbound allowlist
`enrolment token revoked`	Someone clicked Revoke in the UI	Generate a new token
`401 ENROLLMENT_TOKEN_INVALID` after the connector was already paired once	The connector lost its persisted identity and fell back to bootstrap	Reattach the original durable data directory if available. If the identity is gone, generate a fresh token and pair again on durable storage

Post-recovery validation

After any fix:

Confirm the connector reaches Online.
Restart the connector one more time.
Confirm it reconnects without a fresh enrollment token.
Confirm the durable storage path still contains the connector state after the restart.
Capture the evidence for audit:
- connector status before and after
- the log line that proves reconnect or renewal succeeded
- any endpoint.cert_renewed or gateway.enrollment_failed audit entries tied to the incident
- the alert name and time window if monitoring fired

Still stuck

Contact support and include: connector version, log tail, output of curl -v https://<control-plane-host>/healthz.

What the status means​

Autonomous mode vs guided recovery​

Quick checks​

Identify the failure class​

1. First-boot bootstrap failure​

2. Normal reconnect failure​

3. Durable identity loss or invalid bootstrap fallback​

Logs to inspect​

Post-recovery validation​

Still stuck​

Related articles​

What the status means

Autonomous mode vs guided recovery

Quick checks

Identify the failure class

1. First-boot bootstrap failure

2. Normal reconnect failure

3. Durable identity loss or invalid bootstrap fallback

Logs to inspect

Post-recovery validation

Still stuck

Related articles