My connector is offline
A connector normally flips to Online within 30 seconds of install. If it does not, or if a previously healthy connector drops to Offline, Degraded, or Needs recovery, use this runbook.
What the status means
- Offline: the connector is not currently connected, but it may recover on its own after a transient network interruption or process restart.
- Degraded: at least one runtime path is unhealthy, but not every connector path is down.
- Needs recovery: the connector is no longer in a normal reconnect path. Treat this as durable identity loss, invalid bootstrap fallback, or another bounded recovery case. The connector does not silently retry the original bootstrap token in this state; it fails closed and waits for an explicit operator-initiated recovery flow.
Autonomous mode vs guided recovery
How the connector recovers from a restart depends on whether durable identity storage is present.
| Mode | Storage posture | Runtime outcome | Operator action |
|---|---|---|---|
| Autonomous mode | Durable /data or PAM_AGENT_DATA_DIR is mounted and intact | The connector loads its identity bundle on startup and performs mTLS reconnect without operator intervention | None |
| Guided recovery | No persistent /data or the identity volume was lost / recreated empty | The connector cannot reconnect normally and enters needs_recovery while preserving connector configuration in the control plane | Mint a fresh single-use enrollment token and re-pair the connector |
The connector loads its identity bundle on startup and performs mTLS reconnect without any operator action in autonomous mode. This is the default for correctly configured VM, Docker, and Kubernetes deployments. Restarts, rescheduled pods, and container recreation all succeed automatically as long as durable storage survives.
In guided recovery, the connector has no local identity and cannot reconnect. The connector configuration -- org assignment, endpoint ID, and allowed networks -- is preserved in the control plane, but a new enrollment is required. This is the explicit needs_recovery state.
Bootstrap retirement occurs after the first successful enrollment and the first confirmed online heartbeat. Once the connector identity bundle is persisted to durable storage and the control plane records that heartbeat, the bootstrap token is consumed and retired. All subsequent restarts must use the persisted identity for mTLS reconnect. There is no path -- in either mode -- where a previously enrolled connector silently falls back to the bootstrap token as a normal restart credential.
Quick checks
- Is the host up? SSH or RDP into the connector machine; confirm Docker / the native service is running.
- Docker:
docker ps | grep vaultpam-connector— the container should beUp. - systemd:
systemctl status vaultpam-connector.
- Docker:
- Can the host reach the control plane? From the connector machine:
If this fails, you have a network/DNS/firewall issue. Ports 443 (control plane HTTPS) and optionally 51820/udp (VPN reverse tunnel) must be open outbound.curl -v https://<control-plane-host>/healthz
- Is the enrolment token still valid? Tokens expire after 4 hours by default. If the connector has never come online and the token expired, generate a new one from Connectors → pick connector → Regenerate token.
- Did the connector lose its durable identity storage? A previously paired connector should restart from the persisted identity under
/dataorPAM_AGENT_DATA_DIR.- Docker: confirm the container still mounts the same named volume or bind mount at
/data. - Native / VM: confirm the service still points at the same durable host path and that the connector user can read it.
- Kubernetes: confirm the pod still mounts the expected PVC and was not redeployed onto
emptyDiror another ephemeral layer.
- Docker: confirm the container still mounts the same named volume or bind mount at
Identify the failure class
1. First-boot bootstrap failure
Use this branch if the connector never paired successfully.
Common signals:
- the connector never became
Online - logs show token expiry, token revoke, CSR validation failure, or CA trust failure
- the UI still shows an onboarding or activation state rather than
ready
Recovery:
- Fix the trust, network, or token problem.
- Generate a fresh enrollment token if the original one expired or was revoked.
- Retry pairing.
2. Normal reconnect failure
Use this branch if the connector paired before and still has its durable identity.
Common signals:
- temporary
Offlineafter reboot, deploy, or network flap cp_tunnel_heartbeat_timeouts_totalalert activity- the local data directory is still present and readable
Recovery:
- Restore outbound HTTPS reachability to the control plane.
- Confirm the connector still has its local identity bundle on durable storage.
- Restart the connector process or pod once.
- Validate that the connector returns to
Onlinewithout using a new enrollment token.
3. Durable identity loss or invalid bootstrap fallback
Use this branch if the connector paired before, but now behaves like a brand-new connector.
Common signals:
- UI or API shows
Needs recovery - logs show
401 ENROLLMENT_TOKEN_INVALIDafter a previously successful pairing - enrollment endpoint 401 alerts spike after restart
- the durable storage mount was replaced, removed, or recreated empty
Recovery:
- Stop the connector or scale the pod to zero before changing credentials.
- Try to reattach the original durable storage first.
- If the original identity is gone, treat this as controlled reprovision:
- revoke the stale bootstrap token if it still exists
- mint a fresh single-use enrollment token from the control plane
- ensure durable storage is mounted correctly before starting again
- pair exactly once with the new token
- Validate that subsequent restarts reuse the persisted identity instead of asking for another token.
- Confirm that preserved connector configuration is still present; if configuration is missing, escalate as a separate deployment issue.
Do not treat repeated bootstrap retries as a normal restart path.
Logs to inspect
- Docker:
docker logs -n 200 vaultpam-connector. - Native:
/var/log/vaultpam/connector.log(Linux) or%PROGRAMDATA%\VaultPAM\connector.log(Windows).
Common errors:
| Log fragment | Meaning | Fix |
|---|---|---|
x509: certificate signed by unknown authority | Host does not trust the CA | Import the CA cert bundle (/etc/vaultpam/ca.crt) — see the runbook printed during install |
connection refused | Network blocked | Check corporate firewall outbound allowlist |
enrolment token revoked | Someone clicked Revoke in the UI | Generate a new token |
401 ENROLLMENT_TOKEN_INVALID after the connector was already paired once | The connector lost its persisted identity and fell back to bootstrap | Reattach the original durable data directory if available. If the identity is gone, generate a fresh token and pair again on durable storage |
Post-recovery validation
After any fix:
- Confirm the connector reaches
Online. - Restart the connector one more time.
- Confirm it reconnects without a fresh enrollment token.
- Confirm the durable storage path still contains the connector state after the restart.
- Capture the evidence for audit:
- connector status before and after
- the log line that proves reconnect or renewal succeeded
- any
endpoint.cert_renewedorgateway.enrollment_failedaudit entries tied to the incident - the alert name and time window if monitoring fired
Still stuck
Contact support and include: connector version, log tail, output of curl -v https://<control-plane-host>/healthz.