2026 OpenClaw on macOS: After Rotating the Gateway Token, Every Client Returns 401 — Stale OPENCLAW_GATEWAY_TOKEN Inside LaunchAgent, gateway install “Already Installed,” and a Force-Reinstall Runbook (Headless Mac)

During the 2026.4.x release train, teams repeatedly hit the same outage shape: operators rotated or re-issued a Gateway token through the control plane or openclaw doctor, exported the new secret in an interactive shell, maybe even tweaked Docker Compose — yet every consumer (Telegram bots, enterprise IM bridges, browser UI, automation clients) kept returning 401 unauthorized. Locally, openclaw gateway status occasionally still looked “green” because it was exercising a shorter code path than real channel traffic. The culprit is usually macOS LaunchAgent immutability: the last openclaw gateway install materialized OPENCLAW_GATEWAY_TOKEN inside a plist under EnvironmentVariables, and launchd will keep spawning the gateway with that copy until you either rewrite the plist or perform a forced reinstall. Worse, a plain gateway install can print “already installed” and skip updating the embedded token, which feels like success while nothing changes on disk. This runbook sequences plist discovery, triple-source token comparison, safe bootout/bootstrap, optional --force, remote headless verification, and post-incident hard gates. Pair it with our upgrade breaking / device-auth v2, Docker + WS token compose, migration + launchd repair, silent gateway diagnostics, and systemd / launchd baselines articles when the blast radius spans more than one machine.

1. Mental model: why “every client is unauthorized” can coexist with a living process

First, separate transport health from credential alignment. A gateway PID can remain bound to a port while rejecting Authorization headers that no longer match the control plane. Channel plugins and browser sessions surface those rejects immediately; lightweight probes may not. Second, macOS inherits environment variables from the plist, not from whatever you typed into zsh five minutes ago. That split is the classic “two sources of truth” bug. Third, token rotation often revokes the previous secret server-side. If launchd still injects the old string, every path that depends on the gateway as a broker will fail in lockstep until the plist matches the new canonical value. Finally, CLI installers are idempotent by design: detecting an existing LaunchAgent is correct behavior, but it is the wrong behavior when your intent is to refresh secrets — you must opt into overwrite semantics.

2. Symptom → evidence matrix

What you see	First hypothesis	How to prove it
All channels + UI fail with 401 at the same minute	Gateway still authenticates with a revoked/previous token	Correlate failure timestamps with rotation ticket; capture one failing HTTP response body.
launchd instance fails, manual foreground `openclaw gateway run` works	Stale plist environment block	`plutil -p ~/Library/LaunchAgents/openclaw.plist` (adjust glob to your label).
`gateway install` succeeds textually but incidents continue	No plist rewrite occurred	Compare modification time before/after; reach for `--force` per CLI help.
Only the remote Mac mini misbehaves	SSH user ≠ GUI user, or wrong domain	`launchctl print gui/$(id -u)` and verify which user actually owns the plist.

3. Anatomy: how LaunchAgents amplify a missed secret rotation

Think of a LaunchAgent as declarative infrastructure-as-code for your laptop: once OPENCLAW_GATEWAY_TOKEN is serialized into XML/plist, it will survive reboots, disconnections, and shell restarts exactly as written. Unlike a script that re-reads .env every invocation, launchd memoizes the environment at service load time. Operationally, treat token rotation like a miniature software release: update the secret in your vault → update every materialization (plist, compose files, secret managers) → restart the gateway → verify with a non-probe transaction (for example, sending a test message through a real channel). Skipping the plist step is how you get a clean bill of health in chat ops while customers see red banners everywhere.

4. Seven-step runbook (laptop or production ticket)

Step 01 — Freeze and annotate

Capture OpenClaw CLI and gateway versions, the LaunchAgent label, rotation timestamps, and who else might concurrently restart services. Parallel unmanaged restarts turn an hour of work into a day of confounded logs.

Step 02 — Locate the real plist

Most installs land in ~/Library/LaunchAgents/ with a label containing openclaw or your custom string. Multi-user Mac minis need you to repeat the hunt per Unix account — the gateway might be running as a dedicated automation user.

# Example discovery (adjust label/glob)
ls ~/Library/LaunchAgents/*openclaw* 2>/dev/null
launchctl list | grep -i openclaw
                

Step 03 — Compare three token views

Gather (A) the plist’s OPENCLAW_GATEWAY_TOKEN, (B) any interactive shell export (for forensic context only), and (C) the authoritative record in your control plane / vault, compared via salted fingerprint — never paste live secrets into Slack or tickets. Any mismatch stops the investigation: you already know what to rewrite.

Step 04 — Boot out, edit, bootstrap

Use launchctl bootout gui/$UID/… with the domain/label pair documented for your install, edit the plist (or replace the file wholesale), then launchctl bootstrap/load again. If your organization prefers infra-as-code, check the updated plist into git with secrets stripped, referencing a secret manager handle instead of literals.

Step 05 — When you need CLI assistance, pass force

If hand-editing XML feels risky, rerun openclaw gateway install --force (flag name subject to your pinned CLI version — read --help). The critical acceptance test is filesystem metadata: the plist must change; otherwise you still have the stale token.

Step 06 — Layered validation

Watch gateway logs until you see a healthy readiness line, then execute a channel-native smoke test (not merely curling localhost). If 401 persists, hunt for a second plist, a Dockerized duplicate gateway, or skipped doctor steps described in the silent gateway article.

Step 07 — Security hygiene

Rotate not only forward but backward: revoke retired tokens in your vault, encrypt offline backups that still contain literals, and prefer referencing macOS keychain or a root-owned file with 0600 permissions rather than world-readable plist copies in shared Desktop folders.

5. Remote headless Mac: ten SSH checks

(1) Confirm you SSH as the same Unix user that owns the active plist. (2) Verify $HOME matches that user’s LaunchAgents path. (3) Inspect launchctl print gui/$(id -u) for duplicate labels. (4) Ensure ProgramArguments resolves to the intended Node/binary path (Node version managers love to drift). (5) After workspace migrations, confirm data directories were not left pointing at an old SSD path. (6) Lock down firewall rules to the minimal inbound set. (7) Remember Aqua vs. non-GUI sessions: some teams require an actual GUI login before agents load. (8) Check log directory permissions — launchd may silently fail to append if ownership is wrong. (9) When Docker coexists, reconcile precedence between compose env files and launchd per our Docker pairing guide. (10) Attach redacted log excerpts and probe output to the ticket, not raw secrets.

Incident hard gates

Gate A: within ten minutes of rotation, the plist modification time must change or your ticket must document the explicit --force reinstall. Gate B: at least one non-synthetic client transaction returns HTTP 200 / business success. Gate C: the superseded token is marked revoked in the vault so rollback attempts cannot accidentally resurrect it.

6. Common misconceptions

Editing only .env while launchd owns the gateway buys you nothing. Seeing the installer print success without touching the plist is not alignment. Curling loopback with a manually supplied header does not prove that Telegram webhooks share the same path. Document these traps in your internal wiki; future you will be grateful during the 2 a.m. rotation.

7. Boundary with upgrades and migrations

Breaking upgrades frequently retune device identity and Gateway attestation rules. If you run doctor checks from the upgrade article but skip plist refresh, you may observe “half green” clusters: some RPCs succeed while channel auth remains wedged. Migration playbooks must converge on a single secret distribution pipeline; otherwise old and new Macs each hold half the state described in migration + launchd repair.

8. Incident timeline: first 30 minutes after rotation

T+0–2 min — freeze. Ban parallel uncoordinated installs; pin the suspected LaunchAgent label and OpenClaw build hash in the incident channel. T+3–8 min — evidence. Collect one failing HTTP body with a trace id plus the matching gateway auth log line. If clients log 401 but the gateway shows silence, you are likely hitting a different reverse proxy or stale upstream. T+9–15 min — source reconciliation. In parallel: (A) plist fingerprint, (B) Docker / Compose env if present, (C) vault “current” record. Any mismatch is configuration drift, not “control plane down.” T+16–25 min — rewrite & roll. bootout → edit plist or --force install → bootstrap; pick one consumer-stop vs gateway-first order and stick to it for the whole event. T+26–30 min — gate closure. Do not declare victory until Gates A–C pass with redacted proof attached to the ticket.

Writing the clock out loud keeps remote operators honest: mac minis in a cage do not benefit from shoulder taps, only auditable steps.

9. Double gateways, ports, and proxies: you meant A, you hit B

A frequent fork: an engineer foreground-runs a gateway for debugging while launchd still owns another instance on a different loopback port; rotation updates only one path. Another fork: TLS terminates on nginx while proxy_pass still aims at last year’s standby host — curl succeeds locally but webhooks fail. Mitigation: define a unique build or commit endpoint and curl it twice after rotation (raw loopback and public hostname). Divergent hashes mean networking duplication, not token magic. Docker Desktop + Colima hybrids can add a third source; either consolidate on one supervisor (launchd or compose) or require a dual-write checklist in every RFC.

10. Printable triangle: plist ↔ launchctl ↔ process

Layer	Healthy signal	Broken pattern
File	plist mtime moves inside the change window; `plutil` passes	Hand-edited XML missing closure; perms drift from template
launchctl	`launchctl print gui/$uid/…` matches `which openclaw`	Stale Node path from removed nvm prefix; rapid respawn loops
Process env	`OPENCLAW_GATEWAY_TOKEN` fingerprint tracks vault	Process starts yet env missing — implies alternate config loader
Logs	401 waves correlate with auth rejects	CPU pegged without logs — pivot to jsonl/bootstrap freezes before token churn

systemd habits do not transplant verbatim: Linux units often use EnvironmentFile; macOS LaunchAgents typically bake literals into XML. Treat the triangle above as mandatory evidence rather than intuition.

11. FAQ: sneaky branches that still return 401

Q: I edited the plist but launchd still serves the old secret. A: Confirm you edited the path that launchctl print references — iCloud-synced duplicates are a classic trap. Q: Can I revoke first, restart later? A: Risky; rolling processes before revocation (or accepting a short overlap) avoids half-online clients. Q: Doctor is all green — why do channels die? A: Doctor paths can be shorter than real channel handshakes; require a production-style message loop. Q: Multi-user Mac mini isolation? A: Never share one Gateway plist across Unix accounts; permissions regress silently.

Q: Should secrets move from plist to keychain? A: Worth doing, but only after lab validation that launchd can read the helper; avoid Friday cutovers. Q: Rotate SSH keys at the same time? A: Only if your threat model demands it — otherwise keep change surfaces small.

12. Closing

On macOS, mysterious universal 401s after OpenClaw token work are rarely exotic TLS bugs. They are almost always configuration drift between what humans think they edited and what launchd actually injects. Treat LaunchAgent plists as first-class artifacts in every rotation checklist. If you need always-on Apple Silicon to host these gateways without buying idle hardware, review MACGPU pricing and say hi in Telegram (keyword MACGPU). MACGPU authored this field note to shorten outage minutes; always defer to the official CLI documentation for flag names shipped in your version.