1. Pain: the control plane redraws approvals and supply chain
(1) Policy vs velocity: without explicit categories you get stuck state machines, not loud errors. (2) Plugins: fail-closed rejects “latest ClawHub”; CLI success can still mean runtime policy deny. (3) Gateway auth: mixed tokens create double truth between env and JSON. (4) Remote Mac: SSH openclaw logs may not match launchd-inherited Gateway.
2. Symptom matrix
| Signal | Likely root | First move |
|---|---|---|
| Pending / awaiting approval forever | Category not whitelisted | Export semantic category diff; read-only openclaw doctor |
| Unsafe / blocked install | Supply-chain pin mismatch | Verify digest vs allowlist; ban default --dangerously-force-unsafe-install in prod |
| 401 / device auth | Auth v2 + token drift | Reconcile env vs JSON per Docker + upgrade guides |
| Channel alive, child task silent | Routing after trigger-surface shrink | openclaw channels probe then time-windowed logs |
3. Five-step rollout
- Snapshot + read-only doctor: archive
~/.openclawand workspace; capture baseline. - Freeze OPENCLAW truth: inventory launchd plist, compose env, shell rc; delete duplicate exports.
- Approval table review: classify disk writes, shell exec, config mutations; ticket each change.
- Skill drill in staging: minimal pack under fail-closed; record hashes in CMDB.
- Layered evidence gate: no model routing edits until probe passes; keep gateway/channel/tool log windows.
4. Thresholds
- More than one writable source for
gateway.auth.tokenis a P0 blocker until collapsed to a single truth. - After any policy change, run a high-risk write drill within 24h and archive log excerpts—otherwise “tested” is invalid.
- On remote Mac, if logs show a different OPENCLAW_GATEWAY_PORT than the supervision unit by more than one hop, fix supervision first.
5. FAQ
Docker token OK but 401? Add device auth v2 + node-trigger limits per upgrade guide. Global approval first? Staging only—prod stays fail-closed with category allowlists. Pin skills? Yes: digest, tag, install on one ticket with attack-surface cadence.
6. Case: two-week hardening
CI pulled latest skills while prod shared a dev workspace—overnight approval pileups, morning silent denies. Fix: staging installs with pinned hashes, split approval tables, OPENCLAW_* via launchd only. Rule: PRs touching gateway.auth must update the compose env matrix.
7. Closing vs Linux hosts and MACGPU
Limits: Linux VPS differ on permissions and channels; scripts rot without policy-as-code. Remote Mac: launchd supervision you already know. MACGPU: trial remote nodes for 24/7 Gateway—plans/help via CTA. Gate: within 24h post-upgrade, probe plus high-risk drill; logs must carry request ids and policy versions.