1. Pain split: when exec becomes “all denied”, the model is usually fine—defaults moved
(1) Silent cron / sub-agent failures: releases in the 2026.4.x family tightened tool execution and sandbox defaults; older configs with missing fields or permissive placeholders now fall through to deny. The UI can look healthy while jobs stop producing side effects. (2) Dual truth: exec-approvals.json vs tools.exec in openclaw.json: treat them as two windows on the same gate, not unrelated files—changing only one is a classic source of “it fixed itself yesterday”. (3) Sandbox registry drift: upgrades may recreate container names; if containers.json disagrees with Docker reality you get recreate loops or stale profiles. (4) Remote Mac Gateway: when launchd env differs from your SSH shell, openclaw logs may show denials that do not match the JSON the Gateway process actually loaded.
2. Symptom → hypothesis matrix (layer before you edit)
| Signal / log keyword | Likely cause | First move |
|---|---|---|
exec denied / allowlist miss | tools.exec tier + approvals + sandbox defaults intersect on deny | Read-only openclaw doctor; compare tools.exec.security / ask with exec-approvals.json |
| Cron “fires” but no effect | Command blocked inside sandbox or output dropped | Correlate openclaw logs with the Gateway unit; probe with whoami / date |
| Docker name conflict / cannot rm | Registry vs real containers | Follow backup-first cleanup for containers and ~/.openclaw/sandbox/containers.json per release notes |
| CLI vs Gateway mismatch | Dual env sources / multiple config search paths | Use the Docker WS + token alignment checklist |
3. Five-step recovery runbook
- Snapshot: archive
~/.openclaw, workspace,openclaw.json,exec-approvals.json, and sandbox inventory. - Freeze truth sources: list every
OPENCLAW_*from launchd, Docker, and shell; mark which set the running Gateway (openclaw gateway status+ process env). - Align tools.exec: set
tools.exec.securityandtools.exec.askexplicitly inopenclaw.jsonto match policy—avoid implicit defaults. - Align exec-approvals.json: validate minimal profiles for single-user vs multi-agent paths; ticket every change with rollback text.
- Sandbox + logs gate: dry-run destructive cleanup; then
openclaw channels probe→ layeredopenclaw logs—no routing changes before probe passes.
4. Citeable thresholds
- If more than one source can change the effective
tools.exectier (JSON + undocumented env + CI injectors), block production until converged to a single truth. - Within 24 hours of an upgrade, run both a minimal exec probe and a single cron tick; keep ≥3 log lines each or mark the rollout unverified.
- On remote Mac hosts, if the Gateway unit disagrees with your SSH shell on
OPENCLAW_STATE_DIR(or equivalent), fix supervision first—otherwise you edit approvals in the wrong directory.
5. How this intersects upgrade, Task Brain, and Docker guides
Q: I already followed the upgrade / auth v2 checklist—why is exec still denied? That guide focuses on directory moves and device auth; 4.x exec adds sandbox defaults + approvals validation—use the matrix here first.
Q: Do I still need exec after Task Brain? Yes. Task Brain rollout covers control plane + skills policy; commands still traverse tools/exec with different log tokens—triage separately.
Q: Must I rebuild Docker? Not always—start with Docker WS + token parity, then decide if the sandbox sidecar must be recreated.
6. FAQ: rollback, fleets, least privilege
Q: Can I globally set ask: off? Single-user homelab ≠ production multi-tenant risk; if you must relax temporarily, time-box it with an automatic revert—do not bake it into the repo forever.
Q: Does sandbox cleanup lose state? It can drop local artifacts and warm caches; backup workspace + registry JSON; order stops as Gateway down → remove containers → prune registry rows to reduce races.
Q: Should I re-run install audit? Cross-check install.sh + security audit so widening exec does not accidentally widen listeners.
7. Analysis: “painless upgrades” must include exec preflight
Continuity for agents rests on sessions, memory, and executable tools. The industry bias in 2026 is safer defaults; exec and sandbox knobs will keep changing. If you only smoke-test channels and model routing, Monday morning complaints that “the assistant got dumb” are often silent tool denial, not reasoning regressions.
For always-on remote Mac Gateways, exec issues amplify with macOS updates and toolchain paths: missing binaries inside the sandbox look like random denials. A tiny exec health probe in launchd is an order of magnitude cheaper than an hour-long log archaeology session.
Culturally, treat exec and approvals edits like schema migrations—reviewed, scripted rollback, two-person rule—otherwise you get the classic “one-line config change stopped every cron” incident whose hidden cost dwarfs renting a staging remote Mac.
8. Close: even after local recovery, isolate production Gateway
(1) Limits: exec couples to versioned defaults; dual truth and registry drift create long tails; mixing laptop and server homes makes path assumptions brittle.
(2) Why remote Mac helps: isolate staging vs dev; fixed topology and unified launchd units simplify probes and rollback.
(3) MACGPU fit: rent a remote Mac for upgrade rehearsal + exec probes instead of experimenting on production laptops—CTA below (no login).