2026 OpenClaw v2026.5.20 Mac Upgrade Runbook: Multiple Node Installs, LaunchAgent Drift, gateway status JSON & Remote Rollback

Around OpenClaw v2026.5.20, a failure mode that is often misread as a fake upgrade or a broken channel token is actually simpler and nastier: the Node binary baked into your LaunchAgent no longer matches the Node your shell resolves via which node, because openclaw update rewrote the service unit using whatever Node sat on PATH during follow-up steps. After the next gateway restart, launchd may exit every ninety seconds with code 1 while the CLI still prints the new semver. Release 2026.5.20 (PR #84043) fixes this by forcing post-install work through the managed Gateway service Node and exposing runningVersion in openclaw gateway status --json so you can spot CLI/Gateway protocol skew before channels go dark. This article delivers a symptom matrix, decision table, six-step runbook, three acceptance gates, a field case study, and FAQ—cross-linked with our posts on fake upgrades and gateway status, LaunchAgent token expiry, and invalid config and doctor—so you can run 7×24 acceptance and rollback on a remote Apple Silicon Gateway without guessing.

1. Pain breakdown: not a version string mismatch—a Node binary drift

First, multiple Node installs on one Mac are normal, not an edge case. A typical Apple Silicon workstation might simultaneously carry Homebrew Node at /opt/homebrew/opt/node/bin/node, an nvm default on 22.x under ~/.nvm/versions/node/..., and a fnm global alias on 20.x. When you originally ran openclaw gateway install, the tool captured the absolute path of whichever node was active and wrote it into ~/Library/LaunchAgents/ai.openclaw.gateway.plist as the first element of ProgramArguments. That path is frozen until something rewrites the plist—often silently during upgrade.

Second, PATH switching triggers drift. You SSH in, run nvm use 25 or open a fnm-enabled shell, then execute openclaw update. On builds before 5.20, follow-up hooks—doctor, plugin repair, gateway restart—ran under the new PATH node, not the node launchd was still configured to use. If the global npm prefix moved with that Node, the plist may now point at an nvm shim while your production Gateway was installed under Homebrew’s prefix. The tarball version can be correct; the supervised process cannot start.

Third, distinguish this from a fake upgrade. Fake upgrade means CLI reports 5.20 while the running Gateway process is still 5.12—the PID never switched builds. Node drift means CLI and gateway status --json may both say 5.20, yet launchd crashes because engines.node requirements fail on the wrong major, native modules mismatch, or the openclaw entry script lives under a prefix the baked node cannot resolve. Symptoms overlap: WebSocket handshake failures, RPC timeouts, spurious unauthorized—but the fix is runtime alignment, not another npm install -g from your laptop shell.

Fourth, protocol skew in 5.20 adds nuance. The release keeps a health check when the CLI is one minor patch ahead of Gateway during restart. If Node drift prevents Gateway from booting at all, you never reach that check—you see crash loops instead of a clean version line diff. Always capture plist evidence before blaming channel tokens.

Fifth, watch for legacy updater LaunchAgents. Issue #82167 documents ai.openclaw.update.* jobs that relaunch repeatedly and SIGTERM the Gateway every few minutes. Starting in 5.20, openclaw update best-effort disables those stale jobs; your runbook must still include launchctl list | grep openclaw inventory so you do not confuse updater thrash with Node drift—or fix one while the other keeps killing the process.

Sixth, remote Mac amplifies the mistake. Operators validate upgrades on a MacBook with nvm defaults, then SSH to a Mac Studio that runs Gateway under a dedicated service account with Homebrew-only PATH. Copy-pasting plist fragments or running update from the laptop session without freezing PATH is how Friday-night “success” becomes Saturday-morning total channel outage. Treat the plist node path, npm root -g, and which node as a single evidence bundle on every host.

2. Decision matrix: align Node first or roll back the package?

Field signal	Preferred action	Do not
`openclaw --version` and plist node point at different Node majors	Freeze PATH → rerun update with service Node or `gateway install --force`	Hand-edit ProgramArguments without backing up plist
`gateway status --json` missing runningVersion	`gateway restart --wait`, then inspect launchd last exit code	Delete entire `~/.openclaw` tree
Post-upgrade Telegram flaps only	Rule out Node drift before channel-layer token work	Mix with 5.2 HTTP timeout fixes from unrelated runbooks
Laptop OK, remote Mac fully down	Diff plist node vs `npm root -g` on each host separately	Copy nvm paths from laptop plist into Studio plist
Change audit requires proof	Rehearse six steps + 30-minute probe on control Mac first	Upgrade production Friday peak without rollback window

Use the matrix as a triage ladder, not a substitute for evidence. When two rows seem to apply—say, missing runningVersion plus Telegram flaps—execute Node alignment (rows one and two) before opening channel credentials. Rolling back npm without fixing plist leaves you one restart away from the same crash. Conversely, if plist node matches which node in a minimal login shell but Gateway still fails, pivot to the invalid-config runbook: fail-closed schema and doctor moves are orthogonal but can stack after a bad upgrade night.

3. Six-step field runbook

Step 1 Freeze the evidence triplet

Before any fix, archive four facts in one ticket: openclaw --version; which node and node -v; the first two ProgramArguments entries from ~/Library/LaunchAgents/ai.openclaw.gateway.plist (node path + openclaw gateway entry); and the full JSON from openclaw gateway status --json, highlighting runningVersion on 5.20+. Screenshot or paste into the change record. If paths already diverge, label the incident Node drift and skip token rotation until Step 3 completes.

Step 2 Parse engines.node requirements

Open the installed package’s package.json under your global prefix—typically $(npm root -g)/openclaw/package.json—and read engines.node plus release notes for 2026.5.20. If the service Node’s major is below the floor, upgrade that Node tree first (Homebrew bump, nvm install, fnm use) before touching openclaw again. Doctor may pass under an interactive shell while launchd still invokes an older baked binary; engines.node is the contract launchd must satisfy unattended.

Step 3 Align the managed Gateway Node (5.20 recommended path)

On 2026.5.20+, run openclaw update from a shell where PATH is intentionally minimal—or use the service account profile that matches production. The updater should prefer the LaunchAgent’s baked node for follow-up. If you already drifted, run openclaw gateway install --force with your intended node active: one Node manager, one global prefix, one plist. Confirm npm root -g matches the prefix implied by plist ProgramArguments. Re-run Step 1 snapshot; paths must match before restart.

Step 4 Clean legacy updater LaunchAgents

Run launchctl list | grep openclaw. If ai.openclaw.update.* appears with high PID churn or rapid relaunch counts, 5.20’s update should disable it; on older builds, manually launchctl bootout gui/$UID/ai.openclaw.update.* after confirming no update job is mid-flight. Cross-check Issue #82167 symptoms—Gateway SIGTERM on a three-minute cadence often traces to updater jobs fighting the main Gateway unit, not to model API limits.

Step 5 Ordered Gateway restart and JSON probes

Execute openclaw gateway restart --force --wait. Then run openclaw gateway status --json three times at ten-second intervals. Require stable runningVersion matching target semver, RPC success within your SLO, and no rising launchd exit count. On remote Mac without interactive session, use launchctl kick -k gui/$UID/ai.openclaw.gateway and repeat probes from a bastion host. Attach JSON outputs to the ticket as machine-readable proof.

Step 6 Remote 7×24 control run and rollback window

On a control node—ideally a MACGPU remote Mac with clean PATH and no nvm in launchd context—repeat Steps 1–5 identically. Only after the control node passes a 30-minute openclaw channels status --probe window should you touch production. If production still fails, pin the previous openclaw tarball and restore the previous plist node pair from backup; diff npm prefix and plist before closure. Never declare success on CLI version alone.

# Evidence triplet snapshot
openclaw --version
which node && node -v
plutil -p ~/Library/LaunchAgents/ai.openclaw.gateway.plist | head -20
openclaw gateway status --json
# Global prefix alignment
npm root -g
ls -la "$(npm root -g)/openclaw/package.json"
grep -A2 '"engines"' "$(npm root -g)/openclaw/package.json"
openclaw gateway install --force
openclaw gateway restart --force --wait
openclaw gateway status --json
launchctl list | grep openclaw
                

4. Three self-check gates

Gate A — Node: Plist node path exists on disk; executing it yields node -v satisfying engines.node; same path would be chosen by which node in the service account’s login shell without nvm/fnm hooks unless those hooks are the deliberate production choice encoded in plist.

Gate B — Version: gateway status --json reports runningVersion aligned with openclaw --version within the skew window 5.20 documents (CLI may lead Gateway by one small patch during restart health check—both must be on the same Node runtime).

Gate C — Channels: openclaw channels status --probe shows no red rows; for 30 minutes after restart, launchd last exit code stays zero and no “immediate exit after boot” pattern appears in Gateway logs. Fail any gate → hold production traffic and execute rollback plist + package pin.

5. Deep case: “update succeeded, Gateway offline by morning”

“Ops ran openclaw update from a MacBook with nvm default Node 22 over SSH to a Mac Studio whose Gateway was installed under Homebrew Node 25. Friday logs showed success. Saturday’s plist pointed at /Users/ops/.nvm/.../node while plugins and native deps expected /opt/homebrew/...; launchd exited every 90s with code 1.”

The team opened the fake-upgrade runbook first because plugins still loaded intermittently when manually started from SSH. CLI and JSON both read 5.20—skew looked impossible until someone diffed plist ProgramArguments against which node on the Studio service account. Node drift, not token expiry. They rehearsed on a MACGPU control Mac with a single Homebrew Node, ran the six-step ladder, exported passing JSON probes, then applied only the plist node pair + gateway install --force on production—channels green within thirty minutes. Post-incident rule: never run openclaw update from an nvm-enabled interactive shell against a remote launchd Gateway; use service profile, env -i minimal PATH, or a CI runner with frozen Node.

Secondary lesson: attach plist diff and engines.node screenshot to every upgrade ticket. Auditors increasingly ask for runtime pin proof, not semver screenshots alone. The control-node rehearsal paid for itself by preventing a second blind plist paste from the laptop.

6. Industry view: runtime pinning and macOS daemon practice

By 2026, Node fragmentation—nvm, fnm, Volta, Homebrew, asdf—means “global CLI” and “daemon runtime” are two truths on the same UID. OpenClaw 5.20 encoding managed service Node into the update chain marks a broader shift: agent Gateway ops move from “it installed” to pin, diff, rollback. Enterprise change templates now include plist node path, engines.node validation, and gateway status JSON attachments alongside semver.

macOS launchd’s absolute path semantics make drift harsher than on Linux systemd units that sometimes inherit a cleaner Environment block. One wrong node string equals total Gateway outage—not degraded mode. Remote Apple Silicon hosts—Mac Studio, Mac mini—are popular 7×24 Gateway planes because unified memory, Metal-adjacent tooling, and desktop channel stacks coexist on one machine. That same popularity makes them expensive when PATH hygiene fails: there is no hypervisor layer to hide a bad plist.

Renting a MACGPU remote Mac as a golden control plane isolates rehearsal from your laptop’s nvm profile. Snapshot disk, run the six steps, hold the thirty-minute probe, then promote the exact plist/npm prefix pair to production. This is operational insurance, not marketing: it converts “works in my SSH session” into “works when nobody is logged in.”

Windows and Linux Gateways face multi-Node pain too, but macOS LaunchAgent immutability until rewrite teaches a stricter habit: treat gateway install --force as the supported realignment tool, not hand-edited XML. Teams that internalize that reduce repeat incidents after every May micro-release train.

7. Citeable numeric thresholds

① Plist node path ≠ which node in service shell → do not declare upgrade success. ② engines.node requires Node ≥22 (verify your tarball)—service Node below floor → upgrade Node before openclaw. ③ More than three launchd non-zero exits within 30 minutes post-upgrade → default rollback with plist/npm diff preserved. ④ Three consecutive gateway status --json calls without runningVersion → classify Gateway Unhealthy. ⑤ Different global npm prefix on remote vs laptop → forbid cross-machine plist fragment copy; regenerate with gateway install --force on each host.

Add operational hygiene: serialize global npm installs per UID with a change lock; export ten minutes of Gateway logs before and after restart; store plist backup with timestamp in the ticket. These seconds prevent hours of channel downtime and keep model providers off the blame list until Node evidence is exhausted.

8. FAQ

How is this different from a fake upgrade? Fake upgrade compares version strings and PID age when CLI moved but process did not. Node drift compares binary paths and engines.node when versions agree but launchd cannot execute the baked runtime.

Must we reach 5.20? The managed-service-Node fix lands in 5.20 (PR #84043). On older builds, manually run gateway install --force after every update from a frozen PATH—or upgrade to 5.20 first.

What about Docker? Container images usually ship one Node; drift is rare. Watch image tag vs mounted volume state instead of plist paths.

Can I hand-edit the plist? Possible with backup, but gateway install --force is supported and regenerates consistent ProgramArguments and environment blocks.

What does a MACGPU node provide? Isolated PATH, 7×24 Apple Silicon, reproducible control-run for the six-step ladder and rollback window—it does not replace your change approval process.

Does doctor --fix help? Doctor fixes configuration schema; it does not realign Node paths. Run doctor after Node gates pass if fail-closed errors persist—see the invalid-config runbook for ordering.