1. Pain Decomposition: Installed ≠ Operable
(1) Path split = config split: global CLI reads ~/.openclaw, the container mounts another tree, the repo still has openclaw.json — you patch the wrong file and wonder why reload never lands. (2) Node 22 is a contract: guides stuck on 18/20 explode when native addons or pnpm workspaces hit a different minor on CI vs laptop. (3) Upgrade order matters: bumping the global package before retagging compose, or rolling the image back without a volume snapshot, leaves Gateway half-migrated on disk. (4) Observability and privilege drift: mixing sudo global installs with user-level npm prefixes, or mis-aligning launchd UserName with the account that owns log/state dirs, produces “process up but can’t persist state” failures — your runbook must pin runtime user, working directory, and umask.
2. Three-Way Matrix (2026)
| Axis | npm global | Docker Compose | pnpm source |
|---|---|---|---|
| Time to first success | Fastest for solo trials | Medium; needs volumes & networks | Slowest; fork/audit friendly |
| Isolation / multi-instance | Weak; single global version | Strong; multiple projects | Medium; per-workspace trees |
| Persistent data | User home; document backup | Explicit bind mounts required | Tied to how you run binaries |
| Upgrade | npm i -g openclaw@x; mind PATH/permissions | Retag image + compose pull | git pull + pnpm i + build; lock in git |
| Rollback | Pin previous global or use nvm-style managers | Image tag back + volume snapshot | Git tag/branch + reinstall deps |
3. Five Steps: Truth First, 24/7 Second
- Freeze runtime: Node 22+ (or the README’s stated LTS band) everywhere, with an explicit “unsupported range” note for contractors; on Docker, freeze the compose file name, image reference style (tag vs digest), and who may bump base images.
- Map four directories: config, secrets, session/skill state, logs — for compose, annotate host bind paths in comments so on-call does not guess from container-only paths during incidents.
- One golden path: onboard → minimal channel smoke → foreground Gateway → daemon once — capture exact commands, expected log lines, and “good enough” health signals in the internal runbook (not only prose).
- Upgrade SOP: change ticket → backup volume/dir (or storage snapshot) → upgrade →
openclaw doctor→ channels probe; skipping backup should fail the checklist, not the conscience. - Rollback SOP: roll back package/image and data together; after rollback, run
openclaw statusplus the doctor ladder from silent-channel runbook before declaring green.
4. Citeable Planning Numbers
- Maintaining two install shapes without a written truth source for >14 days usually costs 2–4 hours/week in “which Gateway is live?” meetings.
- Production changes should ship with at least one recoverable volume/dir snapshot; blind rollbacks fail more often on stateful agents.
- On remote Mac, align launchd WorkingDirectory with CLI defaults to halve “config changed but daemon ignored it” tickets — see systemd/launchd runbook.
5. When to Prefer a Dedicated Remote Mac
| Signal | Action |
|---|---|
| Need 7×24 Gateway but laptops sleep | VPS or remote Mac + launchd; keep local as admin CLI only |
| Container vs host DNS/NTP drift causes channel jitter | Pick one topology for prod (all container or all bare metal) |
| Apple creative toolchain + OpenClaw on same SLA | Evaluate remote Apple Silicon to reduce cross-OS scripting |
Change hygiene: bind install shape, volume identifiers, and on-call ownership into a single ticket field or config-management record. Without that triple, postmortems degrade to “something was running somewhere.” On remote Macs, split alert routes for SSH bastion, screen-sharing, and Gateway health so one noisy channel does not hide real outages. Run a quarterly “restore from snapshot + channels smoke” drill; it is cheaper than an all-nighter during a real regression.
6. FAQ
Global CLI talking to containerized Gateway? Fine if you document remote URL, tokens, and config paths in one table — otherwise you split brain. Rootless Docker? Follow corporate baseline; rootless is stricter on volume UID mapping and often needs explicit user mapping in compose.
pnpm in CI? Use pnpm i --frozen-lockfile, cache build artifacts separately from prod state paths, and block merges if lockfile drift is detected. Silent channels after upgrade? Run doctor/pairing before reinstalling three times — see diagnostic ladder.
Should compose define healthchecks? Yes — give Gateway a minimal liveness signal (HTTP/TCP or process-level) and document restart policy with backoff caps so a half-started container does not thrash forever. Share snapshot format across dev/stage/prod? Share the procedure, never the secrets; rehearse restores per environment so stage tokens never leak into prod volumes.
7. Case Study: Install Shape Sets Ops Radius
In 2026 OpenClaw ships channels, skills, and routing changes quickly; teams stall on missing single source of truth, not missing features. npm global optimizes for speed; Docker trades host coupling for volume discipline; pnpm source maximizes auditability at the cost of build ownership and supply-chain hygiene.
On Docker, publish one ops page that states image tagging (floating latest for labs only; prod uses immutable tags or digests), backup windows aligned with channel quiet hours, and compose project naming so “who edited which stack” stays auditable. On pnpm source, treat corepack, .npmrc, and private registry mirrors as code-reviewed assets — otherwise nightly builds silently resolve different trees.
For support automation, predictable rollback and reproducible environments beat chasing latest. A bad upgrade without a snapshot often costs days — higher than renting a dedicated remote Mac for a month in SLA terms. Think of install shape as ops radius: wider radius tolerates parallel changes, rotations, and handovers; narrow radius turns every minor bump into a tightrope walk.
Many MACGPU-style teams eventually move long-running Gateway and tool-heavy workloads to rented remote Macs, keeping laptops as light CLIs and debug consoles. That separation cuts unified-memory contention with Final Cut, Xcode, and browser stacks — it complements Docker/npm rather than replacing them. A common pattern is prod Gateway on a fixed topology remote node (bare-metal launchd or a single compose stack), while laptops only handle channel pairing and skill iteration, eliminating “lid closed, automation dead” incidents by design.
8. Closing: Pick One Path for Production
(1) Limits of the paths you already chose: global installs fight system Node, permissions, and sometimes endpoint security or MDM policies; Docker needs volume and image discipline — DNS/NTP drift between host and container turns debugging exponential; source builds need lockfile and CI rigor or you get “builds on my machine” Gateway binaries. Mixing shapes without written truth multiplies drift and “which openclaw.json is authoritative?” debates.
(2) Why remote Mac often wins for 24/7: Apple Silicon, unified memory, and native creative/automation stacks align for long-running Gateway; decoupling launchd from interactive GUI sessions reduces channel jitter and surprise modal interruptions, sharpening the ops boundary.
(3) MACGPU handoff: If you want a low-friction trial of a dedicated remote Mac to prove “single install path + launchd” before cloud capex, MACGPU rents nodes and provides help entry points; the CTA below links public plans without forcing login.