1. Pain split: memory is not “more Markdown”
(1) Boundary drift: dumping logs, scratch notes, and stable prefs into MEMORY.md retrieves stale assumptions as facts; mixing workspace product docs into “persona memory” poisons the layer. (2) Retrieval noise: naive keyword or coarse chunking merges similar wording, different decisions—the model “remembers” the wrong span. (3) Token bloat: system prompts, channel rubrics, tool JSON, MCP schemas, and memory spans share one budget; latency jumps often live in hidden prefixes, not user-visible chat. If doctor/channels look healthy yet latency rises, audit context before swapping models (see the ladder in the silent Gateway article). (4) Remote path skew: on a remote Mac Gateway, ~/.openclaw vs workspace may differ from your laptop mental model—classic false amnesia after edits in the wrong user (same class as migration).
2. Layering: what belongs where
| Layer | Hold | Anti-patterns |
|---|---|---|
| Long-lived prefs / glossary | Stable facts, org terms, approval boundaries | Promoting one-off conclusions; no version or date |
| Project workspace docs | Versioned design, API contracts, runbooks | Secrets, cookies, webhook secrets in plaintext |
| Session / short buffer | Thread goals, open questions, tool intermediates | Unbounded growth without summarization or TTL |
3. Five-step rollout
- Publish a MEMORY contract: what may be auto-written vs human-gated; each long-term entry carries scope (channel/project) and last verified date.
- Fix retrieval gates: filter channel/directory first, then vector/keyword; ban whole-library default sweeps.
- Version rolling summaries: summaries carry generation + hash; after upgrades, diff for duplicate injection.
- Narrow tool surfaces: expose only tools needed for the task—trim schema/examples prefix cost (MCP runbook).
- Align remote env: launchd sets
HOME,PATH, secret paths explicitly; post-restart run a memory read/write smoke test (onboard guide).
4. Citeable thresholds
Numbers you can put in a memo (re-measure on your logs):
- When tool returns + memory spans together routinely exceed ~8k tokens (tune to model window) and p95 latency spikes, trim tools or stage retrieval before adding memory rows.
- If rolling summaries inject the same conclusion three or more times per turn family, you likely lack dedupe or carry two summary generations.
- Spending over three hours/week on “wrong memory / context explosion / upgrade amnesia” means promoting memory + gateway config to release gates, not hand-editing MEMORY forever.
5. Token-bloat diagnostic ladder
| Step | Inspect | Common root cause |
|---|---|---|
| 1) Prefix profile | System prompt, channel rules, fixed disclaimers | Copy-pasted multi-channel blocks duplicated |
| 2) Tools & MCP | Per-call payload size, nested JSON | No pagination, no field projection, wide schemas |
| 3) Memory retrieval | Top-K and per-span caps | Injecting low-score chunks “to be safe” |
| 4) Session summaries | Growth vs turn count | No truncation, merge, or expiry policy |
6. FAQ: self-improve, channels, remote Mac
Q: Auto-apply self-improve writes? Prefer human gate or split low-risk auto vs high-risk review; else mistakes become “org memory”.
Q: One memory pool for all channels? Split by compliance and noise; support vs engineering should not share one vector space without metadata filters.
Q: Paths on remote Mac? Trust the Gateway process user’s HOME, not whichever account you SSH with.
Q: Amnesia after upgrade? Diff state dir vs workspace against plist/container moves—see migration and Gateway rollback.
7. Depth: from chat to operations
Enterprise agents in 2026 are judged on auditable memory and predictable context. Security asks which rows are personal vs organizational, and whether they can be deleted or exported—without scope and retention in the contract, you patch by deleting files.
Engineering-wise, memory blurs with RAG: Markdown on one side, vectors on the other. A frequent failure is dual-write skew—MEMORY updated but the index not rebuilt, so retrieval pulls stale spans. Reviews should demand a single source of truth or a rebuild runbook.
Remote Macs as 24/7 Gateway hosts add disk and backup: snapshots must cover ~/.openclaw and workspace; after restore, decide whether to rebuild memory indexes—same stability logic as remote deployment.
At the gateway, cap max memory rows, per-row bytes, and degradation (on retrieval timeout, fall back to session summary only) so tail latency is explainable.
8. Observability
Log per request: injected memory count and tokens, empty-hit rate, tool payload p95 by name, summary rewrite count. Four-metric drift together suggests config drift; latency-only with stable memory counts points to tools/MCP.
| Signal | How | Suspect |
|---|---|---|
| Memory-inject tokens | Structured per-request log | Top-K too wide, spans too long, no dedupe |
| Retrieval hit rate | Hourly golden questions | Stale index, wrong scope filter |
| Tool payload size | Percentiles by tool | No pagination, trace logs in responses |
9. Evidence pack
Beyond screenshots: MEMORY contract version, retrieval parameter table, pre/post-upgrade prefix diff, failure threads with expected memory. Reviews without failure cases rarely survive week one of real traffic.
10. Close: dev laptops are forgiving; production needs predictability
(1) Limits: default memory policies noise easily; tools/MCP blow prefixes; multi-channel and remote paths drift.
(2) Remote Mac upside: fixed user + plist, unified sleep/backup posture, same macOS behavior as our other OpenClaw guides.
(3) MACGPU: rentable Apple Silicon remote nodes and public help entry if you want Gateway hosting without juggling odd VPS stacks—CTA below links to plans/help without login.
11. Field note: subagents and schedules
With subagents or schedules, define parent vs branch session write ownership to avoid concurrent MEMORY corruption; offload heavy retrieval to workers and keep Gateway orchestration with a narrow tool surface. Pair with our webhook/unattended articles for trigger design.