2026 OpenClaw: MEMORY.md, retrieval governance & token bloat

// Pain: OpenClaw answers, but threads slow down, stale beliefs surface, or upgrades feel like amnesia—usually blurred MEMORY.md vs workspace boundaries, noisy retrieval, and hidden-prefix token pressure stacking together. Takeaway: a memory layering matrix, five-step rollout, citeable thresholds, a token-bloat diagnostic ladder, and remote Mac Gateway path/env parity. Structure: pain | matrix | steps | thresholds | ladder | FAQ | depth | observability | evidence | close. Links: migration & re-pairing, silent Gateway runbook, Gateway 24/7, MCP token budget, onboard & daemon, remote deploy, plans.

1. Pain split: memory is not “more Markdown”

(1) Boundary drift: dumping logs, scratch notes, and stable prefs into MEMORY.md retrieves stale assumptions as facts; mixing workspace product docs into “persona memory” poisons the layer. (2) Retrieval noise: naive keyword or coarse chunking merges similar wording, different decisions—the model “remembers” the wrong span. (3) Token bloat: system prompts, channel rubrics, tool JSON, MCP schemas, and memory spans share one budget; latency jumps often live in hidden prefixes, not user-visible chat. If doctor/channels look healthy yet latency rises, audit context before swapping models (see the ladder in the silent Gateway article). (4) Remote path skew: on a remote Mac Gateway, ~/.openclaw vs workspace may differ from your laptop mental model—classic false amnesia after edits in the wrong user (same class as migration).

2. Layering: what belongs where

Layer	Hold	Anti-patterns
Long-lived prefs / glossary	Stable facts, org terms, approval boundaries	Promoting one-off conclusions; no version or date
Project workspace docs	Versioned design, API contracts, runbooks	Secrets, cookies, webhook secrets in plaintext
Session / short buffer	Thread goals, open questions, tool intermediates	Unbounded growth without summarization or TTL

3. Five-step rollout

Publish a MEMORY contract: what may be auto-written vs human-gated; each long-term entry carries scope (channel/project) and last verified date.
Fix retrieval gates: filter channel/directory first, then vector/keyword; ban whole-library default sweeps.
Version rolling summaries: summaries carry generation + hash; after upgrades, diff for duplicate injection.
Narrow tool surfaces: expose only tools needed for the task—trim schema/examples prefix cost (MCP runbook).
Align remote env: launchd sets HOME, PATH, secret paths explicitly; post-restart run a memory read/write smoke test (onboard guide).

# Suggested memory_record fields (adapt to your stack)
# { "scope": "channel:slack:xxx", "verified_at": "2026-04-11",
#   "source": "human|tool|import", "text": "...", "supersedes": "id-or-hash" }
                

4. Citeable thresholds

Numbers you can put in a memo (re-measure on your logs):

When tool returns + memory spans together routinely exceed ~8k tokens (tune to model window) and p95 latency spikes, trim tools or stage retrieval before adding memory rows.
If rolling summaries inject the same conclusion three or more times per turn family, you likely lack dedupe or carry two summary generations.
Spending over three hours/week on “wrong memory / context explosion / upgrade amnesia” means promoting memory + gateway config to release gates, not hand-editing MEMORY forever.

5. Token-bloat diagnostic ladder

Step	Inspect	Common root cause
1) Prefix profile	System prompt, channel rules, fixed disclaimers	Copy-pasted multi-channel blocks duplicated
2) Tools & MCP	Per-call payload size, nested JSON	No pagination, no field projection, wide schemas
3) Memory retrieval	Top-K and per-span caps	Injecting low-score chunks “to be safe”
4) Session summaries	Growth vs turn count	No truncation, merge, or expiry policy

6. FAQ: self-improve, channels, remote Mac

Q: Auto-apply self-improve writes? Prefer human gate or split low-risk auto vs high-risk review; else mistakes become “org memory”.

Q: One memory pool for all channels? Split by compliance and noise; support vs engineering should not share one vector space without metadata filters.

Q: Paths on remote Mac? Trust the Gateway process user’s HOME, not whichever account you SSH with.

Q: Amnesia after upgrade? Diff state dir vs workspace against plist/container moves—see migration and Gateway rollback.

7. Depth: from chat to operations

Enterprise agents in 2026 are judged on auditable memory and predictable context. Security asks which rows are personal vs organizational, and whether they can be deleted or exported—without scope and retention in the contract, you patch by deleting files.

Engineering-wise, memory blurs with RAG: Markdown on one side, vectors on the other. A frequent failure is dual-write skew—MEMORY updated but the index not rebuilt, so retrieval pulls stale spans. Reviews should demand a single source of truth or a rebuild runbook.

Remote Macs as 24/7 Gateway hosts add disk and backup: snapshots must cover ~/.openclaw and workspace; after restore, decide whether to rebuild memory indexes—same stability logic as remote deployment.

At the gateway, cap max memory rows, per-row bytes, and degradation (on retrieval timeout, fall back to session summary only) so tail latency is explainable.

8. Observability

Log per request: injected memory count and tokens, empty-hit rate, tool payload p95 by name, summary rewrite count. Four-metric drift together suggests config drift; latency-only with stable memory counts points to tools/MCP.

Signal	How	Suspect
Memory-inject tokens	Structured per-request log	Top-K too wide, spans too long, no dedupe
Retrieval hit rate	Hourly golden questions	Stale index, wrong scope filter
Tool payload size	Percentiles by tool	No pagination, trace logs in responses

9. Evidence pack

Beyond screenshots: MEMORY contract version, retrieval parameter table, pre/post-upgrade prefix diff, failure threads with expected memory. Reviews without failure cases rarely survive week one of real traffic.

10. Close: dev laptops are forgiving; production needs predictability

(1) Limits: default memory policies noise easily; tools/MCP blow prefixes; multi-channel and remote paths drift.

(2) Remote Mac upside: fixed user + plist, unified sleep/backup posture, same macOS behavior as our other OpenClaw guides.

(3) MACGPU: rentable Apple Silicon remote nodes and public help entry if you want Gateway hosting without juggling odd VPS stacks—CTA below links to plans/help without login.

11. Field note: subagents and schedules

With subagents or schedules, define parent vs branch session write ownership to avoid concurrent MEMORY corruption; offload heavy retrieval to workers and keep Gateway orchestration with a narrow tool surface. Pair with our webhook/unattended articles for trigger design.

OPENCLAW_2026 MEMORY_TOKEN_CONTEXT_RUNBOOK.