OPENCLAW MULTI
CHANNELS_
JSONL_
BOOTSTRAP.

Multi-channel gateway operations center

Running Telegram, enterprise IM, and bots behind one OpenClaw Gateway sounds efficient until every channel goes mute while HTTP still returns 200. The 2026.4.x line adds structured RPC such as channels.start so operators can boot named channel accounts — yet if ~/.openclaw/agents/main/sessions/*.jsonl balloons to tens of megabytes from cron plus long-run agents, Gateway may log Bootstrap: Session Queue Acquired QueueKey=agent:main:main and then stall parsing, pinning the Node event loop: CPU pegged, no fresh traces, handlers starved. Separately, historical announce-queue defects looked similar when multiple channels existed — learn to distinguish JSONL bootstrap stalls from delivery-path bugs. Cross-read WebSocket handshake / Ed25519 runbook, sessions + OAuth + cron jsonl recovery, and Gateway systemd/launchd hardening.

1. Pain breakdown

1) RPC success ≠ loop healthychannels.start may return started:true while the supervisor thread still starves. 2) Bootstrap parsing is synchronous hot path — oversized JSONL blocks scheduling even though sockets look fine. 3) Multi-channel amplifies symptom drift — announce-queue misroutes resemble mute channels but fix differently. 4) Remote launchd drift — plist HOME vs interactive shell means you rotate the wrong file.

2. Symptom matrix

SignalSuspect firstEvidence
Logs freeze after Bootstrap lineMonster jsonlls -lhS sessions/*.jsonl
High CPU, flat memoryBusy-loop JSON parsePerf trace without channel ticks
Announce/subagent onlyHistorical queue routing bugDiff release notes vs mute-all
Remote-onlyplist/env driftlaunchctl print env block

3. Five-step rescue ladder

Step 01 Freeze blast radius

Stop concurrent restarts and config edits; capture Gateway hash + OS patch level.

Step 02 Use channels.start deliberately

Pass channel + optional accountId; do not trust isolated probe green when bootstrap is wedged.

Step 03 Identify largest jsonl + backup

Copy out the giant file with timestamps before anything destructive.

ls -lhS ~/.openclaw/agents/main/sessions/*.jsonl | head -5

Step 04 Stop → move aside → clear locks → cold start

Move monster JSONL out of the hot path; remove stale *.lock; restart and confirm logs proceed past bootstrap.

Step 05 Layered validation

openclaw gateway status, single-channel smoke, then cron. Align plist env with shell on remote Mac hosts.

4. Decision matrix

EvidencePrimaryFallbackAvoid
Bootstrap stall + file >50MBArchive jsonl + cron budgetSplit agentsInfinite restart roulette
Announce-only anomaliesUpgrade with queue fixesTemporarily disable noisy external deliverDisabling audit trails blindly
Remote plist onlyHOME/volume alignmentDedicated service UIDCopying identity blindly

Numeric gates: single jsonl >40MB three audits straight → mandatory archival policy; bootstrap silence >120s → P0; two silent freezes/week on remote Mac → migrate data volume + Grafana jsonl growth.

5. FAQ

Does moving JSONL lose chat? You lose embedded transcript detail unless restored from backup — prioritize availability, rehearse restore in staging.

channels.start vs gateway restart? Different planes — restart without relocating giant files rarely heals bootstrap stalls.

6. Case note

«OAuth looked broken — actually main.jsonl hit 80 MB; after archival Gateway woke up within seconds.»

Teams mis-attributed provider outages until filesystem metrics proved growth dominated CPU; afterward they charted hourly byte deltas separately from OAuth dashboards.

7. Insight & close

The next SLA after RPC completeness is session storage velocity. VPS-only stacks validate quickly; when Apple toolchain debugging matters, park Gateway on a stable remote Mac with monitored disks — MACGPU hourly nodes align capex with mute-incident frequency.

Contrast: silence often means storage starvation of the event loop, not model outage — inspect JSONL before touching handshake configs; renting remote Mac capacity separates production uptime from laptop thermals.