OPENCLAW MULTI
CHANNELS_
JSONL_
BOOTSTRAP.
Running Telegram, enterprise IM, and bots behind one OpenClaw Gateway sounds efficient until every channel goes mute while HTTP still returns 200. The 2026.4.x line adds structured RPC such as channels.start so operators can boot named channel accounts — yet if ~/.openclaw/agents/main/sessions/*.jsonl balloons to tens of megabytes from cron plus long-run agents, Gateway may log Bootstrap: Session Queue Acquired QueueKey=agent:main:main and then stall parsing, pinning the Node event loop: CPU pegged, no fresh traces, handlers starved. Separately, historical announce-queue defects looked similar when multiple channels existed — learn to distinguish JSONL bootstrap stalls from delivery-path bugs. Cross-read WebSocket handshake / Ed25519 runbook, sessions + OAuth + cron jsonl recovery, and Gateway systemd/launchd hardening.
1. Pain breakdown
1) RPC success ≠ loop healthy — channels.start may return started:true while the supervisor thread still starves. 2) Bootstrap parsing is synchronous hot path — oversized JSONL blocks scheduling even though sockets look fine. 3) Multi-channel amplifies symptom drift — announce-queue misroutes resemble mute channels but fix differently. 4) Remote launchd drift — plist HOME vs interactive shell means you rotate the wrong file.
2. Symptom matrix
| Signal | Suspect first | Evidence |
|---|---|---|
| Logs freeze after Bootstrap line | Monster jsonl | ls -lhS sessions/*.jsonl |
| High CPU, flat memory | Busy-loop JSON parse | Perf trace without channel ticks |
| Announce/subagent only | Historical queue routing bug | Diff release notes vs mute-all |
| Remote-only | plist/env drift | launchctl print env block |
3. Five-step rescue ladder
Step 01 Freeze blast radius
Stop concurrent restarts and config edits; capture Gateway hash + OS patch level.
Step 02 Use channels.start deliberately
Pass channel + optional accountId; do not trust isolated probe green when bootstrap is wedged.
Step 03 Identify largest jsonl + backup
Copy out the giant file with timestamps before anything destructive.
Step 04 Stop → move aside → clear locks → cold start
Move monster JSONL out of the hot path; remove stale *.lock; restart and confirm logs proceed past bootstrap.
Step 05 Layered validation
openclaw gateway status, single-channel smoke, then cron. Align plist env with shell on remote Mac hosts.
4. Decision matrix
| Evidence | Primary | Fallback | Avoid |
|---|---|---|---|
| Bootstrap stall + file >50MB | Archive jsonl + cron budget | Split agents | Infinite restart roulette |
| Announce-only anomalies | Upgrade with queue fixes | Temporarily disable noisy external deliver | Disabling audit trails blindly |
| Remote plist only | HOME/volume alignment | Dedicated service UID | Copying identity blindly |
Numeric gates: single jsonl >40MB three audits straight → mandatory archival policy; bootstrap silence >120s → P0; two silent freezes/week on remote Mac → migrate data volume + Grafana jsonl growth.
5. FAQ
Does moving JSONL lose chat? You lose embedded transcript detail unless restored from backup — prioritize availability, rehearse restore in staging.
channels.start vs gateway restart? Different planes — restart without relocating giant files rarely heals bootstrap stalls.
6. Case note
«OAuth looked broken — actually main.jsonl hit 80 MB; after archival Gateway woke up within seconds.»
Teams mis-attributed provider outages until filesystem metrics proved growth dominated CPU; afterward they charted hourly byte deltas separately from OAuth dashboards.
7. Insight & close
The next SLA after RPC completeness is session storage velocity. VPS-only stacks validate quickly; when Apple toolchain debugging matters, park Gateway on a stable remote Mac with monitored disks — MACGPU hourly nodes align capex with mute-incident frequency.
Contrast: silence often means storage starvation of the event loop, not model outage — inspect JSONL before touching handshake configs; renting remote Mac capacity separates production uptime from laptop thermals.