Summary
Running AutoResearchClaw with provider: acp accumulates orphan child processes across pipeline stages. On a 32GB machine, memory hits ~27GB by Stage 12-15, causing OOM crashes. API mode (provider: anthropic) has zero process accumulation and completes cleanly.
Environment
- AutoResearchClaw v0.3.2
- acpx 0.3.0
- Ubuntu 22.04, 32GB RAM
- Tested with Claude Opus and Sonnet via ACP
Root Cause
Each acpx LLM call spawns ~10 child processes (claude-agent-acp, claude, MCP servers). When the call completes, acpx exits but children are reparented to PID 1 and never terminated. closeSession() only kills the queue-owner process, not the agent or MCP children.
Over 46+ LLM calls across 23 stages: 46 × ~10 processes × ~58MB = ~27GB of orphans.
Key Evidence
We ran 12 test configurations:
- API mode (Sonnet): 25/25 stages, 7.9G flat, 0 orphans. Clean.
- ACP mode (Opus, no fix): 15 stages then OOM. Memory climbed 9G → 25G linearly.
- ACP mode (Opus, various consumer-side fixes): Tried exec mode, PGID kill, session rotation, sidecar cleaners. None reliably prevent accumulation — processes escape cleanup due to reparenting race conditions.
Memory trajectory comparison:
API: 7.4G → 7.5G → 7.5G → 7.8G → 7.9G (flat across 25 stages)
ACP: 9.7G → 13G → 18G → 22G → crash (linear climb)
Consumer-Side Fixes We Tried (None Worked Reliably)
acpx exec instead of persistent sessions — still leaks, exec also spawns full process tree
start_new_session=True + PGID scanner — race condition, children reparent to PID 1 before scan
--ttl 30 — irrelevant to root cause
- Session rotation — made it worse (more sessions = more orphans)
- Sidecar cleaner (timestamp gate + idle detection) — killed 0 processes because all are in the active acpx process group
Suggested Fix
The fix needs to be in acpx itself. When an exec call or session completes, acpx should kill its entire process tree:
process.on("exit", () => {
try { process.kill(-process.pid, "SIGTERM"); } catch {}
setTimeout(() => {
try { process.kill(-process.pid, "SIGKILL"); } catch {}
}, 2000);
});
Related upstream: openclaw/openclaw#35886, openclaw/acpx#47 (fixed session records in #70, but not child processes).
Reproduction
researchclaw run --config config.yaml --auto-approve &
watch -n5 'ps aux | grep -E "claude|hop-mcp|probe.*mcp|cass-cm" | grep -v grep | wc -l'
Process count climbs ~10 per stage and never decreases.
Summary
Running AutoResearchClaw with
provider: acpaccumulates orphan child processes across pipeline stages. On a 32GB machine, memory hits ~27GB by Stage 12-15, causing OOM crashes. API mode (provider: anthropic) has zero process accumulation and completes cleanly.Environment
Root Cause
Each acpx LLM call spawns ~10 child processes (claude-agent-acp, claude, MCP servers). When the call completes, acpx exits but children are reparented to PID 1 and never terminated.
closeSession()only kills the queue-owner process, not the agent or MCP children.Over 46+ LLM calls across 23 stages:
46 × ~10 processes × ~58MB = ~27GBof orphans.Key Evidence
We ran 12 test configurations:
Memory trajectory comparison:
Consumer-Side Fixes We Tried (None Worked Reliably)
acpx execinstead of persistent sessions — still leaks, exec also spawns full process treestart_new_session=True+ PGID scanner — race condition, children reparent to PID 1 before scan--ttl 30— irrelevant to root causeSuggested Fix
The fix needs to be in acpx itself. When an exec call or session completes, acpx should kill its entire process tree:
Related upstream: openclaw/openclaw#35886, openclaw/acpx#47 (fixed session records in #70, but not child processes).
Reproduction
Process count climbs ~10 per stage and never decreases.