Bug: Aegis Server OOM Crash Loop (4GB heap limit)
Environment
- Aegis version: latest develop
- Node.js: v22.22.1
- OS: Linux 6.17.0-23-generic (x64)
- Restart counter: 595 (systemd auto-restart)
- OOM frequency: 8 crashes in the last hour
Description
Aegis server process crashes with FATAL ERROR: Reached heap limit Allocation failed — JavaScript heap out of memory at ~4GB. Systemd auto-restarts, but the leak is chronic — the process re-OOMs after accumulating memory again.
Evidence
May 22 13:58:45 node[1435024]: [1435024:0x3bbc3000] 191663 ms: Scavenge 4066.5 (4091.1) -> 4061.1 (4093.3) MB
May 22 13:58:45 node[1435024]: FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
May 22 13:58:47 systemd: aegis.service: Main process exited, code=dumped, status=6/ABRT
May 22 13:58:53 systemd: aegis.service: Scheduled restart job, restart counter is at 595.
Impact
- P0/Critical: Cannot use Aegis for development (sessions die on OOM)
- Session data lost on each crash
- Telegram topic mappings grow (105-106 restored each restart)
- 220 total sessions tracked
Steps to Reproduce
- Run Aegis server with normal workload
- Observe heap growth over ~3 minutes
- Crash at ~4GB heap
Suspected Causes
- Memory leak in ACP session tracking or state store
- Growing in-memory structures not being GC'd (session maps, transcript caches)
- Possible leak in ACP local storage JSON parsing/writing
Immediate Mitigation
- Increase Node.js heap:
NODE_OPTIONS=--max-old-space-size=8192
- Root cause: profile memory allocation to find the leak
Bug: Aegis Server OOM Crash Loop (4GB heap limit)
Environment
Description
Aegis server process crashes with
FATAL ERROR: Reached heap limit Allocation failed — JavaScript heap out of memoryat ~4GB. Systemd auto-restarts, but the leak is chronic — the process re-OOMs after accumulating memory again.Evidence
Impact
Steps to Reproduce
Suspected Causes
Immediate Mitigation
NODE_OPTIONS=--max-old-space-size=8192