Skip to content

Daemon unresponsive: reindexMemoryArtifacts synchronous full-scan blocks event loop with thousands of memory artifacts #512

@Ostico

Description

@Ostico

Signet version: 0.99.3
Environment: Linux, Node v22.22.1, Bun 1.3.12
Related issues: N/A


Summary

The daemon becomes unresponsive after startup when thousands of memory artifact files exist. Instrumented profiling (resource-monitor, branch ostico/fd-diagnostic) confirmed the root cause: importExistingMemoryFiles() does synchronous I/O on ~7,900 files in a tight loop, blocking the event loop for ~1.9s and triggering chokidar to accumulate ~7,932 persistent read-only FDs that are never released — even on watcher.close().


Observed Behavior

  • Daemon becomes unresponsive to HTTP requests including /health
  • Event loop lag of 1,856ms confirmed by resource monitor
  • ls -la /proc/<pid>/fd | wc -l shows ~8,000+ FDs, almost all pointing to .md artifact files
  • watcher.close() releases zero memory FDs (chokidar v4 leak on Bun)

Expected Behavior

  • Daemon should not watch immutable artifact files
  • Memory import should be async and batched to avoid event loop blocking
  • Daemon should remain responsive indefinitely under normal operation

Instrumented FD Progression (empirical, 2026-04-15)

Resource monitor snapshots at each daemon lifecycle point:

Stage Total FDs Memory .md FDs Sockets RSS
post-db-init 20 0 4 208 MB
post-watcher 20 0 4 215 MB
post-pipeline 284 240 8 244 MB
server-ready 376 320 9 249 MB
/health (~8s later) 8,025 7,932 13 263 MB
cleanup-start 8,021 7,932 9 249 MB
pre-cleanup-watcher 8,012 7,932 6 248 MB
post-cleanup-watcher 8,012 7,932 6 252 MB

Key observation: FDs jump from 376 → 8,025 in ~8 seconds after importExistingMemoryFiles() runs. watcher.close() releases 0 of 7,932 memory FDs.


Root Cause Analysis (corrected from instrumented data)

Corrected mechanism

The original issue described inotify watches as the FD source. Instrumented profiling disproved this — only 1 inotify instance exists. The actual mechanism:

  • Chokidar v4 on Linux/Bun opens read-only FDs (lr-x) for every watched file, not inotify watches
  • These FDs accumulate as importExistingMemoryFiles() does readFileSync on each file, which triggers chokidar's watcher to open persistent handles
  • The user's FD limit is 1,048,576 (not the assumed 8,192), so FD exhaustion alone doesn't crash the daemon at current file counts
  • The event loop blocking from ~7,900 synchronous reads is the primary responsiveness killer

Vector 1: importExistingMemoryFiles() (PRIMARY)

daemon.ts onListening callback → importExistingMemoryFiles() → iterates ALL ~7,916 .md files via readdirSync + readFileSync in a synchronous loop. Blocks the event loop for ~1.9s. Each file access triggers chokidar to open a persistent read-only FD.

Vector 2: Chokidar FD accumulation (SECONDARY)

Chokidar watches ~/.agents/memory/ including ~7,932 immutable artifact files. The ignored callback reduces this from 7,940 → ~880 FDs (87% reduction when tested), but doesn't fully prevent all FD creation. Additionally, watcher.close() leaks ~7,478 of ~7,940 FDs on Bun.

Vector 3: reindexMemoryArtifacts() (TERTIARY)

memory-lineage.ts:538-591 — full DELETE + synchronous re-read of all artifact files. Called from renderMemoryProjection() on synthesis. Now instrumented with logger.time() for ongoing monitoring.


Suggested Fix

Fix 1: Exclude artifact files from chokidar watcher

In watcher-ignore.ts, add the canonical artifact filename pattern:

/^\d{4}-\d{2}-\d{2}T.*--[a-z2-7]{16}--(summary|transcript|compaction|manifest)\.md$/

Also exclude MEMORY.backup-*.md files (309 files, not useful to watch). Tested reduction: 7,940 → ~880 FDs.

Fix 2: Make importExistingMemoryFiles() async with batching

Replace the synchronous loop with await-based batched processing (e.g., 50 files per tick) to keep the event loop responsive during startup.

Fix 3: Replace full-scan reindex with incremental approach

Replace the DELETE + full-read loop in reindexMemoryArtifacts() with an incremental strategy: query existing artifact hashes, compare against directory listing, only read/upsert new or changed files.


Impact

  • Daemon becomes unresponsive, requiring manual restart
  • All harness integrations lose memory injection
  • No graceful degradation — process alive at 0% CPU but cannot accept connections
  • Scales linearly with artifact count — gets worse over time

Diagnostic Tooling Added

Branch ostico/fd-diagnostic adds permanent resource monitoring:

  • resource-monitor.ts — FD snapshot, event loop lag detection, periodic polling
  • Lifecycle snapshots at: post-db-init, post-watcher, post-pipeline, server-ready, cleanup stages
  • /health endpoint extended with additive resources key
  • reindexMemoryArtifacts() instrumented with logger.time()

Environment Details

  • Signet: 0.99.3
  • OS: Linux
  • Node: v22.22.1
  • Bun: 1.3.12
  • DB: SQLite (bun:sqlite)
  • Date observed: 2026-04-15
  • Profiled with: resource-monitor.ts on branch ostico/fd-diagnostic

Working on the fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions