Skip to content

perf(tracing): no per-session span cap — long-lived sessions grow ses_<id>.json unbounded with O(n^2) snapshot write-amplification #903

@anandgupta42

Description

@anandgupta42

Found during the v0.8.4 release review (Chaos Gremlin persona). Deferred — needs a retention/segmentation design, >30 min.

Problem

#895 correctly stopped per-turn truncation so traces persist across turns. But snapshot() rewrites the entire spans array on every span completion, and there is no per-trace span cap. MAX_TRACES=100 bounds the count of trace files, not the size of any one file. For a deliberately long-lived session (the stated design goal) with thousands of tool calls, ses_<id>.json grows monotonically and is fully serialized on every event — an O(n²) write-amplification and a disk-growth concern over a multi-day session.

Proposal (pick one, to be designed)

  • Per-session span cap with head/tail retention (keep first N + last M, summarize the middle), or
  • Append-only segmented trace files + a manifest, so a snapshot doesn't rewrite the whole array, or
  • Debounce/coalesce snapshots more aggressively for large traces and write deltas.

Acceptance

  • A session emitting 10k+ spans has bounded ses_<id>.json size and sub-linear per-event write cost.
  • Viewer still reconstructs the (possibly summarized) trace.

Refs: #895

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions