Skip to content

RFC: Track compaction events with inferred timestamps #55

Description

@evansenter

Summary

Session compaction events ("type": "summary") exist in Claude Code JSONL logs but lack timestamps and aren't meaningfully ingested. This RFC proposes inferring timestamps from surrounding events and exposing compaction rate queries to answer questions like "how often do my sessions compact?"

Problem / Motivation

When asked "how many compactions per hour during active sessions?", we had to write ad-hoc Python scripts to:

  1. Grep for "type":"summary" in raw JSONL files
  2. Infer timing from surrounding timestamped events
  3. Calculate rates manually

This data should be queryable via MCP tools like any other session metric.

Context

  • Discovered during: User question about compaction frequency
  • Relevant files:
    • src/session_analytics/ingest.py:366-381 - Currently creates Event with datetime.now() fallback (wrong - uses ingestion time, not compaction time)
    • src/session_analytics/queries.py - Needs new compaction queries
    • src/session_analytics/server.py - Needs new MCP tools
  • Related issues/PRs: None
  • Raw data format: {"type": "summary", "summary": "...", "leafUuid": "..."}

Proposed Solution

1. Fix Timestamp Inference in Ingestion

Summary events have no timestamp. Infer from the last timestamped event before the summary:

# Track last seen timestamp during file parsing
last_timestamp = None
for line in file:
    entry = json.loads(line)
    if entry.get("timestamp"):
        last_timestamp = entry["timestamp"]
    if entry.get("type") == "summary":
        # Use last_timestamp, not datetime.now()

2. Add Compaction-Specific Fields

Store the summary text and leafUuid for drill-down:

ALTER TABLE events ADD COLUMN compaction_summary TEXT;
ALTER TABLE events ADD COLUMN leaf_uuid TEXT;

3. New Query Functions

def get_compaction_stats(storage, days=7, project=None) -> dict:
    """Returns compaction frequency metrics."""
    return {
        "total_compactions": int,
        "sessions_with_compactions": int,
        "avg_per_hour": float,
        "median_messages_between": float,
        "by_session": [{"session_id": str, "count": int, "rate_per_hour": float}]
    }

4. New MCP Tool

@mcp.tool()
def get_compaction_stats(days: int = 7, project: str | None = None) -> dict:
    """Get compaction frequency and timing metrics."""

Assumptions

Assumption Confidence Impact if Wrong
Summary events always follow timestamped events High Could have null timestamps for edge cases
leafUuid uniquely identifies compaction point Medium May need to track line number as fallback
Compaction rate is useful without message content High Users might want pre/post compaction context

Open Questions

  1. Should we track "messages since last compaction" as a running metric in sessions table?
  2. Should compaction events link to the last N events before compaction (for context)?

Actionable Requirements

# Requirement Owner Blocked By
1 Track last_timestamp during file parsing in ingest.py Claude -
2 Add compaction_summary, leaf_uuid columns (migration v6) Claude -
3 Implement get_compaction_stats() in queries.py Claude 1, 2
4 Add MCP tool in server.py Claude 3
5 Add CLI formatter in cli.py Claude 3
6 Update guide.md Claude 4

Test Requirements

  • Unit: test_ingest.py - summary event timestamp inference
  • Integration: test_queries.py - compaction stats aggregation
  • Edge cases:
    • Session with no compactions
    • Summary as first event (no prior timestamp)
    • Multiple compactions in quick succession

Implementation Checklist

  • Schema migration v6 for new columns
  • Fix timestamp inference in parse_entry()
  • Add get_compaction_stats() query
  • Add MCP tool
  • Add CLI command + formatter
  • Update guide.md
  • Self-play test: can reach actionable compaction insights via MCP only?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions