Skip to content

RFC: Add flexible event querying primitives #56

Description

@evansenter

Summary

Ad-hoc session analysis (like calculating compaction rates) currently requires writing Python scripts against raw JSONL. The API lacks flexible querying primitives that would let LLMs perform arbitrary analysis via MCP tools.

Problem / Motivation

When analyzing compaction frequency, we couldn't use MCP tools. We had to:

  1. Write Python to iterate raw JSONL files
  2. Track state across entries (last timestamp, event counts between markers)
  3. Compute custom aggregations (turns per compaction, rate per hour)

This pattern repeats for any analysis not pre-built into the API. The friction prevents exploratory analysis.

Context

Proposed Solution

1. Entry Type Filtering

Currently get_session_events() filters by tool. Add entry_type filter:

def get_session_events(..., entry_type: str | None = None):
    # entry_type in ('user', 'assistant', 'tool_use', 'tool_result', 'summary')

2. Event Context Windows

Get N events before/after a specific event (useful for understanding what happened around compaction, errors, etc.):

def get_event_context(
    event_uuid: str,
    before: int = 5,
    after: int = 5
) -> dict:
    """Return events surrounding a specific event."""
    return {
        "target": Event,
        "before": [Event, ...],
        "after": [Event, ...]
    }

3. Custom Grouping/Bucketing

Allow grouping by arbitrary time buckets:

def get_event_counts(
    days: int = 7,
    bucket: str = "hour",  # "minute", "hour", "day", "session"
    entry_type: str | None = None,
    tool: str | None = None
) -> list[dict]:
    """Count events by time bucket with optional filters."""
    return [{"bucket": "2026-01-06T14:00", "count": 42}, ...]

4. Marker-Based Aggregation

Count events between markers (e.g., user turns between compactions):

def get_events_between_markers(
    marker_type: str = "summary",  # entry_type that defines boundaries
    count_type: str = "user",      # entry_type to count
    days: int = 7
) -> dict:
    """Count events of one type between occurrences of another type."""
    return {
        "marker_count": 50,
        "counted_events": 3500,
        "avg_per_marker": 70.0,
        "median_per_marker": 65.0,
        "distribution": [{"marker_index": 1, "count": 42}, ...]
    }

5. Raw Event Iteration (Optional)

For truly ad-hoc analysis, expose raw event stream with pagination:

def iterate_events(
    session_id: str | None = None,
    entry_types: list[str] | None = None,
    cursor: str | None = None,
    limit: int = 100
) -> dict:
    """Low-level event iteration for custom analysis."""
    return {
        "events": [...],
        "next_cursor": str | None
    }

Assumptions

Assumption Confidence Impact if Wrong
LLMs can compose primitives into complex queries High May need higher-level convenience functions
Entry types are stable set Medium May need to handle unknown types gracefully
Time bucketing covers common use cases High Could add custom bucket expressions later

Open Questions

  1. Should iterate_events return full Event objects or just essential fields (to reduce payload)?
  2. Should marker-based aggregation support multiple marker types (e.g., "between compaction OR session start")?
  3. Performance concern: context windows require efficient index on uuid - worth adding?

Actionable Requirements

# Requirement Owner Blocked By
1 Add entry_type filter to get_session_events() Claude #55
2 Implement get_event_context() query + MCP tool Claude #55
3 Implement get_event_counts() with bucketing Claude #55
4 Implement get_events_between_markers() Claude #55
5 (Optional) Implement iterate_events() Claude #55
6 Update guide.md with new primitives Claude 1-4

Test Requirements

  • Unit: Each new query function with various filter combinations
  • Integration: Compose primitives to answer "user turns per compaction" via MCP only
  • Edge cases:
    • Empty result sets
    • Single-event sessions
    • Markers at session boundaries

Implementation Checklist

  • Add entry_type filter (low-hanging fruit)
  • Add get_event_context()
  • Add get_event_counts() with bucketing
  • Add get_events_between_markers()
  • Self-play test: replicate compaction analysis using only MCP tools
  • Update guide.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions