Summary
Ad-hoc session analysis (like calculating compaction rates) currently requires writing Python scripts against raw JSONL. The API lacks flexible querying primitives that would let LLMs perform arbitrary analysis via MCP tools.
Problem / Motivation
When analyzing compaction frequency, we couldn't use MCP tools. We had to:
- Write Python to iterate raw JSONL files
- Track state across entries (last timestamp, event counts between markers)
- Compute custom aggregations (turns per compaction, rate per hour)
This pattern repeats for any analysis not pre-built into the API. The friction prevents exploratory analysis.
Context
Proposed Solution
1. Entry Type Filtering
Currently get_session_events() filters by tool. Add entry_type filter:
def get_session_events(..., entry_type: str | None = None):
# entry_type in ('user', 'assistant', 'tool_use', 'tool_result', 'summary')
2. Event Context Windows
Get N events before/after a specific event (useful for understanding what happened around compaction, errors, etc.):
def get_event_context(
event_uuid: str,
before: int = 5,
after: int = 5
) -> dict:
"""Return events surrounding a specific event."""
return {
"target": Event,
"before": [Event, ...],
"after": [Event, ...]
}
3. Custom Grouping/Bucketing
Allow grouping by arbitrary time buckets:
def get_event_counts(
days: int = 7,
bucket: str = "hour", # "minute", "hour", "day", "session"
entry_type: str | None = None,
tool: str | None = None
) -> list[dict]:
"""Count events by time bucket with optional filters."""
return [{"bucket": "2026-01-06T14:00", "count": 42}, ...]
4. Marker-Based Aggregation
Count events between markers (e.g., user turns between compactions):
def get_events_between_markers(
marker_type: str = "summary", # entry_type that defines boundaries
count_type: str = "user", # entry_type to count
days: int = 7
) -> dict:
"""Count events of one type between occurrences of another type."""
return {
"marker_count": 50,
"counted_events": 3500,
"avg_per_marker": 70.0,
"median_per_marker": 65.0,
"distribution": [{"marker_index": 1, "count": 42}, ...]
}
5. Raw Event Iteration (Optional)
For truly ad-hoc analysis, expose raw event stream with pagination:
def iterate_events(
session_id: str | None = None,
entry_types: list[str] | None = None,
cursor: str | None = None,
limit: int = 100
) -> dict:
"""Low-level event iteration for custom analysis."""
return {
"events": [...],
"next_cursor": str | None
}
Assumptions
| Assumption |
Confidence |
Impact if Wrong |
| LLMs can compose primitives into complex queries |
High |
May need higher-level convenience functions |
| Entry types are stable set |
Medium |
May need to handle unknown types gracefully |
| Time bucketing covers common use cases |
High |
Could add custom bucket expressions later |
Open Questions
- Should
iterate_events return full Event objects or just essential fields (to reduce payload)?
- Should marker-based aggregation support multiple marker types (e.g., "between compaction OR session start")?
- Performance concern: context windows require efficient index on uuid - worth adding?
Actionable Requirements
| # |
Requirement |
Owner |
Blocked By |
| 1 |
Add entry_type filter to get_session_events() |
Claude |
#55 |
| 2 |
Implement get_event_context() query + MCP tool |
Claude |
#55 |
| 3 |
Implement get_event_counts() with bucketing |
Claude |
#55 |
| 4 |
Implement get_events_between_markers() |
Claude |
#55 |
| 5 |
(Optional) Implement iterate_events() |
Claude |
#55 |
| 6 |
Update guide.md with new primitives |
Claude |
1-4 |
Test Requirements
- Unit: Each new query function with various filter combinations
- Integration: Compose primitives to answer "user turns per compaction" via MCP only
- Edge cases:
- Empty result sets
- Single-event sessions
- Markers at session boundaries
Implementation Checklist
Summary
Ad-hoc session analysis (like calculating compaction rates) currently requires writing Python scripts against raw JSONL. The API lacks flexible querying primitives that would let LLMs perform arbitrary analysis via MCP tools.
Problem / Motivation
When analyzing compaction frequency, we couldn't use MCP tools. We had to:
This pattern repeats for any analysis not pre-built into the API. The friction prevents exploratory analysis.
Context
queries.py(current query patterns),server.py(MCP tools)Proposed Solution
1. Entry Type Filtering
Currently
get_session_events()filters bytool. Addentry_typefilter:2. Event Context Windows
Get N events before/after a specific event (useful for understanding what happened around compaction, errors, etc.):
3. Custom Grouping/Bucketing
Allow grouping by arbitrary time buckets:
4. Marker-Based Aggregation
Count events between markers (e.g., user turns between compactions):
5. Raw Event Iteration (Optional)
For truly ad-hoc analysis, expose raw event stream with pagination:
Assumptions
Open Questions
iterate_eventsreturn full Event objects or just essential fields (to reduce payload)?Actionable Requirements
entry_typefilter toget_session_events()get_event_context()query + MCP toolget_event_counts()with bucketingget_events_between_markers()iterate_events()Test Requirements
Implementation Checklist
entry_typefilter (low-hanging fruit)get_event_context()get_event_counts()with bucketingget_events_between_markers()