RFC #26: Add session signals and commit tracking (revised per RFC #17) by evansenter · Pull Request #29 · evansenter/agent-session-analytics

evansenter · 2026-01-01T18:00:18Z

Summary

Implements RFC #26 "Data enrichment for deeper session analysis" with raw session signals and commit tracking capabilities.

Revised per RFC #17 principle: "Don't over-distill - raw data with light structure beats heavily processed summaries."

Key Features

Raw session signals: Surfaces observable metrics (error_rate, commit_count, has_rework, etc.) for LLM interpretation
Session-commit junction table: Links sessions to commits with timing metadata (time_to_commit_seconds, is_first_commit)
New MCP tools: get_session_signals, get_session_commits
New CLI commands: signals, session-commits with --project filter
Design Philosophy: Added "Don't over-distill" principle to CLAUDE.md

What's NOT Included (by design)

Per RFC #17, we intentionally do NOT pre-compute interpretations:

❌ outcome classification (success/abandoned/frustrated)
❌ outcome_confidence scores
❌ satisfaction_score estimates

The consuming LLM has context we lack (user intent, conversation history) and should interpret the raw signals itself.

Technical Details

Schema migration v4 adds session_commits table and context_switch_count to sessions
Enhanced correlate_git_with_sessions() to populate junction table with timing
get_session_signals() returns raw metrics without classification

Testing

197 tests passing
Added tests for new CLI commands, formatters, storage operations, and signal extraction

Test plan

Run make check (fmt, lint, 197 tests pass)
Verify schema migration v4 applies correctly
Test CLI commands: signals, session-commits
Test MCP tools work via server
Verify --project filter works on new commands

🤖 Generated with Claude Code

Implements data enrichment for deeper session analysis: - Add session outcome detection (success/abandoned/frustrated/unknown) - Add session_commits junction table with timing metadata - Add satisfaction scoring based on error rates and patterns - Add 3 new MCP tools: query_outcomes, update_outcomes, get_session_commits - Add 3 new CLI commands: outcomes, update-outcomes, session-commits - Add --project filter to new CLI commands for consistency - Track session_commits_errors in correlate_git_with_sessions return - Add failed_sessions list to update_session_outcomes return - Add DEBUG logging to get_col helper for migration debugging Closes #26 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-01T18:01:53Z

PR Review: RFC #26 Session Outcome Detection and Enrichment

Overall this is a solid implementation of session outcome tracking. The code is well-structured, follows existing patterns, and includes comprehensive tests (202 passing). Here's my detailed feedback:

✅ Strengths

Good architecture: The session-commit junction table design enables rich time-to-commit tracking without modifying the existing git_commits table
Schema migration done right: Migration v4 properly checks for existing columns before adding them, and uses CREATE TABLE IF NOT EXISTS for the junction table
Comprehensive test coverage: Tests cover CLI commands, formatters, storage operations, and pattern detection logic
Consistent patterns: Uses existing patterns like build_where_clause(), @_register_formatter, and the storage public API correctly
Defensive coding: Good use of get_col() helper in _row_to_session() to handle older schemas gracefully

🔧 Suggestions for Improvement

1. SQL Injection Risk in CLI (Medium Priority)

In cmd_session_commits() at cli.py:745-750:

project_filter = "AND s.project_path LIKE ?"
params.append(f"%{project}%")

This is actually safe (parameterized), but the f-string construction is in the WHERE clause. Consider using the existing build_where_clause() helper from queries.py for consistency.

2. Missing project filter in MCP query_outcomes (Low Priority)

The CLI outcomes command accepts --project but the MCP tool query_outcomes doesn't expose this parameter. Consider adding it for parity:

@mcp.tool()
def query_outcomes(days: int = 7, min_events: int = 5, project: str | None = None) -> dict:

3. Unused context_switch_count field

The Session dataclass and schema include context_switch_count, but it's never populated. Either:

Add TODO comment explaining it's for future implementation
Remove it if not planned for this RFC
Implement detection logic

4. Potential performance concern in update_session_outcomes

The function makes individual execute_query + execute_write calls per session (patterns.py:1010-1031). For large datasets, consider:

Using batch updates similar to add_session_commits_batch
Or at minimum, wrapping in a single transaction

5. Confidence score calculation quirk

In detect_session_outcomes() at line 944:

confidence = 1.0 - max_score  # Lower confidence when scores are low

This means when max_score = 0.0, confidence is 1.0 (very confident it's "unknown"). This seems semantically inverted - shouldn't low scores mean low confidence?

6. Missing validation for min_events parameter

If min_events <= 0, the HAVING COUNT(*) >= ? clause could behave unexpectedly. Consider adding validation:

min_events = max(1, min_events)

📝 Minor Nits

patterns.py:946 - Consider using max(scores, key=lambda k: scores[k]) instead of scores.get for type clarity
The failed_sessions list in update_session_outcomes caps at 10 entries - consider documenting this truncation in the return dict schema
cli.py:385 - The formatter predicate "commits" in d and "total_commits" in d could match unrelated data; consider adding a more specific field like "_type": "session_commits"

🔒 Security

No security concerns identified. SQL queries use parameterized statements throughout.

📊 Performance

The queries are well-indexed (indexes on session_commits.session_id and commit_sha). The main concern is the loop in update_session_outcomes mentioned above.

Summary: This PR is ready to merge with minor improvements. The implementation follows project conventions, has good test coverage, and implements a useful feature. The suggestions above are mostly enhancements rather than blockers.

Per RFC #17: "Raw data with light structure beats heavily processed summaries. The LLM can handle context." This refactor removes pre-computed interpretation fields and surfaces raw signals instead: Removed: - outcome column (success/abandoned/frustrated/unknown) - outcome_confidence column - satisfaction_score column - detect_session_outcomes() → replaced with get_session_signals() - update_session_outcomes() → removed entirely - query_outcomes MCP tool → replaced with get_session_signals - update_outcomes MCP tool → removed - outcomes CLI command → replaced with signals Added: - get_session_signals(): Returns raw metrics for LLM interpretation (error_count, error_rate, edit_count, commit_count, has_rework, has_pr_activity, duration_minutes) - Design Philosophy section in CLAUDE.md documenting the principle Kept (observable data): - session_commits junction table with timing data - context_switch_count field - get_session_commits MCP tool and CLI command 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter · 2026-01-01T18:27:06Z

Refactored: Applied RFC #17 'Don't Over-Distill' Principle

This PR has been refactored to align with the design principle from RFC #17:

"Don't over-distill": Raw data with light structure beats heavily processed summaries. The LLM can handle context.

What Changed

Removed (pre-computed interpretation):

outcome column (success/abandoned/frustrated/unknown)
outcome_confidence column
satisfaction_score column
detect_session_outcomes() function
update_session_outcomes() function
query_outcomes and update_outcomes MCP tools
outcomes and update-outcomes CLI commands

Added (raw signals):

get_session_signals(): Returns raw metrics for LLM interpretation
get_session_signals MCP tool
signals CLI command
Design Philosophy section in CLAUDE.md

Kept (observable data):

session_commits junction table with timing data
context_switch_count field
get_session_commits MCP tool and CLI command

Raw Signal Output Example

{
    "event_count": 50,
    "error_count": 2,
    "error_rate": 0.04,
    "edit_count": 10,
    "git_count": 5,
    "commit_count": 2,
    "has_rework": False,
    "has_pr_activity": True,
    "duration_minutes": 45.0
}

The consuming LLM can interpret these signals with its fuller context rather than us pre-computing labels.

🤖 Generated with Claude Code

claude · 2026-01-01T18:29:12Z

PR Review: RFC #26 - Session Outcome Detection and Enrichment

Thanks for this well-structured PR implementing session outcome tracking! The implementation follows the project's design philosophy nicely. Here's my detailed review:

✅ Strengths

1. Excellent adherence to RFC #17 "Don't over-distill" principle
The PR correctly provides raw signals (error_count, error_rate, has_rework, commit_count) instead of pre-computed interpretations like "frustrated" or "successful". This is exactly the right approach per the project's design philosophy.

2. Clean schema migration
The migration v4 in storage.py:250-290 is well-structured with proper column checks and index creation. The junction table session_commits is a good design choice for the many-to-many relationship between sessions and commits.

3. Good test coverage
202 tests passing with new tests for:

CLI commands and formatters (test_cli.py:392-560)
Storage operations (test_storage.py:693-775)
Pattern detection (test_patterns.py:1097-1240)

4. Consistent API design
The new MCP tools (get_session_signals, get_session_commits) and CLI commands follow the existing patterns in the codebase.

⚠️ Issues to Address

1. SQL Injection Risk in patterns.py:807-824

The project_filter is built using an f-string and directly interpolated into the SQL query:

project_filter = ""
if project:
    project_filter = "AND project_path LIKE ?"
# ...
sessions = storage.execute_query(
    f"""
    SELECT ... WHERE timestamp >= ?
    {project_filter}  # f-string interpolation
    GROUP BY session_id
    HAVING COUNT(*) >= ?
    """,
    tuple(params),
)

While the parameter value is properly parameterized, the pattern of building SQL strings with f-strings can be error-prone. This specific instance is safe since project_filter is a hardcoded string, but it sets a risky precedent. Consider using a query builder or moving the conditional logic into the SQL itself:

WHERE timestamp >= ?
  AND (? IS NULL OR project_path LIKE ?)

2. Potential N+1 Query Pattern in get_session_signals

The function makes 4 separate queries (patterns.py:807-870):

Session summaries
Commit counts per session
Rework detection (file edits)
PR activity detection

For large datasets, this could be optimized by joining these queries or using CTEs. Not critical for current scale but worth noting for future optimization.

3. Missing datetime timezone handling

In patterns.py:796 and ingest.py:613:

cutoff = datetime.now() - timedelta(days=days)

This uses naive datetimes. If session logs contain timezone-aware timestamps, comparisons could be inconsistent. Consider using datetime.now(tz=timezone.utc) or documenting the timezone assumption.

4. CLI cmd_session_commits duplicates query logic

In cli.py:700-735, the cmd_session_commits function contains a raw SQL query that duplicates logic from storage.get_commits_for_sessions(). Consider refactoring to use the existing storage method with an optional date filter parameter instead of inline SQL.

💡 Suggestions (Non-blocking)

1. Consider adding session commit count validation
When adding session commits in ingest.py:718-727, there's no validation that the session exists. With foreign key constraints, this would fail at the DB level, but an explicit check might provide better error messages.

2. Documentation update for CLAUDE.md is good
The PR description in CLAUDE.md properly documents the new tools and design philosophy section. Well done.

3. Consider adding --project filter to MCP tools
The CLI signals command supports --project filter, but the MCP get_session_signals tool doesn't expose this parameter. For consistency, consider adding it to the MCP tool as well.

📊 Summary

Aspect	Rating
Code Quality	⭐⭐⭐⭐
Test Coverage	⭐⭐⭐⭐⭐
Security	⭐⭐⭐⭐ (minor SQL pattern concern)
Performance	⭐⭐⭐⭐ (acceptable, room for optimization)
Documentation	⭐⭐⭐⭐⭐

Recommendation: Approve with minor suggestions. The core implementation is solid and follows project conventions well.

API improvements for LLM data analysis workflows: 1. Add session_id filter to query_timeline and get_user_journey - Allows drilling into a specific session's full trace - Enables: discover sessions → get signals → drill into session 2. Change min_events default from 5 to 1 in get_session_signals - 36% of sessions were being filtered out (premature optimization) - Per RFC #17: let the LLM decide what's meaningful based on event_count - Users can still pass --min-events=5 if they want filtering 3. Update CLAUDE.md with Session Discovery and Drill-In Flow docs New CLI option: session-analytics-cli journey --session-id <id> New MCP parameters: query_timeline(session_id=...) get_user_journey(session_id=...) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter · 2026-01-01T18:53:00Z

Additional Improvements: Session Discovery and Drill-In Flow

Based on API gap analysis, added improvements for LLM data analysis workflows:

1. Session ID Filters

Added session_id parameter to query_timeline and get_user_journey

Enables the discovery → signals → drill-in workflow:

1. query_sessions() → list all session IDs
2. get_session_signals() → raw metrics per session
3. query_timeline(session_id=<id>) → full event trace
4. get_user_journey(session_id=<id>) → all user messages

2. Fixed min_events Default

Changed from 5 to 1 (36% of sessions were being excluded!)
Per RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17 "don't over-distill": let the LLM decide what's meaningful based on event_count

3. Documentation

Added "Session Discovery and Drill-In Flow" section to CLAUDE.md
Documented all MCP tools with their filtering capabilities

Tests: All 200 passing.

claude · 2026-01-01T18:54:26Z

PR Review: RFC #26 - Session Signals and Commit Tracking

Overall, this is a well-structured PR that follows the project's design philosophy. The adherence to RFC #17 ("Don't over-distill") is evident throughout. Here's my detailed review:

✅ Strengths

1. Design Philosophy Alignment

The explicit decision to return raw signals (error_count, commit_count, has_rework) rather than pre-computed interpretations like "outcome" or "satisfaction_score" is well-documented and correctly implemented.
Good documentation in CLAUDE.md explaining the rationale.

2. Schema Migration

Migration v4 in storage.py:253-291 is clean and follows the established pattern using the @migration decorator.
Proper use of IF NOT EXISTS for idempotency.
Foreign keys reference the correct parent tables.

3. Test Coverage

Comprehensive tests for the new functionality (197 tests passing).
Tests explicitly verify that interpretation fields are NOT present (tests/test_patterns.py:1207-1259).
Good edge case coverage: empty database, min_events filter, error rates.

4. API Design

The session discovery flow documented in CLAUDE.md is intuitive: query_sessions() → get_session_signals() → drill-in tools.
The session_id filter added to query_timeline and get_user_journey enables the drill-in workflow.

⚠️ Suggestions for Improvement

1. Potential Performance Concern in get_session_signals()
Location: src/session_analytics/patterns.py:807-870

The function executes 4 separate queries:

Session summaries (main aggregation)
Commit counts per session
Rework pattern detection
PR-related activity

For large datasets with many sessions, this could be slow. Consider:

Adding an index on (session_id, file_path) for the rework detection query if not already present
Documenting this as a known limitation for very large datasets

2. Missing Project Filter in Commit Counts Query
Location: src/session_analytics/patterns.py:829-837

The commit counts query doesn't apply the project filter, which means sessions from other projects could show inflated commit counts. This is minor since the main session query filters by project, but could cause confusion if a session spans multiple projects.

3. Time Handling Edge Case
Location: src/session_analytics/ingest.py:678-680

Clamping negative time_to_commit values to 0 is reasonable, but it might be worth logging when this happens (commits before session start are unusual and might indicate a timing issue).

4. Minor: Inconsistent Return Key Naming
Location: src/session_analytics/server.py:589-595 vs src/session_analytics/cli.py:709-725

Server's get_session_commits() returns commit_count when filtering by session_id
CLI's cmd_session_commits() builds a result with total_commits

While not a bug, consistency would help API consumers.

🔍 Code Quality Notes

Good Practices Observed:

Proper type hints throughout (list[tuple[str, str, int | None, bool]])
Docstrings explain the RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17/RFC RFC: Data enrichment for deeper session analysis #26 context
Formatter registry pattern used correctly for new CLI output formats
Batch operations used for efficiency (add_session_commits_batch)

✅ Security

No security concerns identified. The code properly uses parameterized queries and doesn't introduce any new attack surfaces.

Summary

Verdict: Approve with minor suggestions

This PR is well-implemented and ready for merge. The suggestions above are minor improvements rather than blockers. The code follows the project's conventions, maintains good test coverage, and correctly implements the RFC #17 principle of surfacing raw data for LLM interpretation.

The addition of the "Session Discovery and Drill-In Flow" documentation in CLAUDE.md is particularly helpful for users of the API.

The discovery flow (query_sessions → get_session_signals → drill-in) was documented in CLAUDE.md but not in guide.md, which is exposed as the MCP resource `session-analytics://guide` for LLM callers. - Add "Session Discovery and Drill-In" section to guide.md with examples - Add maintainer note to CLAUDE.md to keep both doc surfaces in sync 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-01T19:00:58Z

PR Review: RFC #26 - Session Signals and Commit Tracking

This is a well-structured implementation that follows the project's design philosophy. The revision per RFC #17 ("Don't over-distill") is a sound architectural decision.

Strengths

Architecture & Design

Excellent adherence to RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17's principle of surfacing raw data over pre-computed interpretations
Clean separation between observable signals (error_rate, has_rework) and interpretation (left to consuming LLM)
The session-commit junction table with timing metadata (time_to_commit_seconds, is_first_commit) provides valuable correlation data
Good use of the existing migration pattern with @migration(4, "add_session_enrichment")

Code Quality

Well-documented functions with clear docstrings explaining RFC context
Consistent pattern usage matching existing codebase (formatter registry, build_where_clause helper)
The get_session_signals() implementation in patterns.py:772-922 is clean and efficient with sensible query batching

Test Coverage

Comprehensive test coverage (197 tests passing per PR description)
Tests explicitly verify the absence of interpretation fields (outcome, confidence), which is a great way to enforce the design principle
Good edge case coverage (empty database, min_events filter, all raw fields present)

Minor Suggestions

Query efficiency in get_session_signals() (patterns.py:828-836)
The commit counts query fetches all commits without the days filter. Consider joining with the sessions subquery or adding a date filter to avoid scanning old data. For now this is fine given typical usage patterns, but worth noting for future optimization.
Rework threshold is hardcoded (patterns.py:849)
The rework detection uses a magic number 4 for "file edited 4+ times". This works but could benefit from being a named constant or configurable parameter in the future.
Inconsistent default for min_events
- MCP tool get_session_signals in server.py uses min_events: int = 5
- CLI and patterns function use min_events: int = 1
The commit message mentions changing the default from 5 to 1, but the server.py still has 5. This appears intentional based on the commit history (Patch 3/4 mentions this), but the inconsistency between MCP and CLI defaults could confuse users.
Documentation sync (CLAUDE.md and guide.md)
The maintainer note in CLAUDE.md correctly calls out keeping both files in sync. The guide.md update looks complete and consistent.

Security

No security concerns identified
SQL queries use parameterized queries throughout
No external input is directly executed

Performance

Query patterns look reasonable for typical workloads
The batch operations for session_commits (add_session_commits_batch) handle bulk inserts efficiently
Index creation on session_commits table is appropriate

Summary

Recommendation: Approve

This PR is well-implemented and follows the project's established patterns. The revision to use raw signals instead of pre-computed outcomes is the right call given the stated design philosophy. The code is clean, well-tested, and properly documented.

The minor suggestions above are non-blocking improvements that could be addressed in follow-up work if desired.

- Add `make restart` to reload LaunchAgent after code changes - Add "When to restart" table to CLAUDE.md clarifying which changes need a restart (MCP tools, queries, storage) vs which don't (CLI, tests, docs) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-01T19:04:24Z

Code Review: RFC #26 Session Signals and Commit Tracking

Overall, this is a well-structured PR that implements session signals and commit tracking while adhering to the RFC #17 "Don't over-distill" design philosophy. The implementation is clean, well-tested (197 tests), and the documentation updates are thorough.

✅ Strengths

Excellent adherence to RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17 design principle: The deliberate decision to return raw signals (error_rate, commit_count, has_rework) instead of pre-computed interpretations (outcome: "frustrated") is the right approach. The consuming LLM has context that this analytics layer lacks.
Comprehensive test coverage: Tests cover the new signals functionality, CLI formatters, storage operations, and edge cases like empty databases and min_events filtering.
Well-documented session discovery flow: The "Session Discovery and Drill-In Flow" documentation in both CLAUDE.md and guide.md provides clear guidance on how to use these new tools together.
Thoughtful migration design: Migration v4 properly handles the schema evolution with appropriate foreign keys and indexes on the session_commits junction table.
Good API consistency: The session_id filter added to query_timeline and get_user_journey enables the drill-in workflow pattern documented in the PR.

🔍 Suggestions for Consideration

get_session_signals missing project parameter in MCP tool (server.py:550-571)

The patterns.get_session_signals() function accepts a project parameter, but the MCP tool wrapper does not expose it. The CLI command does expose --project, so the MCP tool should match for consistency.
Minor: Rework threshold hardcoded (patterns.py:849)

The rework detection uses a hardcoded threshold of 4 edits. This is reasonable but might be worth making configurable if users work with different edit patterns.
Potential performance consideration (patterns.py:807-872)

get_session_signals() makes 4 separate queries to SQLite (main aggregation, commit counts, rework detection, PR activity). For large datasets, these could potentially be consolidated. However, SQLite performance on local files is typically excellent, so this is more of a note for future optimization if needed.

📋 Minor Observations

The min_events default change from 5 to 1 is well-reasoned - letting the LLM decide what is meaningful is consistent with the design philosophy.
The Makefile restart target for LaunchAgent is a nice addition for developer workflow.
The CLI --session-id option on the journey command completes the drill-in workflow nicely.

🏁 Summary

This PR is ready to merge. The implementation is solid, well-tested, and follows the project design principles. The only substantive suggestion is adding the project parameter to the MCP get_session_signals tool for consistency with the CLI.

🤖 Review generated with Claude Code

Align session-analytics API with claude-event-bus conventions: **Tool renames (query_* → list_*/get_*)**: - query_tool_frequency → get_tool_frequency - query_command_frequency → get_command_frequency - query_sessions → list_sessions - query_timeline → get_session_events - query_tokens → get_token_usage - query_sequences → get_tool_sequences - query_permission_gaps → get_permission_gaps - query_outcomes → get_session_signals (revised per RFC #17) **Argument standardization**: - threshold → min_count (consistency) - min_events → min_count (consistency) - hours → days (fractional: 0.5 = 12 hours) - count → limit (alignment with event-bus) **Documentation**: - Add Suggested Workflows section to guide.md with flowcharts - Update CLAUDE.md tool table - Add MCP API Naming Conventions section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter · 2026-01-01T19:30:03Z

API Naming Standardization

Added commit to align MCP API naming with claude-event-bus conventions:

Tool Renames

Old Name	New Name
`query_tool_frequency`	`get_tool_frequency`
`query_command_frequency`	`get_command_frequency`
`query_sessions`	`list_sessions`
`query_timeline`	`get_session_events`
`query_tokens`	`get_token_usage`
`query_sequences`	`get_tool_sequences`
`query_permission_gaps`	`get_permission_gaps`
`query_outcomes`	`get_session_signals`

Argument Standardization

threshold → min_count
min_events → min_count
hours → days (fractional: 0.5 = 12 hours)
count → limit

Documentation

Added "Suggested Workflows" section to guide.md with flowcharts
Updated all documentation to use new tool names

All 200 tests pass.

claude · 2026-01-01T19:30:28Z

PR Review: RFC #26 - Session Signals and Commit Tracking

This is a well-structured implementation that adds valuable session analytics capabilities while adhering to the RFC #17 "Don't over-distill" design principle. Here's my detailed review:

✅ Strengths

1. Excellent Design Philosophy Adherence
The revision to avoid pre-computed outcome classifications (success, frustrated, confidence) is the right call. Returning raw signals like error_rate, commit_count, has_rework, and has_pr_activity lets the consuming LLM interpret context appropriately.

2. Clean Schema Migration

Migration v4 (add_session_enrichment) is well-documented with clear comments about the RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17 principle
The session_commits junction table is a good design choice for tracking time-to-commit metrics and supporting multiple commits per session
Foreign key constraints and indexes are properly defined

3. Comprehensive Test Coverage

197 tests passing is excellent
TestGetSessionSignals thoroughly validates the RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17 principle by explicitly checking that interpretation fields (outcome, confidence, satisfaction_score) are NOT present
Good edge case coverage: empty database, min_count filtering, error rates

4. CLI/MCP Tool Naming Consistency
The renaming to follow the standard conventions (get_*, list_*) improves API consistency:

query_tool_frequency → get_tool_frequency
query_sessions → list_sessions
query_permission_gaps → get_permission_gaps

5. Good Documentation

CLAUDE.md updates with the "Don't over-distill" principle and examples are clear
Guide.md has useful "Suggested Workflows" section with the drill-in pattern

⚠️ Minor Issues / Suggestions

1. Inconsistent Parameter Naming in get_session_signals

In server.py:554, the MCP tool signature doesn't include the project parameter that exists in patterns.get_session_signals:

# server.py - missing project parameter
@mcp.tool()
def get_session_signals(days: int = 7, min_count: int = 1) -> dict:
    ...
    result = patterns.get_session_signals(storage, days=days, min_count=min_count)

But patterns.py:772 accepts a project parameter:

def get_session_signals(
    storage: SQLiteStorage,
    days: int = 7,
    min_count: int = 1,
    project: str | None = None,  # Not exposed in MCP tool
)

The CLI command (cmd_signals) correctly passes --project, but the MCP tool doesn't expose it.

2. Potential Edge Case in _format_signals

In cli.py:357, the error_rate formatting assumes the value exists:

error_info = f", {sess['error_rate']:.0%} errors" if sess.get("error_rate", 0) > 0 else ""

This is fine, but consider that a session with 0 events would have error_rate = 0/0, which would be caught by the min_count filter. Good defensive coding.

3. Redundant Query in cmd_session_commits

In cli.py:710-731, when no session_id is provided, the code executes a raw SQL query that duplicates some logic. Consider whether this could use storage.get_commits_for_sessions() instead for consistency, though the current approach with project filtering is reasonable.

4. Magic Number in _format_signals

The limit of 15 sessions in _format_signals (cli.py:368) is hardcoded:

for sess in data.get("sessions", [])[:15]:

Consider making this configurable or at least documenting why 15 was chosen.

🔍 Code Quality Notes

Good patterns observed:

Proper use of storage.execute_query() public API instead of _connect()
Session commit correlation handles edge cases (commits before session start clamped to 0)
is_first_commit tracking is correctly calculated per-session
Formatter registry pattern with predicates is clean

SQL Safety:

All SQL queries use parameterized queries - no SQL injection risks
The project_filter in get_session_signals uses LIKE with %project% which is safe since the parameter is properly bound

📋 Test Coverage Assessment

The new tests thoroughly cover:

✅ Empty database handling
✅ Commit count inclusion in signals
✅ Error rate calculation
✅ min_count filtering
✅ All expected raw signal fields present
✅ NO interpretation fields present (RFC RFC: LLM-Powered Analysis and Cross-Session Visibility #17 validation)
✅ CLI formatters for signals and session-commits
✅ Storage operations for session_commits table

Summary

Verdict: LGTM with minor suggestions

This is a solid implementation that follows the project's design principles well. The main suggestion would be to add the project parameter to the get_session_signals MCP tool for parity with the CLI and underlying implementation.

The documentation updates, test coverage, and adherence to the "Don't over-distill" principle make this a high-quality contribution.

evansenter mentioned this pull request Jan 1, 2026

RFC: Data enrichment for deeper session analysis #26

Closed

evansenter changed the title ~~RFC #26: Add session outcome detection and enrichment~~ RFC #26: Add session signals and commit tracking (revised per RFC #17) Jan 1, 2026

evansenter merged commit 00c1439 into main Jan 1, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC #26: Add session signals and commit tracking (revised per RFC #17)#29

RFC #26: Add session signals and commit tracking (revised per RFC #17)#29
evansenter merged 6 commits into
mainfrom
issue-26-data-enrichment

evansenter commented Jan 1, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

evansenter commented Jan 1, 2026

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

evansenter commented Jan 1, 2026

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

evansenter commented Jan 1, 2026

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evansenter commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Features

What's NOT Included (by design)

Technical Details

Testing

Test plan

Uh oh!

claude Bot commented Jan 1, 2026

PR Review: RFC #26 Session Outcome Detection and Enrichment

✅ Strengths

🔧 Suggestions for Improvement

📝 Minor Nits

🔒 Security

📊 Performance

Uh oh!

evansenter commented Jan 1, 2026

Refactored: Applied RFC #17 'Don't Over-Distill' Principle

What Changed

Raw Signal Output Example

Uh oh!

claude Bot commented Jan 1, 2026

PR Review: RFC #26 - Session Outcome Detection and Enrichment

✅ Strengths

⚠️ Issues to Address

💡 Suggestions (Non-blocking)

📊 Summary

Uh oh!

evansenter commented Jan 1, 2026

Additional Improvements: Session Discovery and Drill-In Flow

1. Session ID Filters

2. Fixed min_events Default

3. Documentation

Uh oh!

claude Bot commented Jan 1, 2026

PR Review: RFC #26 - Session Signals and Commit Tracking

✅ Strengths

⚠️ Suggestions for Improvement

🔍 Code Quality Notes

✅ Security

Summary

Uh oh!

claude Bot commented Jan 1, 2026

PR Review: RFC #26 - Session Signals and Commit Tracking

Strengths

Minor Suggestions

Security

Performance

Summary

Uh oh!

claude Bot commented Jan 1, 2026

Code Review: RFC #26 Session Signals and Commit Tracking

✅ Strengths

🔍 Suggestions for Consideration

📋 Minor Observations

🏁 Summary

Uh oh!

evansenter commented Jan 1, 2026

API Naming Standardization

Tool Renames

Argument Standardization

Documentation

Uh oh!

claude Bot commented Jan 1, 2026

PR Review: RFC #26 - Session Signals and Commit Tracking

✅ Strengths

⚠️ Minor Issues / Suggestions

🔍 Code Quality Notes

📋 Test Coverage Assessment

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

evansenter commented Jan 1, 2026 •

edited

Loading