Skip to content

RFC: Data enrichment for deeper session analysis #26

Description

@evansenter

Summary

Add data collection and enrichment capabilities that enable deeper analysis of session patterns. These are the foundational data gaps identified during the RFC #25 analysis process.

Motivation

While analyzing session data for RFC #25 (enhance /status-report), several data gaps limited the depth of insights possible. The current infrastructure captures tool usage and timing, but lacks semantic understanding, outcome tracking, and cross-system correlation.

Proposed Capabilities

1. Semantic Clustering of User Messages

Priority: High

Currently: FTS5 search on keywords
Needed: Automatic clustering by topic/intent

"50% of tasks are refactoring, 30% are features, 20% are debugging"

Implementation options:

  • Embed messages with a local model (e.g., sentence-transformers)
  • Cluster with k-means or HDBSCAN
  • Store cluster assignments in events table

2. Task Outcome Tracking

Priority: High

Currently: Know errors occurred during execution
Needed: Track whether the overall task succeeded

Signals to capture:

  • User explicitly says "done", "thanks", "perfect"
  • Session ends with commit/PR creation
  • User abandons mid-task (long gap, then new topic)
  • Explicit frustration signals ("this isn't working", "never mind")

Schema addition:

ALTER TABLE sessions ADD COLUMN outcome TEXT; -- 'success', 'abandoned', 'frustrated', 'unknown'
ALTER TABLE sessions ADD COLUMN outcome_confidence REAL;

3. Time-to-Completion by Task Type

Priority: Medium

Track duration from first user message to task completion, segmented by:

  • Task type (from semantic clustering)
  • Session classification (debugging, development, etc.)
  • Project

Would enable: "Debugging sessions take 2.3x longer than feature work"

4. Git-Session Correlation

Priority: Medium

Currently: Commits ingested but not linked to sessions
Needed: Explicit session→commit relationships

CREATE TABLE session_commits (
    session_id TEXT,
    commit_sha TEXT,
    time_to_commit_seconds INTEGER,
    PRIMARY KEY (session_id, commit_sha)
);

Would enable:

  • "Sessions that produce commits vs. sessions that don't"
  • "Average time from session start to first commit"
  • "Commits per session by project"

5. Context Switch Detection

Priority: Medium

Currently: See parallel sessions via detect_parallel_sessions()
Needed: Detect mid-task context switches within a session

Signals:

  • Topic change in user messages
  • Long gap followed by different file/tool patterns
  • Explicit "actually, let's do X instead"

6. Session Purpose Summarization

Priority: Medium

Currently: Infer purpose from tool patterns
Needed: LLM-generated summary stored per session

ALTER TABLE sessions ADD COLUMN purpose_summary TEXT;
ALTER TABLE sessions ADD COLUMN purpose_generated_at TIMESTAMP;

Could run as background job or on-demand via summarize_session(session_id) tool.

7. User Satisfaction Signals

Priority: Low

No current way to capture subjective experience. Options:

  • Parse sentiment from user messages
  • Detect frustration patterns (repeated attempts, undo sequences)
  • Eventually: explicit user feedback mechanism

Implementation Plan

  1. Phase 1: Task outcome tracking + git-session correlation (high impact, moderate effort)
  2. Phase 2: Semantic clustering + session summarization (requires embedding model decision)
  3. Phase 3: Context switch detection + satisfaction signals (refinement)

Open Questions

  1. Should embedding/clustering run locally or use Claude API?
  2. How to handle outcome tracking for sessions that span multiple conversations?
  3. Storage implications for embeddings (vector DB vs. SQLite with numpy)?

Worklog Reference

This RFC was identified during the RFC #25 analysis process. See Notion worklog: https://www.notion.so/2db0eedcd74e80838d7eee59515fd439


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions