Skip to content

feat: Data quality fixes + multi-project git ingestion#76

Merged
evansenter merged 3 commits into
mainfrom
feat/data-quality-fixes-and-multi-project-git
Jan 10, 2026
Merged

feat: Data quality fixes + multi-project git ingestion#76
evansenter merged 3 commits into
mainfrom
feat/data-quality-fixes-and-multi-project-git

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

  • Fix warmup error inflation (Issue analyze_failures() counts warmup events as errors, inflating error metrics #75): 83% of "errors" were warmup Task tool exits, not real errors
  • Fix compaction detection: Was finding 0 entries because it looked at wrong entry type
  • Add multi-project git ingestion: git-ingest-all command scans all known projects (5 repos, 247 commits vs previous 2 repos, 132 commits)
  • Add smoke test suite: 10 tests validating assumptions against real database data

Test plan

  • All 360 tests pass (8 new tests for decode_project_path)
  • Verified git-ingest-all finds all 5 git repos
  • Verified warmup events no longer marked as errors
  • Verified compaction detection working

🤖 Generated with Claude Code

Fixes several data quality issues discovered via new smoke tests:

- Fix warmup events incorrectly marked as errors (Issue #75)
  - 83% of "errors" (8,046/9,663) were warmup Task tool exits
  - Updated ingest.py to not mark warmup as errors
  - Added migration 12 to backfill existing data

- Fix compaction detection finding 0 entries
  - Detection was looking at 'summary' entries but markers appear in 'user' entries
  - Added migrations 10, 11 to backfill existing data

- Add multi-project git ingestion (git-ingest-all)
  - New decode_project_path() handles hyphenated directory names
  - Scans all known projects from events table
  - Now finding 5 repos/247 commits (up from 2 repos/132 commits)

- Add smoke test suite (tests/test_smoke_real_data.py)
  - 10 tests validating assumptions against real database
  - Run with SESSION_ANALYTICS_SMOKE_TEST=1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Jan 10, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR fixes data quality issues by correcting warmup error inflation (Issue #75), fixing compaction detection to look at user entries, and adds multi-project git ingestion with a new git-ingest-all command. It also includes a comprehensive smoke test suite for validating assumptions against real data.

Issues Found

Critical

None

Important

  • src/session_analytics/guide.md - The new ingest_git_history_all_projects MCP tool is not documented in guide.md. Per CLAUDE.md line 75: "Documentation in guide.md" is required when adding endpoints. Add an entry under the "Git Integration" table.

  • tests/ - No tests for cmd_git_ingest_all CLI command or ingest_git_history_all_projects function with mocked storage. While the decode_project_path function has excellent test coverage (8 tests), the main ingestion function lacks unit tests covering error paths, empty project list, and git repo detection logic.

Suggestions

  • src/session_analytics/cli.py:1359-1361 - The comment explaining skipped benchmark tools should be updated to explicitly mention git-ingest-all / ingest_git_history_all_projects now that it exists: # Skipped tools (require specific data or modify DB): ingest_logs, ingest_git_history, ingest_git_history_all_projects, correlate_git_with_sessions, ingest_bus_events

  • src/session_analytics/ingest.py:862-878 - The per-project results list includes all projects (even skipped ones that increment projects_skipped), but only appends to per_project_results for projects with git repos. This is fine but could cause confusion when debugging since projects_skipped won't match the count of projects without entries in per_project.

Verdict

REQUEST_CHANGES - Missing documentation in guide.md for the new MCP tool (required per CLAUDE.md), and the new ingest_git_history_all_projects function lacks unit tests for its core logic paths.


Automated review by Claude Code

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Jan 10, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR fixes data quality issues (warmup error inflation per Issue #75, compaction detection looking at wrong entry type), adds multi-project git ingestion via git-ingest-all command, and includes a comprehensive smoke test suite for validating assumptions against real database data.

Issues Found

Critical

None

Important

  • src/session_analytics/guide.md - The new ingest_git_history_all_projects MCP tool is not documented in guide.md. Per CLAUDE.md line 75: "Documentation in guide.md" is required when adding endpoints. Add an entry in the "Git Integration" table.

  • tests/ - No unit tests for ingest_git_history_all_projects function. While decode_project_path has excellent coverage (8 tests), the main ingestion function lacks tests covering: empty project list, handling of decode failures, git repo detection logic, and error accumulation. Consider adding tests with mocked storage.

Suggestions

  • src/session_analytics/cli.py:1359-1360 - The benchmark skipped tools comment should be updated to include the new tool: # Skipped tools (require specific data or modify DB): ingest_logs, ingest_git_history, ingest_git_history_all_projects, correlate_git_with_sessions, ingest_bus_events

  • src/session_analytics/ingest.py:862-878 - The per_project_results list only includes projects that have git repos (those that reach the ingest_git_history call), but projects_skipped counts those without valid paths or git dirs. Consider documenting this behavior or making it more explicit so the returned counts are easier to interpret.

Verdict

REQUEST_CHANGES - Missing documentation in guide.md for the new MCP tool (required per CLAUDE.md), and unit tests needed for the ingest_git_history_all_projects function.


Automated review by Claude Code

- Add ingest_git_history_all_projects to guide.md Git Integration section
- Add 5 unit tests for ingest_git_history_all_projects covering:
  - Empty project list
  - Projects without .git
  - Projects with .git
  - Decode failures
  - Result structure
- Update benchmark skipped tools comment
- Improve docstring with detailed return value documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@evansenter

Copy link
Copy Markdown
Owner Author

Feedback Addressed

Implemented

  • [Important] Added ingest_git_history_all_projects documentation to guide.md Git Integration section
  • [Important] Added 5 unit tests for ingest_git_history_all_projects covering empty lists, decode failures, git detection, and result structure
  • [Suggestion] Updated benchmark skipped tools comment to include new tool
  • [Suggestion] Improved docstring with detailed return value documentation clarifying per_project behavior

@claude

claude Bot commented Jan 10, 2026

Copy link
Copy Markdown

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR fixes data quality issues (warmup error inflation per Issue #75, compaction detection looking at wrong entry type), adds multi-project git ingestion via git-ingest-all command, and includes a comprehensive smoke test suite for validating assumptions against real database data.

Issues Found

Critical

None

Important

None

Suggestions

None

All previously raised issues have been addressed in the "Feedback Addressed" comment:

  • Documentation added to guide.md for ingest_git_history_all_projects
  • 5 unit tests added for ingest_git_history_all_projects covering empty lists, decode failures, git detection, and result structure ✓
  • Benchmark skipped tools comment updated ✓
  • Docstring improved with detailed return value documentation ✓

Verdict

APPROVE - All feedback has been addressed. The implementation is thorough with proper test coverage, documentation, and migration backfills.


Automated review by Claude Code

@evansenter evansenter merged commit aea01ff into main Jan 10, 2026
3 checks passed
evansenter added a commit to evansenter/dotfiles that referenced this pull request Jan 10, 2026
- Add Phase 0 to ingest logs, git commits from all projects, and correlate
- Use ingest_git_history_all_projects() for cross-repo git correlation
- Add get_error_details() call to drill into specific failing patterns
- Note that warmup events no longer count as errors (fixed in session-analytics)

Ref: evansenter/agent-session-analytics#76

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
evansenter added a commit to evansenter/dotfiles that referenced this pull request Jan 10, 2026
…ils (#196)

- Add Phase 0 to ingest logs, git commits from all projects, and correlate
- Use ingest_git_history_all_projects() for cross-repo git correlation
- Add get_error_details() call to drill into specific failing patterns
- Note that warmup events no longer count as errors (fixed in session-analytics)

Ref: evansenter/agent-session-analytics#76

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
evansenter added a commit that referenced this pull request Jan 10, 2026
Resolve conflicts after main updated with PR #76 changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant