Add search_messages MCP tool and CLI command by evansenter · Pull Request #23 · evansenter/agent-session-analytics

evansenter · 2025-12-31T18:24:16Z

Summary

Exposes existing FTS5 search capability via MCP tool and CLI command
Adds search_messages(query, limit) MCP tool for full-text search on user messages
Adds session-analytics-cli search <query> CLI command
Also fixes .gitignore to exclude .worktrees/ directory

Motivation

The FTS5 infrastructure was added in Phase 1 (PR #20), but wasn't exposed to consumers. This enables LLMs to search for specific topics, decisions, or patterns across session messages.

Example usage:

# Find discussions about authentication
session-analytics-cli search "authentication"

# Find decision-related messages
session-analytics-cli search "skip OR defer OR implement"

Test plan

Added test for search_messages MCP tool
All 174 tests pass

🤖 Generated with Claude Code

Exposes the existing FTS5 search capability via: - MCP tool: search_messages(query, limit) - CLI command: session-analytics-cli search <query> This completes the user message search functionality added in Phase 1, enabling consumers to find discussions about specific topics across sessions using full-text search. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2025-12-31T18:25:41Z

PR Review: Add search_messages MCP tool and CLI command

Overall, this is a well-structured PR that exposes the existing FTS5 infrastructure to consumers. The implementation follows the project's established patterns. Here's my detailed feedback:

✅ What's Good

Follows existing patterns: The MCP tool and CLI command follow the same structure as other tools in the codebase (e.g., get_user_journey).
Proper use of ensure_fresh_data: The MCP tool correctly calls queries.ensure_fresh_data(storage) to auto-refresh stale data before searching.
Good documentation: The docstring for search_messages clearly explains FTS5 query syntax with examples.
Formatter registry pattern: Uses the @_register_formatter decorator correctly for CLI output formatting.
Clean .gitignore addition: The .worktrees/ exclusion is appropriate housekeeping.

🔍 Suggestions for Improvement

1. Missing CLI test coverage

The PR adds a test for the MCP tool but no dedicated test for the CLI cmd_search function. Consider adding a test in tests/test_cli.py to verify:

JSON output format
Human-readable output formatting
Edge cases (empty results, special characters in queries)

2. Error handling for malformed FTS5 queries

FTS5 queries can fail with syntax errors (e.g., unbalanced quotes, invalid operators). The current implementation will propagate SQLite exceptions directly. Consider:

# In server.py or cli.py
try:
    results = storage.search_user_messages(query, limit=limit)
except sqlite3.OperationalError as e:
    if "fts5" in str(e).lower() or "syntax" in str(e).lower():
        return {"status": "error", "error": f"Invalid FTS5 query syntax: {e}"}
    raise

3. Truncation in formatter could lose context

In _format_search_results, messages are truncated at 60 characters:

text = msg.get("message", "")[:60] if msg.get("message") else ""

Consider adding ... suffix when truncated to indicate there's more content:

text = msg.get("message", "") if msg.get("message") else ""
if len(text) > 60:
    text = text[:57] + "..."

4. Minor inconsistency in default limit

storage.search_user_messages() defaults to limit=100
search_messages() MCP tool defaults to limit=50
cmd_search() CLI defaults to limit=50

This is fine, but worth noting. The lower defaults for the exposed interfaces make sense for typical use.

5. Consider adding `--project` filter to CLI

Other CLI commands support --project filtering. Consider adding it to search for consistency:

session-analytics-cli search "authentication" --project /path/to/project

This could be a follow-up enhancement.

📝 Documentation

The CLAUDE.md should be updated to include the new tool and CLI command in the tables:

MCP Tools table:
| search_messages | Full-text search on user messages |

CLI Commands section:

session-analytics-cli search <query>     # FTS on user messages

🧪 Test Coverage

The existing test (test_search_messages) validates the happy path. Consider adding tests for:

Empty query string behavior
Special characters in query
Query that returns zero results
Large result sets (verify limit is respected)

Summary

This is a solid PR that completes the FTS5 feature by exposing it to consumers. The code quality is good and follows project conventions. The main suggestions are around error handling for invalid queries and adding CLI test coverage.

Recommendation: Approve with minor suggestions. The error handling and CLI tests could be addressed in a follow-up PR if preferred.

🤖 Generated with Claude Code

- Add error handling for malformed FTS5 queries (catches sqlite3.OperationalError) - Update CLAUDE.md with search_messages and get_user_journey tools - Add ... suffix when truncating messages in formatters - Add CLI test coverage for search command (4 new tests) - Add --project filter to search command (CLI, MCP, and storage layer) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2025-12-31T18:53:46Z

PR Review: Add search_messages MCP tool and CLI command

Overall, this is a well-structured PR that exposes the existing FTS5 infrastructure in a clean and consistent way. Here is my detailed feedback:

Strengths

Consistent patterns: The implementation follows the established codebase conventions - formatter registry pattern in CLI, error handling in both server and CLI, proper test structure.
Good error handling: Both the MCP tool and CLI gracefully handle malformed FTS5 queries by catching sqlite3.OperationalError and returning structured error responses.
Documentation updated: CLAUDE.md is updated with the new tool and CLI command.
Test coverage: Good coverage including happy path, no results, JSON output, and malformed query scenarios.

Issues & Suggestions

1. Missing `project` attribute in CLI Args test classes (Minor Bug)

In tests/test_cli.py, the Args classes for search tests do not include a project attribute. The cmd_search function uses getattr(args, "project", None) which handles this gracefully, but for consistency with actual argparse behavior and explicitness, consider adding project = None to the test Args classes.

2. Consider adding timestamp ordering option

Currently results are ordered by FTS5 rank (relevance). For some use cases, chronological ordering might be more useful. Consider adding an optional order_by parameter. This is a minor enhancement suggestion, not a blocking issue.

3. SQL query duplication in storage.py

The two SQL queries in search_user_messages are nearly identical. This could be simplified using query building with a base query and conditionally appending the project filter. However, the current approach is explicit and readable, so this is a style preference rather than a requirement.

4. Potential for LIKE pattern edge case (Low Risk)

The project filter uses LIKE ? with parameterization, which is safe from SQL injection. However, the %{project}% pattern means special SQL LIKE characters (%, _) in project names could cause unexpected matching. Consider escaping these if precise matching is needed, though this is low priority given typical project paths.

Minor Observations

Line 504 in cli.py: getattr(args, "project", None) - This works but suggests the argparse setup might be inconsistent. Confirmed that --project is properly added to the subparser, so this is just defensive coding.
Test file additions: The new test events for user messages are a good addition to the populated_storage fixture - they enable FTS testing.
.gitignore addition: The .worktrees/ exclusion is unrelated to the main feature but is a reasonable housekeeping item.

Security

No security concerns. The FTS5 MATCH uses parameterized queries, and error messages do not leak sensitive information.

Performance

Using FTS5 with proper indexing is the right approach for text search. The ORDER BY rank leverages SQLite built-in relevance scoring efficiently.

Verdict: This PR is ready to merge. The minor suggestions above are optional improvements that could be addressed in follow-up PRs if desired. Good work on exposing the FTS5 functionality!

claude · 2025-12-31T18:56:37Z

PR Review: Add search_messages MCP tool and CLI command

Overall, this is a well-structured PR that properly exposes the FTS5 search capability. The implementation follows the existing patterns in the codebase and includes good test coverage. Here's my detailed review:

Code Quality ✅

Strengths:

Consistent with existing patterns (formatter registry, MCP tool structure, CLI command structure)
Good docstrings with clear examples for FTS5 query syntax
Proper error handling for malformed FTS5 queries in both MCP tool and CLI
Documentation updated in CLAUDE.md

Minor suggestion:

In storage.py:947-969, the duplicated SQL query could be consolidated using query building, similar to how get_git_commits uses build_where_clause. However, the current approach is clear and readable, so this is optional.

Potential Issues ⚠️

Missing project attribute in test Args classes (tests/test_cli.py:310-368):
The test classes for cmd_search don't include a project attribute. The CLI code uses getattr(args, "project", None) which provides a safe fallback, so this won't cause test failures - but for completeness, the test Args should include project = None to mirror the actual argparse namespace.
FTS index on user_message_text only: The search is limited to user_message_text. This is appropriate for the current use case, but consider documenting this limitation in the tool description (e.g., "searches user messages only, not assistant responses").

Performance Considerations ✅

FTS5 is the correct choice for full-text search - very efficient
ORDER BY rank properly uses FTS5's relevance ranking
LIMIT is applied at the database level, not in Python
The ensure_fresh_data() call ensures data is up-to-date before searching

Security Considerations ✅

Query parameters are properly parameterized (no SQL injection risk)
The LIKE %project% pattern is parameterized correctly
FTS5 query syntax errors are caught and returned as error responses rather than exposing raw exceptions

Test Coverage ✅

Good coverage with tests for:

Basic search functionality
No results case
JSON output mode
Malformed query handling

Suggestion: Consider adding a test for the --project filter to ensure the project filtering works correctly in the CLI.

Documentation ✅

CLAUDE.md updates are complete and helpful.

Verdict: Approve 👍

This is a clean, well-implemented feature that follows the project's established patterns. The minor suggestions above are optional improvements that could be addressed in a follow-up PR if desired.

- Add docstring note that search_messages only searches user messages, not assistant responses - Add explicit project = None to all test Args classes for completeness - Add test for --project filter in cmd_search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-01T16:22:12Z

PR Review: Add search_messages MCP tool and CLI command

Overall this is a well-structured PR that properly exposes the existing FTS5 infrastructure. The code follows the project's established patterns and includes good test coverage.

✅ Strengths

Follows existing patterns well: The search_messages MCP tool matches the structure of other tools like get_user_journey with consistent error handling, ensure_fresh_data() calls, and response formatting.
Good error handling: The FTS5 syntax error handling is thoughtful - catching sqlite3.OperationalError and returning a structured error response rather than crashing is the right approach for invalid user queries.
Comprehensive tests: The test coverage is good, including happy path, empty results, JSON output, malformed query handling, and project filtering scenarios.
Documentation updated: CLAUDE.md was properly updated with the new tool and CLI command.

💡 Suggestions

Formatter predicate specificity (cli.py:177): The predicate lambda d: "query" in d and "messages" in d and "count" in d is fairly generic. Consider adding a more unique key (like "search_query" or similar) to avoid potential collisions with other formatters. Low priority since this works as-is.
Consider adding --json flag in error case (cli.py:503-510): When cmd_search encounters an FTS5 error, the error output structure differs from success - it has status: "error" but no messages key. The tests verify this works, but you might consider documenting this behavior for CLI consumers.
SQL query duplication (storage.py:947-969): The two branches (with/without project) could potentially use a query builder pattern like build_where_clause() mentioned in CLAUDE.md, though for only two simple variants this is fine.

⚠️ Minor Issues

Inconsistent default limits: The MCP tool defaults to limit=50 while the storage method defaults to limit=100. The CLI also defaults to 50. This is fine functionally, but documenting why (if intentional) would be helpful.

🔒 Security

The FTS5 query is passed directly to SQLite which handles sanitization. FTS5's query syntax itself prevents SQL injection, so this is secure.
The project filter uses parameterized LIKE which is also safe.

📊 Performance

The ORDER BY rank is efficient since FTS5 computes rank during the match.
Adding the project filter doesn't use an index on project_path, so for very large databases with broad FTS matches, this could be slow. For typical usage this should be fine.

Verdict

LGTM 👍 - This is a clean, well-tested addition that properly exposes existing functionality. The minor suggestions above are non-blocking.

Merge branch 'main' into add-search-messages-tool

c417e10

evansenter merged commit a558016 into main Jan 1, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add search_messages MCP tool and CLI command#23

Add search_messages MCP tool and CLI command#23
evansenter merged 4 commits into
mainfrom
add-search-messages-tool

evansenter commented Dec 31, 2025

Uh oh!

claude Bot commented Dec 31, 2025

Uh oh!

claude Bot commented Dec 31, 2025

Uh oh!

claude Bot commented Dec 31, 2025

Uh oh!

claude Bot commented Jan 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evansenter commented Dec 31, 2025

Summary

Motivation

Test plan

Uh oh!

claude Bot commented Dec 31, 2025

PR Review: Add search_messages MCP tool and CLI command

✅ What's Good

🔍 Suggestions for Improvement

1. Missing CLI test coverage

2. Error handling for malformed FTS5 queries

3. Truncation in formatter could lose context

4. Minor inconsistency in default limit

5. Consider adding --project filter to CLI

📝 Documentation

🧪 Test Coverage

Summary

Uh oh!

claude Bot commented Dec 31, 2025

PR Review: Add search_messages MCP tool and CLI command

Strengths

Issues & Suggestions

1. Missing project attribute in CLI Args test classes (Minor Bug)

2. Consider adding timestamp ordering option

3. SQL query duplication in storage.py

4. Potential for LIKE pattern edge case (Low Risk)

Minor Observations

Security

Performance

Uh oh!

claude Bot commented Dec 31, 2025

PR Review: Add search_messages MCP tool and CLI command

Code Quality ✅

Potential Issues ⚠️

Performance Considerations ✅

Security Considerations ✅

Test Coverage ✅

Documentation ✅

Uh oh!

claude Bot commented Jan 1, 2026

PR Review: Add search_messages MCP tool and CLI command

✅ Strengths

💡 Suggestions

⚠️ Minor Issues

🔒 Security

📊 Performance

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

5. Consider adding `--project` filter to CLI

1. Missing `project` attribute in CLI Args test classes (Minor Bug)