Skip to content

Add search_messages MCP tool and CLI command#23

Merged
evansenter merged 4 commits into
mainfrom
add-search-messages-tool
Jan 1, 2026
Merged

Add search_messages MCP tool and CLI command#23
evansenter merged 4 commits into
mainfrom
add-search-messages-tool

Conversation

@evansenter

Copy link
Copy Markdown
Owner

Summary

  • Exposes existing FTS5 search capability via MCP tool and CLI command
  • Adds search_messages(query, limit) MCP tool for full-text search on user messages
  • Adds session-analytics-cli search <query> CLI command
  • Also fixes .gitignore to exclude .worktrees/ directory

Motivation

The FTS5 infrastructure was added in Phase 1 (PR #20), but wasn't exposed to consumers. This enables LLMs to search for specific topics, decisions, or patterns across session messages.

Example usage:

# Find discussions about authentication
session-analytics-cli search "authentication"

# Find decision-related messages
session-analytics-cli search "skip OR defer OR implement"

Test plan

  • Added test for search_messages MCP tool
  • All 174 tests pass

🤖 Generated with Claude Code

Exposes the existing FTS5 search capability via:
- MCP tool: search_messages(query, limit)
- CLI command: session-analytics-cli search <query>

This completes the user message search functionality added in
Phase 1, enabling consumers to find discussions about specific
topics across sessions using full-text search.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

PR Review: Add search_messages MCP tool and CLI command

Overall, this is a well-structured PR that exposes the existing FTS5 infrastructure to consumers. The implementation follows the project's established patterns. Here's my detailed feedback:

✅ What's Good

  1. Follows existing patterns: The MCP tool and CLI command follow the same structure as other tools in the codebase (e.g., get_user_journey).

  2. Proper use of ensure_fresh_data: The MCP tool correctly calls queries.ensure_fresh_data(storage) to auto-refresh stale data before searching.

  3. Good documentation: The docstring for search_messages clearly explains FTS5 query syntax with examples.

  4. Formatter registry pattern: Uses the @_register_formatter decorator correctly for CLI output formatting.

  5. Clean .gitignore addition: The .worktrees/ exclusion is appropriate housekeeping.


🔍 Suggestions for Improvement

1. Missing CLI test coverage

The PR adds a test for the MCP tool but no dedicated test for the CLI cmd_search function. Consider adding a test in tests/test_cli.py to verify:

  • JSON output format
  • Human-readable output formatting
  • Edge cases (empty results, special characters in queries)

2. Error handling for malformed FTS5 queries

FTS5 queries can fail with syntax errors (e.g., unbalanced quotes, invalid operators). The current implementation will propagate SQLite exceptions directly. Consider:

# In server.py or cli.py
try:
    results = storage.search_user_messages(query, limit=limit)
except sqlite3.OperationalError as e:
    if "fts5" in str(e).lower() or "syntax" in str(e).lower():
        return {"status": "error", "error": f"Invalid FTS5 query syntax: {e}"}
    raise

3. Truncation in formatter could lose context

In _format_search_results, messages are truncated at 60 characters:

text = msg.get("message", "")[:60] if msg.get("message") else ""

Consider adding ... suffix when truncated to indicate there's more content:

text = msg.get("message", "") if msg.get("message") else ""
if len(text) > 60:
    text = text[:57] + "..."

4. Minor inconsistency in default limit

  • storage.search_user_messages() defaults to limit=100
  • search_messages() MCP tool defaults to limit=50
  • cmd_search() CLI defaults to limit=50

This is fine, but worth noting. The lower defaults for the exposed interfaces make sense for typical use.

5. Consider adding --project filter to CLI

Other CLI commands support --project filtering. Consider adding it to search for consistency:

session-analytics-cli search "authentication" --project /path/to/project

This could be a follow-up enhancement.


📝 Documentation

The CLAUDE.md should be updated to include the new tool and CLI command in the tables:

MCP Tools table:
| search_messages | Full-text search on user messages |

CLI Commands section:

session-analytics-cli search <query>     # FTS on user messages

🧪 Test Coverage

The existing test (test_search_messages) validates the happy path. Consider adding tests for:

  • Empty query string behavior
  • Special characters in query
  • Query that returns zero results
  • Large result sets (verify limit is respected)

Summary

This is a solid PR that completes the FTS5 feature by exposing it to consumers. The code quality is good and follows project conventions. The main suggestions are around error handling for invalid queries and adding CLI test coverage.

Recommendation: Approve with minor suggestions. The error handling and CLI tests could be addressed in a follow-up PR if preferred.

🤖 Generated with Claude Code

- Add error handling for malformed FTS5 queries (catches sqlite3.OperationalError)
- Update CLAUDE.md with search_messages and get_user_journey tools
- Add ... suffix when truncating messages in formatters
- Add CLI test coverage for search command (4 new tests)
- Add --project filter to search command (CLI, MCP, and storage layer)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

PR Review: Add search_messages MCP tool and CLI command

Overall, this is a well-structured PR that exposes the existing FTS5 infrastructure in a clean and consistent way. Here is my detailed feedback:

Strengths

  1. Consistent patterns: The implementation follows the established codebase conventions - formatter registry pattern in CLI, error handling in both server and CLI, proper test structure.

  2. Good error handling: Both the MCP tool and CLI gracefully handle malformed FTS5 queries by catching sqlite3.OperationalError and returning structured error responses.

  3. Documentation updated: CLAUDE.md is updated with the new tool and CLI command.

  4. Test coverage: Good coverage including happy path, no results, JSON output, and malformed query scenarios.

Issues & Suggestions

1. Missing project attribute in CLI Args test classes (Minor Bug)

In tests/test_cli.py, the Args classes for search tests do not include a project attribute. The cmd_search function uses getattr(args, "project", None) which handles this gracefully, but for consistency with actual argparse behavior and explicitness, consider adding project = None to the test Args classes.

2. Consider adding timestamp ordering option

Currently results are ordered by FTS5 rank (relevance). For some use cases, chronological ordering might be more useful. Consider adding an optional order_by parameter. This is a minor enhancement suggestion, not a blocking issue.

3. SQL query duplication in storage.py

The two SQL queries in search_user_messages are nearly identical. This could be simplified using query building with a base query and conditionally appending the project filter. However, the current approach is explicit and readable, so this is a style preference rather than a requirement.

4. Potential for LIKE pattern edge case (Low Risk)

The project filter uses LIKE ? with parameterization, which is safe from SQL injection. However, the %{project}% pattern means special SQL LIKE characters (%, _) in project names could cause unexpected matching. Consider escaping these if precise matching is needed, though this is low priority given typical project paths.

Minor Observations

  1. Line 504 in cli.py: getattr(args, "project", None) - This works but suggests the argparse setup might be inconsistent. Confirmed that --project is properly added to the subparser, so this is just defensive coding.

  2. Test file additions: The new test events for user messages are a good addition to the populated_storage fixture - they enable FTS testing.

  3. .gitignore addition: The .worktrees/ exclusion is unrelated to the main feature but is a reasonable housekeeping item.

Security

No security concerns. The FTS5 MATCH uses parameterized queries, and error messages do not leak sensitive information.

Performance

Using FTS5 with proper indexing is the right approach for text search. The ORDER BY rank leverages SQLite built-in relevance scoring efficiently.


Verdict: This PR is ready to merge. The minor suggestions above are optional improvements that could be addressed in follow-up PRs if desired. Good work on exposing the FTS5 functionality!

@claude

claude Bot commented Dec 31, 2025

Copy link
Copy Markdown

PR Review: Add search_messages MCP tool and CLI command

Overall, this is a well-structured PR that properly exposes the FTS5 search capability. The implementation follows the existing patterns in the codebase and includes good test coverage. Here's my detailed review:

Code Quality ✅

Strengths:

  • Consistent with existing patterns (formatter registry, MCP tool structure, CLI command structure)
  • Good docstrings with clear examples for FTS5 query syntax
  • Proper error handling for malformed FTS5 queries in both MCP tool and CLI
  • Documentation updated in CLAUDE.md

Minor suggestion:

  • In storage.py:947-969, the duplicated SQL query could be consolidated using query building, similar to how get_git_commits uses build_where_clause. However, the current approach is clear and readable, so this is optional.

Potential Issues ⚠️

  1. Missing project attribute in test Args classes (tests/test_cli.py:310-368):
    The test classes for cmd_search don't include a project attribute. The CLI code uses getattr(args, "project", None) which provides a safe fallback, so this won't cause test failures - but for completeness, the test Args should include project = None to mirror the actual argparse namespace.

  2. FTS index on user_message_text only: The search is limited to user_message_text. This is appropriate for the current use case, but consider documenting this limitation in the tool description (e.g., "searches user messages only, not assistant responses").

Performance Considerations ✅

  • FTS5 is the correct choice for full-text search - very efficient
  • ORDER BY rank properly uses FTS5's relevance ranking
  • LIMIT is applied at the database level, not in Python
  • The ensure_fresh_data() call ensures data is up-to-date before searching

Security Considerations ✅

  • Query parameters are properly parameterized (no SQL injection risk)
  • The LIKE %project% pattern is parameterized correctly
  • FTS5 query syntax errors are caught and returned as error responses rather than exposing raw exceptions

Test Coverage ✅

Good coverage with tests for:

  • Basic search functionality
  • No results case
  • JSON output mode
  • Malformed query handling

Suggestion: Consider adding a test for the --project filter to ensure the project filtering works correctly in the CLI.

Documentation ✅

CLAUDE.md updates are complete and helpful.


Verdict: Approve 👍

This is a clean, well-implemented feature that follows the project's established patterns. The minor suggestions above are optional improvements that could be addressed in a follow-up PR if desired.

- Add docstring note that search_messages only searches user messages,
  not assistant responses
- Add explicit project = None to all test Args classes for completeness
- Add test for --project filter in cmd_search

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude

claude Bot commented Jan 1, 2026

Copy link
Copy Markdown

PR Review: Add search_messages MCP tool and CLI command

Overall this is a well-structured PR that properly exposes the existing FTS5 infrastructure. The code follows the project's established patterns and includes good test coverage.

✅ Strengths

  1. Follows existing patterns well: The search_messages MCP tool matches the structure of other tools like get_user_journey with consistent error handling, ensure_fresh_data() calls, and response formatting.

  2. Good error handling: The FTS5 syntax error handling is thoughtful - catching sqlite3.OperationalError and returning a structured error response rather than crashing is the right approach for invalid user queries.

  3. Comprehensive tests: The test coverage is good, including happy path, empty results, JSON output, malformed query handling, and project filtering scenarios.

  4. Documentation updated: CLAUDE.md was properly updated with the new tool and CLI command.

💡 Suggestions

  1. Formatter predicate specificity (cli.py:177): The predicate lambda d: "query" in d and "messages" in d and "count" in d is fairly generic. Consider adding a more unique key (like "search_query" or similar) to avoid potential collisions with other formatters. Low priority since this works as-is.

  2. Consider adding --json flag in error case (cli.py:503-510): When cmd_search encounters an FTS5 error, the error output structure differs from success - it has status: "error" but no messages key. The tests verify this works, but you might consider documenting this behavior for CLI consumers.

  3. SQL query duplication (storage.py:947-969): The two branches (with/without project) could potentially use a query builder pattern like build_where_clause() mentioned in CLAUDE.md, though for only two simple variants this is fine.

⚠️ Minor Issues

  1. Inconsistent default limits: The MCP tool defaults to limit=50 while the storage method defaults to limit=100. The CLI also defaults to 50. This is fine functionally, but documenting why (if intentional) would be helpful.

🔒 Security

  • The FTS5 query is passed directly to SQLite which handles sanitization. FTS5's query syntax itself prevents SQL injection, so this is secure.
  • The project filter uses parameterized LIKE which is also safe.

📊 Performance

  • The ORDER BY rank is efficient since FTS5 computes rank during the match.
  • Adding the project filter doesn't use an index on project_path, so for very large databases with broad FTS matches, this could be slow. For typical usage this should be fine.

Verdict

LGTM 👍 - This is a clean, well-tested addition that properly exposes existing functionality. The minor suggestions above are non-blocking.

@evansenter evansenter merged commit a558016 into main Jan 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant