feat: implement lexical index search#11
Conversation
📝 WalkthroughWalkthroughAdds search capability to the docs-buddy system. New ChangesIndex Search Feature
Sequence Diagram(s)sequenceDiagram
participant Caller
participant search_index
participant DocumentIndex
participant WhooshDocumentIndex
Caller->>search_index: search_index(query: Query, index, max_results)
alt max_results < 1
search_index-->>Caller: raises SearchIndexError
else valid
search_index->>DocumentIndex: search(query, max_results)
DocumentIndex->>WhooshDocumentIndex: MultifieldParser.parse(query.text)
WhooshDocumentIndex->>WhooshDocumentIndex: searcher.search(parsed, limit=max_results)
WhooshDocumentIndex-->>DocumentIndex: hits → QueryResult list (JSON metadata)
DocumentIndex-->>search_index: list[QueryResult]
search_index-->>Caller: list[QueryResult]
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/docs_buddy/adapters/whoosh_index.py (1)
54-72:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInitialize searchable state after
fitso a fitted instance can be queried.
fitcommits data but leavesself._indexunset, sosearchon the same instance always raisesWhooshIndexErroreven after successful indexing.Proposed fix
def fit( self, chunks: Iterator[domain.DocumentChunk], destination: PathLike ) -> None: @@ writer.commit() + self._index = ix + self._query_parser = qparser.MultifieldParser( + self._SEARCH_FIELDS, + schema=self._SCHEMA, + group=qparser.OrGroup, + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs_buddy/adapters/whoosh_index.py` around lines 54 - 72, The fit method creates and commits data to the Whoosh index but fails to store the index instance in self._index, causing subsequent search calls on the same instance to fail. After the writer.commit() call in the fit method, assign the created index instance (ix) to self._index so that the searchable state is properly initialized and search queries can use the indexed data.
🧹 Nitpick comments (1)
src/docs_buddy/adapters/__init__.py (1)
265-272: ⚡ Quick winAlign
FakeIndex.searchwith real adapter search fields to avoid contract drift in tests.The fake currently matches only
chunkcontent, whileWhooshDocumentIndexsearches across content/metadata/path keywords. This can give false confidence in unit tests.Proposed refactor
def search(self, query, max_results): """Return results from the existing chunks""" chunks = self._pipeline.destination_content + needle = str(query).lower() return [ domain.QueryResult(c.chunk, c.path, c.metadata) for c in chunks - if str(query).lower() in c.chunk.lower() + if needle in c.chunk.lower() + or needle in c.path.lower() + or needle in json.dumps(c.metadata).lower() ][:max_results]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/docs_buddy/adapters/__init__.py` around lines 265 - 272, The FakeIndex.search method in the search method only searches within the chunk field, while the real WhooshDocumentIndex searches across multiple fields including chunk content, metadata, and path. Update the filter condition that checks str(query).lower() in c.chunk.lower() to instead check if the query appears in any of the three fields: chunk, metadata (as a string), and path. This ensures the fake adapter's search behavior aligns with the real adapter's contract and prevents false confidence in unit tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/docs_buddy/adapters/whoosh_index.py`:
- Around line 54-72: The fit method creates and commits data to the Whoosh index
but fails to store the index instance in self._index, causing subsequent search
calls on the same instance to fail. After the writer.commit() call in the fit
method, assign the created index instance (ix) to self._index so that the
searchable state is properly initialized and search queries can use the indexed
data.
---
Nitpick comments:
In `@src/docs_buddy/adapters/__init__.py`:
- Around line 265-272: The FakeIndex.search method in the search method only
searches within the chunk field, while the real WhooshDocumentIndex searches
across multiple fields including chunk content, metadata, and path. Update the
filter condition that checks str(query).lower() in c.chunk.lower() to instead
check if the query appears in any of the three fields: chunk, metadata (as a
string), and path. This ensures the fake adapter's search behavior aligns with
the real adapter's contract and prevents false confidence in unit tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4e32b699-27e4-4bc9-8fe3-76655af4ccba
📒 Files selected for processing (9)
README.mdsrc/docs_buddy/adapters/__init__.pysrc/docs_buddy/adapters/whoosh_index.pysrc/docs_buddy/domain/__init__.pysrc/docs_buddy/services/use_cases.pytests/integration/test_adapters.pytests/unit/test_domain.pytests/unit/test_services.pytodo.md
This allows retrieving result from the index matching a user's query.
This PR implements the search index use-case and required dependencies.
Summary by CodeRabbit
New Features
Documentation