You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(graphrag-vectors): add filtering, timestamps, and CRUD operations (microsoft#2236)
* feat(graphrag-vectors): add filtering, timestamps, and CRUD operations
Implement the vector store enhancements from the graphrag-vectors-design spec:
New modules:
- filtering.py: Pydantic-based filter expression system with F builder,
operator overloads, JSON serialization, client-side evaluate(), and
per-backend compilation (SQL for LanceDB/CosmosDB, OData for Azure AI Search)
- timestamp.py: ISO 8601 timestamp explosion into filterable component fields
Enhanced VectorStoreDocument:
- data: dict for user-defined metadata fields
- create_date / update_date: automatic ISO 8601 timestamps
Enhanced VectorStore base class:
- fields config for typed metadata columns
- insert / count / remove / update CRUD methods
- select, filters, include_vectors params on search methods
- Automatic timestamp explosion on insert/update
- User-defined date field explosion
Backend implementations (LanceDB, Azure AI Search, CosmosDB):
- Full filter compilation to native query languages
- Typed schema creation with user-defined fields
- All new CRUD operations
Breaking changes:
- search_by_id raises IndexError when document not found
- Updated indexer_adapters.py caller to handle the new exception
Tests:
- 54 unit tests for filtering and timestamp modules
- 28 LanceDB integration tests covering CRUD, filters, timestamps, select,
include_vectors, and user-defined date field explosion
* fix: resolve CI build failures (formatting, lint, pyright, test mocks)
- Fix ruff formatting and lint errors across all changed files
- Refactor filtering.py: move operator overloads from monkey-patching to
direct class methods for pyright visibility
- Use validation_alias/serialization_alias with populate_by_name for
Pydantic AND/OR/NOT models (pyright + runtime compatible)
- Use Operator enum members instead of string literals in FieldRef
- Add missing abstract methods (insert, count, remove, update) to test
mock VectorStore classes
- Update mock method signatures to match base class (select, filters,
include_vectors params)
- Add docstrings to FieldRef magic methods (ruff D105)
- Fix noqa:S608 placement in cosmosdb.py
* feat: add top-level vector_size to VectorStoreConfig
Add a vector_size field (default 3072) to VectorStoreConfig so users
can set it once instead of on every individual index schema. The value
is propagated to new IndexSchema entries during validation.
* chore: add semversioner patch entry
* chore: add ismatch and ftype to spellcheck dictionary
* Add example notebooks for LanceDB, Azure AI Search, and CosmosDB vector stores
- Three notebooks demonstrating: document loading, similarity search, metadata
filtering with F builder, timestamp filtering, document update/removal
- Sample data files (text_units.parquet, embeddings.text_unit_text.parquet)
- Add CPY001, SLF001, DTZ005 to notebook lint ignores in pyproject.toml
* refactor: extract model/tokenizer creation from generate_text_embeddings into callers
0 commit comments