Skip to content

feat: add ltree filtering operators to v2 vector store#289

Open
airbag_deer (airbagdeer) wants to merge 4 commits into
langchain-ai:mainfrom
airbagdeer:feat/ltree-filter-operators
Open

feat: add ltree filtering operators to v2 vector store#289
airbag_deer (airbagdeer) wants to merge 4 commits into
langchain-ai:mainfrom
airbagdeer:feat/ltree-filter-operators

Conversation

@airbagdeer
Copy link
Copy Markdown

@airbagdeer airbag_deer (airbagdeer) commented Mar 19, 2026

Adds 5 PostgreSQL `ltree` operator filters to the v2 vector store metadata filtering system.

Summary

Extends the MongoDB-style filter system with operators that map directly to PostgreSQL's built-in `ltree` extension, enabling hierarchical path queries over tree-structured label data such as category taxonomies, org charts, and file paths.

Motivation

In many retrieval system, a typical tree document system is implemented for order of files.
The existing filter system has no way to query `ltree`-typed columns or hierarchical path data stored in the JSON metadata column. Applications that classify documents with paths like `Top.Science.Astronomy` need ancestor/descendant traversal and pattern-matching that the `ltree` extension provides natively — none of which is expressible with `$eq`, `$like`, or the other existing operators.

Changes

  • `langchain_postgres/v2/async_vectorstore.py`: Added `LTREE_OPERATORS` constant, extended `SUPPORTED_OPERATORS`, and implemented the `elif operator in LTREE_OPERATORS` branch in `_handle_field_filter` with per-operator input validation and SQL generation
  • `tests/unit_tests/fixtures/metadata_filtering_data.py`: Added `LTREE_METADATAS`, `LTREE_FILTERING_TEST_CASES` (12 cases), and `LTREE_NEGATIVE_TEST_CASES` (6 cases)
  • `tests/unit_tests/v2/test_pg_vectorstore_search.py`: Added `TestLtreeFiltering` class with fixtures for both storage modes and 30 parametrized tests

New Operators

Operator PostgreSQL Description
`$ancestor` `field @> CAST(val AS ltree)` Field is an ancestor of (or equal to) the value
`$descendant` `field <@ CAST(val AS ltree)` Field is a descendant of (or equal to) the value
`$lquery` `field ~ CAST(val AS lquery)` Field matches an lquery wildcard pattern
`$lquery_any` `field ? ARRAY[...]` Field matches any pattern in a list of lqueries
`$ltxtquery` `field @ CAST(val AS ltxtquery)` Field matches an ltxtquery full-text expression

Usage Examples

```python

$ancestor — find categories that are on the path UP TO "Top.Science.Astronomy"

("Top" and "Top.Science" are ancestors of "Top.Science.Astronomy")

results = await vs.asimilarity_search(
"query", filter={"category": {"$ancestor": "Top.Science.Astronomy"}}
)

$descendant — find all documents under "Top.Science" (inclusive)

results = await vs.asimilarity_search(
"query", filter={"category": {"$descendant": "Top.Science"}}
)

$lquery — match direct children of Top only

results = await vs.asimilarity_search(
"query", filter={"category": {"$lquery": "Top.*{1}"}}
)

$lquery_any — match either of two exact paths

results = await vs.asimilarity_search(
"query", filter={"category": {"$lquery_any": ["Top.Science", "Top.Technology"]}}
)

$ltxtquery — match any path containing the label word "Science"

results = await vs.asimilarity_search(
"query", filter={"category": {"$ltxtquery": "Science"}}
)
```

Implementation Details

  • Works with both dedicated `ltree` columns and JSON-stored metadata (the `langchain_metadata` JSONB column). For JSON fields, the `->>` extracted text is automatically cast to `ltree` before the operator is applied.
  • Uses `CAST(:param AS type)` parameter syntax rather than `:param::type` to avoid a psycopg3 driver issue where `::` immediately following a named placeholder prevents parameter substitution.
  • Requires the `ltree` PostgreSQL extension. The test fixtures call `CREATE EXTENSION IF NOT EXISTS ltree`; the pgvector Docker image ships with it pre-installed.

Test Coverage

  • 12 positive test cases × 2 storage modes (dedicated `ltree` column + JSON metadata) = 24 passing tests
  • 6 negative tests confirming that wrong value types raise `ValueError`
  • All 44 existing metadata filter tests continue to pass — no regressions

Breaking Changes

None — fully backward compatible.

Checklist

  • Tests added and passing (30 new + 44 regression)
  • Type hints correct (`mypy` clean)
  • Linting passes (`ruff check`, `ruff format`)
  • Spelling check passes (`codespell`)
  • No breaking changes

Add 5 PostgreSQL ltree-specific filter operators to the metadata
filtering system: \$ancestor (@>), \$descendant (<@), \$lquery (~),
\$lquery_any (?), and \$ltxtquery (@). Supports both dedicated ltree
columns and JSON-stored metadata fields via automatic ::ltree cast.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace :param::type casts with CAST(:param AS type) to prevent
psycopg3's parameter parser from misinterpreting the :: PostgreSQL
cast operator immediately following a named parameter placeholder.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The inline comments incorrectly referenced `value::type` cast syntax.
The actual SQL uses `CAST(value AS type)` to avoid a psycopg3 bug where
`::` immediately after a named placeholder breaks parameter substitution.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@airbagdeer
Copy link
Copy Markdown
Author

Please review my pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant