Skip to content

Multi-match query with WHERE predicate downstream filter/projection fails #117

@prrao87

Description

@prrao87

There's a bug in the planner that over-eagerly prunes aliases that are required in WHERE clauses for filtering a query's results. I noticed it when writing multi-match queries (Queries that contain more than 2 match clauses) and a predicate filter.

In the planning stage, aliases that are referenced in the WHERE clause (not in RETURN) can get pruned from the logical plan before the filter is built. When the planner then tries to translate WHERE alias.prop into a filter, the schema no longer contains alias__prop, so queries that are perfectly reasonable (and work in Kuzu/Ladybug/Neo4j) fail in lance-graph.

Repro

The following code reproduces the issue.

import pyarrow as pa

from lance_graph import CypherQuery, GraphConfig

cfg = (
    GraphConfig.builder()
    .with_node_label("Person", "id")
    .with_node_label("City", "id")
    .with_node_label("Country", "id")
    .with_node_label("Hobby", "id")
    .with_relationship("livesIn", "src", "dst")
    .with_relationship("hasHobby", "src", "dst")
    .with_relationship("inCountry", "src", "dst")
    .build()
)

datasets = {
    "Person": pa.table({"id": [1]}),
    "City": pa.table({"id": [10], "name": ["Paris"]}),
    "Country": pa.table({"id": [100], "name": ["France"]}),
    "Hobby": pa.table({"id": [20], "name": ["Chess"]}),
    "livesIn": pa.table({"src": [1], "dst": [10]}),
    "hasHobby": pa.table({"src": [1], "dst": [20]}),
    "inCountry": pa.table({"src": [10], "dst": [100]}),
}

query = """
    MATCH (c:City)-[:inCountry]->(co:Country),
          (p:Person)-[:livesIn]->(c),
          (p)-[:hasHobby]->(h:Hobby)
    WHERE co.name = "France" AND h.name = "Chess"
    RETURN p.id AS id
"""

result = CypherQuery(query).with_config(cfg).execute(datasets)
print(result)

Gives:

Traceback (most recent call last):
  File "/Users/prrao/code/graph-benchmark-ldbc-snb/lance_graph/t.py", line 35, in <module>
    result = CypherQuery(query).with_config(cfg).execute(datasets)
ValueError: Query planning error: Failed to build filter: Schema error: No field named co__name. Did you mean 'c__name'?.

This can happen regardless of whether the alias is on the left or right side of a pattern. The common factor to reproduce the issue is that there's a multi-match clause that attempts to apply a filter before returning.

Environment

The following environment was used to test this:

lance-graph 0.4.0
Python 3.13
macOS Tahoe 26.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions