Skip to content

Scope schema-constraint validators to diffed nodes#9276

Draft
fatih-acar wants to merge 4 commits into
fac-fix-pc-validatorsfrom
fac-fix-pc-validators-2
Draft

Scope schema-constraint validators to diffed nodes#9276
fatih-acar wants to merge 4 commits into
fac-fix-pc-validatorsfrom
fac-fix-pc-validators-2

Conversation

@fatih-acar
Copy link
Copy Markdown
Contributor

@fatih-acar fatih-acar commented May 18, 2026

Summary

Stacked on top of fac-fix-pc-validators (which scoped property-constraint selection to diffed kinds via d846e73). This branch narrows the scan within each touched kind, and closes one cross-kind correctness gap surfaced along the way.

Four commits:

  • 86eafcd6 scope uniqueness query to diffed node values — pre-fetch diffed nodes' uniqueness-relevant values via NodeManager.get_many and route through the existing _with_value Cypher subqueries. No new Cypher. Value-anchored, so collisions with untouched peers still surface.
  • 691e6dc1 scope attribute checkers to diffed node uuids — plumb node_uuids through SchemaConstraintValidatorRequest and the shared SchemaValidatorQuery base; gate each MATCH (n:Kind) with WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids across kind, regex, length, min_max, optional, enum, choices, number_pool.
  • 90927aa0 trigger uniqueness re-check on cross-kind peer changes — when a rel__peer_attr path's peer attribute changes, the dependent kind itself isn't in the diff. New ConstraintValidatorDeterminer.find_cross_kind_uniqueness_dependents plus a min_count_required=0 reuse of the existing query resolves affected dependent UUIDs and merges them into the constraint.
  • 62696cb7 scope relationship checkers to diffed node uuids — same UUID-scoping pattern for relationship.optional, relationship.count, relationship.peer, relationship.common_parent.

Why

On a 400k-node kind, opening a PC that changes one attribute on one node previously triggered kind-wide label scans across every constraint validator that fired for the kind. With this stack the scan is bounded by the diffed node set (or, for the cross-kind case, the set of dependent nodes pointing to diffed peers). Schema-diff-origin constraints (e.g. flipping optional or adding unique) keep node_uuids=None and continue to full-scan — correctness is preserved.

Test plan

  • uv run pytest backend/tests/component/core/constraint_validators/ — 320 passed, 6 pre-existing skips (verified after each commit).
  • uv run ruff check and uv run mypy — clean on every modified file.
  • End-to-end on the 400k-DcimInterfaceL2 DB: open a PC that touches one name; capture the executed Cypher and PROFILE it to confirm the planner anchors on :AttributeValueIndexed(value) instead of a kind label scan. Time the validator — target sub-second.
  • End-to-end cross-kind: rename a DcimDevice referenced by a DcimInterface rel__peer_attr uniqueness constraint; verify the PC reports any resulting collision.

Out of scope (follow-ups)

  • attribute.unique.update — group-by-value query; needs the same value pre-fetch as the main uniqueness checker.
  • Cross-kind relationship.common_parent.update — when a peer's parent changes, the start kind isn't in the diff. Same shape as the uniqueness gap closed in 90927aa0.

🤖 Generated with Claude Code


Summary by cubic

Scopes all schema-constraint validators to only the nodes changed in a proposed change and fixes cross-kind uniqueness rechecks. This removes kind-wide scans on large datasets while preserving full scans when the schema itself changes.

  • Refactors

    • Added optional node UUID scoping to validator requests and queries; gated each initial MATCH with WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids across attribute and relationship validators.
    • Uniqueness checker now prefetches constraint-relevant values for diffed nodes and issues value-anchored queries to surface collisions with untouched peers.
  • Bug Fixes

    • Re-check uniqueness on peer attribute changes across kinds (e.g., rel__peer_attr paths) by resolving affected dependent node UUIDs and merging them into the constraint scope; schema-diff constraints still full-scan.
    • Fixed value checks so falsy-but-set values (0, "", false) are treated as set in value-scoped queries.

Written for commit 62696cb. Summary will update on new commits. Review in cubic

fatih-acar and others added 4 commits May 14, 2026 01:06
The data-diff path of schema_validate_migrations runs the uniqueness
checker against the entire kind, doing a kind-wide label scan in Cypher
(MATCH (start_node:Kind)-[:HAS_ATTRIBUTE]->...). On a branch that
changes one attribute on one node out of 400k, this still scans all
400k nodes.

Pre-fetch the diffed nodes' current uniqueness-relevant values via
NodeManager.get_many, then emit valued QueryAttributePath /
QueryRelationshipAttributePath entries. The existing query routes
those through its _with_value subqueries, which can anchor on the
:AttributeValueIndexed(value) index and back-traverse to peers — so
violations involving untouched nodes that share the diffed value are
still surfaced.

Schema-diff-driven constraints (e.g. adding unique:true to an existing
attribute) leave node_uuids=None and continue to use the full-scan
path, since the constraint itself changed and every node must be
re-checked. Dedup of constraints favors the full-scan version when
both sources emit the same constraint.

Also fix `if value:` → `if value is not None:` in the query so falsy
but set values (False, 0, "") activate the _with_value subqueries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every attribute validator (kind, regex, length, min_max, optional,
enum, choices, number_pool) runs a `MATCH (n:Kind)` kind-wide label
scan to read every node's current value and validate it against a
schema property. On a 400k-node kind, opening a PC that touches a
single attribute value still walks all 400k nodes.

These checkers fire from two origins, just like the uniqueness one:

  - schema-diff: the property itself changed (e.g. kind Text→Integer,
    new regex, tighter min/max). Every existing value must be
    re-validated — full scan stays correct.
  - data-diff: a node's value changed. The determiner over-eagerly
    emits a constraint for every property of the touched attribute,
    even though only the diffed node's new value needs re-checking
    against the unchanged schema.

Forward `request.node_uuids` (already plumbed for uniqueness) into
the shared `SchemaValidatorQuery` base and gate the initial
`MATCH (n:Kind)` with
`WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids`. None preserves
the schema-diff full-scan behavior; a populated set narrows the scan
to the diffed nodes.

Not covered: `attribute/unique.py` (different group-by-value shape,
needs the same value pre-fetch as the main UniquenessChecker) and
the relationship/* validators (same pattern, follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A uniqueness path like `device__name` makes kind S's constraint value
depend on the peer kind's `name` attribute. When the peer's `name`
changes, S's effective constraint values change too — but S has no
entry in the diff, so the determiner (since d846e73) skipped its
re-validation. A PC could ship a fresh collision.

Reproducer: DcimInterface has `[name, device__name]` with I1(eth0→D1
named "r1") and I2(eth0→D2 named "r2"). Rename D1 to "r2": diff
contains only DcimDevice, DcimInterface uniqueness is never run, and
I1's effective (eth0, r2) collides with I2 undetected.

Add `ConstraintValidatorDeterminer.find_cross_kind_uniqueness_dependents`
to walk every schema's uniqueness constraints, parse each path on `__`,
and return `(dependent_kind, rel_name) → {diffed peer UUIDs}` for the
cases where the peer kind's referenced attribute is in the diff.
Generic peers fan out via `used_by` so concrete diffed kinds match.

In `_get_proposed_change_schema_integrity_constraints`, resolve the
affected dependent-kind UUIDs by reusing `NodeUniqueAttributeConstraintQuery`
with `min_count_required=0` — no new Cypher — and merge them into the
dependent kind's `node.uniqueness_constraints.update` constraint.
That flows through the existing scoped pre-fetch path, where the
value-anchored subquery surfaces collisions with untouched dependent
nodes (the I2 case).

Schema-diff origins keep `node_uuids=None` (full scan); the merge
helper preserves that and only unions concrete UUID sets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four relationship validator queries (optional, count, peer,
common_parent) ran kind-wide MATCH (n:Kind) label scans. On a PC
touching one node, they still walked every node of the kind.

The node_uuids plumbing from prior commits is already in place on
SchemaConstraintValidatorRequest and the shared SchemaValidatorQuery
base. Stamp $node_uuids and gate each initial MATCH (n:Kind) with
`WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids` — the same idiom
used by the attribute checkers and the uniqueness query.

For RelationshipOptionalUpdateValidatorQuery the gate is applied to
all three kind-scanning MATCHes in the same query: the active-node
collection, the with-rel collection, and the violator-set match.

Schema-diff origins (e.g. flipping optional, tightening min_count)
keep node_uuids=None and continue to full-scan unchanged.

Out of scope: cross-kind trigger for relationship.common_parent.update
when a peer's parent changes — same shape as the uniqueness gap fixed
in 90927aa; tracked as follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the group/backend Issue related to the backend (API Server, Git Agent) label May 18, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 20 files

Confidence score: 2/5

  • High-confidence validator logic issues in core paths make this a high-risk merge: both findings are severity 7+/10 with concrete regression potential in uniqueness enforcement.
  • In backend/infrahub/core/validators/determiner.py, cross-kind uniqueness checks can be skipped because property_name includes suffixed names (for example __value) that do not match _attribute_element_map base attributes, which can allow invalid duplicates through.
  • In backend/infrahub/core/validators/relationship/peer.py, limiting relationship.common_parent.update to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes, weakening relationship validation coverage.
  • Pay close attention to backend/infrahub/core/validators/determiner.py and backend/infrahub/core/validators/relationship/peer.py - uniqueness and cross-kind validation paths may be bypassed in realistic update scenarios.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/infrahub/core/validators/relationship/peer.py">

<violation number="1" location="backend/infrahub/core/validators/relationship/peer.py:40">
P1: Scoping `relationship.common_parent.update` to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes.</violation>
</file>

<file name="backend/infrahub/core/validators/determiner.py">

<violation number="1" location="backend/infrahub/core/validators/determiner.py:91">
P1: Cross-kind uniqueness constraint checks will erroneously skip validation because `property_name` includes property suffixes (like `__value`) which do not match the base attribute names in `_attribute_element_map`.</violation>
</file>

Shadow auto-approve: would not auto-approve because issues were found.

Re-trigger cubic

# ruff: noqa: E501
query = """
MATCH (n:%(node_kind)s)
WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Scoping relationship.common_parent.update to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/core/validators/relationship/peer.py, line 40:

<comment>Scoping `relationship.common_parent.update` to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes.</comment>

<file context>
@@ -32,10 +32,12 @@ async def query_init(self, db: InfrahubDatabase, **kwargs: dict[str, Any]) -> No
         # ruff: noqa: E501
         query = """
         MATCH (n:%(node_kind)s)
+        WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids
         CALL (n) {
             MATCH path = (root:Root)<-[rroot:IS_PART_OF]-(n)
</file context>

peer_concrete_kinds = self._derived_kinds(rel_schema.peer)
diffed_peer_uuids: set[str] = set()
for peer_kind in peer_concrete_kinds:
if property_name not in self._attribute_element_map.get(peer_kind, set()):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Cross-kind uniqueness constraint checks will erroneously skip validation because property_name includes property suffixes (like __value) which do not match the base attribute names in _attribute_element_map.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/core/validators/determiner.py, line 91:

<comment>Cross-kind uniqueness constraint checks will erroneously skip validation because `property_name` includes property suffixes (like `__value`) which do not match the base attribute names in `_attribute_element_map`.</comment>

<file context>
@@ -38,6 +39,63 @@ def _index_node_diffs(self, node_diffs: list[NodeDiffFieldSummary]) -> None:
+                    peer_concrete_kinds = self._derived_kinds(rel_schema.peer)
+                    diffed_peer_uuids: set[str] = set()
+                    for peer_kind in peer_concrete_kinds:
+                        if property_name not in self._attribute_element_map.get(peer_kind, set()):
+                            continue
+                        diffed_peer_uuids.update(self._node_uuid_map.get(peer_kind, set()))
</file context>
Suggested change
if property_name not in self._attribute_element_map.get(peer_kind, set()):
if property_name.split("__")[0] not in self._attribute_element_map.get(peer_kind, set()):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

group/backend Issue related to the backend (API Server, Git Agent)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant