Scope schema-constraint validators to diffed nodes#9276
Draft
fatih-acar wants to merge 4 commits into
Draft
Conversation
The data-diff path of schema_validate_migrations runs the uniqueness checker against the entire kind, doing a kind-wide label scan in Cypher (MATCH (start_node:Kind)-[:HAS_ATTRIBUTE]->...). On a branch that changes one attribute on one node out of 400k, this still scans all 400k nodes. Pre-fetch the diffed nodes' current uniqueness-relevant values via NodeManager.get_many, then emit valued QueryAttributePath / QueryRelationshipAttributePath entries. The existing query routes those through its _with_value subqueries, which can anchor on the :AttributeValueIndexed(value) index and back-traverse to peers — so violations involving untouched nodes that share the diffed value are still surfaced. Schema-diff-driven constraints (e.g. adding unique:true to an existing attribute) leave node_uuids=None and continue to use the full-scan path, since the constraint itself changed and every node must be re-checked. Dedup of constraints favors the full-scan version when both sources emit the same constraint. Also fix `if value:` → `if value is not None:` in the query so falsy but set values (False, 0, "") activate the _with_value subqueries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every attribute validator (kind, regex, length, min_max, optional,
enum, choices, number_pool) runs a `MATCH (n:Kind)` kind-wide label
scan to read every node's current value and validate it against a
schema property. On a 400k-node kind, opening a PC that touches a
single attribute value still walks all 400k nodes.
These checkers fire from two origins, just like the uniqueness one:
- schema-diff: the property itself changed (e.g. kind Text→Integer,
new regex, tighter min/max). Every existing value must be
re-validated — full scan stays correct.
- data-diff: a node's value changed. The determiner over-eagerly
emits a constraint for every property of the touched attribute,
even though only the diffed node's new value needs re-checking
against the unchanged schema.
Forward `request.node_uuids` (already plumbed for uniqueness) into
the shared `SchemaValidatorQuery` base and gate the initial
`MATCH (n:Kind)` with
`WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids`. None preserves
the schema-diff full-scan behavior; a populated set narrows the scan
to the diffed nodes.
Not covered: `attribute/unique.py` (different group-by-value shape,
needs the same value pre-fetch as the main UniquenessChecker) and
the relationship/* validators (same pattern, follow-up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A uniqueness path like `device__name` makes kind S's constraint value depend on the peer kind's `name` attribute. When the peer's `name` changes, S's effective constraint values change too — but S has no entry in the diff, so the determiner (since d846e73) skipped its re-validation. A PC could ship a fresh collision. Reproducer: DcimInterface has `[name, device__name]` with I1(eth0→D1 named "r1") and I2(eth0→D2 named "r2"). Rename D1 to "r2": diff contains only DcimDevice, DcimInterface uniqueness is never run, and I1's effective (eth0, r2) collides with I2 undetected. Add `ConstraintValidatorDeterminer.find_cross_kind_uniqueness_dependents` to walk every schema's uniqueness constraints, parse each path on `__`, and return `(dependent_kind, rel_name) → {diffed peer UUIDs}` for the cases where the peer kind's referenced attribute is in the diff. Generic peers fan out via `used_by` so concrete diffed kinds match. In `_get_proposed_change_schema_integrity_constraints`, resolve the affected dependent-kind UUIDs by reusing `NodeUniqueAttributeConstraintQuery` with `min_count_required=0` — no new Cypher — and merge them into the dependent kind's `node.uniqueness_constraints.update` constraint. That flows through the existing scoped pre-fetch path, where the value-anchored subquery surfaces collisions with untouched dependent nodes (the I2 case). Schema-diff origins keep `node_uuids=None` (full scan); the merge helper preserves that and only unions concrete UUID sets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All four relationship validator queries (optional, count, peer, common_parent) ran kind-wide MATCH (n:Kind) label scans. On a PC touching one node, they still walked every node of the kind. The node_uuids plumbing from prior commits is already in place on SchemaConstraintValidatorRequest and the shared SchemaValidatorQuery base. Stamp $node_uuids and gate each initial MATCH (n:Kind) with `WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids` — the same idiom used by the attribute checkers and the uniqueness query. For RelationshipOptionalUpdateValidatorQuery the gate is applied to all three kind-scanning MATCHes in the same query: the active-node collection, the with-rel collection, and the violator-set match. Schema-diff origins (e.g. flipping optional, tightening min_count) keep node_uuids=None and continue to full-scan unchanged. Out of scope: cross-kind trigger for relationship.common_parent.update when a peer's parent changes — same shape as the uniqueness gap fixed in 90927aa; tracked as follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
2 issues found across 20 files
Confidence score: 2/5
- High-confidence validator logic issues in core paths make this a high-risk merge: both findings are severity 7+/10 with concrete regression potential in uniqueness enforcement.
- In
backend/infrahub/core/validators/determiner.py, cross-kind uniqueness checks can be skipped becauseproperty_nameincludes suffixed names (for example__value) that do not match_attribute_element_mapbase attributes, which can allow invalid duplicates through. - In
backend/infrahub/core/validators/relationship/peer.py, limitingrelationship.common_parent.updateto diffed start-node UUIDs can miss cross-kind violations when only the peer side changes, weakening relationship validation coverage. - Pay close attention to
backend/infrahub/core/validators/determiner.pyandbackend/infrahub/core/validators/relationship/peer.py- uniqueness and cross-kind validation paths may be bypassed in realistic update scenarios.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/infrahub/core/validators/relationship/peer.py">
<violation number="1" location="backend/infrahub/core/validators/relationship/peer.py:40">
P1: Scoping `relationship.common_parent.update` to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes.</violation>
</file>
<file name="backend/infrahub/core/validators/determiner.py">
<violation number="1" location="backend/infrahub/core/validators/determiner.py:91">
P1: Cross-kind uniqueness constraint checks will erroneously skip validation because `property_name` includes property suffixes (like `__value`) which do not match the base attribute names in `_attribute_element_map`.</violation>
</file>
Shadow auto-approve: would not auto-approve because issues were found.
Re-trigger cubic
| # ruff: noqa: E501 | ||
| query = """ | ||
| MATCH (n:%(node_kind)s) | ||
| WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids |
Contributor
There was a problem hiding this comment.
P1: Scoping relationship.common_parent.update to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/core/validators/relationship/peer.py, line 40:
<comment>Scoping `relationship.common_parent.update` to diffed start-node UUIDs can miss cross-kind violations when only the peer side changes.</comment>
<file context>
@@ -32,10 +32,12 @@ async def query_init(self, db: InfrahubDatabase, **kwargs: dict[str, Any]) -> No
# ruff: noqa: E501
query = """
MATCH (n:%(node_kind)s)
+ WHERE $node_uuids IS NULL OR n.uuid IN $node_uuids
CALL (n) {
MATCH path = (root:Root)<-[rroot:IS_PART_OF]-(n)
</file context>
| peer_concrete_kinds = self._derived_kinds(rel_schema.peer) | ||
| diffed_peer_uuids: set[str] = set() | ||
| for peer_kind in peer_concrete_kinds: | ||
| if property_name not in self._attribute_element_map.get(peer_kind, set()): |
Contributor
There was a problem hiding this comment.
P1: Cross-kind uniqueness constraint checks will erroneously skip validation because property_name includes property suffixes (like __value) which do not match the base attribute names in _attribute_element_map.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/core/validators/determiner.py, line 91:
<comment>Cross-kind uniqueness constraint checks will erroneously skip validation because `property_name` includes property suffixes (like `__value`) which do not match the base attribute names in `_attribute_element_map`.</comment>
<file context>
@@ -38,6 +39,63 @@ def _index_node_diffs(self, node_diffs: list[NodeDiffFieldSummary]) -> None:
+ peer_concrete_kinds = self._derived_kinds(rel_schema.peer)
+ diffed_peer_uuids: set[str] = set()
+ for peer_kind in peer_concrete_kinds:
+ if property_name not in self._attribute_element_map.get(peer_kind, set()):
+ continue
+ diffed_peer_uuids.update(self._node_uuid_map.get(peer_kind, set()))
</file context>
Suggested change
| if property_name not in self._attribute_element_map.get(peer_kind, set()): | |
| if property_name.split("__")[0] not in self._attribute_element_map.get(peer_kind, set()): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on top of
fac-fix-pc-validators(which scoped property-constraint selection to diffed kinds viad846e73). This branch narrows the scan within each touched kind, and closes one cross-kind correctness gap surfaced along the way.Four commits:
86eafcd6scope uniqueness query to diffed node values — pre-fetch diffed nodes' uniqueness-relevant values viaNodeManager.get_manyand route through the existing_with_valueCypher subqueries. No new Cypher. Value-anchored, so collisions with untouched peers still surface.691e6dc1scope attribute checkers to diffed node uuids — plumbnode_uuidsthroughSchemaConstraintValidatorRequestand the sharedSchemaValidatorQuerybase; gate eachMATCH (n:Kind)withWHERE $node_uuids IS NULL OR n.uuid IN $node_uuidsacross kind, regex, length, min_max, optional, enum, choices, number_pool.90927aa0trigger uniqueness re-check on cross-kind peer changes — when arel__peer_attrpath's peer attribute changes, the dependent kind itself isn't in the diff. NewConstraintValidatorDeterminer.find_cross_kind_uniqueness_dependentsplus amin_count_required=0reuse of the existing query resolves affected dependent UUIDs and merges them into the constraint.62696cb7scope relationship checkers to diffed node uuids — same UUID-scoping pattern forrelationship.optional,relationship.count,relationship.peer,relationship.common_parent.Why
On a 400k-node kind, opening a PC that changes one attribute on one node previously triggered kind-wide label scans across every constraint validator that fired for the kind. With this stack the scan is bounded by the diffed node set (or, for the cross-kind case, the set of dependent nodes pointing to diffed peers). Schema-diff-origin constraints (e.g. flipping
optionalor addingunique) keepnode_uuids=Noneand continue to full-scan — correctness is preserved.Test plan
uv run pytest backend/tests/component/core/constraint_validators/— 320 passed, 6 pre-existing skips (verified after each commit).uv run ruff checkanduv run mypy— clean on every modified file.DcimInterfaceL2DB: open a PC that touches onename; capture the executed Cypher andPROFILEit to confirm the planner anchors on:AttributeValueIndexed(value)instead of a kind label scan. Time the validator — target sub-second.DcimDevicereferenced by aDcimInterfacerel__peer_attruniqueness constraint; verify the PC reports any resulting collision.Out of scope (follow-ups)
attribute.unique.update— group-by-value query; needs the same value pre-fetch as the main uniqueness checker.relationship.common_parent.update— when a peer's parent changes, the start kind isn't in the diff. Same shape as the uniqueness gap closed in90927aa0.🤖 Generated with Claude Code
Summary by cubic
Scopes all schema-constraint validators to only the nodes changed in a proposed change and fixes cross-kind uniqueness rechecks. This removes kind-wide scans on large datasets while preserving full scans when the schema itself changes.
Refactors
Bug Fixes
Written for commit 62696cb. Summary will update on new commits. Review in cubic