fix: filter infrastructure nodes from connect path traversal by aaronsb · Pull Request #339 · aaronsb/knowledge-graph-system

aaronsb · 2026-02-19T00:29:29Z

Summary

graph_accel loaded all node/edge types by default, causing 80% of edges to be provenance plumbing (APPEARS, EVIDENCED_BY, FROM_SOURCE, etc.)
Yen's k-shortest paths traversed Source/Instance/Ontology nodes, producing ghost paths with blank IDs and "Grounding: Unexplored"
4 of 5 returned paths were co-occurrence noise; only 1 carried semantic signal

Fix

Three-layer defense in graph_facade.py:

GUC filtering at load time — _set_accel_gucs() sets node_labels=Concept and dynamically builds an edge_types include list excluding 6 infrastructure types. Loaded edges drop from 19,206 → 3,904.
Post-filter — paths containing nodes without app_id are skipped (defense-in-depth for stale graphs)
GUC timing — SET calls moved after graph_accel_status() (which loads the shared library and registers GUCs) but before graph_accel_load() (which reads them)

Before / After

	Before	After
Loaded edges	19,206	3,904
Ghost paths (trust → complexity)	4 of 5	0 of 5
Semantic paths returned	1	5
Path quality	co-occurrence noise	CONTAINS, SUPPORTS, REQUIRES, APPLIES_TO chains

Test plan

Reproduced defect with MCP concept connect (trust→complexity, conway→adaptation, cognitive load→platform team)
Verified fix: all 3 test cases return clean semantic paths, zero ghost nodes
Unit tests pass (63/63 graph_facade + path tests)
Confirmed load stats: 1367 nodes / 3904 edges (367 semantic / 373 total edge types)

graph_accel was loading all node types and edge types (defaults to *), causing 80% of loaded edges (15,302/19,206) to be provenance edges (APPEARS, EVIDENCED_BY, FROM_SOURCE, SCOPED_BY, HAS_SOURCE, IMAGES). Yen's k-shortest paths traversed these to Source/Instance/Ontology nodes, producing ghost paths that wasted the path budget and rendered as blank entries with no ID or description. Three-layer fix: - Set node_labels=Concept and build dynamic edge_types exclude list at graph load time, reducing loaded edges to 3,904 semantic-only - Post-filter paths containing nodes without app_id (defense-in-depth) - Move GUC SET after graph_accel_status() which loads the shared library and registers GUCs, before graph_accel_load() reads them

aaronsb · 2026-02-19T04:40:41Z

Code Review -- PR #339

What this changes: Fixes a defect where concept connect path traversal returned paths through infrastructure nodes (Source, Instance, Ontology) instead of semantic Concept-to-Concept paths. The fix operates at two layers: (1) GUC-based filtering at graph load time to exclude infrastructure node labels and edge types, and (2) post-filtering in the Python facade as defense-in-depth.

Assessment: Solid fix, well-layered

The root cause analysis is correct. The graph_accel extension was loading all node types and all edge types by default (GUC defaults to *), which meant pathfinding could traverse through Source/Instance/Ontology nodes via APPEARS, EVIDENCED_BY, etc. The fix addresses this at the right layer -- at load time via GUCs, not just in post-processing.

What works well:

Defense-in-depth pattern: The GUC layer prevents infrastructure data from entering the in-memory graph (primary fix), while the app_id null checks in _accel_path_rows_to_dict and _find_paths_accel catch any residual phantom references (secondary guard). This is the right approach for a correctness fix.
_INFRA_EDGE_TYPES as a frozenset class constant: Clean, documented, and efficient for set subtraction. The comment explaining why these edges cause phantom paths is useful context.
Dynamic edge type discovery: The _set_accel_gucs method queries ag_catalog.ag_label for all edge types and subtracts the infrastructure set, rather than hardcoding the semantic types. This means new relationship types added via graph edit (ADR-089) are automatically included without code changes. Open/Closed principle well-applied.
GUC lifecycle ordering: The sequencing comment ("Called after graph_accel_status() has loaded the shared library ... but before graph_accel_load()") correctly captures a subtle ordering requirement. The Rust side reads GUCs inside do_load() at load.rs:42-46, so SET must happen before LOAD.
paths = paths[:max_paths] addition (line 347): Good catch -- the relationship_type post-filter could reduce the count below max_paths, but it could also leave more paths than requested if the filter was a no-op. Truncating after filtering is the correct behavior.

Findings

1. Auto-reload GUC persistence -- verified safe

Location: graph_facade.py:1062 / load.rs:42-46 / generation.rs:199

I traced the auto-reload path carefully. When ensure_fresh() detects a stale graph, it calls do_load(), which re-reads GUCs via guc::get_string(&guc::NODE_LABELS) and guc::get_string(&guc::EDGE_TYPES). Since GUCs are session-level state on the pinned _accel_conn connection, the SET values persist across reloads within that backend. This is correct. No action needed.

2. Connection reset path does not re-apply GUCs

Location: graph_facade.py:1084-1092 (exception handler in _execute_sql)

When a SQL error occurs, _accel_conn is closed and set to None. On the next call, _get_accel_connection() creates a fresh connection and sets graph_accel.node_id_property, but does not call _set_accel_gucs(). The GUC re-application only happens inside _execute_sql when backend_status == 'not_loaded'. Since a fresh connection means a fresh backend (no loaded graph), this condition will be true on the next query, triggering _set_accel_gucs() before graph_accel_load(). So this is actually safe -- but it is safe by coincidence of the not_loaded check, not by explicit design.

Suggestion: Consider adding a brief comment in the exception handler or _get_accel_connection noting that GUCs are re-applied via the not_loaded path in _execute_sql, not in _get_accel_connection. This makes the implicit safety contract explicit for future maintainers. Not blocking.

3. Empty semantic_types edge case

Location: graph_facade.py:1024-1026

If semantic_types is empty after subtracting _INFRA_EDGE_TYPES (meaning every edge type in the graph is infrastructure), the GUC is never set, leaving it at the default *. This would load all edge types -- exactly the problem this PR fixes. In practice this scenario is impossible (IMPLIES, SUPPORTS, etc. always exist), but the code silently degrades rather than logging a warning.

Suggestion: Add an else branch with a warning log: "No semantic edge types found after filtering infrastructure types -- graph may contain phantom paths." This makes the failure mode visible in logs. Not blocking.

4. _accel_path_rows_to_dict return type change is a subtle API contract shift

Location: graph_facade.py:399 (return type Optional[Dict] -> was Dict)

The return type changed from Dict[str, Any] to Optional[Dict[str, Any]]. The only caller (_find_path_accel at line 397) already checks if not rows and returns None, and the caller of that (find_path at line 304) already does if result is not None. So the None propagation is handled correctly through the call chain. Good.

5. Duplicate post-filter logic in two methods

Location: graph_facade.py:409 and graph_facade.py:483

The any(not row.get('app_id') for row in ...) check appears in both _accel_path_rows_to_dict (for single-path) and _find_paths_accel (for multi-path). These two methods have slightly different structures -- one converts a flat list of rows, the other groups by path_index first -- so extracting a shared helper is not trivial. The duplication is minor and justified by the structural differences. No action needed, but worth noting for future refactoring if the conversion logic grows.

6. Dead fallback code preserved

Location: graph_facade.py:433-438 and graph_facade.py:494-499

The else branches that build placeholder nodes with concept_id: nid or '' are now unreachable when the GUC filter is active and the app_id null check returns None/skips. However, keeping them is reasonable as defensive fallback if hydration fails for a valid concept ID (e.g., deleted between path query and hydration). The code is harmless, though the original comment ("Non-Concept node -- use AGE label") was correctly removed since that is no longer the expected case.

7. Missing test coverage for the new behavior

Location: tests/unit/lib/test_graph_facade.py

There are no tests exercising the new None return from _accel_path_rows_to_dict when rows contain app_id: None, or the skip behavior in _find_paths_accel. Adding two unit tests would solidify the contract:

test_accel_path_filters_infrastructure_nodes: Feed rows with app_id: None to _accel_path_rows_to_dict, assert it returns None.
test_accel_multi_path_skips_infrastructure_paths: Feed multi-path rows where one path has app_id: None, assert it is excluded from results.

This is worth adding before merge.

8. _set_accel_gucs is not tested

Location: graph_facade.py:1008-1031

The GUC-setting logic queries ag_catalog and builds the edge type filter dynamically. A unit test mocking the cursor to return a set of edge labels and verifying the SET statements would prevent regressions if the infrastructure edge list changes. Worth considering but not blocking since the method is straightforward.

Summary

Category	Verdict
Correctness	Addresses root cause at the right layer
Defense-in-depth	GUC filter + post-filter is well-layered
SOLID compliance	Good -- dynamic discovery follows Open/Closed
Edge cases	Empty semantic_types edge case is low-risk but worth a warning log
Test coverage	Two unit tests recommended for the new filtering behavior
Security	No concerns -- no user input in the GUC values
Risks	Low -- GUC persistence through reconnection is safe by design

Recommendation: Add the two unit tests for the post-filter behavior (findings #7), then merge. The other suggestions are non-blocking improvements.

AI-assisted review via Claude

Address code review findings: - Add test for single-path phantom node filtering (app_id=None → None) - Add test for multi-path phantom filtering (mixed paths → only clean ones) - Add warning log when no semantic edge types found (empty exclude result)

aaronsb merged commit 7752294 into main Feb 19, 2026
3 checks passed

aaronsb mentioned this pull request Feb 19, 2026

fix: NaN confidence filter silently rejected all graph_accel traversals #340

Merged

5 tasks

aaronsb deleted the fix/connect-filter-infrastructure-nodes branch February 19, 2026 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: filter infrastructure nodes from connect path traversal#339

fix: filter infrastructure nodes from connect path traversal#339
aaronsb merged 2 commits intomainfrom
fix/connect-filter-infrastructure-nodes

aaronsb commented Feb 19, 2026

Uh oh!

aaronsb commented Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

aaronsb commented Feb 19, 2026

Summary

Fix

Before / After

Test plan

Uh oh!

aaronsb commented Feb 19, 2026

Code Review -- PR #339

Assessment: Solid fix, well-layered

Findings

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant