fix: filter infrastructure nodes from connect path traversal#339
fix: filter infrastructure nodes from connect path traversal#339
Conversation
graph_accel was loading all node types and edge types (defaults to *), causing 80% of loaded edges (15,302/19,206) to be provenance edges (APPEARS, EVIDENCED_BY, FROM_SOURCE, SCOPED_BY, HAS_SOURCE, IMAGES). Yen's k-shortest paths traversed these to Source/Instance/Ontology nodes, producing ghost paths that wasted the path budget and rendered as blank entries with no ID or description. Three-layer fix: - Set node_labels=Concept and build dynamic edge_types exclude list at graph load time, reducing loaded edges to 3,904 semantic-only - Post-filter paths containing nodes without app_id (defense-in-depth) - Move GUC SET after graph_accel_status() which loads the shared library and registers GUCs, before graph_accel_load() reads them
Code Review -- PR #339What this changes: Fixes a defect where Assessment: Solid fix, well-layeredThe root cause analysis is correct. The graph_accel extension was loading all node types and all edge types by default (GUC defaults to What works well:
Findings1. Auto-reload GUC persistence -- verified safe Location: I traced the auto-reload path carefully. When 2. Connection reset path does not re-apply GUCs Location: When a SQL error occurs, Suggestion: Consider adding a brief comment in the exception handler or 3. Empty Location: If Suggestion: Add an 4. Location: The return type changed from 5. Duplicate post-filter logic in two methods Location: The 6. Dead fallback code preserved Location: The 7. Missing test coverage for the new behavior Location: There are no tests exercising the new
This is worth adding before merge. 8. Location: The GUC-setting logic queries Summary
Recommendation: Add the two unit tests for the post-filter behavior (findings #7), then merge. The other suggestions are non-blocking improvements. AI-assisted review via Claude |
Address code review findings: - Add test for single-path phantom node filtering (app_id=None → None) - Add test for multi-path phantom filtering (mixed paths → only clean ones) - Add warning log when no semantic edge types found (empty exclude result)
Summary
Fix
Three-layer defense in
graph_facade.py:_set_accel_gucs()setsnode_labels=Conceptand dynamically builds anedge_typesinclude list excluding 6 infrastructure types. Loaded edges drop from 19,206 → 3,904.app_idare skipped (defense-in-depth for stale graphs)graph_accel_status()(which loads the shared library and registers GUCs) but beforegraph_accel_load()(which reads them)Before / After
Test plan
concept connect(trust→complexity, conway→adaptation, cognitive load→platform team)