diff --git a/.claude/ways/kg/api/way.md b/.claude/ways/kg/api/way.md index 40a1cfdbe..a806b8498 100644 --- a/.claude/ways/kg/api/way.md +++ b/.claude/ways/kg/api/way.md @@ -35,10 +35,24 @@ client.facade.count_concepts() client._execute_cypher("MATCH (n) RETURN n") ``` +## GraphFacade (graph_accel integration) + +`GraphFacade` in `api/app/lib/graph_facade.py` wraps graph_accel with +Cypher fallback. Key patterns: + +- **Dedicated connection**: `_accel_conn` is pinned so the in-memory graph + persists across requests. Don't use the regular AGEClient connection. +- **Optional params → NULL, not NaN**: Rust `Option` maps `NULL` to + `None` (skip filter). `float('nan')` maps to `Some(NaN)` which silently + rejects all comparisons (`x >= NaN` is always false in IEEE 754). +- **GUC lifecycle**: GUCs are set once on first load (`_set_accel_gucs`), + then `ensure_fresh()` handles generation-based reloads using the + session-level GUCs already in place. + ## After API Changes ```bash -./operator.sh restart api +./operator.sh restart api # or hot-reload (dev mode watches for changes) ``` ## Testing Endpoints diff --git a/.claude/ways/kg/graph-accel/way.md b/.claude/ways/kg/graph-accel/way.md new file mode 100644 index 000000000..6c912e0a4 --- /dev/null +++ b/.claude/ways/kg/graph-accel/way.md @@ -0,0 +1,99 @@ +--- +match: regex +pattern: \bgraph.accel\b|\bgraph_accel\b|\.so\b.*pgrx|pgrx|traversal\.rs|bfs_neighborhood|graph_accel_load|graph_accel_neighborhood|graph_accel_path|graph_accel_subgraph|graph_accel_degree +files: graph-accel/ +commands: cargo\s+(test|build|pgrx) +--- +# graph_accel Way + +Rust pgrx PostgreSQL extension for in-memory graph acceleration. + +## Architecture + +- **Core** (`graph-accel/core/`): Pure Rust graph algorithms (BFS, k-shortest, degree, subgraph). No pgrx dependency — testable with `cargo test`. +- **Ext** (`graph-accel/ext/`): pgrx wrapper — SPI loading, GUC handling, thread-local state. Needs `cargo pgrx test pg17`. +- **State**: Per-backend (thread_local). Each PostgreSQL connection has its own in-memory graph copy. +- **Generation**: Monotonic counter in `graph_accel.generation` table drives cache invalidation via `ensure_fresh()`. + +## Build & Deploy + +```bash +./graph-accel/build-in-container.sh # Canonical build (ABI-safe) +./graph-accel/tests/deploy-option0.sh # Copy .so into running container +``` + +Output: `dist/pg17/{amd64,arm64}/graph_accel.so` + +## Testing + +```bash +cd graph-accel && cargo test # Core unit tests (fast, no PG) +cd graph-accel && cargo pgrx test pg17 # Full pgrx tests (needs PG) +``` + +## Debugging with SQL + +Always use `operator.sh query` (not docker exec): + +```bash +# Check status +./operator.sh query "SELECT * FROM graph_accel_status()" + +# Load with specific GUCs +./operator.sh query " +SET graph_accel.node_id_property = 'concept_id'; +SET graph_accel.node_labels = 'Concept'; +SET graph_accel.edge_types = 'SUBSUMES,REQUIRES'; +SELECT * FROM graph_accel_load('knowledge_graph'); +" +``` + +**Long GUC values**: Build in SQL, not shell variables. Shell interpolation +can silently truncate long strings: + +```sql +-- GOOD: build in SQL +DO $$ +DECLARE edge_csv text; +BEGIN + SELECT string_agg(name, ',') INTO edge_csv FROM ag_catalog.ag_label ...; + EXECUTE format('SET graph_accel.edge_types = %L', edge_csv); +END $$; + +-- BAD: shell variable (can mangle 4000+ char strings) +./operator.sh query "SET graph_accel.edge_types = '$SHELL_VAR'" +``` + +## Parameter Passing Pitfalls + +**NULL vs NaN for Optional parameters**: graph_accel SQL functions use +`Option` for optional thresholds (min_confidence, etc.). + +| Python value | SQL wire | Rust `Option` | Behavior | +|---|---|---|---| +| `None` | `NULL` | `None` | Filter skipped (correct) | +| `float('nan')` | `'NaN'::float8` | `Some(NaN)` | `x >= NaN` is always false — **rejects all edges** | +| `0.5` | `0.5` | `Some(0.5)` | Normal threshold filter | + +Never use `float('nan')` as a "no filter" sentinel. Pass `None`. + +## GUCs + +| GUC | Default | Purpose | +|-----|---------|---------| +| `graph_accel.source_graph` | (none) | AGE graph name | +| `graph_accel.node_labels` | `*` | Comma-separated vertex labels to load | +| `graph_accel.edge_types` | `*` | Comma-separated edge types to load | +| `graph_accel.node_id_property` | (none) | Property for app-level ID index | +| `graph_accel.auto_reload` | `on` | Auto-reload on generation mismatch | +| `graph_accel.max_memory_mb` | `4096` | Per-backend memory cap | + +## Python Facade Integration + +`GraphFacade` in `api/app/lib/graph_facade.py` manages graph_accel via a +dedicated pinned connection (`_accel_conn`). The load sequence: + +1. `graph_accel_status()` — triggers library loading, registers GUCs +2. `_set_accel_gucs()` — sets node_labels, edge_types filters +3. `graph_accel_load()` — loads filtered graph into backend memory +4. Query functions — `ensure_fresh()` handles generation-based reload diff --git a/.claude/ways/kg/testing/way.md b/.claude/ways/kg/testing/way.md index 08fa9d108..02c370885 100644 --- a/.claude/ways/kg/testing/way.md +++ b/.claude/ways/kg/testing/way.md @@ -22,6 +22,12 @@ Tests run inside containers with live mounts. Platform must be running in dev mo cd cli && npm test ``` +**Rust tests** (graph-accel core, runs from host): +```bash +cd graph-accel && cargo test # Core algorithms (fast, no PG) +cd graph-accel && cargo pgrx test pg17 # pgrx extension tests (needs PG) +``` + ## Test Structure | Directory | What | Framework | diff --git a/api/app/lib/graph_facade.py b/api/app/lib/graph_facade.py index b9e54c57f..9686b9729 100644 --- a/api/app/lib/graph_facade.py +++ b/api/app/lib/graph_facade.py @@ -194,14 +194,13 @@ def _neighborhood_accel( min_confidence: Optional[float] ) -> List[Dict[str, Any]]: """graph_accel fast path for neighborhood.""" - # NaN sentinel means "no filter" in graph_accel - conf_param = min_confidence if min_confidence is not None else float('nan') - + # Pass NULL for no confidence filter — Rust Option maps + # None → skip filter, Some(NaN) → reject all (NaN >= x is false). rows = self._execute_sql( "SELECT app_id, label, distance, path_types " "FROM graph_accel_neighborhood(%s, %s, %s, %s) " "WHERE label = 'Concept' AND distance > 0", - (concept_id, max_depth, direction, conf_param) + (concept_id, max_depth, direction, min_confidence) ) # graph_accel 'label' is the AGE vertex label ("Concept"), not @@ -383,12 +382,10 @@ def _find_path_accel( min_confidence: Optional[float] ) -> Optional[Dict[str, Any]]: """graph_accel fast path for shortest path.""" - conf_param = min_confidence if min_confidence is not None else float('nan') - rows = self._execute_sql( "SELECT step, app_id, label, rel_type, direction " "FROM graph_accel_path(%s, %s, %s, %s, %s)", - (from_id, to_id, max_hops, direction, conf_param) + (from_id, to_id, max_hops, direction, min_confidence) ) if not rows: @@ -802,11 +799,10 @@ def subgraph( List of dicts with from_app_id, to_app_id, rel_type """ if self._accel_ready: - conf_param = min_confidence if min_confidence is not None else float('nan') rows = self._execute_sql( "SELECT from_app_id, from_label, to_app_id, to_label, rel_type " "FROM graph_accel_subgraph(%s, %s, %s, %s)", - (start_id, max_depth, direction, conf_param) + (start_id, max_depth, direction, min_confidence) ) return [dict(r) for r in rows] diff --git a/graph-accel/core/src/traversal.rs b/graph-accel/core/src/traversal.rs index d7cd85a6b..090ad7442 100644 --- a/graph-accel/core/src/traversal.rs +++ b/graph-accel/core/src/traversal.rs @@ -1536,4 +1536,56 @@ mod tests { let paths = k_shortest_paths(&g, 999, 0, 10, 5, TraversalDirection::Both, None); assert!(paths.is_empty()); } + + // --- Two-phase loading tests (mimics ext load_vertices + load_edges) --- + + #[test] + fn test_two_phase_loading_bfs() { + // Reproduce the ext loading path: add_node first, then add_edge + let mut g = Graph::new(); + + // Phase 1: load vertices (like ext load_vertices) + g.add_node(100, "Concept".into(), Some("concept_a".into())); + g.add_node(200, "Concept".into(), Some("concept_b".into())); + g.add_node(300, "Concept".into(), Some("concept_c".into())); + + // Phase 2: load edges (like ext load_edges) + let rt = g.intern_rel_type("SUBSUMES"); + g.add_edge(100, 200, rt, Edge::NO_CONFIDENCE); + g.add_edge(100, 300, rt, Edge::NO_CONFIDENCE); + + // Verify degree sees edges + assert_eq!(g.neighbors_out(100).len(), 2); + assert_eq!(g.neighbors_in(200).len(), 1); + + // Verify BFS finds neighbors + let result = bfs_neighborhood(&g, 100, 2, TraversalDirection::Both, None); + assert_eq!( + result.neighbors.len(), 2, + "BFS should find 2 neighbors from node 100, found {}", + result.neighbors.len() + ); + } + + #[test] + fn test_two_phase_loading_app_id_resolution() { + let mut g = Graph::new(); + + // Phase 1: vertices + g.add_node(100, "Concept".into(), Some("concept_a".into())); + g.add_node(200, "Concept".into(), Some("concept_b".into())); + + // Phase 2: edges + let rt = g.intern_rel_type("SUBSUMES"); + g.add_edge(100, 200, rt, Edge::NO_CONFIDENCE); + + // Resolve app_id → internal NodeId + let resolved = g.resolve_app_id("concept_a").unwrap(); + assert_eq!(resolved, 100); + + // BFS via resolved ID + let result = bfs_neighborhood(&g, resolved, 1, TraversalDirection::Both, None); + assert_eq!(result.neighbors.len(), 1); + assert_eq!(result.neighbors[0].node_id, 200); + } }