Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .claude/ways/kg/api/way.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,24 @@ client.facade.count_concepts()
client._execute_cypher("MATCH (n) RETURN n")
```

## GraphFacade (graph_accel integration)

`GraphFacade` in `api/app/lib/graph_facade.py` wraps graph_accel with
Cypher fallback. Key patterns:

- **Dedicated connection**: `_accel_conn` is pinned so the in-memory graph
persists across requests. Don't use the regular AGEClient connection.
- **Optional params → NULL, not NaN**: Rust `Option<f64>` maps `NULL` to
`None` (skip filter). `float('nan')` maps to `Some(NaN)` which silently
rejects all comparisons (`x >= NaN` is always false in IEEE 754).
- **GUC lifecycle**: GUCs are set once on first load (`_set_accel_gucs`),
then `ensure_fresh()` handles generation-based reloads using the
session-level GUCs already in place.

## After API Changes

```bash
./operator.sh restart api
./operator.sh restart api # or hot-reload (dev mode watches for changes)
```

## Testing Endpoints
Expand Down
99 changes: 99 additions & 0 deletions .claude/ways/kg/graph-accel/way.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
match: regex
pattern: \bgraph.accel\b|\bgraph_accel\b|\.so\b.*pgrx|pgrx|traversal\.rs|bfs_neighborhood|graph_accel_load|graph_accel_neighborhood|graph_accel_path|graph_accel_subgraph|graph_accel_degree
files: graph-accel/
commands: cargo\s+(test|build|pgrx)
---
# graph_accel Way

Rust pgrx PostgreSQL extension for in-memory graph acceleration.

## Architecture

- **Core** (`graph-accel/core/`): Pure Rust graph algorithms (BFS, k-shortest, degree, subgraph). No pgrx dependency — testable with `cargo test`.
- **Ext** (`graph-accel/ext/`): pgrx wrapper — SPI loading, GUC handling, thread-local state. Needs `cargo pgrx test pg17`.
- **State**: Per-backend (thread_local). Each PostgreSQL connection has its own in-memory graph copy.
- **Generation**: Monotonic counter in `graph_accel.generation` table drives cache invalidation via `ensure_fresh()`.

## Build & Deploy

```bash
./graph-accel/build-in-container.sh # Canonical build (ABI-safe)
./graph-accel/tests/deploy-option0.sh # Copy .so into running container
```

Output: `dist/pg17/{amd64,arm64}/graph_accel.so`

## Testing

```bash
cd graph-accel && cargo test # Core unit tests (fast, no PG)
cd graph-accel && cargo pgrx test pg17 # Full pgrx tests (needs PG)
```

## Debugging with SQL

Always use `operator.sh query` (not docker exec):

```bash
# Check status
./operator.sh query "SELECT * FROM graph_accel_status()"

# Load with specific GUCs
./operator.sh query "
SET graph_accel.node_id_property = 'concept_id';
SET graph_accel.node_labels = 'Concept';
SET graph_accel.edge_types = 'SUBSUMES,REQUIRES';
SELECT * FROM graph_accel_load('knowledge_graph');
"
```

**Long GUC values**: Build in SQL, not shell variables. Shell interpolation
can silently truncate long strings:

```sql
-- GOOD: build in SQL
DO $$
DECLARE edge_csv text;
BEGIN
SELECT string_agg(name, ',') INTO edge_csv FROM ag_catalog.ag_label ...;
EXECUTE format('SET graph_accel.edge_types = %L', edge_csv);
END $$;

-- BAD: shell variable (can mangle 4000+ char strings)
./operator.sh query "SET graph_accel.edge_types = '$SHELL_VAR'"
```

## Parameter Passing Pitfalls

**NULL vs NaN for Optional parameters**: graph_accel SQL functions use
`Option<f64>` for optional thresholds (min_confidence, etc.).

| Python value | SQL wire | Rust `Option<f64>` | Behavior |
|---|---|---|---|
| `None` | `NULL` | `None` | Filter skipped (correct) |
| `float('nan')` | `'NaN'::float8` | `Some(NaN)` | `x >= NaN` is always false — **rejects all edges** |
| `0.5` | `0.5` | `Some(0.5)` | Normal threshold filter |

Never use `float('nan')` as a "no filter" sentinel. Pass `None`.

## GUCs

| GUC | Default | Purpose |
|-----|---------|---------|
| `graph_accel.source_graph` | (none) | AGE graph name |
| `graph_accel.node_labels` | `*` | Comma-separated vertex labels to load |
| `graph_accel.edge_types` | `*` | Comma-separated edge types to load |
| `graph_accel.node_id_property` | (none) | Property for app-level ID index |
| `graph_accel.auto_reload` | `on` | Auto-reload on generation mismatch |
| `graph_accel.max_memory_mb` | `4096` | Per-backend memory cap |

## Python Facade Integration

`GraphFacade` in `api/app/lib/graph_facade.py` manages graph_accel via a
dedicated pinned connection (`_accel_conn`). The load sequence:

1. `graph_accel_status()` — triggers library loading, registers GUCs
2. `_set_accel_gucs()` — sets node_labels, edge_types filters
3. `graph_accel_load()` — loads filtered graph into backend memory
4. Query functions — `ensure_fresh()` handles generation-based reload
6 changes: 6 additions & 0 deletions .claude/ways/kg/testing/way.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ Tests run inside containers with live mounts. Platform must be running in dev mo
cd cli && npm test
```

**Rust tests** (graph-accel core, runs from host):
```bash
cd graph-accel && cargo test # Core algorithms (fast, no PG)
cd graph-accel && cargo pgrx test pg17 # pgrx extension tests (needs PG)
```

## Test Structure

| Directory | What | Framework |
Expand Down
14 changes: 5 additions & 9 deletions api/app/lib/graph_facade.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,14 +194,13 @@ def _neighborhood_accel(
min_confidence: Optional[float]
) -> List[Dict[str, Any]]:
"""graph_accel fast path for neighborhood."""
# NaN sentinel means "no filter" in graph_accel
conf_param = min_confidence if min_confidence is not None else float('nan')

# Pass NULL for no confidence filter — Rust Option<f64> maps
# None → skip filter, Some(NaN) → reject all (NaN >= x is false).
rows = self._execute_sql(
"SELECT app_id, label, distance, path_types "
"FROM graph_accel_neighborhood(%s, %s, %s, %s) "
"WHERE label = 'Concept' AND distance > 0",
(concept_id, max_depth, direction, conf_param)
(concept_id, max_depth, direction, min_confidence)
)

# graph_accel 'label' is the AGE vertex label ("Concept"), not
Expand Down Expand Up @@ -383,12 +382,10 @@ def _find_path_accel(
min_confidence: Optional[float]
) -> Optional[Dict[str, Any]]:
"""graph_accel fast path for shortest path."""
conf_param = min_confidence if min_confidence is not None else float('nan')

rows = self._execute_sql(
"SELECT step, app_id, label, rel_type, direction "
"FROM graph_accel_path(%s, %s, %s, %s, %s)",
(from_id, to_id, max_hops, direction, conf_param)
(from_id, to_id, max_hops, direction, min_confidence)
)

if not rows:
Expand Down Expand Up @@ -802,11 +799,10 @@ def subgraph(
List of dicts with from_app_id, to_app_id, rel_type
"""
if self._accel_ready:
conf_param = min_confidence if min_confidence is not None else float('nan')
rows = self._execute_sql(
"SELECT from_app_id, from_label, to_app_id, to_label, rel_type "
"FROM graph_accel_subgraph(%s, %s, %s, %s)",
(start_id, max_depth, direction, conf_param)
(start_id, max_depth, direction, min_confidence)
)
return [dict(r) for r in rows]

Expand Down
52 changes: 52 additions & 0 deletions graph-accel/core/src/traversal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1536,4 +1536,56 @@ mod tests {
let paths = k_shortest_paths(&g, 999, 0, 10, 5, TraversalDirection::Both, None);
assert!(paths.is_empty());
}

// --- Two-phase loading tests (mimics ext load_vertices + load_edges) ---

#[test]
fn test_two_phase_loading_bfs() {
// Reproduce the ext loading path: add_node first, then add_edge
let mut g = Graph::new();

// Phase 1: load vertices (like ext load_vertices)
g.add_node(100, "Concept".into(), Some("concept_a".into()));
g.add_node(200, "Concept".into(), Some("concept_b".into()));
g.add_node(300, "Concept".into(), Some("concept_c".into()));

// Phase 2: load edges (like ext load_edges)
let rt = g.intern_rel_type("SUBSUMES");
g.add_edge(100, 200, rt, Edge::NO_CONFIDENCE);
g.add_edge(100, 300, rt, Edge::NO_CONFIDENCE);

// Verify degree sees edges
assert_eq!(g.neighbors_out(100).len(), 2);
assert_eq!(g.neighbors_in(200).len(), 1);

// Verify BFS finds neighbors
let result = bfs_neighborhood(&g, 100, 2, TraversalDirection::Both, None);
assert_eq!(
result.neighbors.len(), 2,
"BFS should find 2 neighbors from node 100, found {}",
result.neighbors.len()
);
}

#[test]
fn test_two_phase_loading_app_id_resolution() {
let mut g = Graph::new();

// Phase 1: vertices
g.add_node(100, "Concept".into(), Some("concept_a".into()));
g.add_node(200, "Concept".into(), Some("concept_b".into()));

// Phase 2: edges
let rt = g.intern_rel_type("SUBSUMES");
g.add_edge(100, 200, rt, Edge::NO_CONFIDENCE);

// Resolve app_id → internal NodeId
let resolved = g.resolve_app_id("concept_a").unwrap();
assert_eq!(resolved, 100);

// BFS via resolved ID
let result = bfs_neighborhood(&g, resolved, 1, TraversalDirection::Both, None);
assert_eq!(result.neighbors.len(), 1);
assert_eq!(result.neighbors[0].node_id, 200);
}
}