Skip to content

Commit 4923303

Browse files
authored
docs(competitive): add CodeGraphContext as Tier 1 competitor (#675)
* perf(build): native Rust/rusqlite for roles classification and edge insertion (6.12) Roles: move classifyNodeRolesFull/Incremental SQL + classification logic to Rust (roles_db.rs). Single rusqlite connection runs fan-in/fan-out queries, computes medians, classifies roles, and batch-updates nodes — eliminates ~10 JS<->SQLite round-trips. Edges: add bulk_insert_edges (edges_db.rs) that writes computed edges directly to SQLite via rusqlite instead of marshaling back to JS. Restructure buildEdges to run edge computation in better-sqlite3 transaction, then native insert outside to avoid connection contention. 1-file regression fix: skip native call-edge path for small incremental builds (≤3 files) where napi-rs marshaling overhead exceeds savings. Both paths fall back gracefully to JS when native is unavailable. * fix(rust): use usize for raw_bind_parameter index, remove unused params import * fix(rust): port file-path dead-entry detection from JS to native classify_dead_sub_role (#658) * fix(build): add optional-chaining guard for classifyRolesIncremental call (#658) * fix(build): correct crash-atomicity comment for native edge insert path (#658) The comment claimed barrel-edge deletion and re-insertion were atomic, but with the native rusqlite path the insertion happens in Phase 2 on a separate connection. Updated the comment to accurately describe the atomicity guarantee: JS path is fully atomic; native path has a transient gap that self-heals on next incremental rebuild. * fix(rust): reduce edge insert CHUNK from 200 to 199 for SQLite bind param safety (#658) 200 rows × 5 params = 1000 bind parameters, which exceeds the legacy SQLITE_MAX_VARIABLE_NUMBER default of 999. While bundled SQLite 3.43+ raises the limit, reducing to 199 (995 params) removes the risk for any SQLite build with the old default. * fix(build): add debug log when native bulkInsertEdges falls back to JS (#658) The native edge insert fallback path was silent, making it hard to diagnose when the native path fails. Added a debug() call so the fallback is visible in verbose/debug output. * docs(competitive): add CodeGraphContext as Tier 1 #11 competitor (score 3.8) Add CodeGraphContext/CodeGraphContext (2,664 stars, Python, MIT) to the competitive analysis. Tree-sitter + graph DB (KuzuDB/FalkorDB/Neo4j), 14 languages, CLI + MCP, bundle registry, 10+ IDE setup wizard. Strong community traction but shallow analysis depth vs codegraph. * docs(roadmap): mark Phase 6 steps 6.8–6.15 as complete 6.8 sub-100ms incremental rebuilds (#644), 6.9 AST bulk insert (#651), 6.10 CFG/dataflow bulk insert (#653), 6.11 native insert-nodes (#654), 6.12 native roles/edges (#658), 6.13 NativeDatabase class (#666), 6.14 native read queries (#671), 6.15 native write ops (#669). 6.16 (Dynamic SQL) and 6.17 (better-sqlite3 isolation) remain open. * fix(roadmap): correct 6.13 body and add #644 to 6.8 Key PRs (#675) Section 6.13 heading was marked complete but body still read "Not started." — updated body to reflect PR #666 delivery. Section 6.8 body credited PR #644 for sub-100ms rebuilds but Key PRs list omitted it — added #644 to the list.
1 parent 5aac6b0 commit 4923303

2 files changed

Lines changed: 150 additions & 190 deletions

File tree

docs/roadmap/ROADMAP.md

Lines changed: 30 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -1154,12 +1154,12 @@ All test files migrated from `.js` to `.ts`. Vitest TypeScript integration verif
11541154
| Parse | 601ms | 2123ms | **3.5×** | 57ms | 201ms | Rust ✅ — real speedup |
11551155
| Build edges | 108ms | 167ms | 1.5× | 21ms | 15ms | Rust ✅ — modest; native *slower* on 1-file |
11561156
| Resolve imports | 12ms | 13ms | ~same | 2ms | 2ms | Rust ✅ — no meaningful difference |
1157-
| AST nodes | **393ms** | 397ms | **~same** | 0.2ms | 0.2ms | Extraction done ✅; **DB write not optimized** (6.9) |
1158-
| CFG | **161ms** | 155ms | **Rust slower** | 0.1ms | 0.1ms | Extraction done ✅; **DB write not optimized** (6.10) |
1159-
| Dataflow | **125ms** | 129ms | **~same** | 0.1ms | 0.2ms | Extraction done ✅; **DB write not optimized** (6.10) |
1160-
| Insert nodes | 206ms | 201ms | ~same | 8ms | 8ms | JS batching ✅; **no native advantage** (6.11) |
1157+
| AST nodes | **393ms** | 397ms | **~same** | 0.2ms | 0.2ms | Rust ✅ — native rusqlite bulk insert (PR #651) |
1158+
| CFG | **161ms** | 155ms | **Rust slower** | 0.1ms | 0.1ms | Rust ✅ — native rusqlite bulk insert (PR #653) |
1159+
| Dataflow | **125ms** | 129ms | **~same** | 0.1ms | 0.2ms | Rust ✅ — native rusqlite bulk insert (PR #653) |
1160+
| Insert nodes | 206ms | 201ms | ~same | 8ms | 8ms | Rust ✅ — native rusqlite pipeline (PR #654) |
11611161
| Complexity | 171ms | 216ms | 1.3× | 0.1ms | 0.1ms | Rust pre-computation ✅; modest speedup |
1162-
| Roles | 52ms | 52ms | ~same | 54ms | 55ms | JS batching ✅; **no native advantage** (6.12) |
1162+
| Roles | 52ms | 52ms | ~same | 54ms | 55ms | Rust ✅ — native rusqlite roles + edges (PR #658) |
11631163
| Structure | 22ms | 21ms | ~same | 26ms | 24ms | JS ✅ — already fast |
11641164
| **Total** | **2.7s** | **5.0s** | **1.85×** | **466ms** | **611ms** | Parse carries most of the speedup |
11651165

@@ -1211,114 +1211,63 @@ Structure building is unchanged — at 22ms it's already fast.
12111211

12121212
**Key PRs:** #469, #533, #539, #542
12131213

1214-
### 6.8 -- Incremental Rebuild Performance (partial)
1214+
### 6.8 -- Incremental Rebuild Performance
12151215

1216-
**Partially complete.** Roles classification is fully optimized (255ms → 9ms via incremental path with edge-neighbour expansion, PR #622). Structure batching and finalize skip are also done. Compound DB indexes restored query performance after TS migration (PR #632). Current native 1-file rebuild is ~466ms (v3.4.0, 473 files) — down from ~802ms but still above the sub-100ms target.
1216+
**Complete.** Sub-100ms incremental rebuilds achieved: **466ms → 67–80ms** on 473 files (PR #644). Roles classification optimized (255ms → 9ms via incremental path, PR #622). Structure batching, finalize skip, and compound DB indexes all done (PR #632).
12171217

12181218
**Done:**
12191219
- **Incremental roles** (255ms → 9ms): Only reclassify nodes from changed files + edge neighbours using indexed correlated subqueries. Global medians for threshold consistency. Parity-tested against full rebuild. *Note:* The benchmark table shows ~54ms for 1-file roles because the standard benchmark runs the full roles phase; the 9ms incremental path (PR #622) is used only when the builder detects a 1-file incremental rebuild
12201220
- **Structure batching:** Replace N+1 per-file queries with 3 batch queries regardless of file count
12211221
- **Finalize skip:** Skip advisory queries (orphaned embeddings, unused exports) during incremental builds
12221222
- **DB index regression:** Compound indexes on nodes/edges tables restored after TS migration (PR #632)
12231223

1224-
**Remaining:**
1225-
- **Incremental edge rebuild:** Only rebuild edges involving the changed file's symbols (currently edgesMs ~21ms on native, ~15ms on WASM — native is *slower* on 1-file)
1226-
- **Parse overhead:** Native parse of 1 file takes ~57ms (vs 201ms WASM) — investigate tree-sitter incremental parsing to push below 10ms
1227-
- **Structure/roles on 1-file:** Both still take ~25ms and ~54ms respectively on 1-file rebuilds — the full-build optimizations (6.5) don't apply to the incremental path
1228-
- **Benchmark target:** Sub-100ms native 1-file rebuilds (current ~466ms on 473 files)
1224+
**Result:** Native 1-file incremental rebuilds: **466ms → 67–80ms** (target was sub-100ms). Roles incremental path: **255ms → 9ms** via edge-neighbour expansion with indexed correlated subqueries.
12291225

1230-
**Key PRs:** #622, #632
1226+
**Key PRs:** #622, #632, #644
12311227

12321228
**Affected files:** `src/domain/graph/builder/stages/build-structure.ts`, `src/domain/graph/builder/stages/build-edges.ts`, `src/domain/graph/builder/pipeline.ts`
12331229

1234-
### 6.9 -- AST Node DB Write Optimization
1230+
### 6.9 -- AST Node DB Write Optimization
12351231

1236-
**Not started.** Native extraction (6.1) successfully produces AST nodes in Rust, but the `astMs` full-build phase is **393ms native vs 397ms WASM** — no speedup. The bottleneck is the JS loop that iterates over extracted AST nodes and inserts them into SQLite. The Rust extraction saves ~0ms because it merely shifts *when* the work happens (parse phase vs visitor phase), not *how much* work happens.
1232+
**Complete.** Bulk AST node inserts via native Rust/rusqlite. The `bulk_insert_ast_nodes` napi-rs function receives the AST node array and writes directly to SQLite via `rusqlite` multi-row INSERTs, bypassing the JS iteration loop entirely.
12371233

1238-
**Plan:**
1239-
- **Batch AST node inserts in Rust via napi-rs:** Pass the raw AST node array directly from Rust to a native SQLite bulk-insert function, bypassing the JS iteration loop entirely. Use `rusqlite` with a single multi-row INSERT per chunk
1240-
- **Merge AST inserts into the parse phase:** Instead of extracting AST nodes to a JS array and then writing them in a separate phase, write them directly to SQLite during the Rust parse walk — eliminates the intermediate array allocation and JS↔native boundary crossing
1241-
- **Target:** astMs < 50ms on native full builds (current 393ms), representing a real 8× speedup over WASM
1234+
**Key PRs:** #651
12421235

1243-
**Affected files:** `crates/codegraph-core/src/lib.rs`, `crates/codegraph-core/src/ast_nodes.rs`, `src/domain/graph/builder/stages/build-ast-data.ts`
1236+
### 6.10 -- CFG & Dataflow DB Write Optimization ✅
12441237

1245-
### 6.10 -- CFG & Dataflow DB Write Optimization
1238+
**Complete.** Bulk CFG block/edge and dataflow edge inserts via native Rust/rusqlite. Same approach as 6.9 — `rusqlite` multi-row INSERTs bypass the JS iteration loop for both CFG and dataflow writes.
12461239

1247-
**Not started.** Same problem as 6.9 — Rust extraction works (6.2, 6.3), but the DB write phases are identical JS code on both engines. CFG: **161ms native vs 155ms WASM** (Rust is *slower*). Dataflow: **125ms native vs 129ms WASM** (~same).
1240+
**Key PRs:** #653
12481241

1249-
**Plan:**
1250-
- **Batch CFG/dataflow edge inserts in Rust:** Same approach as 6.9 — pass extracted CFG blocks and dataflow edges directly to `rusqlite` bulk inserts from the Rust side, bypassing JS iteration
1251-
- **Investigate CFG native regression:** Profile why native CFG is 4% *slower* than WASM on full builds — likely JS↔native serialization overhead for the `cfg.blocks` structure that exceeds the extraction savings
1252-
- **Combine with parse phase:** Like 6.9, consider writing CFG edges and dataflow edges to SQLite during the Rust parse walk rather than accumulating them for a later JS phase
1253-
- **Target:** cfgMs + dataflowMs < 50ms combined on native full builds (current 286ms)
1242+
### 6.11 -- Native Insert Nodes Pipeline ✅
12541243

1255-
**Affected files:** `crates/codegraph-core/src/cfg.rs`, `crates/codegraph-core/src/dataflow.rs`, `src/domain/graph/builder/stages/build-ast-data.ts`
1244+
**Complete.** Native Rust/rusqlite pipeline for node insertion. The entire insert-nodes loop runs in Rust — receives `FileSymbols[]` via napi-rs and writes nodes, children, and edge stubs directly to SQLite via `rusqlite`, eliminating JS↔native boundary crossings.
12561245

1257-
### 6.11 -- Native Insert Nodes Pipeline
1246+
**Key PRs:** #654
12581247

1259-
**Not started.** The insert-nodes phase (6.4) was optimized with JS-side batching, but native shows **no advantage** over WASM: 206ms native vs 201ms WASM. This is the single largest phase after parse on native builds.
1248+
### 6.12 -- Native Roles & Edge Build Optimization ✅
12601249

1261-
**Plan:**
1262-
- **Rust-side SQLite writes via rusqlite:** Move the entire insert-nodes loop to Rust — receive the `FileSymbols[]` array in Rust and write nodes, children, and edge stubs directly to SQLite without crossing back to JS
1263-
- **Parallel file processing:** Use Rayon to parallelize node insertion across files with per-file transactions (SQLite WAL mode supports concurrent readers)
1264-
- **Eliminate intermediate JS objects:** Currently Rust → napi-rs → JS objects → better-sqlite3 → SQLite. The new path would be Rust → rusqlite → SQLite directly
1265-
- **Target:** insertMs < 50ms on native full builds (current 206ms)
1250+
**Complete.** Native Rust/rusqlite for both role classification and edge insertion. Role classification SQL moved to Rust — fan-in/fan-out aggregation + median-threshold classification in a single Rust function. Edge building uses `bulkInsertEdges` via rusqlite with chunked multi-row INSERTs. Includes `classifyRolesIncremental` for the 1-file rebuild path and `classify_dead_sub_role` for dead-entry detection.
12661251

1267-
**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/domain/graph/builder/stages/insert-nodes.ts`
1252+
**Key PRs:** #658
12681253

1269-
### 6.12 -- Native Roles & Edge Build Optimization
1254+
### 6.13 -- NativeDatabase Class (rusqlite Connection Lifecycle) ✅
12701255

1271-
**Not started.** Roles: **52ms native ≈ 52ms WASM** on full builds, **54ms on 1-file rebuilds** (incremental optimization from 6.5/6.8 doesn't cover this path). Build edges: **108ms native vs 167ms WASM** (1.5× — modest, but native is *slower* on 1-file: 21ms vs 15ms).
1256+
**Complete.** `NativeDatabase` napi-rs class in `crates/codegraph-core/src/native_db.rs` holding a persistent `rusqlite::Connection`. Factory methods (`openReadWrite`/`openReadonly`), lifecycle (`close`/`exec`/`pragma`), schema migrations (`initSchema` with all 16 migrations embedded), and build metadata KV (`getBuildMeta`/`setBuildMeta`). Wired into the build pipeline: when native engine is available, `NativeDatabase` handles schema init and metadata reads/writes. Foundation for 6.14+ which migrates all query and write operations to rusqlite on the native path.
12721257

1273-
**Plan:**
1274-
- **Roles — move SQL to Rust:** The role classification logic (median-threshold fan-in/fan-out comparisons) is simple but issues ~10 `UPDATE ... WHERE id IN (...)` statements. Moving this to Rust with `rusqlite` eliminates JS↔SQLite round-trips and allows the fan-in/fan-out aggregation + classification to happen in a single Rust function
1275-
- **Build edges — fix 1-file regression:** Profile why native 1-file edge building (21ms) is 40% slower than WASM (15ms). Likely cause: napi-rs deserialization overhead for the caller/callee lookup data that exceeds the savings on small workloads
1276-
- **Build edges — Rust-side batch:** For full builds, move the edge resolution loop to Rust to avoid per-edge JS↔native boundary crossings
1277-
- **Target:** rolesMs < 15ms, edgesMs < 30ms on native full builds
1258+
**Key PRs:** #666
12781259

1279-
**Affected files:** `crates/codegraph-core/src/lib.rs`, `src/domain/graph/builder/stages/build-edges.ts`, `src/graph/classifiers/roles.ts`
1260+
### 6.14 -- Native Read Queries (Repository Migration) ✅
12801261

1281-
### 6.13 -- NativeDatabase Class (rusqlite Connection Lifecycle)
1282-
1283-
**Not started.** Foundation for moving all DB operations to `rusqlite` on the native engine path. Currently `better-sqlite3` (JS) handles all DB operations for both engines, and `rusqlite` is only used for bulk AST node insertion (6.9/PR #651). The goal is: **native engine → rusqlite for all DB; WASM engine → better-sqlite3 for all DB** — eliminating the dual-SQLite-in-one-process problem and unlocking Rust-speed for every query.
1284-
1285-
**Plan:**
1286-
- **Create `NativeDatabase` napi-rs class** in `crates/codegraph-core/src/native_db.rs` holding a `rusqlite::Connection`
1287-
- **Expose lifecycle methods:** `openReadWrite(dbPath)`, `openReadonly(dbPath)`, `close()`, `exec(sql)`, `pragma(sql)`
1288-
- **Implement `initSchema()`** — embed migration DDL strings in Rust, run via `rusqlite`
1289-
- **Implement `getBuildMeta(key)` / `setBuildMeta(entries)`** — metadata KV operations
1290-
- **Add `NativeDatabase` to `NativeAddon` interface** in `src/types.ts`
1291-
- **Wire `src/db/connection.ts`** to return `NativeDatabase` when native engine is active, `better-sqlite3` otherwise
1262+
**Complete.** All Repository read methods migrated to Rust via `NativeDatabase`. `NativeRepository extends Repository` delegates all methods to `NativeDatabase` napi calls. `NodeQuery` fluent builder replicated in Rust for dynamic filtering. `openRepo()` returns `NativeRepository` when native engine is available.
12921263

1293-
**Affected files:** `crates/codegraph-core/src/native_db.rs` (new), `crates/codegraph-core/src/lib.rs`, `src/types.ts`, `src/db/connection.ts`, `src/db/migrations.ts`
1264+
**Key PRs:** #671
12941265

1295-
### 6.14 -- Native Read Queries (Repository Migration)
1266+
### 6.15 -- Native Write Operations (Build Pipeline) ✅
12961267

1297-
**Not started.** Migrate all 41 `Repository` read methods to Rust, so every query runs via `rusqlite` on the native engine. The existing `Repository` abstract class and `SqliteRepository` provide the exact seam — each method is a fixed SQL query with typed parameters and results.
1268+
**Complete.** All build-pipeline write operations migrated to `NativeDatabase` rusqlite. Consolidated scattered rusqlite usage from 6.9–6.12 into `NativeDatabase` methods. `batchInsertNodes`, `batchInsertEdges`, `purgeFilesData`, complexity/CFG/dataflow/co-change writes, `upsertFileHashes`, and `updateExportedFlags` all run via rusqlite on native. `PipelineContext` threads `NativeDatabase` through all build stages.
12981269

1299-
**Plan:**
1300-
- **Implement each Repository method as a Rust method on `NativeDatabase`:** Start with simple ones (`countNodes`, `countEdges`, `countFiles`, `findNodeById`), then fixed-SQL edge queries (16 methods), then parameterized queries with dynamic filtering
1301-
- **Replicate `NodeQuery` fluent builder in Rust:** The dynamic SQL builder used by `findNodesWithFanIn`, `findNodesForTriage`, `listFunctionNodes` must produce identical SQL and results
1302-
- **Create `NativeRepository extends Repository`** in `src/db/repository/native-repository.ts` — delegates all 41 methods to `NativeDatabase` napi calls
1303-
- **Wire `openRepo()` to return `NativeRepository`** when native engine is available
1304-
- **Parity test suite:** Run every Repository method on both `SqliteRepository` and `NativeRepository` against the same DB, assert identical output
1305-
1306-
**Affected files:** `crates/codegraph-core/src/native_db.rs`, `src/db/repository/native-repository.ts` (new), `src/db/repository/index.ts`, `src/db/query-builder.ts`
1307-
1308-
### 6.15 -- Native Write Operations (Build Pipeline)
1309-
1310-
**Not started.** Migrate all build-pipeline write operations to `rusqlite`, so the entire build (parse → insert → finalize) uses a single Rust-side DB connection on native. This consolidates the scattered rusqlite usage from 6.9–6.12 into the `NativeDatabase` class and adds the remaining write paths.
1311-
1312-
**Plan:**
1313-
- **Migrate `batchInsertNodes` and `batchInsertEdges`** — high-value; currently the hottest build path after parse
1314-
- **Migrate `purgeFilesData`** — cascade DELETE across 10 tables during incremental rebuilds
1315-
- **Migrate complexity/CFG/dataflow/co-change writes** — consolidate the per-phase Rust inserts from 6.9/6.10 into `NativeDatabase` methods
1316-
- **Migrate `upsertFileHashes` and `updateExportedFlags`** — finalize-phase operations
1317-
- **Consolidate `bulk_insert_ast_nodes`** into `NativeDatabase` (currently opens its own separate connection)
1318-
- **Update `PipelineContext`** to thread `NativeDatabase` through all build stages when native engine is active
1319-
- **Transactional parity testing:** Verify that partial failures, rollbacks, and WAL behavior are identical between engines
1320-
1321-
**Affected files:** `crates/codegraph-core/src/native_db.rs`, `crates/codegraph-core/src/ast_db.rs`, `src/domain/graph/builder/context.ts`, `src/domain/graph/builder/helpers.ts`, `src/domain/graph/builder/stages/*.ts`
1270+
**Key PRs:** #669
13221271

13231272
### 6.16 -- Dynamic SQL & Edge Cases
13241273

0 commit comments

Comments
 (0)