perf(native): move analysis persistence into Rust orchestrator by carlos-alm · Pull Request #907 · optave/ops-codegraph-tool

carlos-alm · 2026-04-10T06:04:13Z

Summary

Rust orchestrator writes all analysis data to DB — AST nodes, complexity metrics, CFG blocks/edges, and dataflow edges are now persisted directly in the build_pipeline.rs pipeline stages, using the same single rusqlite connection. Eliminates the JS runPostNativeAnalysis step and its WASM re-parse overhead entirely.
Removes native-first pipeline — the JS-orchestrated native backend path (CODEGRAPH_FORCE_JS_PIPELINE, nativeFirstProxy in runPipelineStages) is removed. Only one native path exists now: the Rust orchestrator.
Fast-path fixes — allNativeDataComplete() skips WASM re-parse when native data is complete; fixes for AST bulk insert bail on native files and complexity bail on unsupported languages; parseFilesFull napi export for single-pass extraction.

New Rust pipeline stages (8b)

After structure/roles, before finalize:

AST nodes — reuses ast_db::do_insert_ast_nodes with parent resolution
Complexity — writes from Definition.complexity to function_complexity table
CFG — writes blocks/edges from Definition.cfg to cfg_blocks/cfg_edges
Dataflow — resolves function names to node IDs (same-file-first, global fallback), writes to dataflow table

Incremental builds scope analysis to genuinely changed files (excludes reverse-dep files), matching the existing JS behavior.

Test plan

CI builds Rust addon successfully (requires MSVC — not available locally)
Full build produces identical analysis data (complexity, AST, CFG, dataflow) as before
Incremental 1-file rebuild correctly scopes analysis to changed file only
No-op rebuild exits early with analysisComplete: true
WASM fallback still works when native addon is unavailable
Benchmark runs with 2-way comparison (WASM vs Native)

`semverCompare('3.9.3-dev.6', '3.9.1')` returned -1 (less than) because `Number('3-dev')` is NaN, which the `|| 0` fallback turned into 0, making the comparison `0 < 1`. This caused `shouldSkipNativeOrchestrator` to flag all pre-release builds as "buggy", disabling the native orchestrator fast path introduced in #897. Strip `-<prerelease>` before splitting on `.` so the numeric comparison sees `3.9.3` vs `3.9.1` correctly.

Skip co-change, ownership, and boundary lookups when findAffectedFunctions returns empty — all callers return early on this case anyway. Also pass the already-loaded config to checkBoundaryViolations to avoid a redundant loadConfig call. Saves ~2-3ms of fixed overhead per diffImpact invocation when the diff touches no function bodies (the common case for comment/import/type-only changes and the benchmark probe). Closes #904

The short-circuit path was hardcoding boundaryViolations: [] when no functions were affected. Since boundary checks are file-scoped (not function-scoped), import or type-alias changes can still produce real violations. Preserve the check and align the return shape (summary: null) with the two existing early-exit paths.

Add AST, complexity, CFG, and dataflow write stages to the Rust build pipeline (build_pipeline.rs), eliminating the JS runPostNativeAnalysis step and its WASM re-parse overhead. The orchestrator now writes all analysis data directly to DB from the parsed FileSymbols, using the same single rusqlite connection. New pipeline stages (8b) after structure/roles: - AST nodes: reuses ast_db::do_insert_ast_nodes with parent resolution - Complexity: writes metrics from Definition.complexity to function_complexity - CFG: writes blocks/edges from Definition.cfg to cfg_blocks/cfg_edges - Dataflow: resolves function names to node IDs and writes to dataflow table Also removes the native-first pipeline (JS-orchestrated with native backend) since the Rust orchestrator now handles everything end-to-end. Removes CODEGRAPH_FORCE_JS_PIPELINE env var, runPostNativeAnalysis, and the third benchmark variant. Includes prior fast-path fixes from this branch: - allNativeDataComplete() fast path in ast-analysis engine - Fix AST tryNativeBulkInsert bail on native-parsed files - Fix complexity collectNativeBulkRows bail on unsupported languages - parseFilesFull napi export for single-pass extraction

greptile-apps · 2026-04-10T06:13:06Z

Greptile Summary

This PR moves AST, complexity, CFG, and dataflow persistence into the Rust pipeline as Stage 8b, eliminating the JS runPostNativeAnalysis step and its WASM re-parse overhead. It also removes the JS-orchestrated native-first pipeline, adds a parseFilesFull NAPI export for single-pass full extraction, and wires a fast-path in runAnalyses to skip re-parse when native data is already complete.

pipeline.ts regression: Removing the if (ctx.nativeFirstProxy) early-return leaves runPipelineStages reaching the suspendNativeDb guard with ctx.db still set to the NativeDbProxy. This closes the proxy's backing NativeDatabase connection while ctx.db still points at it, causing every subsequent stage DB call to fail on a closed connection. This path is exercised on any build after a codegraph version upgrade, schema change, or engine switch (all of which set forceFullRebuild = true).

Confidence Score: 3/5

Not safe to merge as-is: the proxy-corruption regression will silently break builds on every codegraph version upgrade when native engine is available.

Prior P0/P1 comments (analysis_complete accuracy, bare return compile errors, N-query node map, complexity guard) are all addressed in the fixup commit. The remaining P1 is new: removing the native-first early return in runPipelineStages causes the NativeDbProxy to be suspended mid-use on any forceFullRebuild path, breaking all subsequent stage DB operations. This affects a common, recurrent scenario (post-upgrade first build with native engine). The P2 findings are minor and non-blocking.

src/domain/graph/builder/pipeline.ts — the suspendNativeDb guard at line 589 needs a nativeFirstProxy check or a proxy-to-BetterSQLite handoff before the fallback stages run.

Important Files Changed

Filename	Overview
src/domain/graph/builder/pipeline.ts	Removes native-first pipeline block, but the WASM fallback path still calls suspendNativeDb when ctx.db is already a NativeDbProxy — closing its backing connection and breaking all subsequent stage DB operations when native is available + forceFullRebuild.
crates/codegraph-core/src/build_pipeline.rs	Adds Stage 8b: AST/complexity/CFG/dataflow persistence in Rust. Core logic is sound post-fixes (analysis_complete, temp-table batch, complexity guard); no new issues found in the write helpers.
src/ast-analysis/engine.ts	Adds allNativeDataComplete fast path to skip WASM re-parse; logic is mostly correct with a minor false-negative on empty fileSymbols.
crates/codegraph-core/src/parallel.rs	New parse_files_parallel_full extracts all analysis data in one pass; _root_dir unused (consistent with existing parse_files_parallel), but the guarantee of complexity/CFG inclusion depends on extract_symbols_with_opts internals.
crates/codegraph-core/src/config.rs	Adds complexity: Option to BuildOpts, correctly gating write_complexity in the Rust pipeline.
crates/codegraph-core/src/lib.rs	Exports new parse_files_full NAPI function; straightforward delegation to parse_files_parallel_full.
src/domain/parser.ts	parseFilesAuto and parseFileAuto now always pass true for dataflow/ast to force full extraction; falls back to parseFiles when parseFilesFull is unavailable (older addon).
src/features/ast.ts	Tightens bulk-insert bail condition: now only bails on WASM trees, not on presence of calls — fixes false bail-out for native-parsed files with call sites.
src/features/complexity.ts	Adds langSupported guard to skip unsupported languages instead of bailing out of the entire native bulk path — correct fix for the language-support gap.
scripts/benchmark.ts	Refactors duplicate engine-result object literals into a shared formatEngineResult helper — clean DRY improvement, no logic change.

Sequence Diagram

sequenceDiagram
    participant BG as buildGraph()
    participant SP as setupPipeline
    participant TNO as tryNativeOrchestrator
    participant RPS as runPipelineStages
    participant Rust as Rust run_pipeline

    BG->>SP: init
    SP-->>BG: ctx.db=NativeDbProxy, ctx.nativeDb=open, nativeFirstProxy=true

    BG->>TNO: call
    alt orchestrator runs (normal path)
        TNO->>Rust: buildGraph incl. Stage 8b
        Rust-->>TNO: analysis_complete=true
        TNO-->>BG: BuildResult (no JS post-analysis)
    else orchestrator skipped (forceFullRebuild)
        TNO-->>BG: undefined
        BG->>RPS: call
        RPS->>RPS: suspendNativeDb closes ctx.nativeDb
        Note over RPS: ctx.db proxy backed by CLOSED connection
        RPS->>RPS: collectFiles DB error
    else WASM fallback with parseFilesFull
        BG->>RPS: call nativeFirstProxy=false
        RPS->>RPS: parseFiles parseFilesFull fills all data
        RPS->>RPS: runAnalyses allNativeDataComplete=true
        RPS-->>BG: done
    end

_{Reviews (3): Last reviewed commit: "fix(native): fix Rust compile errors in ..." | Re-trigger Greptile}

greptile-apps · 2026-04-10T06:13:09Z

crates/codegraph-core/src/build_pipeline.rs

@@ -422,6 +481,7 @@ pub fn run_pipeline(
        is_full_build: change_result.is_full_build,


analysis_complete reflects intent, not actual write success

do_analysis is true as long as any of include_ast | include_dataflow | include_cfg is set — it is evaluated before any DB writes happen. All three write functions (write_complexity, write_cfg, write_dataflow) swallow transaction and insert errors with let _ =, so they can return silently with nothing written. The pipeline still returns analysis_complete: true, and since runPostNativeAnalysis was removed from the JS side there is no fallback. On an incremental build the affected file won't be re-analysed on the next run (content hasn't changed), so missing analysis data persists until a forced full rebuild.

A minimal guard would track whether each stage actually committed data:

let mut analysis_succeeded = false; if do_analysis { let ast_ok = if include_ast { ast_db::do_insert_ast_nodes(conn, &ast_batches).is_ok() } else { true }; let cx_ok = write_complexity(conn, &file_symbols, &analysis_file_set, &node_id_map); let cfg_ok = if include_cfg { write_cfg(conn, &file_symbols, &analysis_file_set, &node_id_map) } else { true }; let df_ok = if include_dataflow { write_dataflow(conn, &file_symbols, &analysis_file_set) } else { true }; analysis_succeeded = ast_ok && cx_ok && cfg_ok && df_ok; } // ... analysis_complete: analysis_succeeded,

(requires write helpers to return bool/Result)

Fixed in 7c5830f. The three write functions (write_complexity, write_cfg, write_dataflow) now return bool reflecting whether the transaction committed successfully. analysis_complete is set to do_analysis && analysis_ok where analysis_ok tracks the conjunction of all enabled stage results. If any stage fails to commit, the flag will be false and the next incremental build won't skip re-analysis.

greptile-apps · 2026-04-10T06:13:10Z

crates/codegraph-core/src/build_pipeline.rs

+        };
+
+        // Build node ID lookup: (file, name, line) -> node_id
+        let node_id_map = build_analysis_node_map(conn, &analysis_file_set);


build_analysis_node_map executes N per-file queries

The helper issues one SELECT … WHERE file = ?1 per entry in files. For a full build of a large codebase (e.g. 2 000 files) this is 2 000 round-trips through prepared-statement execution inside a single SQLite connection. A single WHERE file IN (…) or a temporary-table join would reduce this to one query. SQLite handles this reasonably well in practice, but it is worth batching for consistency with the rest of the pipeline (e.g. resolve_imports_batch).

Fixed in 7c5830f. Replaced the N per-file SELECT ... WHERE file = ?1 loop with a single temp-table join: insert all file names into temp._analysis_files, then SELECT ... FROM nodes INNER JOIN temp._analysis_files. This reduces 2000 round-trips on a large full build to a single query.

greptile-apps · 2026-04-10T06:13:11Z

crates/codegraph-core/src/build_pipeline.rs

+    let include_cfg = opts.cfg.unwrap_or(true);
+    let do_analysis = include_ast || include_dataflow || include_cfg;


opts.complexity is not checked — complexity is always written when do_analysis is true

do_analysis is include_ast || include_dataflow || include_cfg, so write_complexity runs whenever any of those three opts is enabled, regardless of opts.complexity. The JS pipeline previously respected opts.complexity !== false (via runAnalyses → buildComplexityMetrics). A caller that passes { ast: true, complexity: false } now gets complexity rows written anyway from the Rust side. If this is intentional (complexity is always cheap to persist), it should be documented; otherwise an include_complexity guard should be added to mirror the JS behaviour.

Fixed in 7c5830f. Added complexity: Option<bool> to BuildOpts (Rust side) and gated write_complexity behind let include_complexity = opts.complexity.unwrap_or(true). The JS BuildGraphOpts already has complexity?: boolean and is serialized to JSON, so the Rust side now respects it. A caller passing { ast: true, complexity: false } will no longer get complexity rows written from the Rust pipeline.

Merge origin/main into fix/semver-prerelease. The conflict in pipeline.ts was between PR #906 (NativeDbProxy overhead fix adding runPostNativeAnalysis back) and this PR which removes that function entirely (analysis now persisted in Rust). Kept the PR's version since the Rust orchestrator handles analysis persistence directly.

…907) - write_complexity/write_cfg/write_dataflow now return bool reflecting whether the transaction committed successfully. analysis_complete is only true when all enabled stages actually succeeded, preventing silent data loss on incremental builds with no fallback. - Add complexity field to BuildOpts so write_complexity respects the opts.complexity flag, matching JS pipeline behavior. - Batch build_analysis_node_map into a single temp-table join query instead of N per-file prepared-statement executions.

carlos-alm · 2026-04-10T07:08:46Z

@greptileai

github-actions · 2026-04-10T07:15:15Z

Codegraph Impact Analysis

24 functions changed → 26 callers affected across 16 files

run_pipeline in crates/codegraph-core/src/build_pipeline.rs:108 (0 transitive callers)
build_analysis_node_map in crates/codegraph-core/src/build_pipeline.rs:1013 (1 transitive callers)
build_ast_batches in crates/codegraph-core/src/build_pipeline.rs:1063 (1 transitive callers)
write_complexity in crates/codegraph-core/src/build_pipeline.rs:1091 (1 transitive callers)
insert_def_complexity in crates/codegraph-core/src/build_pipeline.rs:1116 (2 transitive callers)
write_cfg in crates/codegraph-core/src/build_pipeline.rs:1170 (1 transitive callers)
write_def_cfg in crates/codegraph-core/src/build_pipeline.rs:1225 (2 transitive callers)
write_dataflow in crates/codegraph-core/src/build_pipeline.rs:1275 (1 transitive callers)
resolve_dataflow_node in crates/codegraph-core/src/build_pipeline.rs:1393 (2 transitive callers)
parse_files_full in crates/codegraph-core/src/lib.rs:74 (0 transitive callers)
parse_files_parallel in crates/codegraph-core/src/parallel.rs:17 (0 transitive callers)
parse_files_parallel_full in crates/codegraph-core/src/parallel.rs:48 (0 transitive callers)
formatEngineResult in scripts/benchmark.ts:40 (1 transitive callers)
allNativeDataComplete in src/ast-analysis/engine.ts:676 (3 transitive callers)
runAnalyses in src/ast-analysis/engine.ts:744 (3 transitive callers)
setupPipeline in src/domain/graph/builder/pipeline.ts:114 (6 transitive callers)
NativeOrchestratorResult.analysisComplete in src/domain/graph/builder/pipeline.ts:264 (0 transitive callers)
tryNativeOrchestrator in src/domain/graph/builder/pipeline.ts:477 (5 transitive callers)
runPipelineStages in src/domain/graph/builder/pipeline.ts:582 (6 transitive callers)
parseFileAuto in src/domain/parser.ts:775 (4 transitive callers)

greptile-apps · 2026-04-10T07:16:37Z

crates/codegraph-core/src/build_pipeline.rs

+        Err(_) => return,
+    };
+
+    let mut edge_stmt = match tx.prepare(
+        "INSERT INTO cfg_edges \
+         (function_node_id, source_block_id, target_block_id, kind) \
+         VALUES (?1, ?2, ?3, ?4)",
+    ) {
+        Ok(s) => s,
+        Err(_) => return,


Bare return; in -> bool function is a compile error

Both prepare arms in write_cfg use Err(_) => return, — bare return; is return () in Rust. Since write_cfg declares -> bool, returning () is a type mismatch (error[E0308]). The same pattern occurs in write_dataflow at lines 1288, 1296, and 1305. All five arms should use return false to match the declared return type and correctly propagate failure to the analysis_ok accumulator.

Suggested change

Err(_) => return,

};

let mut edge_stmt = match tx.prepare(

"INSERT INTO cfg_edges \

(function_node_id, source_block_id, target_block_id, kind) \

VALUES (?1, ?2, ?3, ?4)",

) {

Ok(s) => s,

Err(_) => return,

Err(_) => return false,

};

let mut edge_stmt = match tx.prepare(

"INSERT INTO cfg_edges \

(function_node_id, source_block_id, target_block_id, kind) \

VALUES (?1, ?2, ?3, ?4)",

) {

Ok(s) => s,

Err(_) => return false,

Already fixed in 4161f5f — all bare return statements in write_cfg and write_dataflow are now return false. The Rust compile check is now passing in CI.

greptile-apps · 2026-04-10T07:16:37Z

crates/codegraph-core/src/build_pipeline.rs

+        Err(_) => return,
+    };
+
+    let mut local_stmt = match tx.prepare(
+        "SELECT id FROM nodes WHERE name = ?1 AND file = ?2 \
+         AND kind IN ('function','method') LIMIT 1",
+    ) {
+        Ok(s) => s,
+        Err(_) => return,
+    };
+
+    let mut global_stmt = match tx.prepare(
+        "SELECT id FROM nodes WHERE name = ?1 \
+         AND kind IN ('function','method') \
+         ORDER BY file, line LIMIT 1",
+    ) {
+        Ok(s) => s,
+        Err(_) => return,


Bare return; in -> bool function — three occurrences in write_dataflow

write_dataflow has the same issue: Err(_) => return, on lines 1288, 1296, and 1305. All three are compile errors (() returned from a -> bool function). Each needs return false.

Suggested change

Err(_) => return,

};

let mut local_stmt = match tx.prepare(

"SELECT id FROM nodes WHERE name = ?1 AND file = ?2 \

AND kind IN ('function','method') LIMIT 1",

) {

Ok(s) => s,

Err(_) => return,

};

let mut global_stmt = match tx.prepare(

"SELECT id FROM nodes WHERE name = ?1 \

AND kind IN ('function','method') \

ORDER BY file, line LIMIT 1",

) {

Ok(s) => s,

Err(_) => return,

Err(_) => return false,

};

let mut local_stmt = match tx.prepare(

"SELECT id FROM nodes WHERE name = ?1 AND file = ?2 \

AND kind IN ('function','method') LIMIT 1",

) {

Ok(s) => s,

Err(_) => return false,

};

let mut global_stmt = match tx.prepare(

"SELECT id FROM nodes WHERE name = ?1 \

AND kind IN ('function','method') \

ORDER BY file, line LIMIT 1",

) {

Ok(s) => s,

Err(_) => return false,

Already fixed in 4161f5f — all three bare return statements in write_dataflow are now return false, and prepared statements are dropped before tx.commit() to release borrows. Rust compile check is green in CI.

…907) - Change bare `return` to `return false` in write_cfg and write_dataflow since they now return bool - Drop prepared statements before tx.commit() to release borrows on the transaction, fixing E0505 move-out-of-borrowed errors

carlos-alm · 2026-04-10T07:37:17Z

Sweep status

Merge conflicts: Resolved (merge commit 4432eb3).

Review feedback addressed:

P1: analysis_complete now tracks actual write success (7c5830f)
P2: build_analysis_node_map batched into single temp-table join (7c5830f)
P2: opts.complexity respected via new BuildOpts.complexity field (7c5830f)
P0 x2: Bare return compile errors + borrow-after-move fixed (4161f5f)

CI status:

Rust compile, all 6 platform builds, lint, TS type check, security audit, CLA: all green
Tests failing on all 3 platforms (ubuntu, macos, windows) with 6 failures:
- build-parity: AST nodes empty after native build (allNativeDataComplete fast-path issue)
- build.test.ts (x2): "NativeDatabase is closed" on version/engine mismatch full rebuild
- incremental-parity (x3): complexity/CFG/dataflow = 0 after incremental build

Remaining issue (needs human review): Greptile's re-review identified a P1 regression: removing the nativeFirstProxy early-return in runPipelineStages causes suspendNativeDb to close the NativeDbProxy's backing connection on any forceFullRebuild path (version upgrade, schema change, engine switch). This is the root cause of the "NativeDatabase is closed" test failures and the incremental-parity failures where analysis data is empty. The build-parity AST test failure is likely related to the same proxy lifecycle issue. These are pre-existing bugs in the PR's design, not introduced by the merge or review fixes.

carlos-alm added 4 commits April 8, 2026 23:10

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

carlos-alm added 3 commits April 10, 2026 00:58

chore: update package-lock.json

95d0142

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

		@@ -422,6 +481,7 @@ pub fn run_pipeline(
		is_full_build: change_result.is_full_build,

		let include_cfg = opts.cfg.unwrap_or(true);
		let do_analysis = include_ast \|\| include_dataflow \|\| include_cfg;

Conversation

carlos-alm commented Apr 10, 2026

Summary

New Rust pipeline stages (8b)

Test plan

Uh oh!

greptile-apps bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Apr 10, 2026

Sweep status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Apr 10, 2026 •

edited

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading