perf(native): move analysis persistence into Rust orchestrator#907
perf(native): move analysis persistence into Rust orchestrator#907carlos-alm wants to merge 8 commits intomainfrom
Conversation
`semverCompare('3.9.3-dev.6', '3.9.1')` returned -1 (less than) because
`Number('3-dev')` is NaN, which the `|| 0` fallback turned into 0,
making the comparison `0 < 1`. This caused `shouldSkipNativeOrchestrator`
to flag all pre-release builds as "buggy", disabling the native
orchestrator fast path introduced in #897.
Strip `-<prerelease>` before splitting on `.` so the numeric comparison
sees `3.9.3` vs `3.9.1` correctly.
Skip co-change, ownership, and boundary lookups when findAffectedFunctions returns empty — all callers return early on this case anyway. Also pass the already-loaded config to checkBoundaryViolations to avoid a redundant loadConfig call. Saves ~2-3ms of fixed overhead per diffImpact invocation when the diff touches no function bodies (the common case for comment/import/type-only changes and the benchmark probe). Closes #904
The short-circuit path was hardcoding boundaryViolations: [] when no functions were affected. Since boundary checks are file-scoped (not function-scoped), import or type-alias changes can still produce real violations. Preserve the check and align the return shape (summary: null) with the two existing early-exit paths.
Add AST, complexity, CFG, and dataflow write stages to the Rust build pipeline (build_pipeline.rs), eliminating the JS runPostNativeAnalysis step and its WASM re-parse overhead. The orchestrator now writes all analysis data directly to DB from the parsed FileSymbols, using the same single rusqlite connection. New pipeline stages (8b) after structure/roles: - AST nodes: reuses ast_db::do_insert_ast_nodes with parent resolution - Complexity: writes metrics from Definition.complexity to function_complexity - CFG: writes blocks/edges from Definition.cfg to cfg_blocks/cfg_edges - Dataflow: resolves function names to node IDs and writes to dataflow table Also removes the native-first pipeline (JS-orchestrated with native backend) since the Rust orchestrator now handles everything end-to-end. Removes CODEGRAPH_FORCE_JS_PIPELINE env var, runPostNativeAnalysis, and the third benchmark variant. Includes prior fast-path fixes from this branch: - allNativeDataComplete() fast path in ast-analysis engine - Fix AST tryNativeBulkInsert bail on native-parsed files - Fix complexity collectNativeBulkRows bail on unsupported languages - parseFilesFull napi export for single-pass extraction
Greptile SummaryThis PR moves AST, complexity, CFG, and dataflow persistence into the Rust pipeline as Stage 8b, eliminating the JS
Confidence Score: 3/5Not safe to merge as-is: the proxy-corruption regression will silently break builds on every codegraph version upgrade when native engine is available. Prior P0/P1 comments (analysis_complete accuracy, bare return compile errors, N-query node map, complexity guard) are all addressed in the fixup commit. The remaining P1 is new: removing the native-first early return in runPipelineStages causes the NativeDbProxy to be suspended mid-use on any forceFullRebuild path, breaking all subsequent stage DB operations. This affects a common, recurrent scenario (post-upgrade first build with native engine). The P2 findings are minor and non-blocking. src/domain/graph/builder/pipeline.ts — the suspendNativeDb guard at line 589 needs a nativeFirstProxy check or a proxy-to-BetterSQLite handoff before the fallback stages run. Important Files Changed
Sequence DiagramsequenceDiagram
participant BG as buildGraph()
participant SP as setupPipeline
participant TNO as tryNativeOrchestrator
participant RPS as runPipelineStages
participant Rust as Rust run_pipeline
BG->>SP: init
SP-->>BG: ctx.db=NativeDbProxy, ctx.nativeDb=open, nativeFirstProxy=true
BG->>TNO: call
alt orchestrator runs (normal path)
TNO->>Rust: buildGraph incl. Stage 8b
Rust-->>TNO: analysis_complete=true
TNO-->>BG: BuildResult (no JS post-analysis)
else orchestrator skipped (forceFullRebuild)
TNO-->>BG: undefined
BG->>RPS: call
RPS->>RPS: suspendNativeDb closes ctx.nativeDb
Note over RPS: ctx.db proxy backed by CLOSED connection
RPS->>RPS: collectFiles DB error
else WASM fallback with parseFilesFull
BG->>RPS: call nativeFirstProxy=false
RPS->>RPS: parseFiles parseFilesFull fills all data
RPS->>RPS: runAnalyses allNativeDataComplete=true
RPS-->>BG: done
end
Reviews (3): Last reviewed commit: "fix(native): fix Rust compile errors in ..." | Re-trigger Greptile |
| @@ -422,6 +481,7 @@ pub fn run_pipeline( | |||
| is_full_build: change_result.is_full_build, | |||
There was a problem hiding this comment.
analysis_complete reflects intent, not actual write success
do_analysis is true as long as any of include_ast | include_dataflow | include_cfg is set — it is evaluated before any DB writes happen. All three write functions (write_complexity, write_cfg, write_dataflow) swallow transaction and insert errors with let _ =, so they can return silently with nothing written. The pipeline still returns analysis_complete: true, and since runPostNativeAnalysis was removed from the JS side there is no fallback. On an incremental build the affected file won't be re-analysed on the next run (content hasn't changed), so missing analysis data persists until a forced full rebuild.
A minimal guard would track whether each stage actually committed data:
let mut analysis_succeeded = false;
if do_analysis {
let ast_ok = if include_ast { ast_db::do_insert_ast_nodes(conn, &ast_batches).is_ok() } else { true };
let cx_ok = write_complexity(conn, &file_symbols, &analysis_file_set, &node_id_map);
let cfg_ok = if include_cfg { write_cfg(conn, &file_symbols, &analysis_file_set, &node_id_map) } else { true };
let df_ok = if include_dataflow { write_dataflow(conn, &file_symbols, &analysis_file_set) } else { true };
analysis_succeeded = ast_ok && cx_ok && cfg_ok && df_ok;
}
// ...
analysis_complete: analysis_succeeded,(requires write helpers to return bool/Result)
There was a problem hiding this comment.
Fixed in 7c5830f. The three write functions (write_complexity, write_cfg, write_dataflow) now return bool reflecting whether the transaction committed successfully. analysis_complete is set to do_analysis && analysis_ok where analysis_ok tracks the conjunction of all enabled stage results. If any stage fails to commit, the flag will be false and the next incremental build won't skip re-analysis.
| }; | ||
|
|
||
| // Build node ID lookup: (file, name, line) -> node_id | ||
| let node_id_map = build_analysis_node_map(conn, &analysis_file_set); |
There was a problem hiding this comment.
build_analysis_node_map executes N per-file queries
The helper issues one SELECT … WHERE file = ?1 per entry in files. For a full build of a large codebase (e.g. 2 000 files) this is 2 000 round-trips through prepared-statement execution inside a single SQLite connection. A single WHERE file IN (…) or a temporary-table join would reduce this to one query. SQLite handles this reasonably well in practice, but it is worth batching for consistency with the rest of the pipeline (e.g. resolve_imports_batch).
There was a problem hiding this comment.
Fixed in 7c5830f. Replaced the N per-file SELECT ... WHERE file = ?1 loop with a single temp-table join: insert all file names into temp._analysis_files, then SELECT ... FROM nodes INNER JOIN temp._analysis_files. This reduces 2000 round-trips on a large full build to a single query.
| let include_cfg = opts.cfg.unwrap_or(true); | ||
| let do_analysis = include_ast || include_dataflow || include_cfg; |
There was a problem hiding this comment.
opts.complexity is not checked — complexity is always written when do_analysis is true
do_analysis is include_ast || include_dataflow || include_cfg, so write_complexity runs whenever any of those three opts is enabled, regardless of opts.complexity. The JS pipeline previously respected opts.complexity !== false (via runAnalyses → buildComplexityMetrics). A caller that passes { ast: true, complexity: false } now gets complexity rows written anyway from the Rust side. If this is intentional (complexity is always cheap to persist), it should be documented; otherwise an include_complexity guard should be added to mirror the JS behaviour.
There was a problem hiding this comment.
Fixed in 7c5830f. Added complexity: Option<bool> to BuildOpts (Rust side) and gated write_complexity behind let include_complexity = opts.complexity.unwrap_or(true). The JS BuildGraphOpts already has complexity?: boolean and is serialized to JSON, so the Rust side now respects it. A caller passing { ast: true, complexity: false } will no longer get complexity rows written from the Rust pipeline.
Merge origin/main into fix/semver-prerelease. The conflict in pipeline.ts was between PR #906 (NativeDbProxy overhead fix adding runPostNativeAnalysis back) and this PR which removes that function entirely (analysis now persisted in Rust). Kept the PR's version since the Rust orchestrator handles analysis persistence directly.
…907) - write_complexity/write_cfg/write_dataflow now return bool reflecting whether the transaction committed successfully. analysis_complete is only true when all enabled stages actually succeeded, preventing silent data loss on incremental builds with no fallback. - Add complexity field to BuildOpts so write_complexity respects the opts.complexity flag, matching JS pipeline behavior. - Batch build_analysis_node_map into a single temp-table join query instead of N per-file prepared-statement executions.
Codegraph Impact Analysis24 functions changed → 26 callers affected across 16 files
|
| Err(_) => return, | ||
| }; | ||
|
|
||
| let mut edge_stmt = match tx.prepare( | ||
| "INSERT INTO cfg_edges \ | ||
| (function_node_id, source_block_id, target_block_id, kind) \ | ||
| VALUES (?1, ?2, ?3, ?4)", | ||
| ) { | ||
| Ok(s) => s, | ||
| Err(_) => return, |
There was a problem hiding this comment.
Bare
return; in -> bool function is a compile error
Both prepare arms in write_cfg use Err(_) => return, — bare return; is return () in Rust. Since write_cfg declares -> bool, returning () is a type mismatch (error[E0308]). The same pattern occurs in write_dataflow at lines 1288, 1296, and 1305. All five arms should use return false to match the declared return type and correctly propagate failure to the analysis_ok accumulator.
| Err(_) => return, | |
| }; | |
| let mut edge_stmt = match tx.prepare( | |
| "INSERT INTO cfg_edges \ | |
| (function_node_id, source_block_id, target_block_id, kind) \ | |
| VALUES (?1, ?2, ?3, ?4)", | |
| ) { | |
| Ok(s) => s, | |
| Err(_) => return, | |
| Err(_) => return false, | |
| }; | |
| let mut edge_stmt = match tx.prepare( | |
| "INSERT INTO cfg_edges \ | |
| (function_node_id, source_block_id, target_block_id, kind) \ | |
| VALUES (?1, ?2, ?3, ?4)", | |
| ) { | |
| Ok(s) => s, | |
| Err(_) => return false, |
There was a problem hiding this comment.
Already fixed in 4161f5f — all bare return statements in write_cfg and write_dataflow are now return false. The Rust compile check is now passing in CI.
| Err(_) => return, | ||
| }; | ||
|
|
||
| let mut local_stmt = match tx.prepare( | ||
| "SELECT id FROM nodes WHERE name = ?1 AND file = ?2 \ | ||
| AND kind IN ('function','method') LIMIT 1", | ||
| ) { | ||
| Ok(s) => s, | ||
| Err(_) => return, | ||
| }; | ||
|
|
||
| let mut global_stmt = match tx.prepare( | ||
| "SELECT id FROM nodes WHERE name = ?1 \ | ||
| AND kind IN ('function','method') \ | ||
| ORDER BY file, line LIMIT 1", | ||
| ) { | ||
| Ok(s) => s, | ||
| Err(_) => return, |
There was a problem hiding this comment.
Bare
return; in -> bool function — three occurrences in write_dataflow
write_dataflow has the same issue: Err(_) => return, on lines 1288, 1296, and 1305. All three are compile errors (() returned from a -> bool function). Each needs return false.
| Err(_) => return, | |
| }; | |
| let mut local_stmt = match tx.prepare( | |
| "SELECT id FROM nodes WHERE name = ?1 AND file = ?2 \ | |
| AND kind IN ('function','method') LIMIT 1", | |
| ) { | |
| Ok(s) => s, | |
| Err(_) => return, | |
| }; | |
| let mut global_stmt = match tx.prepare( | |
| "SELECT id FROM nodes WHERE name = ?1 \ | |
| AND kind IN ('function','method') \ | |
| ORDER BY file, line LIMIT 1", | |
| ) { | |
| Ok(s) => s, | |
| Err(_) => return, | |
| Err(_) => return false, | |
| }; | |
| let mut local_stmt = match tx.prepare( | |
| "SELECT id FROM nodes WHERE name = ?1 AND file = ?2 \ | |
| AND kind IN ('function','method') LIMIT 1", | |
| ) { | |
| Ok(s) => s, | |
| Err(_) => return false, | |
| }; | |
| let mut global_stmt = match tx.prepare( | |
| "SELECT id FROM nodes WHERE name = ?1 \ | |
| AND kind IN ('function','method') \ | |
| ORDER BY file, line LIMIT 1", | |
| ) { | |
| Ok(s) => s, | |
| Err(_) => return false, |
There was a problem hiding this comment.
Already fixed in 4161f5f — all three bare return statements in write_dataflow are now return false, and prepared statements are dropped before tx.commit() to release borrows. Rust compile check is green in CI.
…907) - Change bare `return` to `return false` in write_cfg and write_dataflow since they now return bool - Drop prepared statements before tx.commit() to release borrows on the transaction, fixing E0505 move-out-of-borrowed errors
Sweep statusMerge conflicts: Resolved (merge commit 4432eb3). Review feedback addressed:
CI status:
Remaining issue (needs human review): Greptile's re-review identified a P1 regression: removing the |
Summary
build_pipeline.rspipeline stages, using the same single rusqlite connection. Eliminates the JSrunPostNativeAnalysisstep and its WASM re-parse overhead entirely.CODEGRAPH_FORCE_JS_PIPELINE,nativeFirstProxyinrunPipelineStages) is removed. Only one native path exists now: the Rust orchestrator.allNativeDataComplete()skips WASM re-parse when native data is complete; fixes for AST bulk insert bail on native files and complexity bail on unsupported languages;parseFilesFullnapi export for single-pass extraction.New Rust pipeline stages (8b)
After structure/roles, before finalize:
ast_db::do_insert_ast_nodeswith parent resolutionDefinition.complexitytofunction_complexitytableDefinition.cfgtocfg_blocks/cfg_edgesdataflowtableIncremental builds scope analysis to genuinely changed files (excludes reverse-dep files), matching the existing JS behavior.
Test plan
analysisComplete: true