docs: update incremental benchmarks (3.9.2)#901
docs: update incremental benchmarks (3.9.2)#901github-actions[bot] wants to merge 3 commits intomainfrom
Conversation
Greptile SummaryThis is an automated benchmark recording PR that adds 3.9.2 incremental benchmark data and adds Confidence Score: 5/5Safe to merge — docs-only benchmark recording with correct label matching and self-cleaning stale detection. No P0 or P1 findings. The KNOWN_REGRESSIONS entry label exactly matches the checkRegression call label, the stale-detection test enforces future cleanup, and the regression itself is tracked in PR #906. No files require special attention.
|
| Filename | Overview |
|---|---|
| generated/benchmarks/INCREMENTAL-BENCHMARKS.md | Adds 3.9.2 benchmark row to the summary table and full phase-level JSON entry; numbers are internally consistent (native full build 9.4s ↑81%, WASM 7.2s ↑4%, native incremental metrics improved). |
| tests/benchmarks/regression-guard.test.ts | Adds '3.9.2:Full build' to KNOWN_REGRESSIONS with root-cause comment; label matches the checkRegression call at line 479, and the existing stale-detection test will enforce cleanup after 3.11.x ships. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[regression-guard.test.ts reads INCREMENTAL_BENCHMARK_DATA] --> B[findLatestPair: 3.9.2 vs 3.9.1]
B --> C{native engine}
B --> D{wasm engine}
C --> E["checkRegression('Full build', 9403, 5206)\n+81% > 25% threshold"]
E --> F{KNOWN_REGRESSIONS.has\n'3.9.2:Full build'?}
F -- Yes → skip --> G[Test passes ✓]
C --> H["checkRegression('No-op rebuild', 8, 15)\n-47%, improvement → skip"]
C --> I["checkRegression('1-file rebuild', 555, 757)\n-27%, improvement → skip"]
D --> J["checkRegression('Full build', 7216, 6900)\n+4% < 25% threshold → pass"]
D --> K["checkRegression('No-op rebuild', 17, 14)\n+21%, but Δ=3ms < MIN_ABSOLUTE_DELTA=10 → skip"]
G --> L[stale-detection test checks entry version gap]
L --> M{"minorGap('3.9.2', pkgVersion) > 1?"}
M -- "3.10.x: gap=1 → no" --> N[Still valid]
M -- "3.11.x: gap=2 → yes" --> O[Test FAILS — forces cleanup]
Reviews (4): Last reviewed commit: "fix(test): add 3.9.2 native full-build t..." | Re-trigger Greptile
| | 3.9.2 | native | 667 | 9.4s ↑81% | 8ms ↓47% | 555ms ↓27% | 6ms ↓17% | 11ms ↓14% | | ||
| | 3.9.2 | wasm | 667 | 7.2s ↑4% | 17ms ↑21% | 598ms ~ | 6ms ↓17% | 11ms ↓14% | |
There was a problem hiding this comment.
Native engine slower than WASM — first time historically
The 3.9.2 native full build (9.4s) is now slower than WASM (7.2s), reversing every prior release (e.g., 3.9.1: native 5.2s vs WASM 6.9s). The +81% build time increase far outpaces the +17% file count growth. The phase breakdown in the 1-file profile shows the bottleneck is outside parsing — setupMs (37.8ms vs 1.8ms WASM), structureMs (127.9ms vs 34.6ms), cfgMs (53.6ms vs 0.3ms), and dataflowMs (31.3ms vs 0.5ms) are all dramatically worse in the native path. The code change driving this regression should be identified and addressed before 3.9.2 ships.
There was a problem hiding this comment.
Valid concern. This PR is a docs-only recording of benchmark results — the regression itself requires changes to the native engine in crates/codegraph-core/, which is outside this PR's scope.
Created #903 to track the native engine full-build regression with the phase breakdown you identified (setup, structure, CFG, dataflow phases all significantly slower).
NativeDbProxy overhead causes native full build to regress +81% (5206ms -> 9403ms). Fix tracked in PR #906. Add to KNOWN_REGRESSIONS to unblock this benchmark data PR.
Automated incremental benchmark update for 3.9.2 from workflow run #689.