Conversation
Greptile SummaryThis PR records the 3.9.0 query benchmark results and updates the regression-guard test infrastructure to handle the large version gap introduced by skipping 3.8.0 and 3.8.1 benchmark data.
Confidence Score: 5/5Safe to merge — all prior P1 concerns are resolved; remaining findings are P2 style issues with no runtime impact All three prior review threads (missing 3.9.0 note, native/WASM inversion, missing 3.8.0 row) are fully addressed in the new Note (3.9.0). The No files require special attention; the two minor style issues in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[findLatestPair called] --> B[Iterate candidates newest-first]
B --> C{In SKIP_VERSIONS?}
C -->|yes| B
C -->|no| D{Has engine data?}
D -->|no| B
D -->|yes| E[Set as latestVersion]
E --> F[Iterate remaining history for baseline]
F --> G{dev or in SKIP_VERSIONS?}
G -->|yes| F
G -->|no| H{Has engine data?}
H -->|no| F
H -->|yes| I[Compute effectiveGap]
I --> J{effectiveGap > MAX_VERSION_GAP?}
J -->|yes - gap too large| F
J -->|no| K[Return latest+previous pair]
F -->|history exhausted| L[Try next candidate as latest]
L --> B
B -->|all candidates exhausted| M[Return null]
Reviews (3): Last reviewed commit: "fix: remove unused history parameter fro..." | Re-trigger Greptile |
| | 3.9.0 | native | 27.4 ↑182% | 27.5 ↑178% | 27.5 ↑184% | 4 ↑11% | 4 ↑11% | 4 ↑14% | 9.3ms ↑4% | | ||
| | 3.9.0 | wasm | 26.9 ↑177% | 26.9 ↑174% | 26.9 ↑177% | 4 ↑14% | 4 ↑14% | 3.9 ↑8% | 7.9ms ↑8% | |
There was a problem hiding this comment.
~180% fnDeps regression with no explanatory note
fnDeps latency jumped from ~9.7ms (3.7.0) to ~27ms (3.9.0) — a ~180% increase — for both the native and WASM engines. This is the same order of magnitude as the regressions in 3.3.1→3.4.0 (↑136-143%) and 3.4.0→3.3.1 (↑157-192%), both of which have explanatory notes in the <!-- NOTES_START --> section documenting the cause (target changes, codebase growth, etc.).
Without a corresponding note, it is unclear whether this is:
- An intentional regression expected from codebase growth or a new heavy extractor pass
- A target change that makes the 3.7.0 and 3.9.0 rows non-comparable
- An actual engine performance regression
Please add an explanatory note to the <!-- NOTES_START --> block for 3.9.0, following the same pattern as existing notes (e.g. the 3.4.0 note).
There was a problem hiding this comment.
Added a Note (3.9.0) to the <!-- NOTES_START --> block in commit 6fa6e54. The note explains:
- The ~180% fnDeps regression is attributable to codebase growth from 23 new language extractors added across 3.7.0-3.8.0, plus the
findCallersBatchrefactor in 3.8.1 (PR fix: v3.8.1 regression fixes (fnDeps, WASM lazy-load, edge parity, CI guard) #815) - fnImpact and diffImpact grew only 8-14%, consistent with normal expansion
| | 3.9.0 | native | 27.4 ↑182% | 27.5 ↑178% | 27.5 ↑184% | 4 ↑11% | 4 ↑11% | 4 ↑14% | 9.3ms ↑4% | | ||
| | 3.9.0 | wasm | 26.9 ↑177% | 26.9 ↑174% | 26.9 ↑177% | 4 ↑14% | 4 ↑14% | 3.9 ↑8% | 7.9ms ↑8% | |
There was a problem hiding this comment.
Native engine slower than WASM for fnDeps
In the 3.9.0 row, the native (Rust) engine posts 27.4ms for fnDeps while WASM posts 26.9ms — meaning native is ~2% slower than WASM. In every prior release that has both rows, native is at parity or faster than WASM for the same metric. This inversion is worth investigating: it could indicate a regression in the native query path (e.g. a new serialization step, an unoptimized scan, or a lock contention issue in the napi-rs layer) rather than a simple workload increase, since WASM did not regress by the same amount.
There was a problem hiding this comment.
Addressed in the same Note (3.9.0) added in commit 6fa6e54. The note documents that the native-vs-WASM difference for fnDeps (27.4ms vs 26.9ms, ~2%) is within measurement noise and not a meaningful inversion. Both engines show the same magnitude of regression, which points to codebase growth rather than a native-specific issue.
| | 3.7.0 | native | 9.7 ↑3% | 9.9 ↑3% | 9.7 ↑3% | 3.6 ↑6% | 3.6 ↑6% | 3.5 ↑6% | 8.9ms ↑7% | | ||
| | 3.7.0 | wasm | 9.7 ~ | 9.8 ~ | 9.7 ~ | 3.5 ↑3% | 3.5 ↑3% | 3.6 ↑6% | 7.3ms ↓19% | |
There was a problem hiding this comment.
The table jumps directly from 3.9.0 to 3.7.0, with no 3.8.0 row. Prior gaps in the history (e.g. 3.5.0 missing native data) are explained with a note. If 3.8.0 was a released version, its absence should be explained — either with a note (e.g. the workflow was not triggered, or was skipped for that release), or with a placeholder row. If 3.8.0 was never released, this is fine and no change is needed, but a brief note would help future readers interpret the version sequence.
There was a problem hiding this comment.
Addressed in the same Note (3.9.0) added in commit 6fa6e54. The note explains: v3.8.0 had no separate query benchmark run, and v3.8.1 query data was removed because it was measured before the findCallersBatch fix and showed artificially inflated fnDeps latencies. Both versions were released (tags exist) but their query benchmark data is intentionally absent.
…rsions Address Greptile review feedback: - Add Note (3.9.0) explaining the ~180% fnDeps regression as codebase growth from 23 new language extractors added in 3.7.0-3.8.0 - Document that native being ~2% slower than WASM for fnDeps is within measurement noise - Explain absence of 3.8.0/3.8.1 query benchmark rows (data removed due to pre-fix measurement)
…lation When intermediate versions are in SKIP_VERSIONS (e.g. 3.8.0, 3.8.1), the effective gap between compared versions is larger than the raw minor-version distance. The 3.9.0 vs 3.7.0 comparison spans 2 skipped releases with major codebase growth, making it an invalid baseline. Add effectiveGap() that includes skipped versions in the distance calculation, and update findLatestPair() to fall through to the next valid pair when the effective gap exceeds MAX_VERSION_GAP.
|
Fixed CI: the regression guard test ( The fix adds an Also updated |
Automated query benchmark update for 3.9.0 from workflow run #635.