Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ jobs:

- name: Gate on resolution thresholds
if: steps.existing.outputs.skip != 'true'
timeout-minutes: 30
run: npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts --reporter=verbose
Comment on lines +109 to +112
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No timeout on the gate step

The resolution test suite builds graphs for every language fixture (~30 + languages, 60 s beforeAll budget each). Neither this step nor the build-benchmark job has a timeout-minutes cap, so a hanging WASM build can stall the entire job indefinitely. Consider adding a step-level timeout-minutes: 30 to bound the gate:

Suggested change
- name: Gate on resolution thresholds
if: steps.existing.outputs.skip != 'true'
run: npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts --reporter=verbose
- name: Gate on resolution thresholds
if: steps.existing.outputs.skip != 'true'
timeout-minutes: 30
run: npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts --reporter=verbose

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added timeout-minutes: 30 to the gate step. 30 minutes is generous enough for the full language fixture suite while bounding worst-case CI stalls.


- name: Merge resolution into build result
Expand Down
2 changes: 2 additions & 0 deletions scripts/update-benchmark-report.ts
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,8 @@ if (prev) {

// ── Resolution regression detection ─────────────────────────────────────
// Resolution metrics are "higher is better" — warn when they DROP.
// SYNC: These must match PRECISION_DROP_PP / RECALL_DROP_PP in
// tests/benchmarks/regression-guard.test.ts (the hard-fail gate side).
const PRECISION_DROP_THRESHOLD = 0.05; // warn if precision drops >5pp
const RECALL_DROP_THRESHOLD = 0.10; // warn if recall drops >10pp

Expand Down
11 changes: 7 additions & 4 deletions tests/benchmarks/regression-guard.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ const SKIP_VERSIONS = new Set(['3.8.0']);
* underlying issue is being fixed.
*
* Format: "version:metric-label" (must match the label passed to checkRegression).
* Resolution keys use: "version:resolution <lang> precision" or "version:resolution <lang> recall".
*
* - 3.9.0:1-file rebuild — native incremental path re-runs graph-wide phases
* (structureMs, AST, CFG, dataflow) on single-file rebuilds. Documented in
Expand Down Expand Up @@ -521,6 +522,9 @@ describe('Benchmark regression guard', () => {
* Precision >5pp drop and recall >10pp drop are flagged.
* Recall has a wider threshold because it's more volatile — adding new
* expected edges to fixtures can temporarily lower recall.
*
* SYNC: These must match PRECISION_DROP_THRESHOLD / RECALL_DROP_THRESHOLD
* in scripts/update-benchmark-report.ts (the ::warning annotation side).
*/
const PRECISION_DROP_PP = 0.05;
const RECALL_DROP_PP = 0.1;
Comment on lines +529 to +530
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Threshold constants duplicated across files

The 5 pp / 10 pp drop limits are defined independently here and again as PRECISION_DROP_THRESHOLD / RECALL_DROP_THRESHOLD in scripts/update-benchmark-report.ts (lines 325–326). If one side is tightened (e.g. recall narrowed from 10 pp to 5 pp), the ::warning signal and the hard-fail gate silently diverge — engineers would see no warning but a failing test, with no obvious reason for the mismatch. Consider extracting a shared constants file, or at minimum adding a cross-reference comment pointing to the mirror definition.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added bidirectional SYNC comments to both regression-guard.test.ts (pointing to the script) and update-benchmark-report.ts (pointing to the test). Extracting a shared constants file was considered but the script runs as a standalone node invocation in CI, so cross-reference comments are the pragmatic choice here.

Expand All @@ -539,10 +543,9 @@ describe('Benchmark regression guard', () => {
resolution?: Record<string, ResolutionLang>;
}

const fullHistory = extractJsonData<BuildEntryWithResolution>(
path.join(BENCHMARKS_DIR, 'BUILD-BENCHMARKS.md'),
'BENCHMARK_DATA',
);
// buildHistory already parsed BUILD-BENCHMARKS.md with the same marker;
// widen the type instead of re-reading the file.
const fullHistory = buildHistory as BuildEntryWithResolution[];

const resolutionPair = findLatestPair(fullHistory, (e) => e.resolution != null);

Expand Down
Loading