Skip to content

Commit 5ee0070

Browse files
authored
feat(ci): gate release workflow on resolution precision/recall thresholds (#886)
* feat(ci): gate release workflow on resolution precision/recall thresholds (#875) Add resolution quality gates to the benchmark pipeline so regressions are caught before publishing: - benchmark.yml: run vitest resolution test after the benchmark script, failing the workflow if any language drops below its threshold - update-benchmark-report.ts: warn on precision >5pp or recall >10pp drop per language between releases - regression-guard.test.ts: hard-fail CI on precision/recall regressions across releases, with KNOWN_REGRESSIONS exemption support * style: fix biome formatting in regression guard * fix: add SYNC comments for duplicated thresholds and eliminate redundant file read (#886) Add cross-reference SYNC comments between regression-guard.test.ts and update-benchmark-report.ts so the duplicated precision/recall thresholds stay in lockstep. Replace the second extractJsonData call with a type cast of buildHistory since both read the same file and marker. * fix: add timeout-minutes to resolution gate step (#886) Prevents a hanging WASM build from stalling the entire benchmark job indefinitely. 30-minute cap is generous enough for the full language fixture suite while still bounding worst-case CI time. * fix: document resolution key format in KNOWN_REGRESSIONS comment (#886)
1 parent ef7c834 commit 5ee0070

File tree

3 files changed

+10
-4
lines changed

3 files changed

+10
-4
lines changed

.github/workflows/benchmark.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,7 @@ jobs:
108108
109109
- name: Gate on resolution thresholds
110110
if: steps.existing.outputs.skip != 'true'
111+
timeout-minutes: 30
111112
run: npx vitest run tests/benchmarks/resolution/resolution-benchmark.test.ts --reporter=verbose
112113

113114
- name: Merge resolution into build result

scripts/update-benchmark-report.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,6 +322,8 @@ if (prev) {
322322

323323
// ── Resolution regression detection ─────────────────────────────────────
324324
// Resolution metrics are "higher is better" — warn when they DROP.
325+
// SYNC: These must match PRECISION_DROP_PP / RECALL_DROP_PP in
326+
// tests/benchmarks/regression-guard.test.ts (the hard-fail gate side).
325327
const PRECISION_DROP_THRESHOLD = 0.05; // warn if precision drops >5pp
326328
const RECALL_DROP_THRESHOLD = 0.10; // warn if recall drops >10pp
327329

tests/benchmarks/regression-guard.test.ts

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ const SKIP_VERSIONS = new Set(['3.8.0']);
6161
* underlying issue is being fixed.
6262
*
6363
* Format: "version:metric-label" (must match the label passed to checkRegression).
64+
* Resolution keys use: "version:resolution <lang> precision" or "version:resolution <lang> recall".
6465
*
6566
* - 3.9.0:1-file rebuild — native incremental path re-runs graph-wide phases
6667
* (structureMs, AST, CFG, dataflow) on single-file rebuilds. Documented in
@@ -521,6 +522,9 @@ describe('Benchmark regression guard', () => {
521522
* Precision >5pp drop and recall >10pp drop are flagged.
522523
* Recall has a wider threshold because it's more volatile — adding new
523524
* expected edges to fixtures can temporarily lower recall.
525+
*
526+
* SYNC: These must match PRECISION_DROP_THRESHOLD / RECALL_DROP_THRESHOLD
527+
* in scripts/update-benchmark-report.ts (the ::warning annotation side).
524528
*/
525529
const PRECISION_DROP_PP = 0.05;
526530
const RECALL_DROP_PP = 0.1;
@@ -539,10 +543,9 @@ describe('Benchmark regression guard', () => {
539543
resolution?: Record<string, ResolutionLang>;
540544
}
541545

542-
const fullHistory = extractJsonData<BuildEntryWithResolution>(
543-
path.join(BENCHMARKS_DIR, 'BUILD-BENCHMARKS.md'),
544-
'BENCHMARK_DATA',
545-
);
546+
// buildHistory already parsed BUILD-BENCHMARKS.md with the same marker;
547+
// widen the type instead of re-reading the file.
548+
const fullHistory = buildHistory as BuildEntryWithResolution[];
546549

547550
const resolutionPair = findLatestPair(fullHistory, (e) => e.resolution != null);
548551

0 commit comments

Comments
 (0)