-
Notifications
You must be signed in to change notification settings - Fork 5
feat(ci): gate release workflow on resolution precision/recall thresholds #886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
147874b
0583f57
f209268
8b532b2
4c4aa03
54bb452
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,6 +61,7 @@ const SKIP_VERSIONS = new Set(['3.8.0']); | |
| * underlying issue is being fixed. | ||
| * | ||
| * Format: "version:metric-label" (must match the label passed to checkRegression). | ||
| * Resolution keys use: "version:resolution <lang> precision" or "version:resolution <lang> recall". | ||
| * | ||
| * - 3.9.0:1-file rebuild — native incremental path re-runs graph-wide phases | ||
| * (structureMs, AST, CFG, dataflow) on single-file rebuilds. Documented in | ||
|
|
@@ -521,6 +522,9 @@ describe('Benchmark regression guard', () => { | |
| * Precision >5pp drop and recall >10pp drop are flagged. | ||
| * Recall has a wider threshold because it's more volatile — adding new | ||
| * expected edges to fixtures can temporarily lower recall. | ||
| * | ||
| * SYNC: These must match PRECISION_DROP_THRESHOLD / RECALL_DROP_THRESHOLD | ||
| * in scripts/update-benchmark-report.ts (the ::warning annotation side). | ||
| */ | ||
| const PRECISION_DROP_PP = 0.05; | ||
| const RECALL_DROP_PP = 0.1; | ||
|
Comment on lines
+529
to
+530
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The 5 pp / 10 pp drop limits are defined independently here and again as
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — added bidirectional SYNC comments to both |
||
|
|
@@ -539,10 +543,9 @@ describe('Benchmark regression guard', () => { | |
| resolution?: Record<string, ResolutionLang>; | ||
| } | ||
|
|
||
| const fullHistory = extractJsonData<BuildEntryWithResolution>( | ||
| path.join(BENCHMARKS_DIR, 'BUILD-BENCHMARKS.md'), | ||
| 'BENCHMARK_DATA', | ||
| ); | ||
| // buildHistory already parsed BUILD-BENCHMARKS.md with the same marker; | ||
| // widen the type instead of re-reading the file. | ||
| const fullHistory = buildHistory as BuildEntryWithResolution[]; | ||
|
|
||
| const resolutionPair = findLatestPair(fullHistory, (e) => e.resolution != null); | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resolution test suite builds graphs for every language fixture (~30 + languages, 60 s
beforeAllbudget each). Neither this step nor thebuild-benchmarkjob has atimeout-minutescap, so a hanging WASM build can stall the entire job indefinitely. Consider adding a step-leveltimeout-minutes: 30to bound the gate:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed — added
timeout-minutes: 30to the gate step. 30 minutes is generous enough for the full language fixture suite while bounding worst-case CI stalls.