Harden CI measurement regression gates#658
Conversation
9a5c25a to
32669e7
Compare
32669e7 to
cd76e08
Compare
4154331 to
aa5a39f
Compare
fe9a09a to
ef7a4e3
Compare
|
Hypermerge returned Status
Why This Happenedhead updated; returning to observation Previous InterventionState: Reason: failure on Expected ProgressionThe observer will re-read GitHub state and enqueue any remaining work. Inspect with Posted on behalf of @schickling
|
CI Measurementspartial - advisory gate - readiness
Unchanged / 0-impact measurements (7)These rows had compatible baseline data, but their semantic impact rounded to 0.00x because the movement was below the configured budget, below the noise floor, or inside the robust noise band.
Diagnostic / ungated measurements (22)
All measurements
Previous runs
Source-of-truth JSON{
"schemaVersion": 1,
"title": "CI Measurements",
"status": "partial",
"gate": "advisory",
"readiness": "partial (8/23 enabled observations gateable)",
"commit": {
"shortSha": "b5220e8",
"sha": "b5220e8bc71249a0bf9a2b7d7defef782d5ba32f"
},
"run": {
"id": "26178458810",
"attempt": "2",
"url": "https://github.com/overengineeringstudio/effect-utils/actions/runs/26178458810"
},
"baseline": null,
"protocol": "devenv-perf-warm-median-v2",
"chart": {
"meaning": "semantic-impact",
"zeroImpactMeaning": "no actionable PR impact after budgets, noise floor, and robust evidence checks",
"svg": "https://raw.githubusercontent.com/overengineeringstudio/effect-utils/ci-measurement-assets/ci-measurements/pr-658/b5220e8bc71249a0bf9a2b7d7defef782d5ba32f/run-26178458810-attempt-2/ci-measurements.svg",
"lightPng": "https://raw.githubusercontent.com/overengineeringstudio/effect-utils/ci-measurement-assets/ci-measurements/pr-658/b5220e8bc71249a0bf9a2b7d7defef782d5ba32f/run-26178458810-attempt-2/ci-measurements.png",
"darkPng": "https://raw.githubusercontent.com/overengineeringstudio/effect-utils/ci-measurement-assets/ci-measurements/pr-658/b5220e8bc71249a0bf9a2b7d7defef782d5ba32f/run-26178458810-attempt-2/ci-measurements-dark.png"
},
"measurements": [
{
"id": "genie_check_direct",
"label": "Genie check direct",
"group": "devenv / genie",
"status": "pass",
"direction": "regressed",
"gateable": true,
"gateReason": "eligible",
"confidence": "within_budget",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 9.055,
"current": 9.399,
"delta": 0.3439999999999994,
"ratio": 1.0379900607399226,
"semanticImpactScore": 0.26615129762562095,
"semanticImpactKind": "below_warn_boundary",
"baselineSources": 5,
"currentSamples": 5,
"pairedSamples": 5,
"evidenceDeltaLower": 0.241,
"evidenceDeltaUpper": 0.443,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"probe": "genie_check_direct",
"probeLabel": "Genie check direct",
"status": 0,
"sampleCount": 11,
"warmupCount": 1,
"measuredSampleCount": 5,
"pairedSampleCount": 5,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "source.lines",
"label": "Genie CI workflow helpers lines",
"group": "source / effect-utils / genie / ci-workflow / source / ci",
"status": "pass",
"direction": "regressed",
"gateable": false,
"gateReason": "disabled",
"confidence": "diagnostic",
"comparisonMode": "budget",
"unit": "lines",
"baseline": 4432,
"current": 6591,
"delta": 2159,
"ratio": 1.4871389891696751,
"semanticImpactScore": null,
"semanticImpactKind": "diagnostic",
"baselineSources": 1,
"currentSamples": 7,
"pairedSamples": 0,
"evidenceDeltaLower": 1715.8,
"evidenceDeltaUpper": 2602.2,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"scope": "genie_ci_workflow"
}
},
{
"id": "source.lines",
"label": "Genie runtime lines",
"group": "source / effect-utils / packages / genie / source / genie",
"status": "pass",
"direction": "regressed",
"gateable": false,
"gateReason": "disabled",
"confidence": "diagnostic",
"comparisonMode": "budget",
"unit": "lines",
"baseline": 18624,
"current": 18722,
"delta": 98,
"ratio": 1.005262027491409,
"semanticImpactScore": null,
"semanticImpactKind": "diagnostic",
"baselineSources": 1,
"currentSamples": 61,
"pairedSamples": 0,
"evidenceDeltaLower": -1764.4,
"evidenceDeltaUpper": 1960.4,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"scope": "genie_runtime"
}
},
{
"id": "task_check_quick_forced",
"label": "Forced check:quick",
"group": "devenv / quality gates / check:quick",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "noise_floor",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 9.224,
"current": 8.901,
"delta": -0.3230000000000004,
"ratio": 0.9649826539462272,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 3,
"currentSamples": 3,
"pairedSamples": 3,
"evidenceDeltaLower": -2.116,
"evidenceDeltaUpper": -0.323,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"workload": "forced-task-cache",
"taskCacheMode": "refresh",
"probe": "task_check_quick_forced",
"probeLabel": "Forced check:quick",
"status": 0,
"sampleCount": 6,
"warmupCount": 0,
"measuredSampleCount": 3,
"pairedSampleCount": 3,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "task_check_quick_warm",
"label": "Warm cached check:quick",
"group": "devenv / quality gates / check:quick",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "within_budget",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 3.107,
"current": 3.259,
"delta": 0.1519999999999997,
"ratio": 1.0489217895075635,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 5,
"currentSamples": 5,
"pairedSamples": 5,
"evidenceDeltaLower": -0.129,
"evidenceDeltaUpper": 0.152,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"workload": "cached-no-op",
"taskCacheMode": "warm",
"probe": "task_check_quick_warm",
"probeLabel": "Warm cached check:quick",
"status": 0,
"sampleCount": 11,
"warmupCount": 1,
"measuredSampleCount": 5,
"pairedSampleCount": 5,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "shell_eval_warm",
"label": "Warm shell eval",
"group": "devenv / devenv shell",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "noise_floor",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 6.122,
"current": 6.059,
"delta": -0.06299999999999972,
"ratio": 0.9897092453446587,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 5,
"currentSamples": 5,
"pairedSamples": 5,
"evidenceDeltaLower": -0.08,
"evidenceDeltaUpper": -0.063,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"probe": "shell_eval_warm",
"probeLabel": "Warm shell eval",
"status": 0,
"sampleCount": 11,
"warmupCount": 1,
"measuredSampleCount": 5,
"pairedSampleCount": 5,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "task_pnpm_install",
"label": "pnpm install task",
"group": "devenv / workspace setup",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "noise_floor",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 0.687,
"current": 0.662,
"delta": -0.025000000000000022,
"ratio": 0.9636098981077147,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 5,
"currentSamples": 5,
"pairedSamples": 5,
"evidenceDeltaLower": -0.028,
"evidenceDeltaUpper": -0.005,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"probe": "task_pnpm_install",
"probeLabel": "pnpm install task",
"status": 0,
"sampleCount": 11,
"warmupCount": 1,
"measuredSampleCount": 5,
"pairedSampleCount": 5,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "task_genie_run",
"label": "Genie run task",
"group": "devenv / genie",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "noise_floor",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 1.483,
"current": 1.476,
"delta": -0.007000000000000117,
"ratio": 0.99527983816588,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 5,
"currentSamples": 5,
"pairedSamples": 5,
"evidenceDeltaLower": -0.142,
"evidenceDeltaUpper": 0.002,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"probe": "task_genie_run",
"probeLabel": "Genie run task",
"status": 0,
"sampleCount": 11,
"warmupCount": 1,
"measuredSampleCount": 5,
"pairedSampleCount": 5,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "processes_help",
"label": "devenv processes --help",
"group": "devenv / devenv cli",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "noise_floor",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 0.021,
"current": 0.02,
"delta": -0.0010000000000000009,
"ratio": 0.9523809523809523,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 9,
"currentSamples": 9,
"pairedSamples": 9,
"evidenceDeltaLower": -0.002,
"evidenceDeltaUpper": -0.001,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"probe": "processes_help",
"probeLabel": "devenv processes --help",
"status": 0,
"sampleCount": 19,
"warmupCount": 1,
"measuredSampleCount": 9,
"pairedSampleCount": 9,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "tasks_list",
"label": "devenv tasks list",
"group": "devenv / devenv cli",
"status": "pass",
"direction": "unchanged",
"gateable": true,
"gateReason": "eligible",
"confidence": "noise_floor",
"comparisonMode": "paired",
"unit": "seconds",
"baseline": 0.051,
"current": 0.05,
"delta": -0.000999999999999994,
"ratio": 0.9803921568627452,
"semanticImpactScore": 0,
"semanticImpactKind": "neutral",
"baselineSources": 9,
"currentSamples": 9,
"pairedSamples": 9,
"evidenceDeltaLower": -0.003,
"evidenceDeltaUpper": 0,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"probe": "tasks_list",
"probeLabel": "devenv tasks list",
"status": 0,
"sampleCount": 19,
"warmupCount": 1,
"measuredSampleCount": 9,
"pairedSampleCount": 9,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "source.files",
"label": "Genie CI workflow helpers files",
"group": "source / effect-utils / genie / ci-workflow / source / ci",
"status": "pass",
"direction": "unchanged",
"gateable": false,
"gateReason": "disabled",
"confidence": "diagnostic",
"comparisonMode": "budget",
"unit": "count",
"baseline": 7,
"current": 7,
"delta": 0,
"ratio": 1,
"semanticImpactScore": null,
"semanticImpactKind": "diagnostic",
"baselineSources": 1,
"currentSamples": 7,
"pairedSamples": 0,
"evidenceDeltaLower": -1,
"evidenceDeltaUpper": 1,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"scope": "genie_ci_workflow"
}
},
{
"id": "source.files",
"label": "Genie runtime files",
"group": "source / effect-utils / packages / genie / source / genie",
"status": "pass",
"direction": "unchanged",
"gateable": false,
"gateReason": "disabled",
"confidence": "diagnostic",
"comparisonMode": "budget",
"unit": "count",
"baseline": 61,
"current": 61,
"delta": 0,
"ratio": 1,
"semanticImpactScore": null,
"semanticImpactKind": "diagnostic",
"baselineSources": 1,
"currentSamples": 61,
"pairedSamples": 0,
"evidenceDeltaLower": -6.1000000000000005,
"evidenceDeltaUpper": 6.1000000000000005,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"scope": "genie_runtime"
}
},
{
"id": "source.files",
"label": "Nix workspace tools files",
"group": "source / effect-utils / nix / workspace-tools / source / nix",
"status": "pass",
"direction": "unchanged",
"gateable": false,
"gateReason": "disabled",
"confidence": "diagnostic",
"comparisonMode": "budget",
"unit": "count",
"baseline": 13,
"current": 13,
"delta": 0,
"ratio": 1,
"semanticImpactScore": null,
"semanticImpactKind": "diagnostic",
"baselineSources": 1,
"currentSamples": 13,
"pairedSamples": 0,
"evidenceDeltaLower": -1.3,
"evidenceDeltaUpper": 1.3,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"scope": "nix_workspace_tools"
}
},
{
"id": "source.lines",
"label": "Nix workspace tools lines",
"group": "source / effect-utils / nix / workspace-tools / source / nix",
"status": "pass",
"direction": "unchanged",
"gateable": false,
"gateReason": "disabled",
"confidence": "diagnostic",
"comparisonMode": "budget",
"unit": "lines",
"baseline": 3237,
"current": 3237,
"delta": 0,
"ratio": 1,
"semanticImpactScore": null,
"semanticImpactKind": "diagnostic",
"baselineSources": 1,
"currentSamples": 13,
"pairedSamples": 0,
"evidenceDeltaLower": -323.70000000000005,
"evidenceDeltaUpper": 323.70000000000005,
"pairedEvidenceQuantile": 0.25,
"dimensions": {
"scope": "nix_workspace_tools"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Nix sources closure size",
"group": "nix / closures / packages / genie / buckets / nix-sources / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "nix-sources"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Nix sources closure size",
"group": "nix / closures / packages / megarepo / buckets / nix-sources / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "nix-sources"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Nix sources closure size",
"group": "nix / closures / packages / oxlint-npm / buckets / nix-sources / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "nix-sources"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Node / pnpm closure size",
"group": "nix / closures / packages / genie / buckets / node / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "node"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Node / pnpm closure size",
"group": "nix / closures / packages / megarepo / buckets / node / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "node"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Node / pnpm closure size",
"group": "nix / closures / packages / oxlint-npm / buckets / node / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "node"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Rust closure size",
"group": "nix / closures / packages / genie / buckets / rust / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "rust"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Rust closure size",
"group": "nix / closures / packages / megarepo / buckets / rust / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "rust"
}
},
{
"id": "nix.closure.bucket.nar_size",
"label": "Rust closure size",
"group": "nix / closures / packages / oxlint-npm / buckets / rust / nix closure buckets",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 0,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "rust"
}
},
{
"id": "shell_eval_traced",
"label": "Shell eval with OTEL trace",
"group": "devenv / devenv shell",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "historical",
"unit": "seconds",
"baseline": null,
"current": 96.989,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"probe": "shell_eval_traced",
"probeLabel": "Shell eval with OTEL trace",
"status": 0,
"sampleCount": 2,
"warmupCount": 0,
"measuredSampleCount": 1,
"pairedSampleCount": 1,
"pairedOrderProtocol": "balanced-seeded-alternating-v1",
"pairedOrderSeed": "26178458810-2-114ad8a157adc2bcdfc762fa7eb32d40d70b54e1",
"measurementProtocol": "devenv-perf-warm-median-v2",
"aggregation": "median",
"phase": "warm",
"devenvRev": "2cf62a010000b70f15c78a72761fad7c9e6fb47a",
"otelServiceName": "devenv-perf-ci"
}
},
{
"id": "nix.closure.path_count",
"label": "Total closure path count",
"group": "nix / closures / packages / genie / total / path-count / nix closure",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "count",
"baseline": null,
"current": 80,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "total"
}
},
{
"id": "nix.closure.path_count",
"label": "Total closure path count",
"group": "nix / closures / packages / megarepo / total / path-count / nix closure",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "count",
"baseline": null,
"current": 5,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "total"
}
},
{
"id": "nix.closure.path_count",
"label": "Total closure path count",
"group": "nix / closures / packages / oxlint-npm / total / path-count / nix closure",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "count",
"baseline": null,
"current": 8,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "total"
}
},
{
"id": "nix.closure.nar_size",
"label": "Total closure size",
"group": "nix / closures / packages / genie / total / nar-size / nix closure",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 533018624,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "total"
}
},
{
"id": "nix.closure.nar_size",
"label": "Total closure size",
"group": "nix / closures / packages / megarepo / total / nar-size / nix closure",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 148820792,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "total"
}
},
{
"id": "nix.closure.nar_size",
"label": "Total closure size",
"group": "nix / closures / packages / oxlint-npm / total / nar-size / nix closure",
"status": "missing_baseline",
"direction": "unknown",
"gateable": false,
"gateReason": "missing_baseline",
"confidence": "missing_baseline",
"comparisonMode": "budget",
"unit": "bytes",
"baseline": null,
"current": 161363816,
"delta": null,
"ratio": null,
"semanticImpactScore": null,
"semanticImpactKind": null,
"baselineSources": 0,
"currentSamples": 1,
"pairedSamples": null,
"evidenceDeltaLower": null,
"evidenceDeltaUpper": null,
"pairedEvidenceQuantile": null,
"dimensions": {
"bucket": "total"
}
}
]
} |
Hypermerge: RepairStatusOutcome: SummaryHypermerge escalation finished for Timeline
Problem Report
Important Evidence
|
| field | value |
|---|---|
job_id |
707 |
attempt_id |
1739 |
started_at |
2026-05-20T09:59:57.169483148+00:00 |
duration |
7m 26s |
deadline |
600s |
budget_usd |
5.00 |
request_head |
db7be84159ae927cb6073d73f9ba0791a131e8fd |
request_context_bytes |
3805 |
failed_job_count |
1 |
failed_job_log_bytes |
2793 |
outcome_applied |
true |
request_head_stale |
true |
agent_reported_duration |
not reported |
agent_turn_count |
not reported |
agent_context_tokens |
not reported |
agent_context_window_tokens |
not reported |
agent_session_notes |
Used pty session co1-slate-0292. Worktree clean after push; remote branch resolves to fc8703b4709398a7720019e441e3a74f49b4b4a0. |
Agent Session
Agent breadcrumbs
machine: dev3
agent_session_id: 0292d490-bca7-5f95-ba14-a23c5c652ed5
agent_name: co1-oak
pty_name: co1-oak-0292
pty_attach: pty attach co1-oak-0292
pty_lookup: pty list --json --filter-tag agent.session_id=0292d490-bca7-5f95-ba14-a23c5c652ed5
pty_name_pattern: co1-<agent-word>-0292
session_tag: mq.escalation.overengineeringstudio.effect-utils.658.unknown
Posted on behalf of @schickling
| field | value |
|---|---|
agent_name |
🌳 co1-oak |
agent_session_id |
0292d490-bca7-5f95-ba14-a23c5c652ed5 |
agent_tool |
Codex CLI |
agent_tool_version |
unknown |
agent_runtime |
mq-daemon unknown |
agent_model |
unknown |
worktree |
|
machine |
dev3 |
tooling_profile |
dotfiles@4e6515b |
Align the genie workflow helper test with the generated split measurement workflow: the PR comment now uses the default CI Measurements title, the task probes are emitted as producer measurements, and the seeded baseline is the current main backfill run. Merge-Queue-Schema: mq.commit.v1 Merge-Queue-Mode: agent-escalated Merge-Queue-PR: #658 Merge-Queue-Attempt-ID: 84c9d5d6-e498-4163-ac30-eda2a7871bad Merge-Queue-Agent-Session-ID: 0292d490-bca7-5f95-ba14-a23c5c652ed5
Hypermerge: RepairStatusOutcome: SummaryHypermerge escalation finished for Timeline
Problem Report
Important Evidence
|
| field | value |
|---|---|
job_id |
708 |
attempt_id |
1740 |
started_at |
2026-05-20T10:10:13.694156428+00:00 |
duration |
2m 0s |
deadline |
600s |
budget_usd |
5.00 |
request_head |
fc8703b4709398a7720019e441e3a74f49b4b4a0 |
request_context_bytes |
3797 |
failed_job_count |
1 |
failed_job_log_bytes |
2707 |
outcome_applied |
true |
request_head_stale |
false |
agent_reported_duration |
not reported |
agent_turn_count |
not reported |
agent_context_tokens |
not reported |
agent_context_window_tokens |
not reported |
agent_session_notes |
No committed changes. Local proof passed; retry CI is the narrowest allowed action. |
Agent Session
Agent breadcrumbs
machine: dev3
agent_session_id: 02924328-3b35-5663-8b02-c57d8136eb21
agent_name: co1-reach
pty_name: co1-reach-0292
pty_attach: pty attach co1-reach-0292
pty_lookup: pty list --json --filter-tag agent.session_id=02924328-3b35-5663-8b02-c57d8136eb21
pty_name_pattern: co1-<agent-word>-0292
session_tag: mq.escalation.overengineeringstudio.effect-utils.658.unknown
Posted on behalf of @schickling
| field | value |
|---|---|
agent_name |
🔭 co1-reach |
agent_session_id |
02924328-3b35-5663-8b02-c57d8136eb21 |
agent_tool |
Codex CLI |
agent_tool_version |
unknown |
agent_runtime |
mq-daemon unknown |
agent_model |
unknown |
worktree |
|
machine |
dev3 |
tooling_profile |
dotfiles@4e6515b |
Hypermerge: RepairStatusOutcome: SummaryHypermerge escalation finished for Timeline
Problem Report
Important Evidence
|
| field | value |
|---|---|
job_id |
709 |
attempt_id |
1741 |
started_at |
2026-05-20T10:28:53.345610962+00:00 |
duration |
2m 34s |
deadline |
600s |
budget_usd |
5.00 |
request_head |
5119eb0d0109293327e2c2675bffa27dd2eed5af |
request_context_bytes |
3660 |
failed_job_count |
1 |
failed_job_log_bytes |
2673 |
outcome_applied |
false |
request_head_stale |
true |
agent_reported_duration |
not reported |
agent_turn_count |
not reported |
agent_context_tokens |
not reported |
agent_context_window_tokens |
not reported |
agent_session_notes |
Head matched initially, but advanced during inspection to c9b03ce1. No commit was made by this escalation session; worktree is clean. |
Agent Session
Agent breadcrumbs
machine: dev3
agent_session_id: 02926016-676f-537c-ac52-567c65d9eaf8
agent_name: co1-wren
pty_name: co1-wren-0292
pty_attach: pty attach co1-wren-0292
pty_lookup: pty list --json --filter-tag agent.session_id=02926016-676f-537c-ac52-567c65d9eaf8
pty_name_pattern: co1-<agent-word>-0292
session_tag: mq.escalation.overengineeringstudio.effect-utils.658.needs-evergreen
Posted on behalf of @schickling
| field | value |
|---|---|
agent_name |
🐦 co1-wren |
agent_session_id |
02926016-676f-537c-ac52-567c65d9eaf8 |
agent_tool |
Codex CLI |
agent_tool_version |
unknown |
agent_runtime |
mq-daemon unknown |
agent_model |
unknown |
worktree |
|
machine |
dev3 |
tooling_profile |
dotfiles@4e6515b |
Problem
CI measurement comments were too easy to misread: single-run noise and normal runner variance could look like real performance regressions, missing or weak baselines produced unclear status, and the chart emphasized raw percentages even when the gate classifier treated the movement as non-actionable.
Goal
Make the shared CI measurement gate reliable enough to reuse across megarepos: stable probe identities, explicit gate policy, comparable historical baselines, typed seed provenance, deterministic/non-deterministic metric handling, human-readable interpretation labels, and a chart that shows actionable impact rather than raw timing noise.
Decisions
ci-measurementsas the single comparison path and remove the legacyperf-comparison.json/DEVENV_PERF_REGRESSION_MODEpath.BASELINE_MAX_CANDIDATE_RUNS.partialrather than a misleading cleanpass.readinessmetadata so comments and artifacts say whether all enabled observations are currently gateable/enforceable.0.00xmeans the raw percentage movement is not actionable for this PR, while raw percent and nominal values stay in the table.resvg, and require public chart assets for private-repo comments.Verification
Local checks on
f6178fda8c906ce57b1ef9c37c466b00f27ece57:bash genie/ci-scripts/ci-measurement-comparison.test.shbun packages/@overeng/genie/bin/genie.tsx --output ci-plain --writeablebun packages/@overeng/genie/bin/genie.tsx --output ci-plain --checkbun test packages/@overeng/genie/src/runtime/github-workflow/ci-workflow-helpers.unit.test.tsCI / production proof:
Commit: f6178fd,Status: pass,Gate: enforced,Readiness: enforceable, and semantic-impact chart text.Complexity
The comparison policy, artifact traversal, baseline accumulation, and comment renderer live in the shared workflow helper because the same logic is reused by devenv perf, Nix closure sizes, source-shape metrics, and downstream megarepos. Repos declare probe policy; the helper owns baseline compatibility, interpretation, and GitHub report rendering.
Concerns
This PR fixes misleading review output and makes robust-band timing gates conservative. It does not claim absolute wall-clock timing is causally perfect. For high-trust merge-blocking timing probes, the long-term standard remains paired interleaved base/head measurement on a compatible testbed; the current PR documents that trust model and keeps raw absolute timing as context unless the evidence is strong enough.
Friction & Bottlenecks
Follow-Ups
maincommit after this PR lands.References
Posted on behalf of @schickling
agent_nameagent_session_idagent_toolworktree