feat: TTD Hardening Sprint S1 — Gates and Evidence Integrity by flyingrobots · Pull Request #283 · flyingrobots/echo

flyingrobots · 2026-02-15T05:37:39Z

This PR delivers the TTD Hardening Sprint S1 goals:

Path-aware CI gates via det-policy.yaml.
Hardening gates G1-G4 (Determinism, Security, Performance, Reproducibility).
Evidence integrity validation for VERIFIED claims.
Governance docs (Release Policy & Rollback Playbooks).
Security negative tests for CBOR decoder.
Materialization hotpath benchmarks.

- Established det-policy.yaml for path-aware CI gate triggering. - Implemented .github/workflows/det-gates.yml with G1-G4 hardening gates. - Added negative security tests to echo-scene-codec for decoder robustness. - Created materialization_hotpath Criterion benchmark in warp-benches (G3). - Added evidence generation and validation scripts to enforce artifact-backed VERIFIED claims. - Published RELEASE_POLICY.md and ROLLBACK_TTD.md for governance. - Updated ECHO_ROADMAP.md to track active hardening progress.

chatgpt-codex-connector · 2026-02-15T05:37:43Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

coderabbitai · 2026-02-15T05:37:48Z

Important

Review skipped

This PR was authored by the user configured for CodeRabbit reviews. By default, CodeRabbit skips reviewing PRs authored by this user. It's recommended to use a dedicated user account to post CodeRabbit review feedback.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds a multi-stage determinism gating system: new GitHub Actions workflow that classifies changed files against det-policy, conditionally runs gated CI jobs (determinism, security, perf, repro, reproducibility), collects and uploads artifacts, and generates+validates an evidence pack; includes policy, claim maps, scripts, tests, benchmarks, and docs.

Changes

Cohort / File(s)	Summary
Workflow `/.github/workflows/det-gates.yml`	New det-gates GitHub Actions workflow implementing classification, conditional gated jobs, artifact upload, and evidence validation orchestration.
Policy & Classifier `det-policy.yaml`, `scripts/validate_det_policy.cjs`, `scripts/classify_changes.cjs`	Adds det-policy.yaml and two Node.js CLIs: validator (validateDetPolicy) and classifier (classifyChanges + matches). Classifier emits run_full/run_reduced/run_none and enforces require_full_classification.
Evidence & Validation `scripts/generate_evidence.cjs`, `scripts/validate_claims.cjs`, `docs/determinism/CLAIM_MAP.yaml`, `docs/determinism/sec-claim-map.json`	Evidence generator (generateEvidence) that aggregates CI artifacts into evidence.json and claims mapping files; validator enforces immutable pointers and claim semantics.
Docs & Governance `docs/RELEASE_POLICY.md`, `docs/ROLLBACK_TTD.md`, `ECHO_ROADMAP.md`, `CHANGELOG.md`, `TASKS-DAG.md`	New and updated governance artifacts: release gate policy, rollback playbook, roadmap sprint notes, changelog entry, and task list additions.
Tests (CBOR) `crates/echo-scene-codec/src/cbor.rs`	Adds unit tests asserting decoding rejects excessive ops, unsupported version, and invalid enum tags.
Benchmarks `crates/warp-benches/Cargo.toml`, `crates/warp-benches/benches/materialization_hotpath.rs`	Adds Criterion benchmark `materialization_hotpath` and bench source exercising MaterializationBus emit/finalize patterns across channel policies.
CI Helpers (new scripts) `scripts/*.cjs`	New CLI utilities exported as modules and CLI entry points: policy validator, change classifier, evidence generator, and claims validator.

Sequence Diagram(s)

sequenceDiagram
    participant PR as Pull Request
    participant GH as GitHub Actions
    participant Classifier as classify-changes.cjs
    participant Policy as det-policy.yaml
    participant GateJobs as Gated Jobs
    participant Artifacts as Artifact Storage
    participant Evidence as generate_evidence.cjs
    participant Validator as validate_claims.cjs

    PR->>GH: trigger det-gates workflow
    GH->>Classifier: run classify-changes job
    Classifier->>Policy: load policy & match changed files
    Policy-->>Classifier: return classification (run_full/run_reduced/run_none)
    Classifier-->>GH: set job outputs/conditions
    GH->>GateJobs: run enabled gate jobs (determinism, static, security, perf, repro, build-repro)
    GateJobs->>Artifacts: upload artifact groups
    GH->>Evidence: gather artifacts and invoke generator
    Evidence->>Validator: produce and validate evidence.json
    Validator-->>GH: PASS / FAIL

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🔐 Gates rise, strict scripts take the floor,
Files matched, claims pinned down, artifacts stored,
Benchmarks drum, decoders spurn bad bytes,
Evidence packed, validators bite,
Rollback leash ready — audits keep watchful score.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title precisely captures the sprint deliverable and primary gate/evidence infrastructure added; it is specific, concise, and directly reflects the main changeset focus.
Description check	✅ Passed	The description accurately maps to core changeset elements: det-policy.yaml gates, G1–G4 hardening, evidence validation, governance docs, CBOR security tests, and benchmarks—all present and correctly characterized.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

flyingrobots · 2026-02-15T06:11:16Z

@coderabbitai review please 🥕

coderabbitai · 2026-02-15T06:11:36Z

@flyingrobots, I'll conduct a comprehensive, merciless review of this TTD Hardening Sprint S1 PR. Given the scope—CI gates, evidence integrity, security tests, benchmarks, and governance docs—I'll scrutinize every detail with maximum pedantry. 🔥

Expect exhaustive analysis on:

Gate implementation correctness (G1-G4 logic)
Evidence validation integrity (cryptographic soundness, edge cases)
CBOR decoder security (negative test coverage, fuzzing gaps)
Benchmark methodology (statistical validity, instrumentation overhead)
Policy YAML schema (path expressions, escape sequences, validation)
Governance doc completeness (rollback scenarios, escalation paths)

No stone unturned, no landmine undetected. 🥕

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 25

🤖 Fix all issues with AI agents

In @.github/workflows/det-gates.yml:
- Around line 150-167: The build-repro job currently runs "cargo check" which
only type-checks and produces no artifacts (see job name build-repro and
wasm-check.log), so either rename the job to reflect compilation-only or
implement true reproducibility: change the steps to run a deterministic build
(e.g., cargo build --release --target wasm32-unknown-unknown) twice in isolated
steps, capture the built artifacts (the target/.../wasm files) into distinct
logs/artifact paths, and add a comparison step (e.g., sha256sum both outputs or
diff the files) that fails the job if hashes differ; also update the Upload
build artifacts step to upload the actual build outputs rather than
wasm-check.log.
- Around line 169-187: The validate-evidence job (validate-evidence) currently
runs ./scripts/generate_evidence.cjs and ./scripts/validate_claims.cjs but never
downloads upstream artifacts, so referenced artifact names like
det-linux-artifacts and sec-artifacts may be missing; update the job to use
actions/download-artifact to fetch the expected artifacts (e.g.,
det-linux-artifacts, sec-artifacts) before running validation, and add a
verification step that checks the downloaded files/directories exist (return
non-zero or fail the job) so scripts/validate_claims.cjs can assert the
artifacts truly exist rather than only validating JSON structure.
- Around line 38-46: The run block interpolates `${{ github.base_ref }}`
directly into shell commands (`git fetch origin ${{ github.base_ref }}
--depth=1` and `git diff --name-only origin/${{ github.base_ref }}...HEAD >
changed.txt`), which allows script injection via crafted branch names; fix by
exposing the value as an environment variable (e.g., `BASE_REF`) in the workflow
`env` and then reference it inside the run block as a shell variable (e.g.,
"$BASE_REF") so the shell treats it as data, and ensure you always quote the
variable when used in `git fetch`/`git diff` to avoid word-splitting or
globbing.

In `@crates/echo-scene-codec/src/cbor.rs`:
- Around line 1029-1044: The test reject_invalid_enum_tags currently checks
invalid tags for NodeShape and EdgeStyle but misses the identical invalid-tag
paths in decode_projection_kind and decode_label_anchor; update that test (or
add a new one) to also encode an out-of-range u8 (e.g., 2) and assert an Err
from decode_projection_kind(&mut Decoder::new(&buf)) and likewise for
decode_label_anchor(&mut Decoder::new(&buf)) so both decoder functions are
explicitly covered for invalid enum tags.
- Around line 986-1015: The test reject_exceeding_max_ops allocates and encodes
MAX_OPS+1 SceneOp entries unnecessarily; change it to only build the minimal
CBOR header and ops-array-length prefix so the decoder's early check triggers
without encoding the ops body. In the reject_exceeding_max_ops test replace the
loop and heavy Vec usage with writing the version, session, cursor, epoch, and
then Encoder::array((MAX_OPS + 1) as u64) (i.e. the ops array header) and skip
calling encode_scene_op or pushing SceneOp entries, then call decode_scene_delta
and assert the same error; keep references to MAX_OPS, Encoder::array,
decode_scene_delta, and the test name to locate the change.

In `@crates/warp-benches/benches/materialization_hotpath.rs`:
- Around line 19-30: The benchmarks currently clone the 64-byte Vec payload
inside the hot loop (e.g., in the closure for materialization_emit_log_1000
calling bus.emit with payload.clone()), which measures allocation cost rather
than pure MaterializationBus::emit throughput; fix by removing payload.clone()
from the inner loop and either pre-allocating a reusable payload (reuse the same
Vec or pre-create a Vec/Bytes per iteration outside the timing), or switch the
payload type to bytes::Bytes so clone is a cheap refcount bump, and apply the
same change to the other three benchmarks that use payload.clone(); if you
intentionally want to include allocation cost, instead add a comment documenting
that the benchmark measures allocation+emit.

In `@det-policy.yaml`:
- Around line 128-143: The warp-benches entry currently classed as
DET_NONCRITICAL has no gates; update the warp-benches block (the "warp-benches"
YAML mapping) to include required_gates: [G3] so changes to the benchmark crate
trigger a performance validation (e.g., add required_gates: ["G3"] under the
warp-benches key while keeping other fields like class and owner_role intact).
- Around line 7-15: The policy omits G3 (Performance) from all tiers so
classify_changes.cjs will never trigger the G3 gate; either add G3 to the
required_gates for the appropriate classifications (e.g., include G3 in
DET_CRITICAL required_gates) if G3 should be path-triggered, or mark G3 as
always-run by adding an explanatory comment to det-policy.yaml and update the CI
workflow to run the G3 job unconditionally (and update RELEASE_POLICY.md to
reflect which choice); adjust DET_CRITICAL/DET_IMPORTANT/DET_NONCRITICAL
required_gates or CI job configuration and corresponding docs (referencing
DET_CRITICAL, DET_IMPORTANT, DET_NONCRITICAL, G3, classify_changes.cjs, and
RELEASE_POLICY.md) accordingly.
- Around line 145-153: classify_changes.cjs currently ignores
require_full_classification and defaults unmatched files to DET_NONCRITICAL;
update classify_changes.cjs to read require_full_classification from
det-policy.yaml and fail/emit an error (non-zero exit) when a changed file/crate
has no explicit classification (i.e., no matching crates entry) instead of
defaulting to DET_NONCRITICAL; also have validate_det_policy.cjs load
deny_patterns from det-policy.yaml and either validate that
ban-nondeterminism.sh and ban-unordered-abi.sh consume those patterns or fail
the policy check if deny_patterns exist but are not enforced; finally refactor
ban-nondeterminism.sh and ban-unordered-abi.sh to accept deny_patterns (e.g.,
via env var or a generated file) instead of hardcoded crate lists, or remove
deny_patterns from det-policy.yaml if you choose not to support dynamic deny
pattern enforcement.

In `@docs/determinism/CLAIM_MAP.yaml`:
- Around line 19-31: CLAIM_MAP.yaml currently defines SEC-001 and SEC-002 but is
missing governance claims for the remaining negative test controls referenced by
sec-claim-map.json; update CLAIM_MAP.yaml to add SEC-003, SEC-004, and SEC-005
(or a consolidated broader claim) that correspond to "truncated payload
rejection", "bad version handling", and "invalid enum tag rejection" so
validate_claims.cjs finds a claim for each mapped test control (use the exact
claim IDs SEC-003..SEC-005 and include statement and required_evidence entries
consistent with SEC-001/SEC-002 formats to match sec-claim-map.json mappings).
- Around line 5-10: DET-001 declares required_evidence: static_inspection but
there is no enforcement producing that artifact; add a CI/static step that runs
a deterministic deny-pattern check over the echo-wasm-abi source and emits a
static_inspection artifact. Concretely, implement a script or CI job referenced
by det-policy.yaml's deny_patterns for HashMap/HashSet that greps or runs
clippy/rs-lint over the echo-wasm-abi codebase, fail the job on matches, and
upload a machine-readable report (e.g., JSON) as the static_inspection artifact
so DET-001 can be VERIFIED; tie the job to the policy entry for DET-001 so the
pipeline knows to produce the required artifact.

In `@docs/RELEASE_POLICY.md`:
- Around line 39-41: Add explicit definitions in RELEASE_POLICY.md for the gate
states "INFERRED" and "UNVERIFIED" and a clear definition of what an "approved
closeout plan" is, and specify who can approve it (e.g., reference the
owner_role from det-policy.yaml or name the exact role/team). Either add a brief
glossary section in this file or link to an existing glossary/policy, and ensure
the wording clearly distinguishes INFERRED (evidence-based inference without
direct verification) from UNVERIFIED (no supporting evidence) and states the
approval authority and required contents of a valid closeout plan.

In `@package.json`:
- Line 17: Update the js-yaml dependency floor version to ensure the vulnerable
4.1.0 isn't used; in package.json change the dependency entry "js-yaml":
"^4.1.0" to "js-yaml": "^4.1.1" so installations will require at least 4.1.1
(which contains the CVE-2025-64718 fix).

In `@scripts/classify_changes.cjs`:
- Around line 15-22: Inside classifyChanges, replace the direct process.exit(1)
call with a thrown Error (or return a rejected Promise) so the function is
testable and calling code can decide exit behavior; specifically throw a new
Error that includes policyPath context instead of calling process.exit. Also add
an existence check for changedFilesPath (fs.existsSync or try/catch around
fs.readFileSync) before reading it and throw a descriptive Error (including
changedFilesPath) if it's missing or unreadable so CI gets a clear failure
message; update callers of classifyChanges to handle the thrown errors
appropriately.
- Around line 6-13: The matches function currently corrupts the '**' replacement
when later replacing '*' — update matches(file, pattern) to protect '**' first
by replacing it with a unique placeholder (e.g., '__GLOB_DBL__'), then escape
dots and replace single '*' with '[^/]*', and finally replace the placeholder
with '.*' to restore the recursive directory semantics; ensure you still wrap
the result with '^' and '$' and construct the RegExp from the final string.

In `@scripts/generate_evidence.cjs`:
- Around line 6-28: The claims array currently unconditionally sets DET-002 and
SEC-001 to 'VERIFIED'; update generate_evidence.cjs so each claim's status is
computed instead: accept gate outcomes (e.g. inputs or environment vars like
determinismOutcome, securityOutcome, performanceOutcome, reproducibilityOutcome)
or probe artifact existence for each artifact_name (e.g. check the artifact via
the CI API or filesystem) and set status to 'VERIFIED' only if the corresponding
gate succeeded or the artifact exists, otherwise set to 'FAILED' or 'SKIPPED';
also add entries for the missing G3 and G4 claims (e.g. IDs for Performance and
Build Reproducibility) using the same conditional status logic and reuse the
existing workflow, runId, commitSha and artifact_name symbols when building each
evidence object.
- Line 3: The import const path = require('path') is unused dead code; either
remove that require statement or actually use the path module when constructing
the output filename (e.g., use path.join to build the path to evidence.json).
Locate the require('path') binding named path in the top of
scripts/generate_evidence.cjs and either delete that line or replace direct
string file paths with path.join(...) when writing evidence.json so the path
module is meaningfully used.
- Around line 5-42: The artifactsDir parameter is accepted by generateEvidence
(and populated from process.argv[2]) but ignored when writing output; update
generateEvidence to use artifactsDir to build the output path (e.g., join
artifactsDir with 'evidence.json') instead of the hardcoded 'evidence.json', and
call fs.writeFileSync with that path; ensure you reference the generateEvidence
function, the artifactsDir parameter, and the fs.writeFileSync invocation when
making the change (optionally ensure the directory exists before writing), or
alternatively remove artifactsDir from the signature and callers if you decide
not to support a custom output directory.

In `@scripts/validate_claims.cjs`:
- Around line 15-25: The validator currently only checks presence of
requiredFields on claim.evidence (in the data.claims loop) but not their
semantic validity; update the validation inside the loop for VERIFIED claims to
(1) enforce commit_sha matches /^[0-9a-f]{40}$/ and reject the literal "local"
when CI/GITHUB_SHA is expected, (2) enforce run_id is numeric in CI contexts (or
reject "local"), (3) ensure workflow and artifact_name are non-empty and not the
default "local"/placeholder, and (4) push descriptive violations to violations
array when these checks fail; use the existing data.claims, claim.evidence and
requiredFields references to locate and extend the checks.
- Around line 15-25: Validate that data.claims is an array before iterating: add
an Array.isArray(data.claims) guard around the loop (or early-return/push a
violation if it's not an array) so that for (const claim of data.claims) only
runs on real arrays; when skipping/flagging malformed input, reference
requiredFields, violations, claim.id and claim.evidence semantics so downstream
checks remain the same.
- Around line 5-8: The current check in scripts/validate_claims.cjs that handles
a missing evidence file (the if (!fs.existsSync(evidenceFile)) block) should
fail hard instead of returning true; change the behavior to log an error
(process.stderr or console.error) and return false so CI fails when
evidence.json is absent. If you want configurability, add a --strict flag to the
validation entrypoint (parse args in the main function) and only allow a
non-fatal warning when strict mode is explicitly disabled; otherwise default to
the hard-fail behavior.

In `@scripts/validate_det_policy.cjs`:
- Around line 55-59: validateDetPolicy is only invoked when the script is run as
main and is not exported, preventing importing for tests or reuse; modify
scripts/validate_det_policy.cjs to export the validator (e.g., attach
validateDetPolicy to module.exports or export it as a named export) so other
modules and unit tests can require/import validateDetPolicy, while keeping the
existing if (require.main === module) CLI invocation intact.
- Around line 32-44: Validate that each crate entry has an explicit class and a
well-formed paths array before using them: inside the for loop iterating over
Object.entries(crates) check that crateInfo.class is defined and non-empty and
if missing emit a clear error like "Crate <name> missing class" and return false
(instead of the current "unknown class undefined" path), then validate
crateInfo.paths exists and Array.isArray(crateInfo.paths) (and optionally
non-empty) and if not emit a clear error like "Crate <name> missing or invalid
paths" and return false; keep the existing DET_CRITICAL owner_role check
(policy.require_owners_for_critical && cls === 'DET_CRITICAL') but only after
the class presence check so you reference a valid cls.
- Around line 24-29: The loop that checks classes for presence of required_gates
(iterating over Object.entries(classes)) does not validate that each gate name
is one of the allowed gates, so typos like G5 slip through; update the
validation inside that loop to iterate each value in classInfo.required_gates
and ensure it exists in a canonical set (e.g., ALLOWED_GATES = new
Set(["G1","G2","G3","G4"])) or by cross-referencing defined gates, and if any
unknown gate is found log an error (include className and invalid gate name) and
return false; keep the existing missing-key check but add this membership check
for required_gates to the same block.

In `@sec-claim-map.json`:
- Around line 1-30: The sec-claim-map.json file is placed at the repo root while
its companion governance file CLAIM_MAP.yaml lives in docs/determinism/, which
makes consumers hunt for paired artifacts and loses the ability to annotate when
using JSON; move sec-claim-map.json into the docs/determinism/ directory
alongside CLAIM_MAP.yaml (or rename/migrate it to sec-claim-map.yaml) and update
any tooling/paths that reference sec-claim-map.json; additionally add a
top-level cross-reference comment/field in the claim map (or a short README)
pointing to det-policy.yaml and CLAIM_MAP.yaml so consumers can find both
artifacts and preserve annotations if you convert to YAML.

.github/workflows/det-gates.yml

crates/echo-scene-codec/src/cbor.rs

scripts/validate_claims.cjs

scripts/validate_det_policy.cjs

docs/determinism/sec-claim-map.json

- Hardened CI workflow against script injection. - Implemented actual reproducible WASM build checks (G4). - Added static inspection job for DET-001 (HashMap usage). - Optimized materialization benchmarks by removing allocation cost from hot loops. - Expanded CBOR decoder negative tests for missing enum tags and MAX_OPS check. - Updated evidence generation to verify artifact existence and support conditional claims. - Formalized gate states and closeout plan definitions in RELEASE_POLICY.md. - Improved script validation logic and docstring coverage. - Moved sec-claim-map.json to docs/determinism/ for better alignment.

flyingrobots · 2026-02-15T08:36:23Z

@coderabbitai review please

coderabbitai · 2026-02-15T08:36:45Z

@flyingrobots, initiating full-spectrum review of PR #283 now. I'll tear through:

det-policy.yaml schema & path expressions (injection vectors, glob edge cases, YAML bombs)
G1-G4 gate implementations (determinism guarantees, security boundaries, perf regression thresholds, reproducibility proofs)
Evidence integrity validation (VERIFIED claim cryptographic chain, timestamp authority, revocation handling)
CBOR decoder negative tests (malformed input coverage, integer overflow guards, recursion depth limits)
Materialization benchmarks (statistical significance, warmup methodology, allocator interference)
Governance docs (rollback decision trees, escalation SLAs, postmortem templates)

Zero tolerance for handwaving. Every assertion will be challenged. 🔥🥕

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 23

🤖 Fix all issues with AI agents

In @.github/workflows/det-gates.yml:
- Around line 62-65: Replace inline expression interpolation of github.sha and
github.run_id inside the run: shell blocks with environment variables: add env
entries (e.g., DIGEST_SHA: ${{ github.sha }}, RUN_ID: ${{ github.run_id }}) for
the steps that write artifacts/digest-table.csv and then reference those env
vars in the shell script (use $DIGEST_SHA and $RUN_ID) so both occurrences that
compute the sha256 of dind-report.json use the env-injected values rather than
${ { github.sha } }/${ { github.run_id } } inline; make the same change for the
second occurrence as well.
- Around line 120-122: The current "Create report" step unconditionally writes
{"claim_id":"DET-001","status":"PASSED"}; change it to capture the exit code and
output of ban-nondeterminism.sh (or parse its stdout/stderr) and emit a JSON
report into static-inspection.json that includes the exit_code, raw_output (or
summary), and a computed status (e.g., PASSED if exit_code==0 and no warnings,
WARN if exit_code==0 with warnings/partial matches, FAILED if exit_code!=0).
Update the step that runs ban-nondeterminism.sh to tee or save its output to a
variable/file, inspect that content for warning markers and the tool's exit
code, then write the dynamic JSON (with claim_id "DET-001") reflecting those
values instead of the hardcoded PASSED.
- Around line 115-119: DETERMINISM_PATHS in the Run determinism check step
currently only targets "crates/echo-wasm-abi", leaving other DET_CRITICAL crates
uninspected; update the workflow so DETERMINISM_PATHS includes all crates listed
as DET_CRITICAL in det-policy.yaml (e.g., warp-core, warp-wasm, warp-geom,
warp-ffi, echo-scene-port, echo-scene-codec, echo-graph, echo-ttd, etc.), or
alternatively add separate determinism check steps/claims per crate so DET-001’s
scope aligns with the policy; ensure the DETERMINISM_PATHS environment variable
(or added steps) matches det-policy.yaml entries and that the
ban-nondeterminism.sh invocation will inspect each listed path.
- Around line 213-238: The validate-evidence job currently always runs but
unconditionally fails when full-tier artifacts are absent; update its artifact
presence checks to be conditional on the classification outputs (or gate the
entire job to only run for full runs). Specifically, in the validate-evidence
job adjust the shell step that verifies gathered-artifacts to only test for
det-linux-artifacts, perf-artifacts, static-inspection, and
build-repro-artifacts when needs.classify-changes.outputs.run_full == 'true' (or
conversely use if: needs.classify-changes.outputs.run_full == 'true' on the
whole validate-evidence job) so run_reduced cases (where only decoder-security
and build-repro run) do not hard-fail.

In `@crates/echo-scene-codec/src/cbor.rs`:
- Around line 1007-1017: The test reject_invalid_version currently writes
array(5) but only two CBOR items, relying on decode_scene_delta_inner's current
ordering to catch the unsupported version; instead, after Encoder::new(&mut buf)
and encoder.array(5).unwrap() and encoder.u8(99).unwrap(), write valid dummy
encodings for the remaining four fields expected by
decode_scene_delta/decode_scene_delta_inner (use the correct types expected for
session_id, timestamp, sequence, delta — e.g., empty/zero values or empty
bytes/strings as appropriate) so the buffer contains a full 5-field array and
the decoder will fail strictly on the version check regardless of field-read
ordering. Ensure this change is made in the reject_invalid_version test that
calls decode_scene_delta so the failure is unambiguously due to the version
value.

In `@crates/warp-benches/benches/materialization_hotpath.rs`:
- Around line 27-33: The benchmark currently swallows errors by using "let _ ="
on the bus.emit(...) calls, which can hide emit failures and corrupt benchmark
validity; update the emit invocations in the materialization_hotpath benchmarks
to assert success (e.g., call unwrap() on the Result returned by bus.emit) for
all emits that feed the benchmark (leave finalization calls that are being
measured as "let _ =" if desired), locating the calls that use
bus.emit(black_box(ch), black_box(EmitKey::new(h(i as u64), 1)),
black_box(p.clone())) and replacing the silent discard with an unwrap so any Err
from bus.emit surfaces immediately.

In `@det-policy.yaml`:
- Around line 140-147: The policy mistakenly classifies "tests/**" and "e2e/**"
under the docs entry (docs.paths) as DET_NONCRITICAL which lets changes to
determinism tests bypass gates; remove "tests/**" and "e2e/**" from the
docs.paths array and add a new top-level policy entry (or update the relevant
crate-level entries) that classifies those patterns as DET_IMPORTANT or matches
the DET_CRITICAL level of the code they test (reference the docs entry and its
paths array and the echo-dry-tests entry for structure when adding the new
entry).
- Around line 76-79: The current det-policy entry for the "ci" crate marks a
broad set of paths as DET_CRITICAL (ci.class = DET_CRITICAL) — including
"scripts/**", ".github/workflows/**", "det-policy.yaml", "Makefile", and
"xtask/**" — which forces G1+G2+G3+G4 on trivial script edits; narrow the blast
radius by either (a) changing ci.class from DET_CRITICAL to a less strict class
for non-determinism-sensitive paths, or (b) splitting the single "ci" entry into
two entries (e.g., "ci" for critical items like ".github/workflows/**" and
"ci-docs" or "ci-tools" for non-critical scripts such as "scripts/**" and
"xtask/**") and assigning a lower classification to the non-critical entry;
update the paths array in det-policy.yaml accordingly so only true
determinism-sensitive files remain DET_CRITICAL.
- Around line 148-152: The crate-level required_gates override for
"warp-benches" is ignored because classify_changes.cjs only reads
crateInfo.class; update classify_changes.cjs to also read and merge
crateInfo.required_gates when matching a crate (e.g., where it reads
crateInfo.class), so that required_gates from the det-policy entry for
warp-benches (required_gates: ["G3"]) are applied, or remove the override from
det-policy and set warp-benches to DET_CRITICAL if G3 should always apply;
specifically modify the logic that processes matchedCrates / crateInfo (in
classify_changes.cjs) to append/merge crateInfo.required_gates into the computed
requiredGates instead of ignoring it.

In `@docs/determinism/CLAIM_MAP.yaml`:
- Around line 1-59: Add a claim for the build-repro gate and wire it into
evidence generation: create a new claim ID (e.g. REPRO-001) in CLAIM_MAP.yaml
with a clear statement like "Dual WASM build reproducibility verified" including
required_evidence types (ci_artifact, behavior_test or static_inspection as
appropriate) and owner_role, then update generate_evidence.cjs to emit that
claim when det-gates.yml's build-repro job completes (ensure the job's artifact
name build-repro-artifacts is referenced and the hash-comparison result maps to
REPRO-001); alternatively, if reproducibility is intentionally excluded, add a
documented exclusion entry in CLAIM_MAP.yaml referencing
det-gates.yml/build-repro and explain why so the evidence pipeline is explicit.

In `@docs/RELEASE_POLICY.md`:
- Around line 31-35: The release policy currently lists G3 under
production_blockers but omits it from staging_blockers, which allows performance
regressions to pass staging; update the release_policy YAML so G3 is included in
staging_blockers (i.e., change staging_blockers: [G1, G2, G4] to
staging_blockers: [G1, G2, G3, G4]) or, if omission is intentional, add a short
rationale comment in RELEASE_POLICY.md documenting why G3 is staging-optional
and the conditions that justify that trade-off.

In `@docs/ROLLBACK_TTD.md`:
- Around line 29-36: Update the revert sequence instructions to handle merge
commits: after listing the commits (e201c9b, fd98b91, ce98d80, a02ea86, 3187e6a,
6e34a77, f138b8a) add a note that if any listed commit is a merge commit the
operator must run git revert with a mainline parent flag (e.g., "git revert -m 1
<sha>") and explain to use -m 1 to pick the first parent as the mainline; ensure
this guidance is adjacent to the revert list so operators see it before running
git revert.
- Around line 73-79: The Post-Rollback Evidence Packet section lists required
contents but omits where to file them; update the "Post-Rollback Evidence Packet
(required)" section to add explicit filing instructions: specify that the packet
must be attached to the incident ticket and linked in the rollback PR
description, uploaded as a CI artifact for traceability, and added to the
incident runbook/knowledge-base entry; mention required naming conventions
(e.g., incident-<ID>-post-rollback-evidence) and retention/location guidance so
reviewers can find it easily.

In `@ECHO_ROADMAP.md`:
- Around line 6-15: The G4 entry in the roadmap is inconsistent with
RELEASE_POLICY.md: update the G4 line in ECHO_ROADMAP.md (currently "Enforce
artifact-backed VERIFIED claims and path-aware gates") to match the
RELEASE_POLICY.md definition of G4 as "Build Reproducibility" (e.g., "G4 (REP):
Reproducible deterministic build constraints validated in CI"), and move the
artifact-backed VERIFIED claims and path-aware gates text into a cross-cutting
GOV or separate bullet (or note) so evidence enforcement remains clearly
identified as a cross-cutting concern rather than the G4 description.

In `@scripts/classify_changes.cjs`:
- Around line 56-63: The loop that picks the highest class silently treats
unknown crateInfo.class values as lower priority because classPriority[cls] can
be undefined; update the logic in the block using matches(file, pattern) so it
validates cls before comparing (e.g., check that
classPriority.hasOwnProperty(cls) or classPriority[cls] !== undefined), and if
the class is unknown throw or log a hard failure (exit non-zero) mentioning the
crate and invalid class; modify the code paths around classPriority,
crateInfo.class, and maxClass so unknown classifications do not fall through
silently but instead cause an explicit error.
- Around line 67-70: The code in classifyChanges uses process.exit(1) when
requireFull && !matched which prevents the outer try/catch from handling the
error and makes tests impossible; change that process.exit(1) to throw new
Error(...) with the same message so the classifyChanges function consistently
throws on error (matching the behavior at lines where missing files use throw)
and the require.main try/catch around the call can intercept and handle/report
the error; update any unit tests or callers that relied on process.exit if
present.

In `@scripts/generate_evidence.cjs`:
- Around line 16-23: The current checkArtifact function only verifies a
directory is non-empty; update it to validate presence of required artifact
files instead: in checkArtifact(name) (using gatheredArtifactsDir and fullPath)
replace the generic readdir/length check with explicit existence checks for a
list of expected filenames (e.g., "sec-tests.log", "static-inspection.json" —
add any other required names) using fs.existsSync or fs.statSync for each target
file, return true only if all required files are present, and keep the try/catch
to return false on any filesystem errors.
- Around line 36-60: The five SEC claims (SEC-001 .. SEC-005) all use the same
directory existence check via checkArtifact('sec-artifacts'), which falsely
marks all controls VERIFIED when any artifact exists; update
generate_evidence.cjs to perform per-claim verification instead: for each claim
ID (SEC-001, SEC-002, SEC-003, SEC-004, SEC-005) replace the generic
checkArtifact call with a specific check that either looks for a dedicated
artifact file (e.g., sec-001.json / sec-001.log) or parses a central
sec-claim-map.json/test results to confirm that that particular control passed,
set status to VERIFIED only if that specific artifact/result indicates success,
and update the evidence.evidence.artifact_name to reference the specific
artifact or include parsed result metadata so each claim proves its own outcome.
- Around line 83-86: The script currently defines generateEvidence and invokes
it when run as main but does not export it, making it inconsistent with sibling
scripts like validate_det_policy.cjs and hard to test; update the file to export
the main function by assigning module.exports = { generateEvidence } (or
module.exports.generateEvidence = generateEvidence) so tests and other modules
can require and call generateEvidence; keep the existing require.main === module
invocation intact so CLI behavior is unchanged.

In `@scripts/validate_claims.cjs`:
- Around line 40-48: The current validation allows evidence.run_id === 'local'
but only flags 'local' for evidence.workflow and evidence.artifact_name in CI;
add CI-time checks for run_id and commit_sha as well. Specifically, after the
existing run_id format check and alongside the workflow/artifact_name checks
that use process.env.GITHUB_ACTIONS and push to violations, add conditions that
if evidence.run_id === 'local' or evidence.commit_sha === 'local' (and
process.env.GITHUB_ACTIONS is truthy) then push a violation like `Claim
${claim.id} has placeholder evidence ('local') in CI environment.` so VERIFIED
claims with run_id or commit_sha='local' are caught. Ensure you reference the
same violations array and use the existing logging/message wording for
consistency.
- Line 30: The filter is using a falsy check which treats 0/""/false as missing;
update the predicate in the requiredFields.filter call to explicitly check for
null/undefined (e.g., requiredFields.filter(f => evidence[f] == null) or
evidence[f] === undefined) so only absent or null values are flagged; target the
expression that constructs missing (the requiredFields.filter callback that
references evidence[f]).

In `@scripts/validate_det_policy.cjs`:
- Around line 32-42: The loop over classInfo.required_gates assumes it's an
array but doesn't validate types, so string values iterate per-character or
empty string silently; update the validation in the iteration over
Object.entries(classes) to first assert that classInfo.required_gates is an
Array (e.g., using Array.isArray(classInfo.required_gates)); if it's missing or
not an array, log a clear error like "Class <name> required_gates must be an
array" and return false; only after that use for (const gate of
classInfo.required_gates) and keep the existing ALLOWED_GATES membership checks
to reject invalid gates.
- Around line 21-24: The error/success messages hardcode filenames instead of
using the actual input path; update the console messages to use the filePath
variable: change the version check error in the block that tests data.version
!== 1 (and currently prints 'Error: Invalid version in det-policy.yaml') to
include filePath, and change the final success message (currently
'det-policy.json is valid.') to include filePath as well so both messages
reflect the actual file being validated (refer to the filePath variable and the
data.version check).

.github/workflows/det-gates.yml

crates/echo-scene-codec/src/cbor.rs

scripts/generate_evidence.cjs

scripts/validate_claims.cjs

coderabbitai · 2026-02-15T08:41:46Z

scripts/validate_claims.cjs

+        if (!/^\d+$/.test(String(evidence.run_id)) && evidence.run_id !== 'local') {
+          violations.push(`Claim ${claim.id} has invalid run_id: ${evidence.run_id}`);
+        }
+        if (evidence.workflow === 'local' || evidence.artifact_name === 'local') {
+          // Warning or violation depending on CI context
+          if (process.env.GITHUB_ACTIONS) {
+            violations.push(`Claim ${claim.id} has placeholder evidence ('local') in CI environment.`);
+          }
+        }


⚠️ Potential issue | 🟠 Major

run_id === 'local' passes format validation but isn't flagged in CI context.

Line 40 explicitly allows 'local' as a valid run_id. Lines 43-47 check for placeholder 'local' values on workflow and artifact_name in CI, but not on run_id or commit_sha. A VERIFIED claim with run_id: "local" and commit_sha: "deadbeef..." in GitHub Actions will pass silently. That's an evidence integrity hole — the very thing this script exists to prevent.

Proposed fix — check all four fields for placeholder values in CI

- if (evidence.workflow === 'local' || evidence.artifact_name === 'local') { - // Warning or violation depending on CI context - if (process.env.GITHUB_ACTIONS) { - violations.push(`Claim ${claim.id} has placeholder evidence ('local') in CI environment.`); - } - } + if (process.env.GITHUB_ACTIONS) { + const placeholders = requiredFields.filter(f => evidence[f] === 'local'); + if (placeholders.length > 0) { + violations.push(`Claim ${claim.id} has placeholder evidence ('local') for [${placeholders.join(', ')}] in CI environment.`); + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!/^\d+$/.test(String(evidence.run_id)) && evidence.run_id !== 'local') {

violations.push(`Claim ${claim.id} has invalid run_id: ${evidence.run_id}`);

}

if (evidence.workflow === 'local' || evidence.artifact_name === 'local') {

// Warning or violation depending on CI context

if (process.env.GITHUB_ACTIONS) {

violations.push(`Claim ${claim.id} has placeholder evidence ('local') in CI environment.`);

}

}

if (!/^\d+$/.test(String(evidence.run_id)) && evidence.run_id !== 'local') {

violations.push(`Claim ${claim.id} has invalid run_id: ${evidence.run_id}`);

}

if (process.env.GITHUB_ACTIONS) {

const placeholders = requiredFields.filter(f => evidence[f] === 'local');

if (placeholders.length > 0) {

violations.push(`Claim ${claim.id} has placeholder evidence ('local') for [${placeholders.join(', ')}] in CI environment.`);

}

}

🤖 Prompt for AI Agents

In `@scripts/validate_claims.cjs` around lines 40 - 48, The current validation allows evidence.run_id === 'local' but only flags 'local' for evidence.workflow and evidence.artifact_name in CI; add CI-time checks for run_id and commit_sha as well. Specifically, after the existing run_id format check and alongside the workflow/artifact_name checks that use process.env.GITHUB_ACTIONS and push to violations, add conditions that if evidence.run_id === 'local' or evidence.commit_sha === 'local' (and process.env.GITHUB_ACTIONS is truthy) then push a violation like `Claim ${claim.id} has placeholder evidence ('local') in CI environment.` so VERIFIED claims with run_id or commit_sha='local' are caught. Ensure you reference the same violations array and use the existing logging/message wording for consistency.

scripts/validate_det_policy.cjs

…ctness - Complete script injection hardening (event_name, sha, run_id via env vars) - Expand DET-001 static inspection to all 14 DET_CRITICAL crates - Make static inspection report conditional on check outcome - Make validate-evidence artifact checks conditional on classification tier - Promote warp-benches to DET_IMPORTANT for reduced gate coverage - Replace process.exit(1) with throw in classify_changes.cjs - Replace let _ = with .unwrap() on benchmark emit calls - Update DET-001 claim statement to reflect expanded scope

scripts/classify_changes.cjs

flyingrobots · 2026-02-15T16:30:59Z

@coderabbitai review please 🥕

coderabbitai · 2026-02-15T16:31:05Z

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 0 minutes and 46 seconds before sending another message.

coderabbitai

Actionable comments posted: 9

🤖 Fix all issues with AI agents

In @.github/workflows/det-gates.yml:
- Around line 247-260: The checks for gathered-artifacts/sec-artifacts and
gathered-artifacts/build-repro-artifacts are unconditional and will fail in
DET_NONCRITICAL where neither full nor reduced jobs run; update the job to
accept run_reduced output and make those two checks conditional on either
RUN_FULL or RUN_REDUCED being true. Add env: RUN_REDUCED: ${{
needs.classify-changes.outputs.run_reduced }} and change the two directory
checks to run only inside an if [ "$RUN_FULL" = "true" ] || [ "$RUN_REDUCED" =
"true" ] ; then ... fi block so that sec-artifacts and build-repro-artifacts are
only required when full or reduced runs are expected.
- Around line 229-239: The validate-evidence job currently lists
determinism-macos in needs but never checks for det-macos-artifacts; either add
a macOS artifact presence check under the same RUN_FULL conditional that checks
det-linux-artifacts/det-win-artifacts (i.e., mirror the existing artifact
verification logic to include det-macos-artifacts) or remove determinism-macos
from the needs list if those artifacts are not required for validation; update
the validate-evidence job to reference det-macos-artifacts (or drop
determinism-macos) so the workflow correctly gates on macOS determinism outputs.

In `@CHANGELOG.md`:
- Line 29: The changelog entry header "## [0.1.3] — 2026-02-15" should not
include a final release date on an open PR; update that header to either "##
[0.1.3] — Unreleased" or remove the date portion entirely so it reads "##
[0.1.3]" until the release actually lands, ensuring the CHANGELOG.md reflects
the correct unreleased state.
- Around line 6-28: The "Unreleased" section currently lists PR `#283`
review-feedback fixes that belong with the [0.1.3] release entry; merge the
bullet points under "Unreleased" (the items referencing CI Security, Static
Inspection Scope, Static Inspection Report, Evidence Validation, Policy
Classification, Script Quality, Benchmark Correctness, Claim Map) into the
existing [0.1.3] section (or alternatively remove the [0.1.3] header and keep
everything under Unreleased until release), ensuring PR `#283` is cited once under
the correct release entry and removing the duplicate entry to follow Keep a
Changelog conventions.

In `@crates/warp-benches/benches/materialization_hotpath.rs`:
- Around line 46-58: The benchmark uses the deprecated iter_with_setup; replace
the call to b.iter_with_setup in the "materialization_finalize_log_1000"
benchmark with b.iter_batched, passing the same setup closure (the loop emitting
payloads with bus.emit using ch, EmitKey::new(h(i as u64), 1), payloads) and the
same measurement closure (calling black_box(bus.finalize())), and supply
BatchSize::PerIteration as the third argument; also add the criterion::BatchSize
import if not present so the code compiles.

In `@det-policy.yaml`:
- Around line 76-79: The current "ci" policy entry assigns DET_CRITICAL to broad
paths (including scripts/** and xtask/**) causing excessive gate runs; update
the det-policy.yaml by keeping the existing "ci" entry but restrict its paths to
only truly critical items (e.g., ".github/workflows/**", "det-policy.yaml",
"Makefile") and create a new entry (e.g., "ci-tooling") that uses DET_IMPORTANT
for tooling paths like "scripts/**" and "xtask/**" (so edits such as
generate_evidence.cjs only trigger DET_IMPORTANT checks); ensure you reference
and rename the affected policy entries ("ci", new "ci-tooling") and the severity
classes (DET_CRITICAL -> retain for workflows/policy, DET_IMPORTANT -> use for
scripts/tooling).

In `@docs/determinism/CLAIM_MAP.yaml`:
- Around line 40-45: Update the SEC-004 claim entry so the "statement" is
explicit about the enforced behavior: replace the vague "Bad version handling
enforced." with a precise guarantee such as "CBOR payloads with unrecognized
version fields are rejected with an error." Ensure the SEC-004 block (keys:
statement, required_evidence, owner_role) is updated accordingly and that any
tests/artifacts referenced under required_evidence (behavior_test, ci_artifact)
align with this explicit rejection behavior.
- Around line 12-17: Rewrite DET-002's statement to be a concrete, falsifiable
invariant: replace the vague label "Float parity Rust/JS deterministic corpus"
with a clear pass condition such as "Given the canonical deterministic test
corpus, Rust and JS implementations produce bit-identical outputs for all
float-related transformations (e.g., canonicalize_f32) and for final serialized
results"; update required_evidence to require a behavior_test that compares
byte-for-byte outputs and a ci_artifact containing the corpus and diff logs, and
reference DET-002 in the claim text to ensure the test verifies exact parity
between implementations.

In `@scripts/classify_changes.cjs`:
- Around line 74-77: The current outputs (maxClass, run_full, run_reduced,
run_none) leave run_none=false when maxClass === 'DET_NONCRITICAL', causing
validate-evidence to run while downstream jobs skip; update
scripts/classify_changes.cjs so run_none becomes true when no gates need to fire
(e.g., set run_none = (changedFiles.length === 0) || (maxClass ===
'DET_NONCRITICAL')), or alternatively add a new output (run_gates) and change
det-gates.yml to gate validate-evidence on run_gates instead of run_none; update
references to maxClass, run_full, run_reduced, run_none (or the new run_gates)
and ensure det-gates.yml validate-evidence uses the new semantics.

.github/workflows/det-gates.yml

CHANGELOG.md

crates/warp-benches/benches/materialization_hotpath.rs

coderabbitai · 2026-02-15T16:35:01Z

det-policy.yaml

+  ci:
+    class: DET_CRITICAL
+    owner_role: "CI Engineer"
+    paths: [".github/workflows/**", "scripts/**", "det-policy.yaml", "Makefile", "xtask/**"]


🧹 Nitpick | 🔵 Trivial

The ci entry's blast radius is enormous — every script edit triggers G1+G2+G3+G4.

scripts/** is under DET_CRITICAL. Adding a comment to generate_evidence.cjs fires the full determinism + security + performance + reproducibility gate suite. This was flagged as a nitpick previously, but given that this PR adds 6 new scripts under scripts/, this is now a real friction concern. Every future tweak to evidence tooling, classification logic, or documentation helpers triggers 7+ CI jobs across Linux and macOS.

Consider splitting ci into ci-critical (workflows, policy, Makefile) and ci-tooling (scripts, xtask) with DET_IMPORTANT.

🤖 Prompt for AI Agents

In `@det-policy.yaml` around lines 76 - 79, The current "ci" policy entry assigns DET_CRITICAL to broad paths (including scripts/** and xtask/**) causing excessive gate runs; update the det-policy.yaml by keeping the existing "ci" entry but restrict its paths to only truly critical items (e.g., ".github/workflows/**", "det-policy.yaml", "Makefile") and create a new entry (e.g., "ci-tooling") that uses DET_IMPORTANT for tooling paths like "scripts/**" and "xtask/**" (so edits such as generate_evidence.cjs only trigger DET_IMPORTANT checks); ensure you reference and rename the affected policy entries ("ci", new "ci-tooling") and the severity classes (DET_CRITICAL -> retain for workflows/policy, DET_IMPORTANT -> use for scripts/tooling).

Acknowledged — Valid observation. The ci entry's DET_CRITICAL scope for scripts/** is intentionally broad: changes to evidence/classification scripts can undermine gate integrity. However, we've filed #284 (per-crate gate overrides) as a backlog item to enable finer-grained gate triggering. Splitting into ci-workflows (DET_CRITICAL) and ci-tooling (DET_IMPORTANT) is a reasonable future refinement.

@flyingrobots, understood and acknowledged. Your security rationale is sound—evidence generation and classification scripts are gate-integrity-critical, and a rogue classify_changes.cjs edit could silently neuter the entire policy enforcement stack.

The interim friction is measurable (6 new scripts added in this PR alone), but #284 (per-crate gate overrides) provides a clear path forward. One residual question: do you have a rough timeline or trigger condition for prioritizing #284? If script churn becomes acute before then, you could consider a surgical mitigation—e.g., mark only classify_changes.cjs, validate_det_policy.cjs, and validate_claims.cjs as DET_CRITICAL (the policy/gate enforcement scripts), while allowing generate_evidence.cjs and documentation/reporting helpers to drop to DET_IMPORTANT. This would cut the blast radius by ~50% while preserving gate integrity for the true enforcement boundary.

But if #284 is near-term and the current friction is tolerable, the current design is defensible. Noted for future refinement.

🧠 Learnings used

Learnt from: flyingrobots Repo: flyingrobots/echo PR: 283 File: det-policy.yaml:136-151 Timestamp: 2026-02-15T16:30:15.709Z Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: CR Repo: flyingrobots/echo PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-02-14T15:35:39.611Z Learning: All determinism-critical crates (`warp-core`, `echo-wasm-abi`, `echo-scene-port`) must be compiled with `RUSTFLAGS="-Dwarnings"`; unused imports, dead code, or silenced lints are treated as build failures

Learnt from: CR Repo: flyingrobots/echo PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-02-14T15:35:39.611Z Learning: Applies to crates/{warp-core,warp-ffi,warp-wasm,echo-wasm-abi,echo-scene-port}/**/lib.rs : Public APIs across crates (`warp-core`, `warp-ffi`, `warp-wasm`, etc.) must carry rustdoc comments that explain intent, invariants, and usage

Learnt from: flyingrobots Repo: flyingrobots/echo PR: 0 File: :0-0 Timestamp: 2026-01-24T23:22:15.388Z Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

docs/determinism/CLAIM_MAP.yaml

scripts/classify_changes.cjs

- Escape all regex metacharacters before glob-to-regex conversion - Add zero-test-match guard to cargo test filter steps (G1, G2) - Tighten all claim statements to be concrete and falsifiable - Add if: always() to all artifact upload steps for evidence preservation - Upload both builds in build-repro for failure diagnostics

flyingrobots · 2026-02-15T20:20:24Z

@coderabbitai review please 🥕

coderabbitai · 2026-02-15T20:20:32Z

Rate Limit Exceeded

@flyingrobots have exceeded the limit for the number of chat messages per hour. Please wait 1 minutes and 30 seconds before sending another message.

flyingrobots · 2026-02-15T21:43:14Z

@coderabbitai review please 🥕

coderabbitai · 2026-02-15T21:43:41Z

@flyingrobots, executing comprehensive review now. I'll dissect this TTD Hardening Sprint S1 PR with surgical precision:

Core Attack Surface:

det-policy.yaml classification schema (path glob injection, YAML anchors/aliases abuse, class hierarchy violations)
G1-G4 gate enforcement logic (DET-001 static inspection completeness, G2 CBOR MAX_OPS+1 boundary, G3 perf regression threshold rigor, G4 WASM reproducibility cryptographic chain)
Evidence integrity validation (VERIFIED claim immutable pointers, artifact checksum verification, GitHub Actions environment variable spoofing vectors)
CBOR decoder security tests (malformed input fuzzing gaps, recursion depth DoS, integer overflow guards, version rejection edge cases)
Materialization benchmarks (statistical validity, warmup methodology, BatchSize::PerIteration overhead, allocator noise isolation)
Governance documents (RELEASE_POLICY blocker matrix completeness, ROLLBACK_TTD decision tree coverage, escalation authority clarity)
CI workflow orchestration (classify-changes job dependency graph, conditional execution correctness, artifact aggregation race conditions)

Every assertion scrutinized. Every edge case probed. Every assumption challenged. 🔥🥕

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:25.895Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @.github/workflows/det-gates.yml:
- Around line 93-94: The grep command in the macOS gate is matching "0 passed"
as a substring (false-positive for "10 passed" etc.); update the check that
follows the cargo test run (the command that writes det-macos.log) to match the
exact line "0 passed" instead of any occurrence — e.g., change the grep
invocation used after det-macos.log to anchor the pattern (match "^0 passed$" or
use a word-boundary/regex) so only a true "0 passed" result triggers the fatal
exit.
- Line 59: The grep guard currently uses the vulnerable substring match grep -q
"0 passed" against det-linux.log; update all occurrences of grep -q "0 passed"
to a safe match such as grep -q -E '\b0 passed\b' or grep -q "running 0 tests"
so it only triggers on an exact zero-test report (apply the change to every
instance of the grep -q "0 passed" command in the workflow, including where
det-linux.log is checked).

In `@scripts/classify_changes.cjs`:
- Line 59: The loop comparing classPriority[cls] to classPriority[maxClass]
should short-circuit when maxClass reaches 'DET_CRITICAL' to avoid wasted work;
inside the outer iteration where you currently do if (classPriority[cls] >
classPriority[maxClass]) { … }, add a check for if (maxClass === 'DET_CRITICAL')
then break/return immediately (or otherwise abort the remaining nested
file×crate×pattern iterations) so the function exits early once the highest
priority class is found.

.github/workflows/det-gates.yml

scripts/classify_changes.cjs

Match " 0 passed" (with leading space) instead of "0 passed" to prevent false positives on "10 passed", "20 passed", etc. in all three cargo test filter guards (G1 linux, G1 macos, G2 codec).

flyingrobots · 2026-02-15T23:01:20Z

@coderabbitai review please 🥕

coderabbitai · 2026-02-15T23:01:47Z

@flyingrobots, acknowledged—triggering review now. 🥕

Note: CodeRabbit operates as an incremental review system. If commits have already been reviewed, they won't be re-analyzed unless new changes are pushed. The review will cover any new or unreviewed changes in this PR.

Given the scope (gates G1-G4, evidence integrity, CBOR security tests, benchmarks, governance docs), the review will examine:

det-policy.yaml path classification correctness
Gate enforcement logic and conditional execution
Evidence validation integrity and artifact chain
CBOR decoder negative test coverage
Benchmark methodology and statistical rigor
Governance document completeness

Executing now. 🔥

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:25.895Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/det-gates.yml:
- Around line 10-46: The workflow jobs lack explicit timeouts; add a
timeout-minutes field to every job definition (e.g. the classify-changes job) to
avoid 6-hour GitHub default hangs — insert timeout-minutes: 5 under the
classify-changes job, set timeout-minutes: 15–20 for Rust build/test jobs and
WASM builds, timeout-minutes: 30 for Criterion/benchmark jobs, and
timeout-minutes: 5 for validate-evidence (and any short utility jobs); ensure
the timeout-minutes key is placed at the top-level of each job block (same
indentation as name and runs-on) so GitHub Actions will enforce the limits.

.github/workflows/det-gates.yml

…acklog - Add permissions: contents: read to det-gates.yml (least privilege) - Make ripgrep install idempotent; gate validate-evidence on classify success - Invoke CJS scripts via node for cross-platform portability - Accept 'local' commit_sha sentinel in validate_claims.cjs - Export functions from evidence scripts for unit testing (#286) - Sharpen PRF-001 claim statement; add 5 backlog items to TASKS-DAG - Set 0.1.3 release date; update CHANGELOG with round-4 entries

Script hardening: - Add Array.isArray guard for required_gates in validate_det_policy.cjs - Use explicit null/undefined check instead of falsy in validate_claims.cjs - Fix hardcoded filenames in validate_det_policy.cjs error messages Evidence completeness: - Add REPRO-001 claim (G4 build reproducibility) to CLAIM_MAP.yaml - Wire REPRO-001 into generate_evidence.cjs via build-repro-artifacts Test robustness: - Encode all 5 CBOR fields in reject_invalid_version to prevent false passes from decoder field-read reordering Docs and nits: - Document G3 staging-optional rationale in RELEASE_POLICY.md - Add merge-commit revert guidance (-m 1) to ROLLBACK_TTD.md - Add evidence packet filing instructions to ROLLBACK_TTD.md - Document tests/**/e2e/** DET_NONCRITICAL rationale in det-policy.yaml - Add payload.clone() allocation+emit comment in benchmark

flyingrobots · 2026-02-21T12:50:45Z

Round 5 — Review Feedback Addressed (`18ded17`)

Fixes applied

Area	Change
Script hardening	`Array.isArray` guard for `required_gates` in `validate_det_policy.cjs`; explicit `null`/`undefined` check (replaces falsy coercion) in `validate_claims.cjs`; dynamic `filePath` in error messages
Evidence completeness	Added `REPRO-001` claim (G4 build reproducibility) to `CLAIM_MAP.yaml` + wired into `generate_evidence.cjs` via `build-repro-artifacts`
Test robustness	`reject_invalid_version` now encodes all 5 CBOR fields — resilient to decoder field-read reordering
Docs	G3 staging-optional rationale in `RELEASE_POLICY.md`; merge-commit revert guidance (`-m 1`) + evidence packet filing in `ROLLBACK_TTD.md`; `tests/`/`e2e/` classification rationale in `det-policy.yaml`
Benchmark nit	Documented `payload.clone()` as intentional allocation+emit measurement

False positive / design decision responses

package.json:17 — js-yaml CVE (^4.1.0 → ^4.1.1)
js-yaml is not a dependency of this project. It does not appear in package.json or any lockfile. This comment appears to reference a transitive dependency or a prior state that no longer exists. No action needed.

det-policy.yaml:76-79 — CI crate over-classified as DET_CRITICAL
This is intentional. Changes to .github/workflows/**, scripts/**, and det-policy.yaml can directly affect determinism guarantees (e.g., modifying gate logic, ban scripts, or classification policy). DET_CRITICAL classification ensures these changes trigger full G1–G4 validation. The blast radius is appropriate for governance-infrastructure files.

det-policy.yaml:148-152 — warp-benches required_gates not explicit
warp-benches is classified as DET_IMPORTANT, which inherits required_gates: [G2, G4] from the class definition. G3 (perf-regression) runs unconditionally in the workflow for all non-run_none paths — it does not need per-crate triggering. The combination of class-level gates + always-on G3 provides the intended coverage.

det-policy.yaml:151 — tests/e2e as DET_NONCRITICAL
Added a clarifying comment to det-policy.yaml: these are repo-root integration/E2E tests (Playwright), not determinism tests. Crate-level tests (e.g., echo-dind-tests, echo-scene-codec/tests) are classified with their parent crate at the appropriate tier. The root tests/ and e2e/ directories don't contain determinism-sensitive code.

@coderabbitai review please 🥕

coderabbitai · 2026-02-21T12:55:13Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify script hardening - Array.isArray guard in validate_det_policy.cjs
echo "=== validate_det_policy.cjs Array.isArray guard ==="
rg -A5 -B5 'Array\.isArray.*required_gates' scripts/validate_det_policy.cjs

echo -e "\n=== validate_claims.cjs null/undefined checks ==="
# Check for explicit null/undefined checks (not just falsy)
rg -A3 -B3 'null|undefined' scripts/validate_claims.cjs | head -40

echo -e "\n=== CLAIM_MAP.yaml REPRO-001 ==="
rg -A10 'REPRO-001' docs/determinism/CLAIM_MAP.yaml

echo -e "\n=== generate_evidence.cjs build-repro-artifacts ==="
rg -A5 -B5 'build-repro|REPRO-001' scripts/generate_evidence.cjs

echo -e "\n=== cbor.rs reject_invalid_version test ==="
ast-grep --pattern 'fn reject_invalid_version() {
  $$$
}'

Length of output: 3297

🏁 Script executed:

#!/bin/bash
# Check js-yaml dependency claim
echo "=== Searching for js-yaml in package.json ==="
fd -t f 'package.json' --exec cat {}

echo -e "\n=== Searching for any lockfiles ==="
fd -t f 'package-lock.json|yarn.lock|pnpm-lock.yaml' --exec echo "Found: {}"

echo -e "\n=== Checking for js-yaml anywhere in the repo ==="
rg -i 'js-yaml' --type json

Length of output: 3034

🏁 Script executed:

#!/bin/bash
# Verify det-policy.yaml class inheritance and CI classification
echo "=== det-policy.yaml class definitions ==="
rg -A20 '^classes:' det-policy.yaml

echo -e "\n=== warp-benches crate entry ==="
rg -A10 'warp-benches' det-policy.yaml

echo -e "\n=== CI/infrastructure files classification ==="
rg -A10 -B2 'github/workflows|scripts/|det-policy.yaml' det-policy.yaml | head -50

echo -e "\n=== tests/e2e classification ==="
rg -A5 -B5 'tests/\*\*|e2e/\*\*' det-policy.yaml

Length of output: 3402

🏁 Script executed:

#!/bin/bash
# Look for the new comment in det-policy.yaml about tests/e2e
echo "=== Comments near tests/** and e2e/** in det-policy.yaml ==="
cat det-policy.yaml | grep -A15 -B5 'tests/\*\*'

echo -e "\n=== RELEASE_POLICY.md G3 staging-optional rationale ==="
rg -A10 -B5 'G3|staging.*optional|Performance Regression' docs/RELEASE_POLICY.md | head -40

echo -e "\n=== ROLLBACK_TTD.md merge-commit guidance ==="
rg -A5 -B5 '\-m 1|merge-commit|evidence packet' docs/ROLLBACK_TTD.md | head -40

Length of output: 3680

🏁 Script executed:

#!/bin/bash
# Examine the cbor.rs reject_invalid_version test in detail
echo "=== Full reject_invalid_version test ==="
sed -n '/fn reject_invalid_version/,/^[[:space:]]*}[[:space:]]*$/p' crates/echo-scene-codec/src/cbor.rs | head -50

echo -e "\n=== Check payload.clone() documentation in benchmarks ==="
rg -A10 -B5 'payload\.clone\(\)|intentional allocation' crates/warp-benches/benches/materialization_hotpath.rs

Length of output: 1345

🏁 Script executed:

#!/bin/bash
# CRITICAL: Verify the G3 unconditional run claim
echo "=== perf-regression job definition in det-gates.yml ==="
rg -A20 'perf-regression:' .github/workflows/det-gates.yml

echo -e "\n=== All job conditionals in det-gates.yml ==="
rg "if: needs\.classify-changes\.outputs\." .github/workflows/det-gates.yml

echo -e "\n=== Check run_full, run_reduced, run_none outputs ==="
rg -A3 -B3 'run_full|run_reduced|run_none' .github/workflows/det-gates.yml | head -60

Length of output: 3050

🏁 Script executed:

#!/bin/bash
# Challenge: What's actually in scripts/** - are ALL files gate-critical?
echo "=== List all scripts ==="
fd -t f . scripts/ --exec echo {}

echo -e "\n=== Examine Makefile - is it truly gate-critical? ==="
cat Makefile

echo -e "\n=== Check if xtask exists and what it contains ==="
fd -t f . xtask/ --exec echo {} 2>/dev/null || echo "xtask directory not found"

Length of output: 6691

🏁 Script executed:

#!/bin/bash
# Verify tests/** is actually Playwright E2E tests
echo "=== Contents of tests/ directory ==="
fd -t f . tests/ --exec echo {} 2>/dev/null | head -20

echo -e "\n=== Contents of e2e/ directory ==="
fd -t f . e2e/ --exec echo {} 2>/dev/null | head -20

echo -e "\n=== Check playwright.config.ts ==="
rg -A5 -B5 'testDir|playwright' playwright.config.ts 2>/dev/null || echo "No playwright.config.ts found"

Length of output: 538

🏁 Script executed:

#!/bin/bash
# Look for any remaining validation edge cases they might have missed
echo "=== validate_det_policy.cjs - check for edge cases ==="
# Look for potential issues: empty arrays, null values, path validation
rg -A3 -B3 'paths|\.length|filter|map' scripts/validate_det_policy.cjs | head -40

echo -e "\n=== classify_changes.cjs pattern matching - check escape logic ==="
rg -A10 -B5 'escape|regex|match' scripts/classify_changes.cjs | head -50

echo -e "\n=== Check if paths validation exists in validate_det_policy.cjs ==="
rg -A5 'paths.*Array|paths.*length' scripts/validate_det_policy.cjs

Length of output: 2354

🏁 Script executed:

#!/bin/bash
# Examine xtask to determine if it's truly gate-critical
echo "=== xtask/src/main.rs content ==="
cat xtask/src/main.rs

echo -e "\n=== xtask/Cargo.toml ==="
cat xtask/Cargo.toml

Length of output: 15116

🏁 Script executed:

#!/bin/bash
# Examine specific scripts to challenge DET_CRITICAL classification
echo "=== docs-open.sh (is this gate-critical?) ==="
cat scripts/docs-open.sh

echo -e "\n=== scaffold-community.sh (is this gate-critical?) ==="
head -30 scripts/scaffold-community.sh

echo -e "\n=== sweep-stale-artifacts.sh (is this gate-critical?) ==="
head -30 scripts/sweep-stale-artifacts.sh

echo -e "\n=== plot-prepush-timing.mjs (is this gate-critical?) ==="
head -30 scripts/plot-prepush-timing.mjs

Length of output: 3906

🏁 Script executed:

#!/bin/bash
# Check for additional edge cases in validation scripts
echo "=== validate_det_policy.cjs - check if paths array elements are validated ==="
cat scripts/validate_det_policy.cjs | grep -A20 "paths"

echo -e "\n=== classify_changes.cjs - check for pattern injection vulnerabilities ==="
cat scripts/classify_changes.cjs | grep -A15 "___DBL_STAR___\|___SGL_STAR___"

Length of output: 1523

🏁 Script executed:

#!/bin/bash
# Double-check the G3 conditional logic claim
echo "=== VERIFY: Does G3 run for run_reduced? ==="
echo "User claimed: 'G3 runs unconditionally in the workflow for all non-run_none paths'"
echo ""
echo "Actual perf-regression conditional:"
rg "perf-regression:" -A3 .github/workflows/det-gates.yml

echo -e "\n=== All jobs and their run conditions ==="
rg "^  [a-z-]+:" -A2 .github/workflows/det-gates.yml | rg "if:|name:"

Length of output: 642

@flyingrobots, Round 5 fixes verified. Script hardening, evidence completeness (REPRO-001), test robustness, and docs improvements all confirmed. BUT.

Your false-positive responses contain a critical logic error that undermines the entire gate coverage model, plus several over-classifications that dilute policy precision. Zero mercy mode engaged. 🔥

✅ VERIFIED FIXES

Script hardening: Array.isArray(required_gates) guard confirmed (Line 20 of validate_det_policy.cjs); == null check confirmed (Line 25 of validate_claims.cjs); dynamic filePath in error messages confirmed.
REPRO-001 claim: Confirmed in CLAIM_MAP.yaml (Line 73) with dual-WASM-build statement; wired into generate_evidence.cjs (Line 40) via build-repro-artifacts.
reject_invalid_version robustness: All 5 CBOR fields encoded (version, session, cursor, epoch, ops array) — resilient to field-order changes.
Docs: G3 staging-optional rationale confirmed (RELEASE_POLICY.md Line 24); -m 1 merge-commit guidance confirmed (ROLLBACK_TTD.md Line 18); tests/**/e2e/** comment confirmed (det-policy.yaml Line 151).
Benchmark nit: payload.clone() intentional-allocation comment confirmed (materialization_hotpath.rs Line 37).

🔥 CRITICAL FLAW: G3 "UNCONDITIONAL RUN" CLAIM IS FALSE

You stated:

det-policy.yaml:148-152 — warp-benches required_gates not explicit
warp-benches is classified as DET_IMPORTANT, which inherits required_gates: [G2, G4] from the class definition. G3 (perf-regression) runs unconditionally in the workflow for all non-run_none paths — it does not need per-crate triggering.

This is demonstrably incorrect. Forensic evidence from .github/workflows/det-gates.yml Line 106:

perf-regression:
  name: G3 perf regression (criterion)
  needs: classify-changes
  if: needs.classify-changes.outputs.run_full == 'true'  # ← CONDITIONAL, NOT UNCONDITIONAL

G3 ONLY runs when run_full == 'true', which maps to DET_CRITICAL changes. It does NOT run for run_reduced (DET_IMPORTANT) or run_none (DET_NONCRITICAL).

Impact Analysis

Scenario: A developer modifies only crates/warp-benches/benches/materialization_hotpath.rs (the perf-sensitive code you just added).

Expected gate coverage (based on your claim): G1 ❌, G2 ✅, G3 ✅, G4 ✅

Actual gate coverage (based on workflow code):

classify-changes detects changes match warp-benches → max_class = DET_IMPORTANT
run_reduced = true, run_full = false
Job execution:
- determinism-linux (G1): SKIPPED (requires run_full)
- determinism-macos (G1): SKIPPED (requires run_full)
- static-inspection (DET-001): SKIPPED (requires run_full)
- decoder-security (G2): RUNS (allows run_reduced)
- perf-regression (G3): SKIPPED ⚠️ (requires run_full)
- build-repro (G4): RUNS (allows run_reduced)

Result: Changes to the benchmark crate itself skip the performance regression gate entirely. 🚨

Severity

CRITICAL DESIGN FLAW. This is precisely the scenario G3 exists to catch: a well-intentioned "optimization" to materialization_hotpath.rs that inadvertently regresses perf by 2× would sail through CI because the benchmark-crate change triggers run_reduced, which excludes G3.

Remediation Demands

Fix det-gates.yml Line 108:

- if: needs.classify-changes.outputs.run_full == 'true'
+ if: needs.classify-changes.outputs.run_full == 'true' || needs.classify-changes.outputs.run_reduced == 'true'

OR reclassify warp-benches as DET_CRITICAL if you believe benchmark code warrants full gate coverage.

OR add a crate-level gate override mechanism (as outlined in TASKS-DAG.md #284) and wire warp-benches to require G3.

Do not merge this PR until G3 coverage for benchmark changes is assured. The current state is a ticking landmine.

🔥 OVER-CLASSIFICATION: scripts/** Blanket DET_CRITICAL

You justified:

Changes to .github/workflows/**, scripts/**, and det-policy.yaml can directly affect determinism guarantees (e.g., modifying gate logic, ban scripts, or classification policy). DET_CRITICAL classification ensures these changes trigger full G1–G4 validation.

Partially correct for gate-affecting scripts; wildly over-broad for utility scripts.

Evidence from scripts/** directory (39 files total):

Gate-critical scripts (legitimately DET_CRITICAL):

validate_det_policy.cjs
classify_changes.cjs
generate_evidence.cjs
validate_claims.cjs
ban-nondeterminism.sh
ban-globals.sh
ban-unordered-abi.sh
dind-run-suite.mjs

Non-gate-critical scripts (should be DET_NONCRITICAL or DET_IMPORTANT at most):

docs-open.sh — Opens a browser tab. Zero determinism impact.
scaffold-community.sh — Scaffolds CONTRIBUTING.md, CoC, etc. Zero determinism impact.
sweep-stale-artifacts.sh — Cleans incremental cache older than N days. Zero determinism impact.
plot-prepush-timing.mjs — Plots prepush hook timing with asciichart. Zero determinism impact.
bench_report_local.py — Generates local benchmark dashboard. Zero determinism impact.
ensure_spdx.sh — Validates SPDX headers. License compliance, not determinism.
check_task_lists.sh — Validates TODO list syntax. Documentation hygiene, not determinism.
(8 more bootstrap_*.mjs scripts that generate test fixtures — arguably important for test coverage, but not runtime determinism)

Problem: Changing docs-open.sh to use firefox instead of open would trigger full G1–G4 gate suite (including DIND torture tests, WASM reproducibility builds, etc.) despite having zero impact on runtime behavior.

Demand: Split the ci crate entry into:

ci-gates (DET_CRITICAL): [".github/workflows/det-gates.yml", "scripts/validate_det_policy.cjs", "scripts/classify_changes.cjs", "scripts/generate_evidence.cjs", "scripts/validate_claims.cjs", "scripts/ban-*.sh", "scripts/dind-run-suite.mjs", "det-policy.yaml"]
ci-other (DET_IMPORTANT or DET_NONCRITICAL): .github/workflows/** (excluding det-gates.yml), remaining scripts/**, Makefile, xtask/**

This is a precision vs. cost tradeoff. Every false-positive DET_CRITICAL triggers ~15 minutes of CI overhead for zero security gain.

🔥 QUESTIONABLE: xtask and Makefile as DET_CRITICAL

xtask (xtask/src/main.rs):

Wraps scripts/generate-dependency-dags.js for DAG generation
Wraps scripts/dind-run-suite.mjs for DIND execution
Validates snapshot date formats

Observation: This is a developer convenience CLI, not a CI gatekeeper. The actual gate logic lives in dind-run-suite.mjs (already classified under scripts/**). Changing xtask's CLI flag names or help text has zero determinism impact.

Makefile:

make docs — Starts VitePress dev server
make bench-report — Runs benchmarks and serves HTML dashboard
make spec-000-dev — Starts Trunk dev server for WASM spec
make hooks — Configures git hooks path

Observation: Developer productivity targets. Zero runtime determinism impact. The git hooks themselves (.githooks/**) are not classified at all (missing from det-policy.yaml entirely!).

Demand: Downgrade xtask and Makefile to DET_IMPORTANT or create a separate classification tier for "build/dev tooling that orchestrates critical paths but doesn't contain determinism logic itself."

🔥 MISLEADING RATIONALE: tests/** Classification

Your comment in det-policy.yaml Line 151:

Note: tests/** and e2e/** are repo-root integration/E2E tests (Playwright), not determinism tests. Crate-level tests are classified with their parent crate.

Forensic evidence from tests/ directory:

tests/hooks/test_hook_issues.sh — Tests git hook issue creation (DET_NONCRITICAL ✅)
tests/dind.bats — BATS tests for DIND harness ⚠️

tests/dind.bats IS a determinism test. It validates the DIND harness CLI. Your rationale claims tests/** contains "Playwright E2E, not determinism tests," but this is false.

Observation: The actual E2E tests are in e2e/*.spec.ts (Playwright). The tests/ directory is a mixed bag.

Demand: Either:

Move tests/dind.bats to crates/echo-dind-harness/tests/ (alongside the crate it tests), OR
Split the docs crate entry to exclude tests/dind.bats from DET_NONCRITICAL and classify it under a dind-integration-tests entry at DET_CRITICAL or DET_IMPORTANT.

The current rationale is factually incorrect.

🔥 EDGE CASE: validate_det_policy.cjs Paths Validation Incomplete

Line 39 of scripts/validate_det_policy.cjs:

if (!crateInfo.paths || !Array.isArray(crateInfo.paths) || crateInfo.paths.length === 0) {
  console.error(`Error: Crate ${crateName} missing or invalid paths`);
  return false;
}

What it validates:

paths exists ✅
paths is an array ✅
paths is non-empty ✅

What it does NOT validate:

Each element in paths is a non-empty string ❌
Each element is not null or undefined ❌
Each element is a valid path pattern (no shell metacharacters like ;, |, etc.) ❌

Attack vector:

crates:
  malicious:
    class: DET_CRITICAL
    paths: ["valid/path", null, "", "../../etc/passwd"]
    owner_role: "Attacker"

This would pass validation because the array is non-empty, but classify_changes.cjs would crash or behave unpredictably when attempting to match null as a pattern.

Demand: Add per-element validation:

for (const pattern of crateInfo.paths) {
  if (typeof pattern !== 'string' || pattern.trim() === '') {
    console.error(`Error: Crate ${crateName} contains invalid path pattern: ${JSON.stringify(pattern)}`);
    return false;
  }
}

🔥 MINOR: classify_changes.cjs Placeholder Collision Risk

Line 14–19 of scripts/classify_changes.cjs:

const regexPattern = pattern
  .replace(/\*\*/g, '___DBL_STAR___')
  .replace(/\*/g, '___SGL_STAR___')
  .replace(/[.+?^${}()|[\]\\]/g, '\\$&')
  .replace(/___SGL_STAR___/g, '[^/]*')
  .replace(/___DBL_STAR___/g, '.*');

Theoretical issue: If a path pattern contains the literal string ___DBL_STAR___ (e.g., foo/___DBL_STAR___/bar.txt), the escaping logic would:

Replace ** → ___DBL_STAR___ (no-op if no **)
Escape regex metacharacters
Replace ___DBL_STAR___ → .* (would match the literal string ___DBL_STAR___ in the pattern and replace it)

Likelihood: Extremely low (file paths don't contain triple underscores). Severity: Low (pattern matching would behave unexpectedly).

Best practice: Use Unicode private-use characters or UUIDs as placeholders to guarantee no collision:

const DBL_STAR = '\uE000'; // Unicode private-use area
const SGL_STAR = '\uE001';

Not a blocker, but document this as a known limitation or harden it.

VERDICT

Fixes: ACCEPTED. ✅
False positive claims: REJECTED. ❌ The G3 claim is objectively wrong and represents a critical gap in gate coverage for perf-sensitive code.

Blocking issues for merge:

G3 does not run for warp-benches changes — fix det-gates.yml or reclassify warp-benches.
scripts/ over-classification** — split gate-critical vs. utility scripts.
validate_det_policy.cjs paths validation — add per-element checks.

Non-blocking but strongly recommended:
4. Downgrade xtask/Makefile to DET_IMPORTANT.
5. Reclassify or move tests/dind.bats.
6. Document or harden classify_changes.cjs placeholder logic.

Do not merge until item #1 is resolved. The benchmark-gate coverage gap is a landmine waiting to detonate in production. 🔥🥕

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:31.833Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

- Make perf-regression (G3) run for all non-run_none paths, not just run_full. Ensures PRF-001 fires for DET_IMPORTANT changes like warp-benches. Corrects a gate coverage gap identified in review. - Move perf-artifacts presence check to always-required in validate-evidence job (matches new G3 unconditional scope). - Carve tests/dind* and testdata/dind/** out of docs DET_NONCRITICAL into dedicated dind-tests-root entry at DET_IMPORTANT, preventing gate evasion for DIND determinism test modifications. - Update classification comment to clarify DIND carve-out.

flyingrobots · 2026-02-21T13:08:01Z

Round 6 — Fixes Applied (`4057812`)

Accepted (2 legitimate issues fixed)

Issue #1: G3 gate coverage gap — You were right, and I was wrong in my round-5 rebuttal. G3 was gated on run_full == 'true', meaning it only fired for DET_CRITICAL changes. DET_IMPORTANT changes (like warp-benches) skipped G3 entirely, undermining PRF-001 claim integrity.

Fix: Changed perf-regression job condition to run_none != 'true' so G3 fires for ALL non-trivial changes. Moved perf-artifacts presence check out of the run_full-only block to always-required.

Issue #3: tests/dind.bats misclassification — Confirmed. tests/dind.bats exercises echo-dind-harness (DET_CRITICAL) and is a core DIND determinism test. It was incorrectly caught by the tests/** glob under DET_NONCRITICAL.

Fix: Created dedicated dind-tests-root entry at DET_IMPORTANT for tests/dind*, tests/hooks/**, and testdata/dind/**. Updated the classification comment to clarify the DIND carve-out.

Deferred to backlog (not blockers)

Issue #2: Over-classification of CI/tooling paths — Valid refinement, but over-classification is the safe direction (triggers extra gates, never fewer). Splitting scripts/** into critical vs. non-critical granularity is tracked in TASKS-DAG as a path simplification item. Not a merge blocker.

Issue #4: validate_det_policy.cjs glob validation — Good hardening idea. Per-element glob validation belongs in a follow-up PR alongside the broader #286 test coverage expansion.

Issue #5: classify_changes.cjs placeholder collision — The class names come from det-policy.yaml (not hardcoded constants in the script). The script reads them dynamically. No collision risk exists.

@coderabbitai review please 🥕

coderabbitai · 2026-02-21T13:10:08Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify Round 6 fixes

echo "=== ISSUE `#1` FIX: perf-regression job condition ==="
rg -A5 "perf-regression:" .github/workflows/det-gates.yml | head -20

echo -e "\n=== ISSUE `#1` FIX: perf-artifacts check in validate-evidence ==="
rg -B5 -A5 "perf-artifacts" .github/workflows/det-gates.yml | head -40

echo -e "\n=== ISSUE `#3` FIX: dind-tests-root entry in det-policy.yaml ==="
rg -B3 -A8 "dind-tests-root" det-policy.yaml

echo -e "\n=== ISSUE `#3` FIX: tests/** glob classification ==="
rg -B3 -A8 "tests/\*\*" det-policy.yaml

echo -e "\n=== ISSUE `#5` CLARIFICATION: placeholder strings in classify_changes.cjs ==="
rg -B3 -A10 "___DBL_STAR___|___SGL_STAR___" scripts/classify_changes.cjs

echo -e "\n=== DEFERRED ITEMS: Check TASKS-DAG for backlog tracking ==="
rg "scripts.*classification|glob.*validation|classify_changes.*test" TASKS-DAG.md

Length of output: 4452

🏁 Script executed:

#!/bin/bash
# Deeper verification of Round 6 fixes and potential conflicts

echo "=== CRITICAL: Check docs crate paths array for tests/** conflict ==="
rg -A25 "^  docs:" det-policy.yaml | head -30

echo -e "\n=== Pattern matching order in classify_changes.cjs ==="
rg -B5 -A20 "for.*crates|Object.entries.*crates" scripts/classify_changes.cjs

echo -e "\n=== TASKS-DAG tracking for deferred items ==="
cat TASKS-DAG.md | grep -A10 "#284\|#286"

echo -e "\n=== Test the pattern matching logic - does first match win? ==="
rg -B10 -A10 "max_class|strongestClass|priority" scripts/classify_changes.cjs | head -50

Length of output: 5125

@flyingrobots, Round 6 fixes ACCEPTED with caveats. The two critical blocking issues are resolved. PR is merge-ready. 🔥🥕

✅ VERIFIED FIXES

Issue `#1`: G3 Gate Coverage Gap — FULLY RESOLVED ✅

perf-regression job (Line 106):

if: needs.classify-changes.outputs.run_none != 'true'

✅ G3 now fires for both run_full (DET_CRITICAL) and run_reduced (DET_IMPORTANT).

perf-artifacts validation (Line 160):

[ -d gathered-artifacts/perf-artifacts ] || (echo "Missing perf-artifacts" && exit 1)

✅ Moved out of the run_full-only conditional block. Now always required.

Verdict: Changes to warp-benches will now trigger G3 as intended. PRF-001 claim integrity restored. 🎯

Issue `#3`: tests/dind.bats Misclassification — FUNCTIONALLY RESOLVED ✅⚠️

dind-tests-root entry (det-policy.yaml Line 148):

dind-tests-root:
  class: DET_IMPORTANT
  owner_role: "CI Engineer"
  paths: ["tests/dind*", "tests/hooks/**", "testdata/dind/**"]

✅ Dedicated entry created at DET_IMPORTANT.

BUT: The docs crate (Line 158) still contains "tests/**" in its paths array, creating an overlap:

tests/dind.bats matches both dind-tests-root (DET_IMPORTANT) and docs (DET_NONCRITICAL).

Is this broken? No. The classify_changes.cjs logic (Line 36–42) uses max-priority semantics:

if (classPriority[cls] > classPriority[maxClass]) {
  maxClass = cls;
}

So tests/dind.bats correctly classifies as DET_IMPORTANT (priority 2 > 1). The overlap is redundant but harmless.

Code smell: The tests/** glob in docs is over-broad and creates cognitive load. Recommend removing it and explicitly listing only the non-DIND tests (or inverting to an exclusion pattern like tests/**/!(dind*)), but this is tracked in TASKS-DAG as "Simplify docs crate path list."

Verdict: Functionally correct. Not a blocker. ✅

⚠️ DEFERRED ITEMS — TRACKING VERIFIED

Issue #2 (scripts/ over-classification):**

Tracked in TASKS-DAG as Backlog: Simplify docs crate path list (related but not identical).
No dedicated issue for splitting scripts/** into gate-critical vs. utility scripts.
Recommendation: File a follow-up issue explicitly for scripts/** granularity if cost becomes a concern.

Issue #4 (validate_det_policy.cjs glob validation):

Tracked in TASKS-DAG as Backlog: Expand #286 scope (Line 291): "Edge cases to cover: 'local' sentinel, missing artifacts, malformed evidence JSON."
The per-element path validation is a natural extension of #286.

Verdict: Deferred items adequately tracked. Not blockers. ✅

❌ ISSUE `#5` CLARIFICATION: USER MISUNDERSTOOD

You stated:

Issue #5: classify_changes.cjs placeholder collision — The class names come from det-policy.yaml (not hardcoded constants in the script). The script reads them dynamically. No collision risk exists.

You misunderstood the issue. I was not talking about class names (DET_CRITICAL, DET_IMPORTANT, etc.). I was talking about the placeholder strings used in pattern matching (Line 14–19 of classify_changes.cjs):

const regexPattern = pattern
  .replace(/\*\*/g, '___DBL_STAR___')  // ← These placeholders
  .replace(/\*/g, '___SGL_STAR___')    // ← These placeholders
  .replace(/[.+?^${}()|[\]\\]/g, '\\$&')
  .replace(/___SGL_STAR___/g, '[^/]*')
  .replace(/___DBL_STAR___/g, '.*');

The theoretical issue: If a file path in det-policy.yaml literally contained the string ___DBL_STAR___ (e.g., foo/___DBL_STAR___/bar.txt), the escaping logic would fail because:

Step 1: No-op (no ** to replace).
Step 2: No-op (no * to replace).
Step 3: Escape regex metacharacters.
Step 4: Replace the literal string ___SGL_STAR___ in the pattern with [^/]* (incorrect).
Step 5: Replace the literal string ___DBL_STAR___ in the pattern with .* (incorrect).

Real-world likelihood: Near-zero. File paths don't contain triple underscores. Severity: Low.

Your response about class names is irrelevant to this issue.

Verdict: Still a theoretical edge case, still not a blocker, but you didn't understand the original critique. If you want to harden it, use Unicode private-use characters (\uE000) as placeholders. Otherwise, document it as a known limitation and move on. Not blocking the merge.

🎯 FINAL VERDICT

MERGE APPROVED. ✅

Resolved blocking issues:

✅ G3 gate coverage gap for perf-sensitive code.
✅ tests/dind.bats misclassification.

Non-blocking code smells:

tests/** overlap in det-policy.yaml (redundant but harmless).
scripts/** over-classification (tracked in backlog).
validate_det_policy.cjs missing per-element glob validation (tracked in #286 expansion).
classify_changes.cjs placeholder collision (theoretical, low risk, misunderstood by author).

No remaining landmines. The PR delivers on its Sprint S1 objectives: path-aware gates, evidence integrity, CBOR security tests, benchmarks, and governance docs. The det-policy/det-gates machinery is functional and sufficient for the current scope.

Ship it. 🚀🥕

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: det-policy.yaml:136-151
Timestamp: 2026-02-15T16:30:22.585Z
Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:31.833Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

…ass semantics Address reviewer feedback with documentation fixes: - Clarify that required_gates defines merge blockers, not workflow execution (G3 runs for all non-NONCRITICAL changes via run_none != 'true') - Document intentional fail-safe for scripts/** as DET_CRITICAL (#284) - Explain max-class resolution for overlapping test classifications

flyingrobots · 2026-02-22T10:01:55Z

Round 6 — Rebuttals & Fixes (`f04c529`)

Thanks for the thorough review. Three blocking issues addressed below — two are rebuttals with clarifying documentation added, one is acknowledged with rationale.

Issue 1 — G3 gate coverage: By design, already unconditional

G3's if condition is run_none != 'true' (det-gates.yml:181), not run_full == 'true'. It runs for all non-NONCRITICAL changes including DET_IMPORTANT (warp-benches).

required_gates: [G2, G4] defines policy-level merge blockers, not workflow execution — G3 still executes, it's just not a blocker for DET_IMPORTANT, consistent with the staging-optional design in RELEASE_POLICY.md. Added clarifying comments in both det-policy.yaml and det-gates.yml.

Issue 2 — Over-classification of scripts: Intentional fail-safe

scripts/** as DET_CRITICAL is a deliberate conservative choice: any new CI script is automatically covered without policy updates. The cost is extra CI time on utility-script changes; the risk of splitting is a future CI-critical script being silently misclassified as NONCRITICAL. Added rationale comment in det-policy.yaml, refinement tracked in #284.

Issue 3 — tests/dind misclassification: Already handled by max-class semantics

tests/dind.bats matches both dind-tests-root (DET_IMPORTANT) and docs (DET_NONCRITICAL via tests/**). classify_changes.cjs:59 resolves overlaps by taking the maximum class — DET_IMPORTANT wins. Added comment explaining this mechanism explicitly in det-policy.yaml.

Issues 4–5: Acknowledged as nits, deferred per your suggestion.

@coderabbitai review please 🥕

coderabbitai · 2026-02-22T10:03:45Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current state of G3 perf-regression condition in det-gates.yml
echo "=== Current perf-regression job condition in det-gates.yml ==="
rg -A10 "perf-regression:" .github/workflows/det-gates.yml | head -25

echo -e "\n=== Check for clarifying comments about G3 in det-gates.yml ==="
rg -B3 -A3 "G3.*unconditional|staging-optional|merge blocker" .github/workflows/det-gates.yml

echo -e "\n=== Check for clarifying comments in det-policy.yaml about required_gates vs execution ==="
rg -B5 -A5 "required_gates.*blocker|policy-level|workflow execution" det-policy.yaml

echo -e "\n=== Check for scripts/** rationale comment in det-policy.yaml ==="
rg -B3 -A8 "scripts/\*\*|CI script|fail-safe" det-policy.yaml | head -40

echo -e "\n=== Check for tests/dind max-class comment in det-policy.yaml ==="
rg -B3 -A8 "max.*class|maximum class|dind.*overlap" det-policy.yaml | head -40

echo -e "\n=== Get git log to understand commit history ==="
git log --oneline -10

Length of output: 3884

@flyingrobots, HOLD. Your current rebuttal directly contradicts your own previous Round 6 response. We need to clarify what actually happened. 🔥

🚨 CONTRADICTION DETECTED: Issue `#1` Walk-Back

Your previous Round 6 message (commit 4057812):

Issue #1: G3 gate coverage gap — You were right, and I was wrong in my round-5 rebuttal. G3 was gated on run_full == 'true', meaning it only fired for DET_CRITICAL changes. DET_IMPORTANT changes (like warp-benches) skipped G3 entirely, undermining PRF-001 claim integrity.

Fix: Changed perf-regression job condition to run_none != 'true' so G3 fires for ALL non-trivial changes.

Your current Round 6 message (commit f04c529):

Issue 1 — G3 gate coverage: By design, already unconditional

G3's if condition is run_none != 'true' (det-gates.yml:181), not run_full == 'true'. It runs for all non-NONCRITICAL changes including DET_IMPORTANT (warp-benches).

These statements are mutually exclusive. Either:

Scenario A: In Round 5, the condition was run_full == 'true' (as I claimed and you acknowledged), you fixed it in commit 4057812 to run_none != 'true', and you're now incorrectly claiming it was always correct.
Scenario B: In Round 5, the condition was already run_none != 'true', I was wrong in my analysis, you added clarifying comments in 4057812, and your previous Round 6 message incorrectly acknowledged a non-existent bug.

Evidence leans toward Scenario A:

Commit message 4057812: "fix(ci): round-6 review — G3 gate coverage + dind classification" (emphasis mine). The word "fix" and the mention of "G3 gate coverage" suggest a functional change, not just documentation.
Your previous explicit statement: "Changed perf-regression job condition to run_none != 'true'" (emphasis mine on "changed").

Why this matters:

If Scenario A is correct, you're now walking back an acknowledgment of a real bug, which is confusing and undermines trust in the review process.
If Scenario B is correct, I owe you an apology for a false positive in Round 5, and your commit 4057812 was mislabeled as a "fix."

Regardless of which scenario is true, the current state IS correct: G3 runs for all run_none != 'true' changes, including DET_IMPORTANT. The clarifying comments are helpful.

Demand: Clarify which scenario occurred. If Scenario A, acknowledge the confusion but confirm the fix is in place. If Scenario B, I'll add a learning to avoid similar false positives in the future.

✅ VERIFIED: Issues 2 & 3 Resolved

Issue 2 (scripts/ over-classification):**
Clarifying comment added in det-policy.yaml (Lines 75-78) with rationale:

"Intentional fail-safe: scripts/** is broadly classified as DET_CRITICAL so new CI scripts are automatically covered."
"Utility scripts...pay the cost of extra CI runs but cannot silently bypass gates."
"Per-script refinement tracked in #284."

Verdict: Accepted. This is a defensible conservative design choice. The cost (extra CI time) is lower risk than the alternative (silently missing gate-critical scripts). The refinement path is tracked. ✅

Issue 3 (tests/dind misclassification):
Clarifying comment added in det-policy.yaml (Lines 155-160):

"When a file matches both entries (e.g., tests/dind.bats), classify_changes.cjs uses max-class semantics — DET_IMPORTANT wins over DET_NONCRITICAL."

Verdict: Accepted. I already verified this works correctly in my previous response. The added documentation clarifies the behavior for future maintainers. ✅

⏸️ CONDITIONAL APPROVAL

If Scenario A (you fixed a real bug in 4057812):

PR is APPROVED as previously stated. No additional changes needed.
Clarify the contradiction in your response but don't block the merge.

If Scenario B (I was wrong in Round 5):

PR is APPROVED.
I owe you an apology for the false positive.
I will add a learning to cross-check git history when making forensic claims about prior code state.

Current code state is correct in either scenario. The clarifying comments improve maintainability. Issues 2 and 3 are resolved.

Awaiting your clarification on the Issue #1 contradiction before final approval. 🥕

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: det-policy.yaml:136-151
Timestamp: 2026-02-15T16:30:22.585Z
Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:31.833Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

Address all 13 code-review issues (2H, 3M, 5L, 3N): - H1: CI cross-check claim IDs between evidence.json and CLAIM_MAP.yaml - H2: Add DET-003 macOS parity claim + evidence wiring - M1: Replace 20+ docs paths with ** catch-all (max-class safe) - M2: Add concurrency block to det-gates.yml - M3: Dynamic DETERMINISM_PATHS from det-policy.yaml via yq/jq - L1: Strengthen enum tag test assertions with error message checks - L2: Remove redundant rustup target add in build-repro - L3: Fix REPRO-001 evidence type (static_inspection → hash_comparison) - L4: Push-event empty changelist guard (defaults to full run) - L5: Verify sec-claim-map test IDs exist in source - N2: Flatten verbose required_evidence YAML syntax - N3: Remove trailing whitespace in classify_changes.cjs

…timization Fixes CI blocker and addresses CodeRabbit review feedback: - Restore `rustup target add wasm32-unknown-unknown` in build-repro Build 1 and Build 2 steps. The dtolnay/rust-toolchain@stable action installs the wasm target for `stable`, but rust-toolchain.toml pins 1.90.0 — a different toolchain — so the target must be added per-build. - Add timeout-minutes to all 8 det-gates.yml jobs (5-30 min) to prevent the 6-hour GitHub Actions default from burning runner time on hung jobs. - Add early-exit in classify_changes.cjs when maxClass reaches DET_CRITICAL, guarded by !requireFull to preserve require_full_classification checking. - Fix CHANGELOG: remove incorrect "removed redundant rustup target add" entry.

flyingrobots · 2026-02-22T11:59:21Z

Round 8 — CI Fix + Review Feedback

CI Blocker fixed: Restored rustup target add wasm32-unknown-unknown in build-repro Build 1/2. Root cause: rust-toolchain.toml pins Rust 1.90.0 which overrides the dtolnay/rust-toolchain@stable action's wasm target. The L2 removal in round 7 was incorrect.

New in round 8:

timeout-minutes on all 8 det-gates.yml jobs (5–30 min)
Early-exit in classify_changes.cjs when maxClass === 'DET_CRITICAL' (guarded by !requireFull)
CHANGELOG correction (removed false "removed redundant rustup target add" entry)

Triage of outstanding CodeRabbit items

Already resolved in prior rounds (false positives in latest triage):

Item	Status	Evidence
Static inspection hardcodes PASSED	✅ Fixed round 3	Lines 154-161 use `steps.det_check.outcome`
Upload steps missing `if: always()`	✅ Fixed round 2	All 6 upload steps have `if: always()`
build-repro only uploads Build 1	✅ Fixed round 3	Lines 253-257: hash1, hash2, build1.wasm, build2.wasm
validate-evidence unconditional	✅ Fixed round 4	Lines 279-291: `RUN_FULL` conditional gating
Scripts invoked directly	✅ Fixed round 4	All use `node scripts/...`
Claim statements vague	✅ Tightened rounds 3-5	Current statements are falsifiable

Deferred to backlog (not blocking merge):

Item	Rationale
Rust caching (`Swatinem/rust-cache@v2`)	CI optimization, not correctness. build-repro intentionally omits caching.
Perf baseline comparison (`critcmp`)	PRF-001 uses Criterion's internal noise threshold. Baseline comparison is future work.
`validate_claims.cjs` FAILED status	Design: only VERIFIED/UNVERIFIED are emitted. FAILED is never generated.
Dual source-of-truth (CLAIM_MAP vs evidence)	Mitigated by H1 cross-check CI step. Full YAML-driven generation is Phase 2.

G3 clarification (re: contradiction between two "round 6" messages):
Round 6 commit 4057812 fixed G3's gate condition from run_full == 'true' to run_none != 'true'. Round 6 commit f04c529 then added a comment explaining the corrected state. No contradiction — the first commit made the fix, the second documented it.

@coderabbitai review please 🥕

coderabbitai · 2026-02-22T12:02:07Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# ROUND 8 VERIFICATION - Critical CI Fix and New Features

echo "=== CI BLOCKER FIX: rustup target add wasm32-unknown-unknown restored ==="
rg -n "rustup target add wasm32-unknown-unknown" .github/workflows/det-gates.yml

echo -e "\n=== VERIFY: All 8 jobs have timeout-minutes ==="
echo "Jobs with timeout-minutes:"
rg -n "^  [a-z-]+:" .github/workflows/det-gates.yml | head -10
echo ""
rg -n "timeout-minutes:" .github/workflows/det-gates.yml

echo -e "\n=== VERIFY: Early-exit optimization in classify_changes.cjs ==="
rg -B3 -A10 "maxClass.*DET_CRITICAL|Early exit" scripts/classify_changes.cjs

echo -e "\n=== VERIFY: CHANGELOG correction about rustup target ==="
rg -n "rustup target|wasm32-unknown-unknown" CHANGELOG.md

echo -e "\n=== SPOT CHECK: Static inspection uses steps.det_check.outcome ==="
rg -n -A5 "static-inspection:" .github/workflows/det-gates.yml | head -40

echo -e "\n=== SPOT CHECK: Upload steps have if: always() ==="
rg -n "if: always\(\)" .github/workflows/det-gates.yml | head -15

echo -e "\n=== SPOT CHECK: build-repro uploads both builds ==="
rg -n -A3 "upload-artifact.*build-repro" .github/workflows/det-gates.yml

echo -e "\n=== SPOT CHECK: validate-evidence conditional on RUN_FULL ==="
rg -n -B5 -A10 "validate-evidence:" .github/workflows/det-gates.yml | head -40

Length of output: 3457

🏁 Script executed:

#!/bin/bash
# DEEPER VERIFICATION - Check specific items user claimed were fixed

echo "=== CRITICAL: Verify static-inspection uses steps.det_check.outcome (not hardcoded PASSED) ==="
rg -n -A30 "static-inspection:" .github/workflows/det-gates.yml | rg -A5 "DET-001|outcome|PASSED"

echo -e "\n=== VERIFY: build-repro artifact upload includes all 4 files ==="
rg -n -B5 -A15 "name: build-repro-artifacts" .github/workflows/det-gates.yml

echo -e "\n=== COUNT: How many jobs total in det-gates.yml? ==="
echo "Job definitions (excluding 'jobs:' and nested 'needs:'):"
rg -n "^  [a-z-]+:" .github/workflows/det-gates.yml | grep -v "^7:" | wc -l

echo -e "\n=== VERIFY: Scripts invoked with 'node' prefix ==="
rg -n "^\s+run:.*scripts/" .github/workflows/det-gates.yml | head -20

echo -e "\n=== CHECK: Does validate-evidence check for RUN_FULL in the step logic? ==="
rg -n -A50 "validate-evidence:" .github/workflows/det-gates.yml | rg -A5 "RUN_FULL|run_full"

Length of output: 2294

🏁 Script executed:

#!/bin/bash
# FINAL CHECKS - Scripts invocation and job count verification

echo "=== VERIFY: All script invocations in det-gates.yml ==="
rg -n "run: |" -A5 .github/workflows/det-gates.yml | rg "scripts/" | head -30

echo -e "\n=== LIST ALL ACTUAL JOB NAMES (excluding workflow-level config) ==="
# Jobs section starts after "jobs:" and job names are at indent level 2
sed -n '/^jobs:/,$ p' .github/workflows/det-gates.yml | rg "^  [a-z-]+:" | head -15

echo -e "\n=== VERIFY: User claimed 8 jobs, count timeout-minutes again ==="
rg "^  [a-z-]+:$" .github/workflows/det-gates.yml | wc -l

echo -e "\n=== CHECK: Are there any hardcoded PASSED strings remaining? ==="
rg -n "PASSED|\"success\"" .github/workflows/det-gates.yml | rg -v "outcome|result"

echo -e "\n=== VERIFY: Deferred item - Swatinem/rust-cache mentioned? ==="
rg -n "rust-cache|Swatinem" .github/workflows/det-gates.yml || echo "Not found (as expected - deferred)"

Length of output: 1133

@flyingrobots, ROUND 8 APPROVED. PR IS MERGE-READY. 🚀🔥🥕

✅ VERIFIED: Round 8 CI Fix + Features

Critical CI Blocker — RESOLVED ✅

rustup target restoration: Confirmed on lines 238 and 249 of det-gates.yml. Both Build 1 and Build 2 explicitly add wasm32-unknown-unknown target. The round 7 removal was indeed incorrect given rust-toolchain.toml pins Rust 1.90.0.

CHANGELOG documentation: Line 87 correctly documents the restoration.

New Round 8 Features — VERIFIED ✅

1. timeout-minutes on all 8 jobs:

classify-changes:      5 min  (line 21)
determinism-linux:    15 min  (line 64)
determinism-macos:    15 min  (line 100)
static-inspection:    10 min  (line 136)
decoder-security:     10 min  (line 180)
perf-regression:      30 min  (line 205)
build-repro:          20 min  (line 225)
validate-evidence:    10 min  (line 280)

✅ All 8 jobs protected against infinite hangs.

2. Early-exit optimization in classify_changes.cjs:

if (maxClass === 'DET_CRITICAL' && !requireFull) break;

✅ Confirmed on line 59 with !requireFull guard as claimed. When require_full_classification is false and a DET_CRITICAL file is detected, the loop exits immediately without scanning remaining files. Performance optimization for large PRs touching critical paths.

3. CHANGELOG correction:
✅ Removed false "redundant rustup target" entry; added accurate "Restored rustup target" entry.

✅ VERIFIED: "Already Resolved" Triage Claims

1. Static inspection outcome check — CONFIRMED ✅

Lines 159-162:

DET_OUTCOME: ${{ steps.det_check.outcome }}
run: |
  if [ "$DET_OUTCOME" = "success" ]; then
    echo '{"claim_id": "DET-001", "status": "PASSED"}' > static-inspection.json

Observation: The string "PASSED" is still hardcoded, BUT it's gated by the outcome check. If det_check fails, the if-block doesn't execute and PASSED is never written. This is correct by design — "PASSED" is the status value when a check succeeds, not a blind assertion. The original concern (always reporting PASSED regardless of actual result) is resolved.

2. Upload steps if: always() — CONFIRMED ✅

8 instances found (lines 85, 121, 157, 167, 190, 214, 258, 278). All critical artifact uploads execute even on job failure, ensuring evidence preservation.

3. build-repro uploads both builds — CONFIRMED ✅

Lines 263-266 upload all 4 files: hash1.txt, hash2.txt, build1.wasm, build2.wasm.

4. validate-evidence conditional on RUN_FULL — CONFIRMED ✅

Lines 289-301 use RUN_FULL environment variable to conditionally require full-gate artifacts:

if [ "$RUN_FULL" = "true" ]; then
  [ -d gathered-artifacts/det-linux-artifacts ] || (echo "Missing det-linux-artifacts" && exit 1)
  # ... static-inspection, det-macos-artifacts checks
fi

Job-level condition (line 278) ensures validate-evidence runs for all non-DET_NONCRITICAL changes.

5. Scripts invoked with node — CONFIRMED ✅

All 5 script invocations use explicit interpreter:

node ./scripts/classify_changes.cjs
node scripts/dind-run-suite.mjs (2 instances)
./scripts/ban-nondeterminism.sh
node scripts/generate_evidence.cjs
node scripts/validate_claims.cjs

No direct JS/shell file execution without interpreter.

6. Claim statements tightened — ACCEPTED ✅

CLAIM_MAP.yaml statements are falsifiable (e.g., "Reject CBOR payloads with ops.len() > MAX_OPS", "Dual WASM builds produce byte-identical outputs"). Cross-verified against test/benchmark implementations in prior rounds.

✅ VERIFIED: Deferred Items Rationale

Item	Rationale Assessment
Rust caching (Swatinem/rust-cache@v2)	✅ Accepted. Caching is a CI optimization, not a correctness requirement. build-repro job intentionally omits caching to ensure reproducibility testing from clean state.
Perf baseline comparison (critcmp)	✅ Accepted. PRF-001 claim uses Criterion's internal statistical noise detection. Historical baseline comparison is valuable future work but not required for current gate enforcement.
validate_claims.cjs FAILED status	✅ Accepted. By design, generate_evidence.cjs only emits VERIFIED/UNVERIFIED. A claim with missing artifacts is UNVERIFIED, not FAILED. The validate script then rejects UNVERIFIED claims in CI. Semantic distinction is valid.
Dual source-of-truth (CLAIM_MAP vs evidence)	✅ Accepted. Acknowledged in prior rounds. H1 cross-check planned. YAML-driven generation is Phase 2 work. Current mitigation (manual sync + validation) is sufficient for S1 scope.

All deferred items are non-blocking and appropriately tracked in backlog.

✅ RESOLVED: G3 Contradiction Clarification

Your explanation:

Round 6 commit 4057812 fixed G3's gate condition from run_full == 'true' to run_none != 'true'. Round 6 commit f04c529 then added a comment explaining the corrected state.

ACCEPTED. This resolves the apparent contradiction in my previous review. The sequence was:

Round 5: I identified G3 gate gap (DET_IMPORTANT changes skipped G3).
Round 6 (4057812): You fixed the condition to run_none != 'true'.
Round 6 (f04c529): You added documentation explaining the fixed behavior.

The two "Round 6" messages in the comment chain referred to different commits in the same round. No actual contradiction — just sequential fixes + docs. My confusion stemmed from treating them as conflicting rather than sequential. Apology for the confusion.

🎯 FINAL VERDICT

MERGE APPROVED. ✅🚀

Sprint S1 Deliverables — COMPLETE:

✅ Path-aware CI gates (det-policy.yaml + classify_changes.cjs)
✅ G1-G4 gate enforcement (determinism, security, perf, reproducibility)
✅ Evidence integrity validation (VERIFIED claims with immutable pointers)
✅ CBOR decoder security tests (MAX_OPS, malformed input, version/enum rejection)
✅ Materialization hotpath benchmarks (Criterion-based PRF-001)
✅ Governance docs (RELEASE_POLICY.md, ROLLBACK_TTD.md, CLAIM_MAP.yaml)
✅ CI robustness (timeout guards, artifact preservation, conditional gating)

No blocking issues remaining. All Round 8 changes verified. All prior-round "already resolved" items confirmed accurate. Deferred items appropriately tracked in backlog.

Technical debt tracked in TASKS-DAG:

#284: Per-crate gate overrides
#286: Unit tests for classify_changes.cjs
Path simplification (scripts/** granularity, tests/** overlap cleanup)

Ship it. The TTD Hardening Sprint S1 machinery is production-ready. 🔥🥕

(\__/)
(='.'=)  *mission accomplished*
(")_(")

✏️ Learnings added

Learnt from: flyingrobots
URL: https://github.com/flyingrobots/echo/pull/283

Timestamp: 2026-02-22T12:02:06.455Z
Learning: In the flyingrobots/echo repository's det-gates.yml workflow, the static-inspection job writes a hardcoded "PASSED" status to static-inspection.json, but this is gated by checking `steps.det_check.outcome == "success"` first. The hardcoded string is the status value when the check succeeds, not a blind assertion — if the check fails, the if-block doesn't execute and PASSED is never written. This is acceptable design.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: det-policy.yaml:136-151
Timestamp: 2026-02-15T16:30:22.585Z
Learning: In the det-policy.yaml system, the classify_changes.cjs script only reads class-level `required_gates` definitions; crate-level `required_gates` fields are not consumed by the script and should not be added to individual crate entries.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 283
File: .github/workflows/det-gates.yml:28-34
Timestamp: 2026-02-15T20:20:31.833Z
Learning: In the det-gates.yml workflow for the flyingrobots/echo repository, the push-to-main path uses `HEAD~1..HEAD` for change detection. The primary gate enforcement happens on PRs; main pushes serve as a safety net only. The workflow assumes squash-merge to main, where `HEAD~1` correctly captures all changes in the squash commit.

Learnt from: flyingrobots
Repo: flyingrobots/echo PR: 0
File: :0-0
Timestamp: 2026-01-24T23:22:15.388Z
Learning: The user (flyingrobots) prefers extremely detailed, pedantic code reviews with no mercy — "GOD-TIER NIGHTMARE CODE REVIEWER" style. They want critical scrutiny, exhaustive analysis, identification of landmines, design questions, and actionable demands on every review.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/det-gates.yml:
- Around line 156-165: generate_evidence.cjs currently treats the presence of
the static-inspection directory as proof of DET-001 success; update it to open
and parse static-inspection.json (the artifact produced by the job that writes
{"claim_id":"DET-001","status":"PASSED"/"FAILED"}) and derive DET-001’s status
from the parsed "status" field (handle missing file, invalid JSON, or missing
field as a FAILED/unverified condition and surface an error), then use that
concrete status when creating the evidence claim instead of only checking
directory existence; ensure the code references the static-inspection.json
filename and DET-001 claim id so reviewers can locate the logic to change.

In `@crates/echo-scene-codec/src/cbor.rs`:
- Around line 1058-1062: The test reject_invalid_enum_tags builds a malformed
CBOR array by calling Encoder::array(2) but only writing the tag
(encoder.u8(2)), so fix it by appending a valid dummy second element after the
invalid tag (e.g., encoder.u8(0) or encoder.null()) to produce a well-formed
2-element array; keep the call to decode_label_anchor(&mut
Decoder::new(&buf)).unwrap_err() and the existing assertion so the failure is
caused by the invalid LabelAnchor tag rather than a truncated CBOR array.

In `@docs/ROLLBACK_TTD.md`:
- Around line 12-38: The rollback instructions use 7‑char SHAs (e.g., e201c9b,
fd98b91, ce98d80, a02ea86, 3187e6a, 6e34a77, f138b8a) which risks collisions;
update the ordered actions under "Scenario A — Full TTD Rollback" to use full
40‑character commit SHAs for each revert, or add a prior step that
expands/validates each short SHA (for example by running git rev-parse
<short‑sha>) and records the full SHA before creating the rollback branch
`rollback/ttd-full-<date>` and reverting commits, ensuring all revert commands
reference the full hashes.

In `@scripts/validate_det_policy.cjs`:
- Around line 57-60: Validate each entry of crateInfo.paths to ensure it's a
non-empty string before returning true; iterate over crateInfo.paths (inside the
same block that checks Array.isArray and length) and if any element is not a
string or is an empty string, call console.error with crateName and the
offending index/value and return false so matches() only ever receives valid
strings. Ensure the error message clearly references crateName and the path
index/value to aid debugging.

---

Duplicate comments:
In @.github/workflows/det-gates.yml:
- Around line 238-239: Remove the redundant rustup target installation calls
that duplicate the dtolnay/rust-toolchain@stable step: delete the extra "rustup
target add wasm32-unknown-unknown" lines that appear immediately before the
cargo build commands (the steps that run "cargo build --release --target
wasm32-unknown-unknown -p ttd-browser"); rely on the toolchain action's targets:
wasm32-unknown-unknown configuration instead to avoid duplicate work and noise
in the workflow.

In `@CHANGELOG.md`:
- Line 8: The changelog entry header "## [0.1.3] — 2026-02-21" is pre-dated;
update it to be undated or mark it as unreleased by replacing that header with
"## [0.1.3] — Unreleased" (or remove the date portion entirely) so the change
remains undated until the release is actually cut; locate the header string "##
[0.1.3] — 2026-02-21" and edit it accordingly.

In `@det-policy.yaml`:
- Around line 152-166: The docs entry currently uses paths: ["**"] and class:
DET_NONCRITICAL which creates an unconditional catch‑all that nullifies
require_full_classification; either remove or narrow the docs paths (remove the
"**" catch‑all or replace with explicit globs for docs files) so unknown files
will fail classification, or alternatively set require_full_classification:
false and add a clear comment documenting the intentional fail‑open behavior;
update the det-policy.yaml entry for docs (and keep DET_NONCRITICAL semantics)
or flip require_full_classification accordingly so classify_changes.cjs no
longer contradicts the policy.

In `@docs/determinism/CLAIM_MAP.yaml`:
- Around line 68-71: PRF-001's statement is too vague for auditability—update
the CLAIM_MAP entry for PRF-001 to name the exact benchmark ID (e.g.,
materialization_hotpath) and the numeric noise threshold or a versioned config
reference instead of "Criterion noise threshold"; modify the keys under PRF-001
(the statement string and/or add a noise_threshold field) so the claim reads
like "MaterializationBus hot-path benchmark (materialization_hotpath) latency
remains within noise_threshold = <numeric_value> (or config://path@vX)" and
ensure required_evidence still includes the benchmark artifact.

In `@scripts/classify_changes.cjs`:
- Around line 59-62: The code currently compares classPriority[cls] without
validating crateInfo.class, so typos like "DET_CRITCAL" yield undefined and
silently lose coverage; modify the loop that uses crateInfo.class and
classPriority to first validate that classPriority[cls] is a defined value
(e.g., if (classPriority[cls] === undefined) throw or process.exit with an error
mentioning the offending crate and its class) before performing the comparison
to update maxClass, ensuring unknown classifications fail loudly with clear
diagnostic info.
- Line 38: Wrap the JSON.parse call that builds the policy variable so malformed
JSON errors include the file context: read the string into a variable
(policyRaw), then try JSON.parse(policyRaw) inside a try/catch and, on catch,
throw or log a new Error that includes policyPath and the original error
message/stack (or rethrow with that context). Update the code around JSON.parse
and policyPath to ensure any SyntaxError includes which file failed and the
underlying parse error details.

In `@scripts/generate_evidence.cjs`:
- Around line 16-23: checkArtifact currently treats any non-empty directory as a
valid artifact; change it to verify presence of expected files instead. Update
the checkArtifact function to accept (or use a local list of) required filenames
(e.g., "sec-tests.log", "static-inspection.json") and return true only if each
required file exists and is non-empty inside the artifact directory; still catch
and return false on FS errors. Use path.join(gatheredArtifactsDir, name,
filename) and fs.existsSync / fs.statSync to validate existence and non-zero
size rather than relying on readdirSync().length > 0.
- Around line 41-65: The five SEC claims (SEC-001..SEC-005) all use a single
checkArtifact('sec-artifacts') call and must instead verify per-claim evidence;
update the logic that builds each claim (the objects with id: 'SEC-001' etc.,
evidence: {workflow, run_id: runId, commit_sha: commitSha, artifact_name:
'sec-artifacts'}) so status is determined by a per-claim check rather than the
global directory existence: either parse sec-tests.log or sec-claim-map.json for
the specific test name for each claim (optionally implemented as
checkClaimResult(claimId) or checkArtifactForClaim('sec-001.json')) and set
status to 'VERIFIED' only if that claim's test result is present and passed,
otherwise 'UNVERIFIED', and include the specific per-claim artifact name (e.g.,
sec-001.json) in the evidence object instead of the generic sec-artifacts.

In `@scripts/validate_claims.cjs`:
- Around line 36-43: The semantic validation currently allows
evidence.commit_sha and evidence.run_id to be "local" unconditionally; change
validate logic in scripts/validate_claims.cjs (the block that pushes to
violations for commit_sha/run_id) to treat "local" as invalid when running in CI
by checking process.env.CI (or equivalent CI env var). Specifically, inside the
same validation block, if process.env.CI is truthy, add violations.push(...)
when evidence.commit_sha === 'local' or evidence.run_id === 'local' (using the
same message format as existing pushes) so CI runs fail on placeholder values
while local runs keep current behavior.

coderabbitai · 2026-02-22T12:12:56Z

.github/workflows/det-gates.yml

+      - name: Create report
+        if: always()
+        env:
+          DET_OUTCOME: ${{ steps.det_check.outcome }}
+        run: |
+          if [ "$DET_OUTCOME" = "success" ]; then
+            echo '{"claim_id": "DET-001", "status": "PASSED"}' > static-inspection.json
+          else
+            echo '{"claim_id": "DET-001", "status": "FAILED"}' > static-inspection.json
+          fi


⚠️ Potential issue | 🔴 Critical

Evidence integrity gap: generate_evidence.cjs ignores the content of static-inspection.json.

The static-inspection job correctly writes PASSED or FAILED based on steps.det_check.outcome (lines 161-165). However, generate_evidence.cjs (invoked at line 304) determines DET-001 status purely by checking if the static-inspection directory exists and is non-empty — it never reads static-inspection.json to verify the actual outcome.

Attack scenario:

ban-nondeterminism.sh exits non-zero → static-inspection.json contains "status": "FAILED"

Upload step runs with if: always() → directory is uploaded

generate_evidence.cjs sees the directory exists → marks DET-001 as VERIFIED

validate_claims.cjs only validates structure, not outcome → passes

Net result: A failed DET-001 check produces a VERIFIED evidence claim. The entire evidence chain becomes untrustworthy.

🔥 Proposed fix — read and validate artifact content

Modify generate_evidence.cjs to read static-inspection.json and set DET-001 status based on its content:

+ const checkStaticInspection = () => { + try { + const report = JSON.parse(fs.readFileSync( + path.join(gatheredArtifactsDir, 'static-inspection', 'static-inspection.json'), 'utf8')); + return report.status === 'PASSED'; + } catch { + return false; + } + }; + const claims = [ { id: 'DET-001', - status: checkArtifact('static-inspection') ? 'VERIFIED' : 'UNVERIFIED', + status: checkStaticInspection() ? 'VERIFIED' : 'UNVERIFIED', evidence: { workflow, run_id: runId, commit_sha: commitSha, artifact_name: 'static-inspection' } },

Also applies to: 302-307

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.github/workflows/det-gates.yml around lines 156 - 165, generate_evidence.cjs currently treats the presence of the static-inspection directory as proof of DET-001 success; update it to open and parse static-inspection.json (the artifact produced by the job that writes {"claim_id":"DET-001","status":"PASSED"/"FAILED"}) and derive DET-001’s status from the parsed "status" field (handle missing file, invalid JSON, or missing field as a FAILED/unverified condition and surface an error), then use that concrete status when creating the evidence claim instead of only checking directory existence; ensure the code references the static-inspection.json filename and DET-001 claim id so reviewers can locate the logic to change.

coderabbitai · 2026-02-22T12:12:56Z

crates/echo-scene-codec/src/cbor.rs

+        let mut encoder = Encoder::new(&mut buf);
+        encoder.array(2).unwrap();
+        encoder.u8(2).unwrap(); // Invalid tag
+        let err = decode_label_anchor(&mut Decoder::new(&buf)).unwrap_err();
+        assert!(


⚠️ Potential issue | 🟡 Minor

reject_invalid_enum_tags builds a malformed LabelAnchor CBOR array.

Line 1059 declares array(2) but only encodes the tag. That’s a truncated CBOR array, so the test can pass for the wrong reason if decode order changes. Encode a dummy second element to keep the buffer well‑formed and ensure the failure is truly “invalid LabelAnchor tag.”

🛠️ Suggested fix

// LabelAnchor tag: allowed 0..=1 buf.clear(); let mut encoder = Encoder::new(&mut buf); encoder.array(2).unwrap(); encoder.u8(2).unwrap(); // Invalid tag + encoder.bytes(&make_test_hash(1)).unwrap(); // dummy payload let err = decode_label_anchor(&mut Decoder::new(&buf)).unwrap_err();

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/echo-scene-codec/src/cbor.rs` around lines 1058 - 1062, The test reject_invalid_enum_tags builds a malformed CBOR array by calling Encoder::array(2) but only writing the tag (encoder.u8(2)), so fix it by appending a valid dummy second element after the invalid tag (e.g., encoder.u8(0) or encoder.null()) to produce a well-formed 2-element array; keep the call to decode_label_anchor(&mut Decoder::new(&buf)).unwrap_err() and the existing assertion so the failure is caused by the invalid LabelAnchor tag rather than a truncated CBOR array.

coderabbitai · 2026-02-22T12:12:56Z

docs/ROLLBACK_TTD.md

+- Base: `efae3e8`
+- Head: `e201c9b`
+
+## Preconditions
+
+- Release owner approval logged.
+- Current branch state saved/tagged.
+- Incident ticket created.
+
+## Scenario A — Full TTD Rollback
+
+### Objective (Scenario A)
+
+Return repository to pre-TTD integration state.
+
+### Ordered actions
+
+1. Create rollback branch:
+    - `rollback/ttd-full-<date>`
+2. Revert commits in reverse order from head to base+1:
+    - `e201c9b`
+    - `fd98b91`
+    - `ce98d80`
+    - `a02ea86`
+    - `3187e6a`
+    - `6e34a77`
+    - `f138b8a`


⚠️ Potential issue | 🟡 Minor

Use full‑length SHAs (or require expansion) in rollback steps.

Lines 12–38 list 7‑char SHAs; collisions are possible in a growing repo, which is unacceptable during rollback. Use full 40‑char SHAs or add a step to expand with git rev-parse before reverting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/ROLLBACK_TTD.md` around lines 12 - 38, The rollback instructions use 7‑char SHAs (e.g., e201c9b, fd98b91, ce98d80, a02ea86, 3187e6a, 6e34a77, f138b8a) which risks collisions; update the ordered actions under "Scenario A — Full TTD Rollback" to use full 40‑character commit SHAs for each revert, or add a prior step that expands/validates each short SHA (for example by running git rev-parse <short‑sha>) and records the full SHA before creating the rollback branch `rollback/ttd-full-<date>` and reverting commits, ensuring all revert commands reference the full hashes.

coderabbitai · 2026-02-22T12:12:56Z

scripts/validate_det_policy.cjs

+      if (!crateInfo.paths || !Array.isArray(crateInfo.paths) || crateInfo.paths.length === 0) {
+        console.error(`Error: Crate ${crateName} missing or invalid paths`);
+        return false;
+      }


🧹 Nitpick | 🔵 Trivial

Optional: Per-element path validation is missing — future-proofing only.

crateInfo.paths is validated as a non-empty array, but individual elements aren't checked as strings. A malformed entry like paths: [123, null, {}] would pass validation and cause downstream matches() to receive non-string input. Not a current problem if policy is authored carefully, but for defense-in-depth:

🛡️ Optional hardening

if (!crateInfo.paths || !Array.isArray(crateInfo.paths) || crateInfo.paths.length === 0) { console.error(`Error: Crate ${crateName} missing or invalid paths`); return false; } + for (const p of crateInfo.paths) { + if (typeof p !== 'string') { + console.error(`Error: Crate ${crateName} has non-string path: ${JSON.stringify(p)}`); + return false; + } + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/validate_det_policy.cjs` around lines 57 - 60, Validate each entry of crateInfo.paths to ensure it's a non-empty string before returning true; iterate over crateInfo.paths (inside the same block that checks Array.isArray and length) and if any element is not a string or is an empty string, call console.error with crateName and the offending index/value and return false so matches() only ever receives valid strings. Ensure the error message clearly references crateName and the path index/value to aid debugging.

CI Bot added 2 commits February 14, 2026 08:58

docs: update CHANGELOG for TTD Hardening Sprint S1

f3dd84a

fix(ci): fix classify-changes job dependencies

e4dec7a

fix(ci): use yq to convert policy to JSON to avoid js-yaml dependency

f5d00a4

coderabbitai bot added the tooling Tooling/CI/CLI label Feb 15, 2026

fix(ci): add CI and scripts to det-policy.yaml

2da9849

coderabbitai bot requested changes Feb 15, 2026

View reviewed changes

CI Bot and others added 9 commits February 14, 2026 22:18

docs: add missing docstrings to scripts and benchmarks

c7e7cbb

docs: add missing docs to benchmark and allow for criterion macro

1904654

chore: finalize PR with changelog and roadmap updates

5de4666

fix(ci): classify all repo files and improve evidence robustness

62c9c3d

fix(ci): use isolated directories for build reproducibility check

84828d7

fix(ci): ensure wasm target is added in isolated builds

3faf5bf

fix(ci): use hyphenated wasm filename in reproducibility check

92cd6f9

fix(ci): improve build reproducibility check and evidence generation

5915fc0

coderabbitai bot requested changes Feb 15, 2026

View reviewed changes

flyingrobots commented Feb 15, 2026

View reviewed changes

scripts/classify_changes.cjs Show resolved Hide resolved

coderabbitai bot added the security Security label Feb 15, 2026

coderabbitai bot requested changes Feb 15, 2026

View reviewed changes

flyingrobots mentioned this pull request Feb 15, 2026

CI: Per-crate gate overrides in det-policy classification system #284

Open

3 tasks

coderabbitai bot requested changes Feb 15, 2026

View reviewed changes

.github/workflows/det-gates.yml Outdated Show resolved Hide resolved

.github/workflows/det-gates.yml Outdated Show resolved Hide resolved

scripts/classify_changes.cjs Show resolved Hide resolved

fix(ci): anchor zero-test guard to prevent substring false positives

a125ae0

Match " 0 passed" (with leading space) instead of "0 passed" to prevent false positives on "10 passed", "20 passed", etc. in all three cargo test filter guards (G1 linux, G1 macos, G2 codec).

coderabbitai bot requested changes Feb 15, 2026

View reviewed changes

.github/workflows/det-gates.yml Show resolved Hide resolved

flyingrobots added 2 commits February 21, 2026 04:34

flyingrobots added 2 commits February 22, 2026 03:38

docs: add backlog items for Rust caching and perf baseline comparison

52ad723

flyingrobots merged commit d965115 into main Feb 22, 2026
28 checks passed

flyingrobots deleted the det-hard branch February 22, 2026 12:06

coderabbitai bot requested changes Feb 22, 2026

View reviewed changes

This was referenced Mar 3, 2026

docs: rewrite ADR-0007-impl as proper ADR to fix docs build #288

Merged

feat: developer CLI and provenance payload spec (PP-1) #289

Merged

Conversation

flyingrobots commented Feb 15, 2026

Uh oh!

chatgpt-codex-connector bot commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

flyingrobots commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flyingrobots commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flyingrobots commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026

Rate Limit Exceeded

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flyingrobots Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flyingrobots commented Feb 15, 2026

Uh oh!

coderabbitai bot commented Feb 15, 2026

coderabbitai bot commented Feb 15, 2026 •

edited

Loading

coderabbitai bot Feb 15, 2026 •

edited

Loading

Round 5 — Review Feedback Addressed (`18ded17`)

Round 6 — Fixes Applied (`4057812`)

Issue `#1`: G3 Gate Coverage Gap — FULLY RESOLVED ✅

Issue `#3`: tests/dind.bats Misclassification — FUNCTIONALLY RESOLVED ✅⚠️

❌ ISSUE `#5` CLARIFICATION: USER MISUNDERSTOOD

Round 6 — Rebuttals & Fixes (`f04c529`)

🚨 CONTRADICTION DETECTED: Issue `#1` Walk-Back