55: Add durable run ledger and checkpoints for Rondo-style runs by sandsower · Pull Request #90 · sandsower/beislid

sandsower · 2026-05-10T22:28:47Z

Summary

Adds the P0 durable run-ledger/checkpoint contract for Beislið repo-aware runs.

Defines run-ledger-v1 under external Beislið state:
${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/<flow>/<repo_hash>/<run_id>/
Documents stable run IDs, run.json, events.jsonl, checkpoints, ticket snapshots, plans, final reports, and artifact directories.
Wires kickoff and ship-it protocols for ledger initialization, step/phase checkpoints, side-effect checkpoints, final reports, interruptions, and resume prompts.
Adds predictable gate/review artifact paths.
Extends ship-it/kickoff smoke helpers and verifiers for ledger files, artifacts, resume hints, memory evidence, and redaction coverage.

Verification

python3 scripts/validate_skills.py
python3 -m py_compile tests/agent-smoke/scenarios/ship-it/evidence_helper.py tests/agent-smoke/scenarios/ship-it/verify.py tests/agent-smoke/scenarios/kickoff/verify.py
python3 tests/agent-smoke/scenarios/ship-it/evidence_helper.py --self-test
python3 tests/agent-smoke/scenarios/ship-it/verify.py --self-test
python3 tests/agent-smoke/scenarios/kickoff/verify.py --self-test
git diff --check
python3 tests/agent-smoke/run.py gate ship-it --hosts claude,codex --timeout 900 --changed-only

Review

Independent review findings fixed.
Final fresh-eyes pass: no blocking findings.
Cross-host smoke passed on Claude and Codex.

Notes

P0 chooses external state storage. Repo-local .beislid/runs is reserved for a future explicit opt-in.
Domain capture is not configured for this repo.

Summary by CodeRabbit

New Features
- Persistent run-ledger with resume/inspect/start/abort prompts, checkpointed resume hints, verbose checkpoint/run stamps, and final-report writeback.
- Gate/review artifact paths and transcript-safe checkpointing with best-effort persistence and redact-on-write behavior.
Documentation
- Expanded docs and templates covering run-ledger layout, resume semantics, checkpointing, artifact conventions, and session-memory requirements.
Tests
- Updated smoke tests and verifiers to check run-ledger presence, checkpoint/resume shapes, artifact lists, and redaction rules.

…heckpoints # Conflicts: # skills/ship-it/phase-2-gates.md

coderabbitai · 2026-05-10T22:28:58Z

📝 Walkthrough

Walkthrough

This PR implements a durable run ledger system for repo-aware orchestrators in Beislid, adding persistent local evidence storage with structured checkpoints, artifact management, and resume capability across kickoff and ship-it workflows.

Changes

Run Ledger Feature

Layer / File(s)	Summary
Specification & Configuration `.beislid/output-templates.md`, `.beislid/workflow-md-format.md`, `docs/configuration.md`, `docs/workflows.md`	Defines `run-ledger-v1` under `${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/<flow>/<repo-hash>/<run-id>/`, required artifacts (`run.json`, `events.jsonl`, checkpoint JSONs, optional `ticket.json`/`implementation-plan.md`/`final-report.md`), artifact dirs, transcript-safe checkpoint constraints, best-effort persistence, resume semantics (scan incomplete, prompt user, resume only to next `resume_hint` boundary), and prohibits repo-local `.beislid/runs` inference.
Templates (kickoff & ship-it) `.beislid/kickoff-templates.md`, `.beislid/ship-it-templates.md`, `.beislid/output-templates.md`	Add incomplete-run prompts (resume/inspect/new/abort), verbose checkpoint-saved stamps with `resume_hint`, combined run-ledger + verbose transcript persistence spec, and extend structured session-memory to include `ledger` and artifact-path-bearing `gates`/`review` summaries.
Kickoff Integration `skills/kickoff/SKILL.md`, `skills/kickoff/step-1-ticket.md`, `skills/kickoff/step-8-ticket-update.md`	Require ledger init/resume after probe cache; persist `ticket.json` and per-action pending/result checkpoints with `resume_hint`; write `implementation-plan.md` and `final-report.md`; emit ledger events/checkpoints after aux loads and step exits; expand required outputs and run-end completion semantics.
Kickoff Smoke Test `tests/agent-smoke/scenarios/kickoff/scenario.json`, `tests/agent-smoke/scenarios/kickoff/verify.py`	Scenario prompt requires kickoff run ledger with `run.json`, `events.jsonl`, `ticket.json`, implementation-plan, final-report, and checkpoints under `$BEISLID_STATE_DIR/runs/kickoff/<repo_hash>/<run_id>/`. Adds `verify_run_ledger()` to validate ledger identity, required artifacts, checkpoint presence with `resume_hint`, and step-boundary markers; self-test constructs ledger fixture and provides `state_dir`.
Ship-it Integration `skills/ship-it/SKILL.md`, `skills/ship-it/phase-1-detect.md`, `skills/ship-it/phase-2-gates.md`, `skills/ship-it/phase-3-review.md`, `skills/ship-it/phase-4-ship.md`, `.beislid/ship-it-templates.md`	Initialize/resume ledger after config/cache; append ledger events at phase boundaries, gates, reviews, and side effects; create per-phase checkpoints and artifact dirs (`artifacts/gates/...`, `artifacts/reviews/...`); require `final-report.md` and mark `run.json` completed before cache writeback; enforce transcript-safety for persisted content; extend session memory brief with enriched gates/review evidence and ledger paths in PR body.
Ship-it helpers & tests `tests/agent-smoke/scenarios/ship-it/evidence_helper.py`, `tests/agent-smoke/scenarios/ship-it/verify.py`, `tests/agent-smoke/scenarios/ship-it/scenario.json`	Evidence helper adds ledger utilities (`ledger_paths()`, `append_event()`, `write_checkpoint()`, `maybe_write_artifact()`), initializes ledger/checkpoints, appends events, writes interruption/completion checkpoints, broadens recursive redaction for Authorization/GitHub-token-like strings, and adds `interrupt` CLI subcommand. Verifier extracts memory from transcript or `evidence.json`, validates incomplete-resume and completed-ledger shapes, checks artifact path correspondence, and self-tests produce both incomplete and completed ledger fixtures.
Phase 2 gate docs `skills/ship-it/phase-2-gates.md`	Document per-attempt gate artifact checkpoint dirs, raw vs summary persistence rules, autofix confirmation step, strict non-autofix failure prompting, and richer gate tracking/checkpoint expectations for final report.
Supporting wording & minor docs `.beislid/doctor-templates.md`, `.beislid/probe-semantics.md`, `skills/*`	Replace “P0” phrasing with “current implementation/currently runnable”, narrow doctor gate warning triggers, adjust probe/lifecycle-action wording, and other small textual clarifications.

Sequence Diagram(s)

sequenceDiagram
  participant Orchestrator
  participant StateDir
  participant User
  Orchestrator->>StateDir: scan ${BEISLID_STATE_DIR}/runs/<flow>/*
  StateDir-->>Orchestrator: list run-ledger-v1 candidates
  Orchestrator->>User: prompt (resume / inspect / new / abort)
  User-->>Orchestrator: chosen action
  Orchestrator->>StateDir: open selected run-ledger-v1 (run.json/events.jsonl/checkpoints)
  Orchestrator->>StateDir: append events/checkpoints during run
  Orchestrator->>StateDir: write final-report & mark run.json completed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

[P0] Add durable run ledger and checkpoints for Rondo-style runs #55: The main changes implement the durable "run-ledger-v1" and checkpoint/resume semantics described in the issue (storage paths, events.jsonl, checkpoints, resume_hint, etc.).

Possibly related PRs

sandsower/beislid#89: Both PRs edit similar templates/docs (run-ledger/checkpoint protocol vs gate result envelope), so they are related.
sandsower/beislid#88: Related wording and gate handling/documentation updates.

Poem

🐰 A ledger hops into the state directory,
Checkpoints tucked neat, resume hints for when we tarry,
Events append like footprints in the sand,
Artifacts rest in folders, tidy and planned—
Now runs can pause and hop right back to carry on!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: adding durable run ledger and checkpoints for Rondo-style runs, which aligns with the core objective documented in the PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-55-run-ledger-checkpoints

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…heckpoints # Conflicts: # .beislid/output-templates.md # .beislid/ship-it-templates.md # skills/ship-it/phase-2-gates.md

coderabbitai

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.beislid/kickoff-templates.md:
- Around line 21-23: The three unlabeled fenced code blocks in
kickoff-templates.md (the blocks containing "💭 Run ledger: `<run_id>`...", the
"⚠️ Found incomplete kickoff run `<run_id>`..." block, and the "✓ Run
checkpoint: `<checkpoint>`..." block) should be annotated with a fence language
to satisfy markdownlint MD040; update each triple-backtick fence to include a
language label such as text (e.g., ```text) so the blocks are labeled
consistently.

In @.beislid/output-templates.md:
- Around line 89-92: The artifact path components (<scope>, <gate-name>,
<review-kind>) are unconstrained and can enable path traversal or invalid paths;
update the contract and any code that writes under artifacts/gates/... and
artifacts/reviews/... to require and validate a safe-slug normalization (e.g.,
lowercase alphanumerics, hyphens/underscores only) and explicitly reject path
separators or empty segments and characters like "../" or "/" before any
filesystem write; document the rule in the contract text and ensure callers (the
code that composes "artifacts/gates/<scope>/<gate-name>/<attempt>/" and
"artifacts/reviews/<review-kind>.md") must normalize inputs and fail validation
rather than write unsafe paths.

In `@docs/configuration.md`:
- Line 262: The placeholder in docs/configuration.md currently uses dash-style
tokens "<repo-hash>/<run-id>" which conflicts with the ledger contract's
snake_case tokens "<repo_hash>/<run_id>"; update the documentation string
"${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/<flow>/<repo-hash>/<run-id>/"
to use the canonical "<repo_hash>/<run_id>" form so it matches the ledger
contract and other docs (search for occurrences of "<repo-hash>" and "<run-id>"
and replace them with "<repo_hash>" and "<run_id>" respectively).

In `@skills/kickoff/SKILL.md`:
- Line 37: Update the kickoff flow to initialize or resume a run-ledger-v1
directory immediately after cache setup and before Step 1 by scanning for
incomplete ledgers for the repo hash and offering
resume/inspect/start-new/abort; on resume reuse run_id, append a resume event to
events.jsonl, and continue only from the next safe step boundary named by
resume_hint while always requiring ticket update approval. When starting-new
write run.json and the first events.jsonl entry including run_id, branch, repo
hash, workflow hash, cache state, and ledger path, and print the ≤240-char
orientation prose from kickoff-templates.md (include run id/path in verbose mode
or if ledger init warned). Add a required durable checkpoint named run-start
immediately after kickoff (before Step 1) and ensure an additional checkpoint is
written immediately after any irreversible side effect (e.g., ticket
update/post) so resume state is fully defined.

In `@skills/ship-it/phase-4-ship.md`:
- Around line 41-42: Update the PR body guidance so it prohibits publishing
host-local filesystem paths: modify the "Body: terse record-facing summary..."
text to remove the allowance of "local run-ledger artifact paths" and instead
instruct authors to reference repository-facing paths, sanitized artifact IDs,
or attach artifacts to the PR; add a note to redact or replace any host-specific
info (e.g., home-directory usernames) with placeholders or repo-relative paths
and preserve existing advice about carried warnings (like AI-generated
translation notices).

In `@tests/agent-smoke/scenarios/kickoff/verify.py`:
- Around line 74-80: The current check aggregates all checkpoint files into one
big string (checkpoint_text) and searches for "resume_hint" globally; instead
iterate over each file in checkpoints (the variable checkpoints from the run_dir
/ "checkpoints" glob), read each file (using path.read_text(encoding="utf-8",
errors="replace")), and for every path ensure the substring "resume_hint" is
present; if any file is missing it, call fail(errors, f"...") referencing that
specific path so the failure points to the offending checkpoint rather than
masking missing fields across files.
- Around line 52-57: The code currently picks the last path from ledgers
(created in tests/agent-smoke/scenarios/kickoff/verify.py) using lexicographic
order; change it to select the most recently modified ledger by using filesystem
mtime (or an updated_at field) when choosing run_json so run_json = max(ledgers,
key=lambda p: p.stat().st_mtime) (keep the existing empty-check/fail behavior),
then continue using run_dir = run_json.parent; this ensures the latest ledger is
chosen reliably instead of relying on sorted(...)[-1].

In `@tests/agent-smoke/scenarios/ship-it/evidence_helper.py`:
- Around line 305-318: Read run_json into run_payload first and use that to make
finalize idempotent: before calling write_checkpoint/append_event and mutating
run_payload, check run_payload.get("status") == "completed" and whether
run_payload.get("last_checkpoint") or run_payload.get("checkpoints", []) already
contains the same final checkpoint path (compare against the would-be
final_checkpoint value or its string form); if so, skip re-writing the
final-report checkpoint, skip append_event("run-finalized") and avoid pushing
the duplicate checkpoint into run_payload, otherwise proceed to call
write_checkpoint, append_event and update/write_json; reference
final_checkpoint, append_event, write_checkpoint, run_payload, run_json and
write_json to locate the changes.

In `@tests/agent-smoke/scenarios/ship-it/verify.py`:
- Around line 65-79: The code currently returns {} as soon as
json.loads(match.group(1)) fails, which prevents the subsequent evidence.json
fallback from running; change the logic in the block that handles match (around
the json.loads(match.group(1)) call) so that on JSONDecodeError it does not
return immediately but instead continues to the later transcript marker check
and evidence.json reading (i.e., remove or replace the early return in the
except block and allow the function to proceed to the
evidence_path/evidence.json logic); keep the existing behavior of returning a
dict when parsed successfully and still validate that evidence.get("memory") is
a dict before returning it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e2f3201e-4e75-4168-b10a-b3cb0a21358b

📥 Commits

Reviewing files that changed from the base of the PR and between 3b33974 and 2452946.

📒 Files selected for processing (19)

.beislid/kickoff-templates.md
.beislid/output-templates.md
.beislid/ship-it-templates.md
.beislid/workflow-md-format.md
docs/configuration.md
docs/workflows.md
skills/kickoff/SKILL.md
skills/kickoff/step-1-ticket.md
skills/kickoff/step-8-ticket-update.md
skills/ship-it/SKILL.md
skills/ship-it/phase-1-detect.md
skills/ship-it/phase-2-gates.md
skills/ship-it/phase-3-review.md
skills/ship-it/phase-4-ship.md
tests/agent-smoke/scenarios/kickoff/scenario.json
tests/agent-smoke/scenarios/kickoff/verify.py
tests/agent-smoke/scenarios/ship-it/evidence_helper.py
tests/agent-smoke/scenarios/ship-it/scenario.json
tests/agent-smoke/scenarios/ship-it/verify.py

coderabbitai · 2026-05-10T22:35:16Z

+```
+💭 Run ledger: `<run_id>` at `<ledger_path>`.
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add fence languages to fix markdownlint MD040 warnings.

Lines 21, 27, and 62 use unlabeled fenced blocks. Add a language (e.g., text) to keep docs lint-clean.

Proposed fix

-``` +```text 💭 Run ledger: `<run_id>` at `<ledger_path>`.

- +text
⚠️ Found incomplete kickoff run <run_id>: , last checkpoint <checkpoint>, resume hint: <resume_hint>.
What now? (a) resume this run, (b) inspect the ledger path, (c) start a new run, (d) abort.

-``` +```text ✓ Run checkpoint: `<checkpoint>` saved; resume hint: <next-safe-action>.

</details> Also applies to: 27-30, 62-64 <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.1)</summary> [warning] 21-21: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.beislid/kickoff-templates.md around lines 21 - 23, The three unlabeled
fenced code blocks in kickoff-templates.md (the blocks containing "💭 Run
ledger: <run_id>...", the "⚠️ Found incomplete kickoff run <run_id>..."
block, and the "✓ Run checkpoint: <checkpoint>..." block) should be annotated
with a fence language to satisfy markdownlint MD040; update each triple-backtick
fence to include a language label such as text (e.g., ```text) so the blocks are
labeled consistently.

</details>   

coderabbitai · 2026-05-10T22:35:16Z

+Gate attempts use predictable paths under `artifacts/gates/<scope>/<gate-name>/<attempt>/`. Store raw stdout/stderr as `raw.log` when safe, or `summary.txt` when raw output should stay out of the ledger. The gate checkpoint/envelope references that path and records gate name, scope, cwd, command label, status, duration, safe summary, retry/fix decisions, and accepted risk if any.
+
+Review outputs use predictable paths under `artifacts/reviews/<review-kind>.md` (`review.md`, `fresh-eyes.md`, or `combined-review.md`). Store the human-readable finding summary, addressed/accepted-risk state, and reduced-coverage notes; do not store hidden reviewer reasoning.
+


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Constrain artifact path components to safe slugs before filesystem writes.

<scope> / <gate-name> / <review-kind> are currently unconstrained in the contract. If implementations map these directly, config-controlled values can cause path traversal or invalid paths. Define a required normalization/validation rule in the contract (e.g., slug + separator rejection).

Suggested contract hardening text

-Gate attempts use predictable paths under `artifacts/gates/<scope>/<gate-name>/<attempt>/`. +Gate attempts use predictable paths under `artifacts/gates/<scope_slug>/<gate_slug>/<attempt>/`. +`<scope_slug>` and `<gate_slug>` MUST be filesystem-safe slugs (e.g., `[a-z0-9._-]+`), with path separators, `..`, drive prefixes, and absolute paths rejected. ... -Review outputs use predictable paths under `artifacts/reviews/<review-kind>.md` +Review outputs use predictable paths under `artifacts/reviews/<review_kind_slug>.md` +with the same slug/validation rule.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.beislid/output-templates.md around lines 89 - 92, The artifact path components (<scope>, <gate-name>, <review-kind>) are unconstrained and can enable path traversal or invalid paths; update the contract and any code that writes under artifacts/gates/... and artifacts/reviews/... to require and validate a safe-slug normalization (e.g., lowercase alphanumerics, hyphens/underscores only) and explicitly reject path separators or empty segments and characters like "../" or "/" before any filesystem write; document the rule in the contract text and ensure callers (the code that composes "artifacts/gates/<scope>/<gate-name>/<attempt>/" and "artifacts/reviews/<review-kind>.md") must normalize inputs and fail validation rather than write unsafe paths.

coderabbitai · 2026-05-10T22:35:16Z

 - Hash matches → load capability entries; cache state `fresh`. Per-cap freshness uses `probe_cache.ttl_hours`, default 24.

-Print orientation prose from `kickoff-templates.md` (≤240 chars).
+Initialize or resume a `run-ledger-v1` directory per `output-templates.md` immediately after cache setup and before Step 1. Use flow `kickoff`; first scan for incomplete ledgers for this repo hash and offer resume / inspect / start-new / abort. On resume, reuse `run_id`, append a resume event, and continue only from the next safe step boundary named by `resume_hint`; never skip ticket update approval. On start-new, write `run.json` and the first `events.jsonl` entry with `run_id`, branch, repo hash, workflow hash, cache state, and ledger path. Print orientation prose from `kickoff-templates.md` (≤240 chars), including the run id/path in verbose mode or when ledger initialization needed a warning.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a durable checkpoint before Step 1 and immediately after irreversible side effects.

Right now start-new only requires run.json plus the first event, and the global step rule only checkpoints on step exit. That leaves resume state undefined if kickoff is interrupted before Step 1 completes or right after a side effect like a ticket update/post. For a P0 resume contract, the top-level protocol should require a run-start checkpoint with resume_hint, and another checkpoint immediately after any irreversible side effect.

Also applies to: 69-69

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/kickoff/SKILL.md` at line 37, Update the kickoff flow to initialize or resume a run-ledger-v1 directory immediately after cache setup and before Step 1 by scanning for incomplete ledgers for the repo hash and offering resume/inspect/start-new/abort; on resume reuse run_id, append a resume event to events.jsonl, and continue only from the next safe step boundary named by resume_hint while always requiring ticket update approval. When starting-new write run.json and the first events.jsonl entry including run_id, branch, repo hash, workflow hash, cache state, and ledger path, and print the ≤240-char orientation prose from kickoff-templates.md (include run id/path in verbose mode or if ledger init warned). Add a required durable checkpoint named run-start immediately after kickoff (before Step 1) and ensure an additional checkpoint is written immediately after any irreversible side effect (e.g., ticket update/post) so resume state is fully defined.

coderabbitai · 2026-05-10T22:35:16Z

+- Body: terse record-facing summary of changes, why, verification, reviewer warnings, accepted risks, reduced review coverage if any, and local run-ledger artifact paths when useful for maintainers in the same environment.
 - Include carried warnings such as AI-generated translation notices.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t publish local ledger filesystem paths in PR bodies.

Line 41 allows local run-ledger paths in PR descriptions. That can leak host-local identifiers (e.g., home-directory usernames) into remote PR metadata. Keep local paths in ledger/memory artifacts and keep PR body repository-facing.

Proposed fix

-- Body: terse record-facing summary of changes, why, verification, reviewer warnings, accepted risks, reduced review coverage if any, and local run-ledger artifact paths when useful for maintainers in the same environment. +- Body: terse record-facing summary of changes, why, verification, reviewer warnings, accepted risks, and reduced review coverage if any.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/ship-it/phase-4-ship.md` around lines 41 - 42, Update the PR body guidance so it prohibits publishing host-local filesystem paths: modify the "Body: terse record-facing summary..." text to remove the allowance of "local run-ledger artifact paths" and instead instruct authors to reference repository-facing paths, sanitized artifact IDs, or attach artifacts to the PR; add a note to redact or replace any host-specific info (e.g., home-directory usernames) with placeholders or repo-relative paths and preserve existing advice about carried warnings (like AI-generated translation notices).

coderabbitai · 2026-05-10T22:35:16Z

+    ledgers = sorted(state_dir.glob("runs/kickoff/*/*/run.json"))
+    if not ledgers:
+        fail(errors, f"no kickoff run ledger found under {state_dir}/runs/kickoff")
+        return ""
+    run_json = ledgers[-1]
+    run_dir = run_json.parent


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Select the latest ledger by updated_at/mtime, not lexicographic filename order.

sorted(...)[-1] is path-order based and can pick the wrong run when multiple ledgers exist. That makes smoke verification flaky and can validate stale artifacts.

Suggested fix

- ledgers = sorted(state_dir.glob("runs/kickoff/*/*/run.json")) + ledgers = list(state_dir.glob("runs/kickoff/*/*/run.json")) if not ledgers: fail(errors, f"no kickoff run ledger found under {state_dir}/runs/kickoff") return "" - run_json = ledgers[-1] + def ledger_rank(path: Path) -> tuple[str, float]: + try: + payload = json.loads(path.read_text(encoding="utf-8")) + return (str(payload.get("updated_at", "")), path.stat().st_mtime) + except Exception: + return ("", path.stat().st_mtime) + run_json = max(ledgers, key=ledger_rank)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/agent-smoke/scenarios/kickoff/verify.py` around lines 52 - 57, The code currently picks the last path from ledgers (created in tests/agent-smoke/scenarios/kickoff/verify.py) using lexicographic order; change it to select the most recently modified ledger by using filesystem mtime (or an updated_at field) when choosing run_json so run_json = max(ledgers, key=lambda p: p.stat().st_mtime) (keep the existing empty-check/fail behavior), then continue using run_dir = run_json.parent; this ensures the latest ledger is chosen reliably instead of relying on sorted(...)[-1].

coderabbitai · 2026-05-10T22:35:16Z

+    checkpoints = list((run_dir / "checkpoints").glob("*.json")) if (run_dir / "checkpoints").exists() else []
+    if len(checkpoints) < 2:
+        fail(errors, f"kickoff run ledger should have step checkpoints: {run_dir / 'checkpoints'}")
+    else:
+        checkpoint_text = "\n".join(path.read_text(encoding="utf-8", errors="replace") for path in checkpoints)
+        if "resume_hint" not in checkpoint_text:
+            fail(errors, "kickoff run ledger checkpoints should include resume_hint for resume semantics")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate resume_hint in each checkpoint, not via one merged text search.

Current logic only checks whether "resume_hint" appears anywhere across all checkpoint text, so one good file can mask missing fields in others.

Suggested fix

- else: - checkpoint_text = "\n".join(path.read_text(encoding="utf-8", errors="replace") for path in checkpoints) - if "resume_hint" not in checkpoint_text: - fail(errors, "kickoff run ledger checkpoints should include resume_hint for resume semantics") + else: + for checkpoint in checkpoints: + try: + payload = json.loads(checkpoint.read_text(encoding="utf-8")) + except json.JSONDecodeError as exc: + fail(errors, f"checkpoint is not valid JSON: {checkpoint}: {exc}") + continue + if not payload.get("resume_hint"): + fail(errors, f"checkpoint missing resume_hint: {checkpoint}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/agent-smoke/scenarios/kickoff/verify.py` around lines 74 - 80, The current check aggregates all checkpoint files into one big string (checkpoint_text) and searches for "resume_hint" globally; instead iterate over each file in checkpoints (the variable checkpoints from the run_dir / "checkpoints" glob), read each file (using path.read_text(encoding="utf-8", errors="replace")), and for every path ensure the substring "resume_hint" is present; if any file is missing it, call fail(errors, f"...") referencing that specific path so the failure points to the offending checkpoint rather than masking missing fields across files.

coderabbitai · 2026-05-10T22:35:16Z

+    final_checkpoint = write_checkpoint(transcript, "final-report", {"status": "completed", "summary": safe_summary, "resume_hint": "run complete", "final_report": str(final_report)})
+    append_event(transcript, "run-finalized", safe_summary, checkpoint=str(final_checkpoint), final_report=str(final_report))
+    run_payload = read_json(run_json)
+    run_payload["status"] = "completed"
+    run_payload["updated_at"] = now()
+    run_payload["final_report"] = str(final_report)
+    run_payload["resume_hint"] = "run complete"
+    run_payload.pop("blocking_reason", None)
+    run_payload.setdefault("interruptions", [])
+    if run_payload.get("interrupted_at"):
+        run_payload["interruptions"].append({"interrupted_at": run_payload.pop("interrupted_at"), "checkpoint": str(transcript.parent / "checkpoints" / "interrupted.json")})
+    run_payload.setdefault("checkpoints", []).append(str(final_checkpoint))
+    run_payload["last_checkpoint"] = str(final_checkpoint)
+    write_json(run_json, run_payload)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make finalize idempotent before appending ledger history.

command_finalize() rewrites the same final-report checkpoint, appends another run-finalized event, and pushes the same checkpoint path into run.json every time it is retried. The self-test already calls command_finalize(...) twice, so this currently mutates ledger history on a retry instead of treating finalize as a no-op once the run is completed.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/agent-smoke/scenarios/ship-it/evidence_helper.py` around lines 305 - 318, Read run_json into run_payload first and use that to make finalize idempotent: before calling write_checkpoint/append_event and mutating run_payload, check run_payload.get("status") == "completed" and whether run_payload.get("last_checkpoint") or run_payload.get("checkpoints", []) already contains the same final checkpoint path (compare against the would-be final_checkpoint value or its string form); if so, skip re-writing the final-report checkpoint, skip append_event("run-finalized") and avoid pushing the duplicate checkpoint into run_payload, otherwise proceed to call write_checkpoint, append_event and update/write_json; reference final_checkpoint, append_event, write_checkpoint, run_payload, run_json and write_json to locate the changes.

coderabbitai · 2026-05-10T22:35:16Z

+    if match:
+        try:
+            return json.loads(match.group(1))
+        except json.JSONDecodeError:
+            return {}
+    if transcript and re.search(r"kind:\s*ship-it-session-memory-v1", text, re.IGNORECASE):
+        evidence_path = transcript.parent / "evidence.json"
+        if evidence_path.exists():
+            try:
+                evidence = json.loads(evidence_path.read_text(encoding="utf-8"))
+            except json.JSONDecodeError:
+                return {}
+            memory = evidence.get("memory")
+            return memory if isinstance(memory, dict) else {}
+    return {}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the evidence.json fallback when transcript memory JSON is malformed.

If the transcript has the ship-it-session-memory-v1 marker but the fenced JSON is invalid, Lines 65-69 return {} immediately and never consult evidence.json. That breaks the new fallback path and will fail otherwise-valid smoke runs whose transcript marker is present but not parseable.

Suggested fix

if match: try: return json.loads(match.group(1)) except json.JSONDecodeError: - return {} + pass if transcript and re.search(r"kind:\s*ship-it-session-memory-v1", text, re.IGNORECASE): evidence_path = transcript.parent / "evidence.json" if evidence_path.exists(): try: evidence = json.loads(evidence_path.read_text(encoding="utf-8"))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if match:

try:

return json.loads(match.group(1))

except json.JSONDecodeError:

return {}

if transcript and re.search(r"kind:\s*ship-it-session-memory-v1", text, re.IGNORECASE):

evidence_path = transcript.parent / "evidence.json"

if evidence_path.exists():

try:

evidence = json.loads(evidence_path.read_text(encoding="utf-8"))

except json.JSONDecodeError:

return {}

memory = evidence.get("memory")

return memory if isinstance(memory, dict) else {}

return {}

if match:

try:

return json.loads(match.group(1))

except json.JSONDecodeError:

pass

if transcript and re.search(r"kind:\s*ship-it-session-memory-v1", text, re.IGNORECASE):

evidence_path = transcript.parent / "evidence.json"

if evidence_path.exists():

try:

evidence = json.loads(evidence_path.read_text(encoding="utf-8"))

except json.JSONDecodeError:

return {}

memory = evidence.get("memory")

return memory if isinstance(memory, dict) else {}

return {}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/agent-smoke/scenarios/ship-it/verify.py` around lines 65 - 79, The code currently returns {} as soon as json.loads(match.group(1)) fails, which prevents the subsequent evidence.json fallback from running; change the logic in the block that handles match (around the json.loads(match.group(1)) call) so that on JSONDecodeError it does not return immediately but instead continues to the later transcript marker check and evidence.json reading (i.e., remove or replace the early return in the except block and allow the function to proceed to the evidence_path/evidence.json logic); keep the existing behavior of returning a dict when parsed successfully and still validate that evidence.get("memory") is a dict before returning it.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.beislid/ship-it-templates.md:
- Around line 282-284: Add explicit language identifiers to the three fenced
code blocks that currently use plain triple backticks: change their opening
fences to ```text. Specifically update the block containing
"${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/ship-it/<repo_hash>/<run_id>/",
the block with the two warning lines beginning "⚠️ Run ledger could not be
written to `<path>`..." and "⚠️ Verbose transcript could not be written to
`<path>`...", and the block with the two info lines beginning "💭 Run ledger
saved: `<path>`" and "💭 Verbose transcript saved: `<path>`" so markdownlint
MD040 is satisfied.

In @.beislid/workflow-md-format.md:
- Line 209: Change the capitalized "The" at the start of the sentence in the
lifecycle-actions paragraph to lowercase "the" so the sentence reads "...Under
these events, the current implementation supports `type: artifact` only...";
update the text in the same paragraph that contains "spec owns the
`spec_approved` event; `blueprint` owns the `blueprint_approved` event..." to
use the lowercase "the".

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 55dd70e4-1b68-4b40-942d-a83ee6c41cb3

📥 Commits

Reviewing files that changed from the base of the PR and between 2452946 and 66af348.

📒 Files selected for processing (13)

.beislid/doctor-templates.md
.beislid/output-templates.md
.beislid/probe-semantics.md
.beislid/ship-it-templates.md
.beislid/workflow-md-format.md
docs/configuration.md
skills/blueprint/SKILL.md
skills/doctor/SKILL.md
skills/heard-chef/phase-3-push.md
skills/kickoff/step-1-ticket.md
skills/setup/SKILL.md
skills/ship-it/phase-2-gates.md
skills/spec/SKILL.md

✅ Files skipped from review due to trivial changes (7)

skills/setup/SKILL.md
skills/heard-chef/phase-3-push.md
.beislid/probe-semantics.md
docs/configuration.md
skills/blueprint/SKILL.md
skills/spec/SKILL.md
skills/doctor/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (3)

skills/kickoff/step-1-ticket.md
.beislid/output-templates.md
skills/ship-it/phase-2-gates.md

coderabbitai · 2026-05-10T22:38:43Z

 ```
-${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/ship-it/<repo_hash>/<timestamp>/transcript.md
+${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/ship-it/<repo_hash>/<run_id>/
 ```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language tags to fenced code blocks (MD040).

These three fences are missing language identifiers and currently trip markdownlint.

Suggested patch

-``` +```text ${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/ship-it/<repo_hash>/<run_id>/

- +text
⚠️ Run ledger could not be written to <path>: . Continuing without persisted ledger.
⚠️ Verbose transcript could not be written to <path>: . Continuing without persisted transcript.

-``` +```text 💭 Run ledger saved: `<path>` 💭 Verbose transcript saved: `<path>`

</details> Also applies to: 304-307, 311-314 <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.1)</summary> [warning] 282-282: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.beislid/ship-it-templates.md around lines 282 - 284, Add explicit language
identifiers to the three fenced code blocks that currently use plain triple
backticks: change their opening fences to ```text. Specifically update the block
containing
"${BEISLID_STATE_DIR:-~/.local/state/beislid}/runs/ship-it/<repo_hash>/<run_id>/",
the block with the two warning lines beginning "⚠️ Run ledger could not be
written to <path>..." and "⚠️ Verbose transcript could not be written to
``...", and the block with the two info lines beginning "💭 Run ledger
saved: ``" and "💭 Verbose transcript saved: ``" so markdownlint
MD040 is satisfied.

</details>   

coderabbitai · 2026-05-10T22:38:43Z

 ````

-`spec` owns the `spec_approved` event; `blueprint` owns the `blueprint_approved` event. Kickoff only passes context in and records returned artifact status/path. Under these events, P0 supports `type: artifact` only; CLI, MCP, and other providers are reserved and skipped. Artifact actions write the approved spec/design Markdown to a repo file. `approval: prompt` asks before writing; `approval: auto` creates a missing target without another prompt; omitted approval defaults to `prompt`. Existing targets always prompt for overwrite / choose another path / skip. Skip and write failures do not block routing to downstream skills.
+`spec` owns the `spec_approved` event; `blueprint` owns the `blueprint_approved` event. Kickoff only passes context in and records returned artifact status/path. Under these events, The current implementation supports `type: artifact` only; CLI, MCP, and other providers are reserved and skipped. Artifact actions write the approved spec/design Markdown to a repo file. `approval: prompt` asks before writing; `approval: auto` creates a missing target without another prompt; omitted approval defaults to `prompt`. Existing targets always prompt for overwrite / choose another path / skip. Skip and write failures do not block routing to downstream skills.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix capitalization typo in lifecycle-actions paragraph.

Line 209 has “Under these events, The current implementation...”; “the” should be lowercase for sentence flow.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.beislid/workflow-md-format.md at line 209, Change the capitalized "The" at the start of the sentence in the lifecycle-actions paragraph to lowercase "the" so the sentence reads "...Under these events, the current implementation supports `type: artifact` only..."; update the text in the same paragraph that contains "spec owns the `spec_approved` event; `blueprint` owns the `blueprint_approved` event..." to use the lowercase "the".

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/ship-it/phase-2-gates.md`:
- Line 42: Replace the ambiguous phrase with a concrete storage spec: state that
the user's decision to proceed without a passing gate must be recorded as a new
field in the checkpoint JSON (add "proceed_without_passing" to the checkpoint
object), and describe its schema minimally (user id, timestamp, reason,
assessed_risk, and related_artifact_path) so implementers know to persist
decision alongside existing checkpoint data rather than as a separate file or
inside the gate envelope.
- Line 39: Clarify the artifact path template
artifacts/gates/<scope>/<gate-name>/<attempt>/ by specifying that for gates
defined at the top-level (under the gates key rather than within scopes) the
<scope> placeholder should be the literal "top-level" (so
artifacts/gates/top-level/<gate-name>/<attempt>/), and update the docs where the
artifacts path and the gates vs scopes distinction are described (references:
the artifacts/gates/<scope>/<gate-name>/<attempt>/ template, the gates key, and
scopes key).
- Line 39: Clarify selection between raw.log and summary.txt by stating that
raw.log should be used when the run's stdout/stderr is safe for inclusion in
transcripts (no sensitive PII, secrets, or unacceptable data), and summary.txt
should be used when the output contains sensitive or unsafe content—in that case
store a redacted/sanitized summary or structured excerpt suitable for sharing;
keep the same artifact path pattern
artifacts/gates/<scope>/<gate-name>/<attempt>/ and ensure the
envelope/checkpoint references which file was stored and why (i.e., include a
small metadata flag like "transcript_safe": true/false).
- Line 31: The document references three missing docs (workflow-md-format.md,
output-templates.md, ship-it-templates.md); either create those files with the
expected schemas/content or inline/remove the references: add
workflow-md-format.md containing the rich gate schema (attributes stage, kind,
command, execution, output, failure and rules for pre-pr sensor detection), add
output-templates.md with the Gate result envelope format and pytest/generic
parsing guidance, and add ship-it-templates.md with all prompt text, skip
messages, and templates used across Phase 2a–2d and walkthrough flows;
alternatively, replace each reference in phase-2-gates.md with the full required
schema/text so implementers have the definitions in-place.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b78ea7b4-b932-45be-8ecb-a99f8c641422

📥 Commits

Reviewing files that changed from the base of the PR and between 66af348 and b24c409.

📒 Files selected for processing (6)

skills/kickoff/SKILL.md
skills/kickoff/step-1-ticket.md
skills/ship-it/SKILL.md
skills/ship-it/phase-1-detect.md
skills/ship-it/phase-2-gates.md
skills/ship-it/phase-3-review.md

✅ Files skipped from review due to trivial changes (2)

skills/ship-it/phase-1-detect.md
skills/kickoff/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (3)

skills/kickoff/step-1-ticket.md
skills/ship-it/phase-3-review.md
skills/ship-it/SKILL.md

coderabbitai · 2026-05-10T22:43:25Z

 - neither: print `no gates configured — skipping` and continue.

-Normalize per `workflow-md-format.md`: flat `name`+`command` means pre-pr computational sensor. P0 ship-it executes legacy gates plus rich gates where `stage` is absent/`pre-pr`, `kind` is absent/`sensor`, `command` exists, and `execution` is absent/`computational`. Record other stages as `skipped-by-stage`; record pre-pr non-computational/non-sensor as `skipped-by-execution`. Treat rich `output`/`failure` as prompt context.
+Normalize per `workflow-md-format.md`: flat `name`+`command` means pre-pr computational sensor. Ship-it currently executes legacy gates plus rich gates where `stage` is absent/`pre-pr`, `kind` is absent/`sensor`, `command` exists, and `execution` is absent/`computational`. Record other stages as `skipped-by-stage`; record pre-pr non-computational/non-sensor as `skipped-by-execution`. Treat rich `output`/`failure` as prompt context.


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify workflow-md-format.md exists and documents gate schema # Check if the file exists if fd -t f '^workflow-md-format\.md$' | grep -q .; then echo "✓ workflow-md-format.md found" # Check for key gate-related terms fd -t f '^workflow-md-format\.md$' --exec rg -n -C3 'stage|kind|execution|output|failure' {} \; | head -40 else echo "✗ workflow-md-format.md not found" fi

Repository: sandsower/beislid

Length of output: 95

🏁 Script executed:

# Search for all referenced files echo "=== Checking for referenced files ===" fd -t f '^(workflow-md-format|output-templates|ship-it-templates)' --type f echo "" echo "=== Checking for files in skills/ship-it directory ===" fd -t f '\.md$' skills/ship-it echo "" echo "=== Checking for files with 'format' or 'template' in name ===" fd -t f '(format|template)' --type f --extension md | head -20

Repository: sandsower/beislid

Length of output: 372

🏁 Script executed:

# Read the phase-2-gates.md file to see all references and context cat skills/ship-it/phase-2-gates.md

Repository: sandsower/beislid

Length of output: 6979

Create missing documentation files or remove references.

The file references three documentation files that do not exist in the repository:

workflow-md-format.md — referenced at line 31 for rich gate schema definition (with stage, kind, command, execution, output, failure attributes)

output-templates.md — referenced at line 42 for Gate result envelope structure and pytest/generic parsing guidelines

ship-it-templates.md — referenced multiple times (lines 13, 35, 37, 50, 56, and later sections) for prompt text, skip messages, and template content for Phase 2a, 2b, 2c, 2d, and walkthrough flows

Implementers cannot complete Phase 2 execution without these references. Either create these documentation files with the required schema and content, or remove/inline the references and clarify the expectations directly in this file.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/ship-it/phase-2-gates.md` at line 31, The document references three missing docs (workflow-md-format.md, output-templates.md, ship-it-templates.md); either create those files with the expected schemas/content or inline/remove the references: add workflow-md-format.md containing the rich gate schema (attributes stage, kind, command, execution, output, failure and rules for pre-pr sensor detection), add output-templates.md with the Gate result envelope format and pytest/generic parsing guidance, and add ship-it-templates.md with all prompt text, skip messages, and templates used across Phase 2a–2d and walkthrough flows; alternatively, replace each reference in phase-2-gates.md with the full required schema/text so implementers have the definitions in-place.

coderabbitai · 2026-05-10T22:43:25Z

 1. If `fast_ship_eligible=true`, batch only gates with `parallel_safe: true`, no `autofix`, and `mutates` not true. Run concurrently when supported; otherwise run sequentially and record `parallel_unavailable`.
 2. Run non-batched gates once in configured order. Normal mode treats every selected gate as non-batched.
-3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest parser when pytest-like, otherwise generic). Store raw logs by path when possible, else a safe summary.
+3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the artifact path structure for top-level gates.

The path template artifacts/gates/<scope>/<gate-name>/<attempt>/ includes a <scope> placeholder, but top-level gates (from the gates key rather than scopes) don't belong to a scope. Please specify what value to use for <scope> in the top-level case (e.g., top-level, root, or a different path structure).

📝 Suggested clarification

-3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint. +3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/` (use `top-level` for `<scope>` when running top-level gates); store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint.

3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/` (use `top-level` for `<scope>` when running top-level gates); store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/ship-it/phase-2-gates.md` at line 39, Clarify the artifact path template artifacts/gates/<scope>/<gate-name>/<attempt>/ by specifying that for gates defined at the top-level (under the gates key rather than within scopes) the <scope> placeholder should be the literal "top-level" (so artifacts/gates/top-level/<gate-name>/<attempt>/), and update the docs where the artifacts path and the gates vs scopes distinction are described (references: the artifacts/gates/<scope>/<gate-name>/<attempt>/ template, the gates key, and scopes key).

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify the criteria for choosing between raw.log and summary.txt.

The documentation states "store safe raw output as raw.log or summary.txt" but doesn't specify the selection criteria on this line. The AI summary mentions "transcript-safety" as the determining factor. Please add explicit criteria here to prevent implementation ambiguity.

📝 Suggested clarification

-3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint. +3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store transcript-safe raw output as `raw.log`, otherwise store redacted output as `summary.txt`, and reference the artifact path from the envelope/checkpoint.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store safe raw output as `raw.log` or `summary.txt`, and reference it from the envelope/checkpoint.

3. For each run, capture duration and parse stdout/stderr into the shared Gate result envelope from `output-templates.md` (pytest when pytest-like, else generic). If the ledger is available, create `artifacts/gates/<scope>/<gate-name>/<attempt>/`; store transcript-safe raw output as `raw.log`, otherwise store redacted output as `summary.txt`, and reference the artifact path from the envelope/checkpoint.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/ship-it/phase-2-gates.md` at line 39, Clarify selection between raw.log and summary.txt by stating that raw.log should be used when the run's stdout/stderr is safe for inclusion in transcripts (no sensitive PII, secrets, or unacceptable data), and summary.txt should be used when the output contains sensitive or unsafe content—in that case store a redacted/sanitized summary or structured excerpt suitable for sharing; keep the same artifact path pattern artifacts/gates/<scope>/<gate-name>/<attempt>/ and ensure the envelope/checkpoint references which file was stored and why (i.e., include a small metadata flag like "transcript_safe": true/false).

coderabbitai · 2026-05-10T22:43:25Z

 4. Autofix only when the envelope is `status: fail` and `environment_failure: false`: show envelope summary, run only that autofix, show diff, and ask before committing.
 5. For `status: error`, `environment_failure: true`, or no `autofix`, prompt from the envelope (`summary`, failures, flags, action, raw-log reference) plus configured `failure.*` / `output.parser` context. Do not guess. In parallel, wait for siblings and surface all failure envelopes together.
-6. After a fix, re-run the applicable gate before advancing. If the user explicitly proceeds without a passing gate, record that decision/risk.
+6. After a fix, re-run the applicable gate before advancing. If the user explicitly proceeds without a passing gate, record that decision/risk beside the artifact path or unavailable reason.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify where and how to record the user's proceed decision.

The phrase "record that decision/risk beside the artifact path" is ambiguous about the storage mechanism. Please specify whether this should be:

A separate file in the artifact directory (e.g., decision.json)

A field in the checkpoint JSON

An entry in the gate envelope

Another mechanism

Clear specification is important for this ledger/persistence feature to ensure consistent implementation.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/ship-it/phase-2-gates.md` at line 42, Replace the ambiguous phrase with a concrete storage spec: state that the user's decision to proceed without a passing gate must be recorded as a new field in the checkpoint JSON (add "proceed_without_passing" to the checkpoint object), and describe its schema minimally (user id, timestamp, reason, assessed_risk, and related_artifact_path) so implementers know to persist decision alongside existing checkpoint data rather than as a separate file or inside the gate envelope.

sandsower added 6 commits May 10, 2026 21:57

Add durable run ledger checkpoints

860b459

Merge remote-tracking branch 'origin/main' into issue-55-run-ledger-c…

94be787

…heckpoints # Conflicts: # skills/ship-it/phase-2-gates.md

Harden run ledger checkpoint evidence

53fd127

Tighten ledger resume and redaction coverage

f4a5bb3

Accept hyphenated combined-review smoke evidence

0e44445

Allow smoke memory fallback from evidence

2452946

Merge remote-tracking branch 'origin/main' into issue-55-run-ledger-c…

66af348

…heckpoints # Conflicts: # .beislid/output-templates.md # .beislid/ship-it-templates.md # skills/ship-it/phase-2-gates.md

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

Trim skill protocols under size budgets

b24c409

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

		Gate attempts use predictable paths under `artifacts/gates/<scope>/<gate-name>/<attempt>/`. Store raw stdout/stderr as `raw.log` when safe, or `summary.txt` when raw output should stay out of the ledger. The gate checkpoint/envelope references that path and records gate name, scope, cwd, command label, status, duration, safe summary, retry/fix decisions, and accepted risk if any.

		Review outputs use predictable paths under `artifacts/reviews/<review-kind>.md` (`review.md`, `fresh-eyes.md`, or `combined-review.md`). Store the human-readable finding summary, addressed/accepted-risk state, and reduced-coverage notes; do not store hidden reviewer reasoning.

		- Body: terse record-facing summary of changes, why, verification, reviewer warnings, accepted risks, reduced review coverage if any, and local run-ledger artifact paths when useful for maintainers in the same environment.
		- Include carried warnings such as AI-generated translation notices.

Conversation

sandsower commented May 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Review

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sandsower commented May 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading