spec v29 + v30: gap closure and operational resilience#11
Merged
Conversation
Closes spec-v29 (18 steps) and spec-v30 (12 steps).
## v29 — Gap Closure
1. Reconciled spec 28 with the 250 LOC default; introduced `_DEFAULT_PR_LOC_CAP`.
2. Removed the legacy 4-phase Orchestrator (BUILD/MERGE/REVIEW/FIX), shrinking
`orchestrator.py` from 1499 → 501 LOC. `V2Orchestrator` renamed to
`Orchestrator` with a back-compat alias.
3. Added jitter (`secrets.SystemRandom`) to all exponential-backoff sites in
`llm_client` and `loop_controller`.
4. `LLMClient` now honors HTTP `Retry-After` (seconds or HTTP-date), capped
at 120 s, with jitter still applied.
5. Claude rate-limit branch parses provider reset windows
(`_parse_claude_reset_seconds`) and clamps to [10, 3600] s.
6. `verify_paths` runs ruff/bandit/pytest scoped to chunk-modified files; both
engines call it when chunk metadata is present.
7. `chunk_spec_with_llm` now windows oversized specs (5000-char windows,
500-char overlap, capped at 10) instead of silently truncating.
8. Engine ABC trimmed to `execute_chunk` / `verify_chunk` / `fix_chunk` —
`run_build_cycle` and `BuildResult` removed.
9. Surfaced spec 27 §3.2 Claude CLI flags (`--allowedTools`, `--output-format
stream-json`) at the chunk call-site via `_DEFAULT_ALLOWED_TOOLS`.
10. Per-chunk deadline gate (`>=`) before `engine.execute_chunk`.
11. HF reflection step now gates on verification with up to 2 fix-cycle
attempts (`_HF_MAX_FIX_ATTEMPTS`).
12. New dedicated `tests/test_audit_logger.py` (15 cases).
13. Added `chunk_spec_with_llm` test coverage (10 cases total).
14. `_probe_git_credentials` distinguishes ssh-add exit codes (no_agent,
empty, keys_loaded, unknown) and tailors prompt accordingly.
15. Sandbox emits a one-shot WARNING on platforms without `os.O_NOFOLLOW`.
16. Confirmed CI matrix matches `pyproject.toml` Python classifiers.
17. `prompts.py` docstring documents the live template set.
18. Single-source-of-truth `_FORBIDDEN_PATTERNS_DOC` tuple in scaffolder.
## v30 — Operational Resilience & Idempotency
1. Per-repo advisory lockfile via `_run_lock` (`fcntl.LOCK_EX | LOCK_NB`);
second concurrent invocation exits 75 (EX_TEMPFAIL).
2. Persistent chunk-status ledger at `.codelicious/state/<spec>.json` — runs
resume by skipping already-merged chunks; new `--no-resume` and
`--reset-ledger` CLI flags.
3. CLI-layer endpoint validation (`_validate_endpoint_url_strict`) rejects
non-HTTPS and credentials-in-URL before banner.
4. SIGTERM integration test spawns a real Python child holding the run-lock,
asserts exit 143 within 8 s and lockfile cleanup.
5. Engine fallback list — Claude rate-limit fails over to HuggingFace for
the remainder of the run when both credentials are configured.
6. Token-budget-aware chunk sizing — `enforce_token_budget` recursively
halves over-budget chunks up to depth 3, preserving file coverage.
7. Atomic `latest.log` symlink update via `_atomic_symlink_update` (tmp +
`os.replace`); Windows fallback writes `<link>.txt`.
8. Coverage-floor enforcement (`resolve_min_coverage`,
`_enforce_coverage_floor`); CLI `--min-coverage`,
`[tool.codelicious].min_coverage`, default 90.
9. Enriched PR descriptions with Chunk Context, Verifier Summary, and
Audit Log sections via `chunk_metadata` arg to `_build_pr_body`.
10. Branch-name disambiguation (`_disambiguate_branch`) probes local + remote
and appends a hint or unix timestamp on collision.
11. Cross-process `fcntl.flock` on `.audit.lock` keeps audit lines from
interleaving across `codelicious` processes.
12. `_write_postmortem` aggregates ledger counts + log tail + resume hint
on abnormal exit, written to `.codelicious/postmortem-<ts>.md`.
## Quality
- `pytest` 1928 passed (was 1901 before, 27 net new — accounting for the
~70-test removal that came with deleting `tests/test_orchestrator.py`).
- `ruff check src/ tests/` clean.
- `ruff format src/ tests/` clean.
- `bandit -r src/` 0 medium / 0 high (low findings only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
clay-good
added a commit
that referenced
this pull request
May 5, 2026
post-merge bugfixes: 7 real issues found by deep review of PR #11
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Orchestratorremoved, jitter on all backoff sites, chunk-scoped verifier, prompt-window splitting, HF reflection gate, etc.fcntl.flock), idempotent resume ledger, CLI-layer endpoint validation, atomiclatest.logswap, token-budget-aware chunking, engine fallback on rate-limit, branch-name disambiguation, cross-process audit log, coverage gate, PR description metadata, postmortem on abort.tests/test_orchestrator.py; chunk-based coverage already lives intests/test_v2_orchestrator.py+tests/test_full_workflow.py.Scope notes
The two specs share files (
cli.py,orchestrator.py,verifier.py, etc.), so this lands as a single PR rather than two stacked ones. See the commit body for the per-step breakdown.Quality
pytest -q --no-cov→ 1928 passedruff check src/ tests/→ cleanruff format --check src/ tests/→ cleanbandit -r src/→ 0 medium, 0 high (low findings only)Test plan
codelicious .against a sandbox repo with a multi-task spec, confirm PRs ≤ 250 LOC and split into part-2 / part-3 as expected (closes one of the two unchecked items in spec 28's acceptance criteria).codelicious . --continuousruns to completion without intervention (closes the other spec 28 acceptance item)..codelicious/run.lockis removed and apostmortem-*.mdis written.ANTHROPIC_API_KEY(or Claude CLI auth) andHF_TOKENset, force a Claude rate-limit and confirm the build continues on HuggingFace.🤖 Generated with Claude Code