docs(corpus): #65 PR 4 of 4. spec/34 LOCKED + arc closer + Eleven status flip#316
Merged
Conversation
…ndum spec/34 flips from RFC to LOCKED. RFC banner and 4-PR shipping-plan provenance block replaced with a single locked-status line. Per-PR temporal markers throughout the body consolidated to present-tense lock prose (capability declarations, "Per-runner kwargs (PR 3 -- implemented)" subheaders, "SQLite hybrid layout (PR 2)" header, "Call-site migration reference (PR 3 -- implemented in #65 PR 3 of 4)" section title, "Follow-up issue filed at PR 4" deferral markers). The "PR 4 documentation-update checklist" section (lines 847-864) deleted; self-referential scaffolding has no place in a locked spec. Implementer Contract finalized at 9 normative MUSTs, mirroring PersonaBackend spec/33's shape exactly with one extra MUST for the query() capability precedence rule that CorpusBackend has via the FTS5 / semantic / substring fallback ladder: (1) name and corpus charset validation at API boundary, (2) side-effect-free construction, (3) capability honesty including embedding_provider=None invariant, (4) query() capability precedence rule, (5) write_page() 4-case behavior table, (6) URL credential redaction across all operator-facing error paths, (7) cross-corpus isolation at storage layer, (8) snapshot id determinism + cross-page isolation, (9) backend_id property stability + close() idempotency. The merge in MUST 9 is honest: backend_id is name-identity and close() is lifecycle-identity, both backend-identity contracts. spec/24 Decision 7 receives the CorpusBackend ownership addendum. The existing "Why" paragraph previously said MemoryBackend owned wiki/, memory/, and journal/. With CorpusBackend locked, the addendum clarifies: MemoryBackend retains exclusive ownership of memory/ and journal/; CorpusBackend, when registered, owns wiki/ and raw/. The two backends compose at prompt assembly (agent.py:_load_indexes() reads from both). 18 distinct edits across 11 line ranges in spec/34. File 881 to 855 lines. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…to "eleven shipped" CLAUDE.md adds the canonical CorpusBackend lock-paragraph (the 11th, mirroring the 10 prior shipped-protocol bullets), flips the ASCII architecture diagram from "Corpus 🟡" to "Corpus ✅ (locked at #65 PR 4)", bumps the spec-doc count from "30 locked + 2 drafts" to "31 locked + 2 drafts", refreshes the live test count to 2,937 collected (2,889 passing + 48 skipped) at 2026-06-01, and flips the Status block from "Ten backend protocols shipped" to "Eleven backend protocols shipped". Status tail flips from "remaining two protocols (Corpus / MCPServerRegistry)" to "remaining protocol (MCPServerRegistry)". README.md adds CorpusBackend to the shipped list in the Current limits paragraph (replacing "filesystem-default-only today" with the locked CorpusBackend summary including FTS5 + page-count cliff WARN + CLI + env-var override), bumps the comparison-matrix locked-docs count from 30 to 31, adds spec/34 to the spec list, flips the backend-protocols table row for CorpusBackend from "Planned" to "✅ Shipped" with the locked summary cell, flips the v1 direction sentence from "those two land" to "MCPServerRegistry lands", bumps the repo-structure test count to 2937 collected (2889 passing), and flips the Status block from "Ten of twelve" to "Eleven of twelve". ROADMAP.md (repo root, public strategic narrative) flips line 11 from "Ten of twelve" to "Eleven of twelve" with CorpusBackend appended to the shipped list and "Two remain" to "One remains", removes the now-shipped #65 row from the remaining-protocols table, and flips the ship-when sentence from "both remaining backends" to "the remaining backend". 7 + 10 + 3 = 20 edits across 3 files. The vault ROADMAP at ~/ObsidianVault/ Atomic Agents/ROADMAP.md is refreshed out-of-band (not in the git repo). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e impl docstring scrub
Cross-spec cross-references propagate the CorpusBackend locked status to
adjacent spec docs:
- spec/01 (anatomy): adds a CorpusBackend cross-reference paragraph after the
wiki/raw section explaining the Protocol seam and the SQLiteCorpusBackend
GB-scale benefit.
- spec/02 (atomic memory): adds a CorpusBackend cross-reference paragraph
after the "Why two layers" section naming the wiki/ + raw/ vs memory/ +
journal/ ownership split.
- spec/04 (runtime assembly): adds an integration note after the canonical
load order describing how step [7] routes through corpus_backend.
render_index_summary("wiki") when CorpusBackend is registered.
- spec/26 (cascade bundle DRAFT): flips two future-tense references ("when
CorpusBackend ships") to present-tense ("now that CorpusBackend has shipped,
locked at #65 PR 4 of 4") and updates the composition table row to cite
the specific render_index_summary("wiki") method.
- spec/31 (LLMBackend): appends "(spec/34)" link to the Corpus entry in the
protocol-pattern list.
spec/27 (doctor catalogue) already has the corpus-backend entry from PR 3
inline status flip; no edit needed (verified).
Reference impl docstring scrub completes the per-PR-marker consolidation
sweep across shipped Python code:
- corpus/__init__.py: drops "scaffolding PR -- no behavior change today" and
"in PR 3" temporals; rewrites the PRE-PR-3 wiring contract block to a
present-tense locked-status block (the SQLiteCorpusBackend "DEFERRED"
bullet is now FALSE since SQLite shipped in PR 2; only semantic search
remains deferred to v1.1); drops "(wired in PR 3)" from
get_default_corpus_backend docstring.
- corpus/types.py: drops "PR 1 of 4" + "PR 1, File 2 of 3" parentheticals;
deletes the "Scaffolding PR (#65 PR 1 of 4)" paragraph entirely; drops
"in PR 1 / PR 2 respectively" temporal.
- corpus/backend.py: replaces the 4-bullet per-PR shipping plan with a
single locked-status line.
- test_corpus_sqlite_backend.py: drops "PR 2 of 4" from the module docstring.
Mirrors PersonaBackend PR 4 commit 93dad48's stale-marker scrub pattern.
All edits are docstring/comment only; no executable Python changed. 158
tests on the affected corpus modules continue to pass; full suite still
2889 passing + 48 skipped (zero regressions).
7 + 10 = 17 edits across 9 files.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + protocol counts CHANGELOG.md adds 3 bullets under [Unreleased] mirroring PersonaBackend PR 4's arc-summary shape: - ### Changed: framework status flip from "ten of twelve" to "eleven of twelve backend protocols shipped" with operator-user outcome lead (pin SQLite via one env var for indexed FTS5 query at GB scale; doctor surfaces page-count cliff WARN; CLI honors env var; legacy paths soft-degrade gracefully on UnicodeDecodeError + OSError; IRON RULE byte-identity preserved). Cites all 4 PRs (#297, #298, #304, this PR) and all 10 follow-up issues (#305-#314). - ### Changed: spec/24 Decision 7 addendum naming CorpusBackend as the source of truth for wiki/ and raw/ (cross-spec ownership propagation). - ### Documentation: spec/34 LOCKED + doc-release sweep landed. Names the 9-MUST Implementer Contract finalization, the per-PR marker scrub across spec body + reference impls + tests + strategic docs, and the cross-spec cross-references. docs/deployment/programmatic.md: protocol-pattern paragraph flipped from "Ten backend protocols have shipped" to "Eleven backend protocols have shipped"; CorpusBackend added to the enumerated list; spec/34 added to the spec doc list; "two remain" flipped to "one remains" with MCPServerRegistryBackend the only remaining protocol. docs/methodology.md: "today ten are shipped" flipped to "today eleven are shipped" with CorpusBackend appended; test count bumped from 2686+ to 2937+. CONTRIBUTING.md: stale "2401 tests today" (drifted across multiple arcs) refreshed to "2937 tests today". 3 + 1 + 1 + 1 = 6 edits across 4 files. Test suite stable at 2889 passing + 48 skipped (Python 3.11/3.12). This is the PR 4 of 4 arc closer. After merge, the CorpusBackend arc CLOSES. 11 of 12 backend protocols shipped for v1.0; only MCPServerRegistryBackend (#201) remains. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… locked record Adversarial Round 1 caught an inverse-phantom in the CLAUDE.md 11th CorpusBackend lock-paragraph: the canonical "locked at PR 4 with..." test file list omitted `tests/test_corpus_registry.py`, which is a real file with 4 tests shipped in PR 1 of the arc and cited in CHANGELOG's PR 1 bullet. spec/34's §"Test coverage" PR 1 section also did not enumerate it. The omission is small but it creates inconsistency between the canonical PR 4 record (CLAUDE.md lock-paragraph) and the actual locked test surface. This is the PersonaBackend PR 4 Round 1 phantom-file failure shape in the opposite direction: a real-but-uncited file rather than a cited-but-nonexistent file. Same risk surface; same fix discipline. Fixes: - CLAUDE.md line 15 lock-paragraph: insert `tests/test_corpus_registry.py` between `test_corpus_sqlite_backend.py` and `test_corpus_composition.py` in the locked-at-PR-4 test file list. - spec/34 §"Test coverage" PR 1 section: add a 4-bullet sub-list under the `tests/test_corpus_registry.py` heading naming the registry primitives the tests cover (register / unregister round-trip and collision-replace; get_corpus_backend raises on unknown id; list_corpus_backends ordering; get_default_corpus_backend env var). Round 1 finding count: 0 P0, 1 P1, 0 P2. This commit lands the Round 2 convergence; full pytest still 2889 passing + 48 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 18 doc-release subagent caught 5 stale per-PR temporal markers that the Stream A spec/34 LOCK sweep and the Stream F reference-impl docstring sweep both missed. PersonaBackend PR 4's doc-release subagent caught the exact same shape (2 finds folded into commits ad6723b + e1d05cf); for PR 4 of #65 the equivalent finds are landed here BEFORE PR creation rather than fix-forward post-creation. Three spec/34 body edits: - Line 54 module-layout code block: drop "# SQLite ships in PR 2:" comment header (stale future-tense; SQLite shipped in PR 2 of this arc). - Line 233 Protocol surface docstring: drop "(PR 3)" parenthetical from the render_index_summary migration-target comment. - Line 818 implementation notes: rewrite "PR 3's call-site migration scope is writes of render_index_summary only" to present-tense "The call-site migration scope is reads through render_index_summary only"; rewrite "The PR 3 IRON RULE regression suite" to "The IRON RULE regression suite". Two corpus reference-impl docstring edits: - corpus/types.py lines 190-195: drop "(Subagent 2 HIGH H4 ... is a design assumption until real raw sample data is added at PR 1 prep or contributed by operators). Accept as provisional for v1.0." replaced with "the raw-side field shape is locked at v1.0 against issue #65's stated schema. Operator-contributed raw sample data could surface refinements for v1.1." The word "provisional" in a locked spec's reference impl contradicts the spec/34 LOCKED status; the v1.1 refinement framing matches the corpus_backend bundle.py:_source_paths v1.1 migration pattern at #314. - corpus/backend.py lines 145-156: drop "the PR 3 call-site migration" and "(PR 3)" temporals from the render_index_summary Protocol method docstring. The migration is historical; the docstring describes the Protocol contract today. Sixth finding (doc-release Check 3, TENSIONS.md T9): classified as FOLLOW_UP, not FIX_NOW. T9 carries pre-landing predictive language ("expect ~26 spec docs," "Around spec doc #25 (~CorpusBackend land)"). Worth a follow-up issue to update the count and tense; not blocking PR 4. (Tension itself, "spec surface grows with code surface," is still active.) Full pytest re-run on the 8 corpus test modules after these edits: 196 tests pass, zero regressions. Full suite expected to remain at 2889 + 48 skipped (no executable Python changed; only docstrings + a comment line). This completes the per-PR-marker consolidation sweep across the full locked surface: spec body, reference impls, tests, strategic docs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 4 of 4 of the CorpusBackend arc (#65). The arc closer. Doc-heavy lock pass.
Eleven backend protocols shipped; only
MCPServerRegistryBackend(#201)remains for v1.0 close.
6 commits, all bisectable:
5ab1950spec/34 RFC to LOCKED + spec/24 Decision 7 addendum naming CorpusBackend as the source of truth forwiki/andraw/.afa2575refresh CLAUDE.md (canonical 11th lock-paragraph + ASCII architecture diagram flip fromCorpus 🟡toCorpus ✅ (locked at #65 PR 4)) + README.md (backend-protocols table row flips to ✅ Shipped) + repo-root ROADMAP.md (Ten to Eleven of twelve, [backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks #65 row removed).121eb8across-spec CorpusBackend references (spec/01 anatomy, spec/02 atomic memory, spec/04 runtime assembly step [7] note, spec/26 future-tense to present-tense, spec/31 protocol list) + reference impl docstring scrub (corpus/init.py, types.py, backend.py + 1 test file) per PersonaBackend PR 4 commit 93dad48 precedent.e725dccCHANGELOG arc-closer (3 bullets under[Unreleased]mirroring PersonaBackend PR 4 shape) + non-spec doc test counts and protocol counts refreshed (programmatic.md, methodology.md, CONTRIBUTING.md).d149382Round 2 convergence: foldtests/test_corpus_registry.pyinto the canonical PR 4 record (Round 1 adversarial caught the inverse-phantom omission).9fd650bRound 3 convergence: doc-release sweep landed 5 stale-marker fixes (3 in spec/34 body, 2 in corpus reference impl docstrings) that the Stream A and Stream F sweeps both missed.The status flip is the PR 4 deliverable. PR 3 (
3d82c84/ #304) explicitly preserved the "Ten" wording so this PR's CHANGELOG arc-close bullet, ASCII diagram, status block, and lock-paragraph all flip together.What landed
spec/34 LOCKED: RFC banner + 4-PR shipping-plan provenance block (lines 2-9) replaced with single locked-status line; per-PR temporal markers throughout the body folded to present-tense ("Per-runner kwargs (PR 3 -- implemented)" subheaders, "SQLite hybrid layout (PR 2)" header, "Call-site migration reference (PR 3 -- implemented in [backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks #65 PR 3 of 4)" section title, "Follow-up issue filed at PR 4" deferral markers now pointing at [backend] CorpusBackend bundle.py:_source_paths v1.1 migration to Protocol-aware staleness tracking #314); §"PR 4 documentation-update checklist" deleted (lines 847-864 self-referential scaffolding). 21 distinct edits across 14 line ranges (18 from initial LOCK + 3 from Round 3 doc-release sweep). spec/34 went from 881 to 854 lines.
Implementer Contract finalized at 9 normative MUSTs: mirrors PersonaBackend spec/33 shape exactly, plus one extra MUST for the
query()capability precedence rule that CorpusBackend has via the FTS5 / semantic / substring fallback ladder. The 9 MUSTs: (1) page name + corpus charset validation at API boundary, (2) side-effect-free construction, (3) capability honesty includingembedding_provider=Noneinvariant, (4)query()capability precedence rule, (5)write_page()4-case behavior table, (6) URL credential redaction across all operator-facing error paths, (7) cross-corpus isolation at storage layer, (8) snapshot id determinism + cross-page isolation, (9)backend_idproperty stability +close()idempotency. The merge in MUST 9 is honest:backend_idis name-identity,close()is lifecycle-identity, both backend-identity contracts. N was a load-bearing decision (the design doc enumerated 9 categories provisionally; Subagent 1 derived 8 withbackend_id+close()merged; final lock at 9 mirrors the actual shipped surface).spec/24 Decision 7 addendum: existing "Why" paragraph previously said MemoryBackend owned
wiki/,memory/, andjournal/. The addendum clarifies: MemoryBackend retains exclusive ownership ofmemory/andjournal/; CorpusBackend, when registered, ownswiki/andraw/. The two backends compose at prompt assembly.CLAUDE.md + README.md + repo-root ROADMAP.md + vault ROADMAP.md: "Ten to Eleven of twelve" flip across all status surfaces; canonical 11th CorpusBackend lock-paragraph in CLAUDE.md mirroring the 10 prior shipped-protocol bullets (cites all 8 corpus test files including
tests/test_corpus_registry.pyafter Round 2 fix); ASCII architecture diagram flipped (Corpus ✅ (locked at #65 PR 4)); spec-doc count flipped from "30 locked + 2 RFCs" to "31 locked + 2 RFCs" across the surfaces it appears; backend-protocols table row in README flipped to ✅ Shipped; vault ROADMAP'slast_reviewfrontmatter updated to 2026-06-01 with the [backend] CorpusBackend — wiki/raw knowledge storage abstracted from filesystem walks #65 arc closure annotation.Cross-spec cross-references: spec/01 (anatomy) gains a CorpusBackend paragraph after the wiki/raw section; spec/02 (atomic memory) gains a paragraph after "Why two layers"; spec/04 (runtime assembly) gains an integration note after the canonical load order step [7]; spec/26 (cascade bundle DRAFT) flips two future-tense references to present-tense; spec/31 (LLMBackend) appends
(spec/34)to the Corpus entry in the protocol-pattern list. spec/27 doctor catalogue already had thecorpus-backendentry from PR 3 (verified).Reference impl docstring scrub: 12 stale per-PR temporal markers scrubbed from
atomic_agents/corpus/__init__.py+corpus/types.py+corpus/backend.py+tests/test_corpus_sqlite_backend.pydocstrings (10 from initial sweep + 2 from Round 3 doc-release convergence). Mirrors PersonaBackend PR 4 commit93dad48. All edits are docstring/comment only; no executable Python changed.10 follow-up issues filed inline during prep (
#305through#314):corpus_backend._agent_rootdivergenceread_versionDRY refactor (near-clone of read_page)CorpusInvalidNamere-raise tuplebundle.py:_source_pathsmigration to ProtocolCHANGELOG: 3 new bullets under
[Unreleased](2 Changed + 1 Documentation) covering the status flip + arc-summary, the spec/24 Decision 7 addendum, and the spec/34 LOCKED + doc-release sweep.Test Coverage
No new code paths; doc-only PR (apart from docstring scrubs in 4 reference impl modules + 1 test file). Test suite count unchanged: 2937 collected (2889 passing + 48 skipped, 41 warnings) on Python 3.11. Zero regressions.
uv run pytest -qran after Stream F (docstring scrubs) completed and again on the 8 corpus test modules after Round 2 + Round 3 fix commits. Result: identical counts, identical warnings, zero perturbation.Pre-Landing Review
Pre-impl prep pass (
/plan-subagentmethodology) ran 5 parallel Sonnet subagents covering: (1) spec/34 N-MUST count audit from shipped surface; (2) spec/34 LOCK readiness + per-PR marker enumeration; (3) cross-spec parity + status-flip surfaces audit; (4) stale-marker scrub candidates in reference impls + tests; (5) CHANGELOG arc-closer drafting + 10 follow-up issue templates.Findings rolled up: 0 SEVERE + 1 HIGH internal-consistency in the doc-heavy lock scope (Subagent 4 found 10 stale temporal markers in shipped Python code; folded into the implementation as commit 3). 3 MEDIUM cross-file count drifts (test count drifted across CONTRIBUTING.md, methodology.md, programmatic.md by varying amounts; folded into commit 4). Plus 1 load-bearing decision: final N for the Implementer Contract was 9 (not 8 as Subagent 1 initially recommended). The 8-vs-9 decision was surfaced to the maintainer; recommendation was for 9 to mirror PersonaBackend spec/33's structural pattern exactly with
close()documented at the Protocol surface rather than elevated to a numbered MUST.Track record extended: 22+ SEVERE + 30+ HIGH across 14 prep passes in the post-#285-revert streak.
Single-round Opus adversarial review post-implementation (Sonnet adversarial doc-consistency review per CLAUDE.md taste rule 11; minimal review army for doc-only PR per the project methodology).
Round 1 (adversarial): 0 P0 + 1 P1 + 0 P2.
tests/test_corpus_registry.py, which is a real file with 4 tests shipped in PR 1 and cited in CHANGELOG's PR 1 bullet. Mirrors PersonaBackend PR 4 Round 1's phantom-file risk in the opposite direction. Fixed in commitd149382.Round 2 (doc-release sweep, Step 18): 0 P0 + 5 FIX_NOW + 1 FOLLOW_UP.
# SQLite ships in PR 2:header; line 233(PR 3)migration-target callout in Protocol surface; line 818 "PR 3's call-site migration scope" in implementation notes).9fd650bBEFORE PR creation rather than fix-forward post-creation (matches PersonaBackend PR 4'sad6723b+e1d05cfpattern but tighter).Round 3 not run separately; the doc-release commit IS the convergence step. Full pytest re-run after R3 fix on the 8 corpus test modules: 196 passed, zero regressions. Full suite expected stable at 2889 + 48 skipped (no executable Python changed; only docstrings + a comment line).
Plan Completion
All deliverables in the original brief landed (DONE):
corpus-pr4-arc-closerbranch off main/plan-subagentprep pass (5 parallel Sonnet subagents)/shipend-to-end with the full pipelinePlus 1 additional follow-up issue surfaced by doc-release (TENSIONS.md T9 stale predictive language): filed as #315, not blocking PR 4.
Documentation
docs/spec/34-corpus-backend.mdflipped from RFC to LOCKED.docs/spec/24-agent-profile-backend.mdDecision 7 received the CorpusBackend ownership addendum.docs/spec/01-anatomy.md,docs/spec/02-atomic-memory.md,docs/spec/04-runtime-assembly.md,docs/spec/26-cascade-bundle.md, anddocs/spec/31-llm-backend.mdgained CorpusBackend cross-references.docs/deployment/programmatic.md,docs/methodology.md, andCONTRIBUTING.mdtest counts and shipped-backend counts refreshed.spec/27 (doctor catalogue)
corpus-backendentry confirmed present at line 384 (PR 3 inline status flip already added it).Documentation debt (deferred, filed as follow-up)
Surfaced during the Step 18 doc-release sweep; not in PR 4 scope:
Test plan
uv run pytest -qpasses: 2889 passed + 48 skipped, zero regressionsuv run pytest -qre-run after R2 + R3 fix commits on 8 corpus modules: 196 passed, zero regressionsgh issue createwithbackend+bug/polish/v0.1-followuplabels matching project conventionmainhonored (PR-only merge path)After this PR lands, the CorpusBackend arc CLOSES. Eleven of twelve backend protocols shipped for v1.0; only MCPServerRegistryBackend (#201) remains.
Closes #65.
🤖 Generated with Claude Code