diff --git a/CHANGELOG.md b/CHANGELOG.md index f62fe2c..23c4e29 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -85,6 +85,10 @@ CHANGELOG entry. - **spec/24 line 436 `TemplateProfileBackend` reservation removed** ([#62](https://github.com/dep0we/atomic-agents-stack/issues/62) PR 4, D5 cross-spec propagation). PR 1's spec/33 RFC documented the retirement (templates are PersonaBackend's domain because they are persona-centric: operators bring their own model and tools to a persona); PR 4 propagates the change to spec/24 itself by deleting the `TemplateProfileBackend` entry from ยง"Reserved future capabilities". `PersonaCapabilities.supports_templates` is the canonical home for the capability; a future persona-template marketplace (`pip install atomic-personas-starters` or a curated GitHub registry) registers a backend with `supports_templates=True` without forking the framework. +- **Framework status flipped from "ten of twelve" to "eleven of twelve backend protocols shipped"** ([#65](https://github.com/dep0we/atomic-agents-stack/issues/65), CorpusBackend arc **PR 4 of 4**, closes the arc). Only `MCPServerRegistryBackend` ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201)) remains for v1.0 close. Operators can pin SQLite for indexed FTS5 query at GB scale via one env var (`ATOMIC_AGENTS_CORPUS_BACKEND=sqlite`), fall back to filesystem keyword scan with a doctor WARN that names the remedy at the 1000-page cliff, and trust that no existing `AtomicAgent(...)` construction site changed behavior (IRON RULE byte-identity pinned by 5 explicit regression assertions across 166 sites). The 4-PR arc shipped: Protocol scaffolding + `FilesystemCorpusBackend` (PR 1 / [#297](https://github.com/dep0we/atomic-agents-stack/pull/297), ~67 tests), `SQLiteCorpusBackend` with FTS5 (PR 2 / [#298](https://github.com/dep0we/atomic-agents-stack/pull/298), ~46 tests through 3 rounds of Opus adversarial), wiring + per-runner kwargs + `delegate.py` explicit-only threading + doctor + IRON RULE regression suite (PR 3 / [#304](https://github.com/dep0we/atomic-agents-stack/pull/304), ~35 tests through 3 rounds of Opus adversarial converging at Round 3 LOW), spec/34 LOCK + status flip + doc-release sweep (this PR). Arc total: ~150 net new tests; test suite 2691 to 2937 collected (2889 passing + 48 skipped). CLAUDE.md adds the canonical 11th CorpusBackend lock-paragraph mirroring the 10 prior shipped-protocol bullets, flips the ASCII architecture diagram (`Corpus ๐ŸŸก` to `Corpus โœ… (locked at #65 PR 4)`), and bumps the spec-doc count from "30 locked + 2 RFCs" to "31 locked + 2 RFCs". README.md backend-protocols table row flips to "โœ… Shipped". Both ROADMAPs refreshed (repo-root + vault). Ten follow-up issues filed inline during prep ([#305](https://github.com/dep0we/atomic-agents-stack/issues/305) OSError data-loss fallback, [#306](https://github.com/dep0we/atomic-agents-stack/issues/306) SQLite connection leak on schema-init failure, [#307](https://github.com/dep0we/atomic-agents-stack/issues/307) cascade-layout `corpus_backend._agent_root` divergence gap, [#308](https://github.com/dep0we/atomic-agents-stack/issues/308) CLI integration tests, [#309](https://github.com/dep0we/atomic-agents-stack/issues/309) title-derivation DRY refactor, [#310](https://github.com/dep0we/atomic-agents-stack/issues/310) `read_version` DRY refactor, [#311](https://github.com/dep0we/atomic-agents-stack/issues/311) idempotent no-op lock-hoist optimization, [#312](https://github.com/dep0we/atomic-agents-stack/issues/312) redundant `CorpusInvalidName` re-raise tuple, [#313](https://github.com/dep0we/atomic-agents-stack/issues/313) `UnicodeDecodeError` partial-content test gap, [#314](https://github.com/dep0we/atomic-agents-stack/issues/314) `bundle.py:_source_paths` v1.1 Protocol-aware staleness tracking). + +- **spec/24 Decision 7 addendum: `CorpusBackend` named as the source of truth for `wiki/` and `raw/`** ([#65](https://github.com/dep0we/atomic-agents-stack/issues/65) PR 4 cross-spec propagation). spec/24's existing Decision 7 "Why" paragraph previously said `MemoryBackend` owned `wiki/`, `memory/`, and `journal/`. With CorpusBackend locked, the addendum clarifies: `MemoryBackend` retains exclusive ownership of `memory/` and `journal/`; `CorpusBackend`, when registered, owns `wiki/` and `raw/`. The two backends compose at prompt assembly (`agent.py:_load_indexes()` reads from both). Closes the cross-spec ownership ambiguity that surfaced during PR 3 wiring review. + #### BREAKING - **`ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP` env-var default flipped from `false` to `true`** ([#89](https://github.com/dep0we/atomic-agents-stack/issues/89) โ€” PolicyBackend arc **PR 4 of 4**, closes the arc; spec/32 LOCKED). Operators authoring `policy.md` with tool / MCP-server / model surfaces now see those surfaces **enforce by default**: denied tools produce a synthesized `policy_blocked` `ToolCallResult` (no handler execution); denied MCP servers are filtered before `MCPClientPool` construction (no subprocess spawn); Policy's `get_effective_model()` return replaces the pre-Policy effective model for the LLM call. Cost caps were already enforcing (PR 3a) and are unaffected. @@ -121,6 +125,8 @@ CHANGELOG entry. - **Consolidated doc-debt sweep: spec/27 doctor catalogue extended, per-PR markers scrubbed across 6 locked specs, Protocol exception catalog added to disaster-recovery, methodology arc-table refreshed, README spec list backfilled.** Closes accumulated documentation drift across the 10 shipped protocol arcs. **spec/27-doctor.md** gains 6 new scope-scoped backend-check entries (`lock-backend`, `log-backend`, `agent-profile-backend`, `tool-registry-backend`, `mandate-backend`, `policy-backend`) following the canonical `persona-backend` template added in PR 4 of [#62](https://github.com/dep0we/atomic-agents-stack/issues/62). Each entry documents the PASS/WARN/FAIL ladder, capability snapshot detail, URL credential redaction shape, and the production failure mode the check prevents. The `mandate-backend` entry carries an "Implementation status" disclaimer pointing at [#235](https://github.com/dep0we/atomic-agents-stack/issues/235) because `check_mandate_backend` was never implemented during the #124 arc despite being documented in CLAUDE.md as shipping. **Per-PR temporal markers scrubbed** across `spec/22-log-backend.md` (26 to 4 markers), `spec/24-agent-profile-backend.md` (32 to 14), `spec/25-tool-registry-backend.md` (32 to 21), `spec/28-judge-layer.md` (23 to 1), `spec/29-mandates.md` (22 to 15), and `spec/32-policy-backend.md` (5 to 0). The decision rule applied: top-of-file metadata (LOCKED banner + canonical "Shipping plan across the #ARC arc" line + cross-links) preserved verbatim; Decision-anchored / Risk-anchored / plan-subagent-anchored historical references preserved as load-bearing provenance; forward-pointing "PR N ships X" / "lands in PR M" / "shipped in PR Y" prose rewritten to present tense or dropped; ยง"Out of scope" sections cleaned of bullets describing work that has now shipped (spec/32 dropped two bullets describing AtomicAgent wiring + consumption logic that landed across PR 2-3 of #89). **`docs/methodology.md`** "Recent arcs" table gains 3 new rows for #124 MandateBackend (13 SEVERE + 9 HIGH across 5 prep passes; PR 3b second-pass amendments caught Step 8 vs Step 9 precedence inversion + cache leak on BLOCK paths), #89 PolicyBackend (PR 4 BREAKING default flip; #273 dedup invariant; #274 model_from_per_call_override audit field), and #62 PersonaBackend (PR 4 Round 1 phantom test file + MUST #8 timezone-wording drift; Round 2 public repo-root ROADMAP drift; doc-release subagent miscount). Line 12-16 prose list updated to "Ten of twelve backend protocols shipped" with PersonaBackend appended. **`docs/deployment/disaster-recovery.md`** gains a new "Protocol-backend exception catalog" section covering all 7 shipped Protocols using the verified exception inventory from `atomic_agents/exceptions.py` + `atomic_agents/mandate/types.py` + `atomic_agents/policy/types.py` (the verification step caught the pre-impl-prep observation that `MandateExpired`, `MandateConstraintViolation`, `PolicyDenied`, `PolicySnapshotNotFound`, and `LogBackendError` do not exist as classes; the catalog uses the actual `MandateError` / `MandateInvalid` / `MandateNotFound` / `MandateStateSchemaUnsupported` + audit-event family `mandate_cap_exceeded_block` for Mandate, the actual `PolicyError` / `PolicyInvalid` + audit-event family `policy_decision` with `decision_kind: deny` for Policy, and documents LogBackend's stdlib-exception failure shapes honestly). The catalog sits after the doctor-failure table (which gains 7 Protocol-backend rows matching spec/27's new entries) and before the cross-references block. Line 22's "Doctor runs nine checks (env, python, ...)" replaced with a cross-reference to spec/27 so the file stays current as the doctor catalogue grows; line 198's "All nine checks should pass" relaxed to "All checks should pass". **`README.md`** spec list at lines 156-182 backfilled with 9 previously-missing locked specs (21, 22, 24, 25, 28, 29, 30 marked DRAFT, 31, 32). spec/23 deliberately skipped (does not exist; the numbering jumps from 22 to 24 because spec/23 was historically renumbered). Line 207's "remaining three protocols ship" drift fixed to "remaining two protocols" (same lesson the post-merge sweep of PR 4 of #62 caught for ROADMAP). All 10 modified files compose to a single bisectable PR; full test suite (`uv run pytest`) remains at 2686 passing + 48 skipped, zero regressions, because the sweep does not touch any code path. **Track record extended: 22 SEVERE + 23 HIGH across 10 prep passes** (pre-impl prep on this PR caught 2 SEVERE + 2 HIGH that would have shipped wrong exception names and deleted load-bearing historical record). Codex skipped per standing project rule; Opus adversarial subagent ran the prep + review rounds. +- **spec/34 LOCKED + doc-release sweep landed** ([#65](https://github.com/dep0we/atomic-agents-stack/issues/65) PR 4, closes the arc). `docs/spec/34-corpus-backend.md` flips from RFC to LOCKED. The RFC banner and 4-PR shipping-plan provenance block at the top are replaced with a single locked-status line. Per-PR temporal markers throughout the spec body (capability declarations, `Per-runner kwargs (PR 3 -- implemented)` subheaders, `SQLite hybrid layout (PR 2)` headers, `Call-site migration reference (PR 3 -- implemented in #65 PR 3 of 4)` section title, `Follow-up issue filed at PR 4` deferral markers) consolidated to present-tense lock prose. ยง"PR 4 documentation-update checklist" deleted (self-referential scaffolding has no place in a locked spec). ยง"Implementer contract for corpus backends" finalizes at 9 normative MUSTs (mirroring PersonaBackend spec/33's shape exactly, with one extra MUST for the `query()` capability precedence rule that CorpusBackend has via the FTS5 / semantic / substring fallback ladder): page name and corpus charset validation at API boundary, side-effect-free construction, capability honesty including `embedding_provider=None` invariant, `query()` capability precedence rule, `write_page()` 4-case behavior table (fresh / idempotent no-op / CAS / collision), URL credential redaction across all operator-facing error paths, cross-corpus isolation at storage layer, snapshot id determinism + cross-page isolation, `backend_id` property stability + `close()` idempotency. spec/24 Decision 7 receives the CorpusBackend ownership addendum. spec/26 (cascade bundle, DRAFT) cross-references updated from future-tense ("when CorpusBackend ships") to present-tense ("now that CorpusBackend has shipped, locked at #65 PR 4"). spec/01 (anatomy), spec/02 (atomic memory), spec/04 (runtime assembly step [7]), and spec/31 (LLMBackend) gain CorpusBackend cross-references. Per-PR temporal markers scrubbed from `atomic_agents/corpus/__init__.py` + `atomic_agents/corpus/types.py` + `atomic_agents/corpus/backend.py` + `tests/test_corpus_sqlite_backend.py` docstrings (completes the per-PR-marker consolidation sweep across spec body, reference impls, tests, and strategic docs). `docs/deployment/programmatic.md`, `docs/methodology.md`, and `CONTRIBUTING.md` test counts and shipped-backend counts refreshed. + ### Added - **PersonaBackend Protocol scaffolding** ([#62](https://github.com/dep0we/atomic-agents-stack/issues/62) โ€” PersonaBackend protocol arc **PR 1 of 4**). New `atomic_agents/persona/` module ships the `PersonaBackend` `@runtime_checkable` Protocol surface (9 methods: `load_persona`, `save_persona`, `list_personas`, `exists`, `clone`, `snapshot`, `restore`, `list_snapshots`, `capabilities` + `backend_id` class attribute), the `FilesystemPersonaBackend(personas_root)` reference impl, the registry primitives (`register_persona_backend`, `get_persona_backend`, `list_persona_backends`, `unregister_persona_backend` per D9 fold #3 fixture hygiene, `get_default_persona_backend(scope_root)` honoring `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars), the `make_filesystem_persona_backend_from_url("filesystem:///path")` URL factory with credential redaction across all `ValueError` sites, and the three frozen dataclasses (`Persona`, `PersonaSnapshot`, `PersonaCapabilities`). Storage layout `//{IDENTITY,SOUL,USER}.md` plus `metadata.json` sidecar (`schema_version: 1`, version, label, created_at). `persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot / empty-string refusal. Side-effect-free construction (no filesystem walk in `__init__`; lazy read on first method call) so the 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no `persona.link.md` exists. Six persona exceptions in `atomic_agents/exceptions.py` per D-PI-1 (NOT in `persona/types.py`): `PersonaError`, `PersonaNotFound`, `PersonaExists`, `PersonaCorrupted`, `PersonaSnapshotNotFound`, `PersonaOwnershipConflict`, `PersonaLinkInvalid`. Cross-module placement matches the `AgentProfileNotFound` / `ToolNotInRegistry` convention because `PersonaOwnershipConflict` is raised by profile backends (PR 2 wiring) and `PersonaLinkInvalid` by the `persona_link_md.py` parser (PR 2 per D-ER-3). diff --git a/CLAUDE.md b/CLAUDE.md index f5e3ff4..80fc7db 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -12,7 +12,7 @@ For broader context, read these in order on a fresh session: ## What this is -Atomic Agents is a vault-native AI agent framework: agents live as plain markdown files, the runtime is stateless, and storage is moving toward swappable protocols layer by layer. **Shipped backend protocols**: MemoryBackend (PR #57); LLMBackend (#87 โ€” Anthropic + OpenAI + Moonshot reference impls); JudgeBackend Protocol (#112 โ€” locked at PR 4 with conformance suite, PolicyJudge + LLMJudgeBackend reference impls, ESCALATE + REVISE state machines, `judges.md` operator config + cascade-aware project floor, operator-driven resolution flow); LockBackend Protocol (#60 โ€” locked at PR 4 with `FilesystemLockBackend` + `RedisLockBackend` reference impls, `scope()` Protocol method, daemon-thread heartbeat with `LockLost` lease-expiry detection, operator override via env vars + constructor kwarg, doctor `check_lock_backend` coherence check โ€” closes the multi-host cliff so atomic-agents runs on Cloud Run / Kubernetes / gizmo without forking); LogBackend Protocol (#61 โ€” locked at PR 4 with `FilesystemLogBackend` + `SQLiteLogBackend` reference impls, parametrized conformance suite across both backends, operator override via `ATOMIC_AGENTS_LOG_BACKEND` env var + constructor kwarg + per-runner kwargs on OutcomeRunner/DreamRunner, doctor `check_log_backend` coherence check with stats probe + URL credential redaction, `LogQuery.agent_name` filter for shared-backend cross-agent isolation โ€” closes the dashboard-perf cliff: operators on Cloud Run / Kubernetes can pin SQLite for indexed query/aggregate/retention); AgentProfileBackend Protocol (#63 โ€” locked at PR 4 with `FilesystemAgentProfileBackend` + `SQLiteAgentProfileBackend` reference impls, parametrized conformance suite across both backends, JSON-based snapshot trio on both backends, `supports_skills` capability dimension, operator override via `ATOMIC_AGENTS_PROFILE_BACKEND=sqlite` + optional `ATOMIC_AGENTS_PROFILE_BACKEND_URL` env vars OR `AtomicAgent(..., profile_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner/delegate.py, doctor `check_agent_profile_backend` coherence check with capability snapshot + agent-count probe + URL credential redaction, Implementer contract for registry-backed backends documented in spec/24 โ€” closes the SaaS-shape cliff: SaaS / database-backed / git-backed agent registries are now ONE Protocol implementation away); ToolRegistryBackend Protocol (#64 โ€” locked at PR 4 with `FilesystemToolRegistryBackend` + `SQLiteToolRegistryBackend` reference impls, parametrized conformance suite across both backends, hybrid metadata-in-SQL + handler-bodies-on-disk storage shape on SQLite, `install` / `uninstall` capability flipped True on SQLite with TOCTOU-safe INSERT-first + atomic_write-on-success-only atomicity, multi-process WAL race resolved by `PRAGMA busy_timeout=5000` before WAL pragma, cross-scope isolation enforced at SQL layer (`WHERE agent_scope = ?` on every query), URL factory credential redaction across all 5 `ValueError` sites, operator override via `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND=sqlite` + optional `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND_URL` (`sqlite:///path?agent_scope=`) env vars OR `AtomicAgent(..., tool_registry_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (delegate.py deliberately NOT threaded โ€” per-agent scoping per spec/25 Decision 9), doctor `check_tool_registry_backend` coherence check with capability snapshot + tool-count probe + URL credential redaction, Implementer contract for registry-backed tool backends documented in spec/25 โ€” Protocol seam in place; future PyPI / git / company-internal-HTTP / SaaS-database adapters register via `register_tool_registry_backend(...)` without forking core); **PolicyBackend Protocol (#89 โ€” locked at PR 4 with `FilesystemPolicyBackend` reference impl reading `/policy.md` (markdown + embedded YAML), mtime+size composite-key parse cache (`cache_ttl_s=0` capability declaration โ€” operators observe edits within 0 seconds of mtime change), `agent_name` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal, side-effect-free construction (lazy parse on first method call so the 115 existing `AtomicAgent(...)` construction sites stay byte-identical when no `policy.md` exists), parametrized conformance suite across registered backends, `PolicySnapshotForCall` frozen per call entry (per Premise 3 โ€” operator edits mid-call defer to the next call), cost-cap MIN composition in `_check_cost_guardrails` + `MandateCheck` steps 7-9 consume pre-composed effective caps (PR 3a โ€” cost caps enforce immediately), non-cap surfaces (tool allowlist, MCP server allowlist, model selection) consumed at the matching call sites with `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP` env-var-gated enforcement (PR 3b shipped in log-only mode; **PR 4 flipped the default to `true` โ€” non-cap surfaces enforce by default; operators wanting log-only set the env to `false` explicitly**), unified `policy_decision` event family with `decision_kind: deny | override` discriminator + `axis: cost_cap | tool_allowlist | mcp_allowlist | model_selection` field + `enforced: bool` so SaaS / Postgres adapters target a frozen schema (Premise 4 โ€” one event answers "was this Policy or Mandate?" via `denying_layer`), `model_from_per_call_override` field captures the `agent.call(model=...)` kwarg when Policy supersedes it (#274 โ€” fleet-config-wins precedence is audit-visible to the caller), per-call dedup set bounds tool-allowlist denial emissions to one event per `(tool_name, call)` (#273 โ€” log-only audit shape stays clean when the LLM re-attempts a denied tool every iteration), per-dimension MIN cap math (`daily` and `monthly` independently; cumulative deferred to v1.1 per plan-subagent D1), per-agent overrides under nested `agents:` section with field-level MERGE for caps + UNION+deny-wins for allowlists + REPLACE for model selection, cross-host cap-overrun bound `(replica_count) ร— (per-call ceiling)` documented for shared-FS deployments (Postgres / SaaS adapters with linearizable state get exact-cap semantics through their own consistency layer), operator override via `ATOMIC_AGENTS_POLICY_BACKEND` env var OR `AtomicAgent(..., policy_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner + `delegate.py` threading per spec/32 D1 (Policy is fleet-scoped โ€” a delegate inheriting the coordinator's pinned Postgres backend doesn't silently fall back to filesystem-default), `doctor.check_policy_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction, Implementer contract for policy backends documented in spec/32 ยง"Implementer contract for policy backends" (7 normative MUSTs covering `agent_name` validation at API boundary, per-agent storage isolation, `cache_ttl_s`-bounded staleness, side-effect-free construction, capability honesty, URL credential redaction, `PolicyDecision` event schema compliance). **Closes the cross-agent configuration cliff**: operators with a fleet of agents stop hand-syncing `model.md` / `tools.md` / `mcp.md` across N agents; the single project-root `policy.md` is the audit-trail source of truth, with SaaS / Postgres / org-admin-console adapters one Protocol implementation away.** **MandateBackend Protocol (#124 โ€” locked at PR 4 with `FilesystemMandateBackend` reference impl, parametrized conformance suite across registered backends, `MandateCheck` judge specialist with validation steps 1-9 (existence, source-hash binding, state, tool allowlist, target allowlist via per-agent named `TargetExtractorRegistry`, time window, token-cost projection with stale-baseline defense, external-cost projection via `CostEstimatorRegistry` fail-closed to `mandate_external_cost_unprojectable`, escalation thresholds with ESCALATE-preempts-BLOCK precedence), reservation pattern (`MandateReservationManager.create / commit / rollback / _expire` lifecycle with `threading.Timer`-driven TTL watchers + `threading.Lock`-serialized in-process state), crash recovery via `MandateBackend.recover_orphan_reservations` with LockBackend-serialized scan-inside-lock discipline (pessimistic over-report > silent under-bill for orphan reservations from prior crashed runs), post-action verification event family (`mandate_action_verified` / `_diverged` / `_verification_unavailable` emitted exactly once per `external_side_effect` / `irreversible` action after cost commit), suspicious-rebind throttle (60s default; closes the source-hash-before-state edit window for prompt-injection-style threats), `mandates.md` operator-authored markdown + embedded YAML parser + `judges.md ## Mandates` operator config with cascade-aware project floor, structural write protection (`mandates.md` excluded from default WritePolicy alongside `tools.md` / `judges.md` / `persona/*.md`), operator override via `ATOMIC_AGENTS_MANDATE_BACKEND` env var OR `AtomicAgent(..., mandate_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (delegate.py deliberately NOT threaded โ€” per-agent scoping per spec/29 + spec/15 delegate isolation), doctor `check_mandate_backend` coherence check, Implementer contract for mandate backends documented in spec/29 โ€” closes the durable-authorization cliff: operators authoring `cumulative_external_usd: 6000` on a procurement mandate now have that cap defended against concurrent action races + crash-restart, with operator-facing audit signal when an action's executed target diverged from authorization at proposal time).** **PersonaBackend Protocol (#62 โ€” locked at PR 4 with `tests/test_persona_protocol_conformance.py` parametrized across registered backends + `tests/test_persona_filesystem_backend.py` + `tests/test_persona_composition.py` + `tests/test_profile_composition_snapshot.py` + `tests/test_profile_composition_restore.py`, `FilesystemPersonaBackend(personas_root)` reference impl at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json` sidecar (hidden namespace mirrors `.snapshots/`; `list_agents()` skips dot-prefixed entries so personas don't surface as agents), `persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal, group-atomic `save_persona` with `mkdir(exist_ok=False)` for race-free fresh-create + swap-and-delete for `overwrite=True` (20-iteration retry bound on macOS APFS `ENOTEMPTY`), snapshot trio capability flipped `supports_snapshot=False โ†’ True` in PR 3 with nested storage `//.snapshots//` (D-PP-10 โ€” geometric cross-persona isolation: `rm -rf //` removes the persona AND its full history cleanly) + `snap__<12hex>` snapshot IDs matching AgentProfile (D-PP-11 โ€” 48-bit entropy + cross-Protocol uniformity enables a shared `_validate_snapshot_id` path-security guard), `/persona.link.md` (YAML-in-code-block with `kind: shared` + `persona_id` per D-ER-4) is the ownership trigger driving AgentProfileBackend composition via `external_persona_ref(agent_id) -> str | None` (D-PP-3 โ€” supersedes D-ER-1's earlier boolean for cleaner bootstrap-path resolution) so `load_profile` repopulates persona fields + re-derives `agent_mode` (D-PP-4), `save_profile` ignores persona fields when externally owned (D6, mirrors spec/24 Decision 6's `agent_mode` pattern), `snapshot()` + `restore()` drop persona fields with one-time `agent_profile_restore_dropped_persona_fields` warning per `(agent_id, snapshot_id)` via thread-safe per-process dedup (D-PP-13 migration-window event), `PersonaOwnershipConflict` raised on filesystem-backend when both `persona.link.md` and `persona/IDENTITY.md` coexist (D2a + D-PP-8 โ€” filesystem-only loud refusal; SQLite uses silent-drop with the equivalent `agent_profile_save_dropped_persona_fields` event for cross-backend uniformity), SQLite v1โ†’v2 schema migration adds `agents.persona_id` column with forward-only race-loser handling, D-PP-1 sentinel sweep teaches `load_profile/list_agents/exists/list_skills/load_skill_body` about the shared-persona layout (D-PP-12 closed the sweep in PR 3), operator override via `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars OR `AtomicAgent(..., persona_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner + `delegate.py` explicit-only threading per D-ER-2 (mirrors Policy's `_policy_backend_was_explicit` precedent at `agent.py:401`; default-resolved backends do not leak the coordinator's `personas_root` to delegates because persona is per-agent semantic context), `atomic-agents persona list / show / snapshot / list-snapshots / restore / clone` CLI (zero LLM calls) catches `PersonaError` subclasses + `OSError` + `PermissionError` cleanly, doctor `check_persona_backend` coherence check with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction, Implementer contract for persona backends documented in spec/33 ยง"Implementer contract for persona backends" (8 normative MUSTs), D5 retires spec/24's `TemplateProfileBackend` reservation โ€” `PersonaCapabilities.supports_templates` is the canonical home for a future persona-template marketplace surface โ€” **closes the shared-persona cliff**: a team running 5 customer-support agents stops maintaining 5 separate `SOUL.md` files that drift; one canonical persona record (`shared:customer-support-v3`) serves all 5 agents with consistent identity, snapshot/restore lifecycle, and operator-editable markdown โ€” home users with one agent running the legacy `/persona/{IDENTITY,SOUL,USER}.md` layout see byte-identical pre-#62 behavior because the legacy layout works forever through AgentProfile's existing filesystem walk).** **Ten backend protocols shipped.** **Next per ROADMAP**: Corpus ([#65](https://github.com/dep0we/atomic-agents-stack/issues/65)) / MCPServerRegistry ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201)) protocols โ€” two remaining for v1.0 close. #201 was carved out of #64 via spec/25 Decision 3 (MCP servers are processes; ToolRegistry is functions โ€” they share Protocol-pattern shape but not invocation semantics). A person at home runs filesystem-everything with one agent. An organization runs the same agents over Postgres, behind an HTTP service, with a fleet of orchestrated roles. **Same agent definitions, same call() flow, same audit trail. Different backends.** +Atomic Agents is a vault-native AI agent framework: agents live as plain markdown files, the runtime is stateless, and storage is moving toward swappable protocols layer by layer. **Shipped backend protocols**: MemoryBackend (PR #57); LLMBackend (#87 โ€” Anthropic + OpenAI + Moonshot reference impls); JudgeBackend Protocol (#112 โ€” locked at PR 4 with conformance suite, PolicyJudge + LLMJudgeBackend reference impls, ESCALATE + REVISE state machines, `judges.md` operator config + cascade-aware project floor, operator-driven resolution flow); LockBackend Protocol (#60 โ€” locked at PR 4 with `FilesystemLockBackend` + `RedisLockBackend` reference impls, `scope()` Protocol method, daemon-thread heartbeat with `LockLost` lease-expiry detection, operator override via env vars + constructor kwarg, doctor `check_lock_backend` coherence check โ€” closes the multi-host cliff so atomic-agents runs on Cloud Run / Kubernetes / gizmo without forking); LogBackend Protocol (#61 โ€” locked at PR 4 with `FilesystemLogBackend` + `SQLiteLogBackend` reference impls, parametrized conformance suite across both backends, operator override via `ATOMIC_AGENTS_LOG_BACKEND` env var + constructor kwarg + per-runner kwargs on OutcomeRunner/DreamRunner, doctor `check_log_backend` coherence check with stats probe + URL credential redaction, `LogQuery.agent_name` filter for shared-backend cross-agent isolation โ€” closes the dashboard-perf cliff: operators on Cloud Run / Kubernetes can pin SQLite for indexed query/aggregate/retention); AgentProfileBackend Protocol (#63 โ€” locked at PR 4 with `FilesystemAgentProfileBackend` + `SQLiteAgentProfileBackend` reference impls, parametrized conformance suite across both backends, JSON-based snapshot trio on both backends, `supports_skills` capability dimension, operator override via `ATOMIC_AGENTS_PROFILE_BACKEND=sqlite` + optional `ATOMIC_AGENTS_PROFILE_BACKEND_URL` env vars OR `AtomicAgent(..., profile_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner/delegate.py, doctor `check_agent_profile_backend` coherence check with capability snapshot + agent-count probe + URL credential redaction, Implementer contract for registry-backed backends documented in spec/24 โ€” closes the SaaS-shape cliff: SaaS / database-backed / git-backed agent registries are now ONE Protocol implementation away); ToolRegistryBackend Protocol (#64 โ€” locked at PR 4 with `FilesystemToolRegistryBackend` + `SQLiteToolRegistryBackend` reference impls, parametrized conformance suite across both backends, hybrid metadata-in-SQL + handler-bodies-on-disk storage shape on SQLite, `install` / `uninstall` capability flipped True on SQLite with TOCTOU-safe INSERT-first + atomic_write-on-success-only atomicity, multi-process WAL race resolved by `PRAGMA busy_timeout=5000` before WAL pragma, cross-scope isolation enforced at SQL layer (`WHERE agent_scope = ?` on every query), URL factory credential redaction across all 5 `ValueError` sites, operator override via `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND=sqlite` + optional `ATOMIC_AGENTS_TOOL_REGISTRY_BACKEND_URL` (`sqlite:///path?agent_scope=`) env vars OR `AtomicAgent(..., tool_registry_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (delegate.py deliberately NOT threaded โ€” per-agent scoping per spec/25 Decision 9), doctor `check_tool_registry_backend` coherence check with capability snapshot + tool-count probe + URL credential redaction, Implementer contract for registry-backed tool backends documented in spec/25 โ€” Protocol seam in place; future PyPI / git / company-internal-HTTP / SaaS-database adapters register via `register_tool_registry_backend(...)` without forking core); **PolicyBackend Protocol (#89 โ€” locked at PR 4 with `FilesystemPolicyBackend` reference impl reading `/policy.md` (markdown + embedded YAML), mtime+size composite-key parse cache (`cache_ttl_s=0` capability declaration โ€” operators observe edits within 0 seconds of mtime change), `agent_name` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal, side-effect-free construction (lazy parse on first method call so the 115 existing `AtomicAgent(...)` construction sites stay byte-identical when no `policy.md` exists), parametrized conformance suite across registered backends, `PolicySnapshotForCall` frozen per call entry (per Premise 3 โ€” operator edits mid-call defer to the next call), cost-cap MIN composition in `_check_cost_guardrails` + `MandateCheck` steps 7-9 consume pre-composed effective caps (PR 3a โ€” cost caps enforce immediately), non-cap surfaces (tool allowlist, MCP server allowlist, model selection) consumed at the matching call sites with `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP` env-var-gated enforcement (PR 3b shipped in log-only mode; **PR 4 flipped the default to `true` โ€” non-cap surfaces enforce by default; operators wanting log-only set the env to `false` explicitly**), unified `policy_decision` event family with `decision_kind: deny | override` discriminator + `axis: cost_cap | tool_allowlist | mcp_allowlist | model_selection` field + `enforced: bool` so SaaS / Postgres adapters target a frozen schema (Premise 4 โ€” one event answers "was this Policy or Mandate?" via `denying_layer`), `model_from_per_call_override` field captures the `agent.call(model=...)` kwarg when Policy supersedes it (#274 โ€” fleet-config-wins precedence is audit-visible to the caller), per-call dedup set bounds tool-allowlist denial emissions to one event per `(tool_name, call)` (#273 โ€” log-only audit shape stays clean when the LLM re-attempts a denied tool every iteration), per-dimension MIN cap math (`daily` and `monthly` independently; cumulative deferred to v1.1 per plan-subagent D1), per-agent overrides under nested `agents:` section with field-level MERGE for caps + UNION+deny-wins for allowlists + REPLACE for model selection, cross-host cap-overrun bound `(replica_count) ร— (per-call ceiling)` documented for shared-FS deployments (Postgres / SaaS adapters with linearizable state get exact-cap semantics through their own consistency layer), operator override via `ATOMIC_AGENTS_POLICY_BACKEND` env var OR `AtomicAgent(..., policy_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner + `delegate.py` threading per spec/32 D1 (Policy is fleet-scoped โ€” a delegate inheriting the coordinator's pinned Postgres backend doesn't silently fall back to filesystem-default), `doctor.check_policy_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction, Implementer contract for policy backends documented in spec/32 ยง"Implementer contract for policy backends" (7 normative MUSTs covering `agent_name` validation at API boundary, per-agent storage isolation, `cache_ttl_s`-bounded staleness, side-effect-free construction, capability honesty, URL credential redaction, `PolicyDecision` event schema compliance). **Closes the cross-agent configuration cliff**: operators with a fleet of agents stop hand-syncing `model.md` / `tools.md` / `mcp.md` across N agents; the single project-root `policy.md` is the audit-trail source of truth, with SaaS / Postgres / org-admin-console adapters one Protocol implementation away.** **MandateBackend Protocol (#124 โ€” locked at PR 4 with `FilesystemMandateBackend` reference impl, parametrized conformance suite across registered backends, `MandateCheck` judge specialist with validation steps 1-9 (existence, source-hash binding, state, tool allowlist, target allowlist via per-agent named `TargetExtractorRegistry`, time window, token-cost projection with stale-baseline defense, external-cost projection via `CostEstimatorRegistry` fail-closed to `mandate_external_cost_unprojectable`, escalation thresholds with ESCALATE-preempts-BLOCK precedence), reservation pattern (`MandateReservationManager.create / commit / rollback / _expire` lifecycle with `threading.Timer`-driven TTL watchers + `threading.Lock`-serialized in-process state), crash recovery via `MandateBackend.recover_orphan_reservations` with LockBackend-serialized scan-inside-lock discipline (pessimistic over-report > silent under-bill for orphan reservations from prior crashed runs), post-action verification event family (`mandate_action_verified` / `_diverged` / `_verification_unavailable` emitted exactly once per `external_side_effect` / `irreversible` action after cost commit), suspicious-rebind throttle (60s default; closes the source-hash-before-state edit window for prompt-injection-style threats), `mandates.md` operator-authored markdown + embedded YAML parser + `judges.md ## Mandates` operator config with cascade-aware project floor, structural write protection (`mandates.md` excluded from default WritePolicy alongside `tools.md` / `judges.md` / `persona/*.md`), operator override via `ATOMIC_AGENTS_MANDATE_BACKEND` env var OR `AtomicAgent(..., mandate_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner (delegate.py deliberately NOT threaded โ€” per-agent scoping per spec/29 + spec/15 delegate isolation), doctor `check_mandate_backend` coherence check, Implementer contract for mandate backends documented in spec/29 โ€” closes the durable-authorization cliff: operators authoring `cumulative_external_usd: 6000` on a procurement mandate now have that cap defended against concurrent action races + crash-restart, with operator-facing audit signal when an action's executed target diverged from authorization at proposal time).** **PersonaBackend Protocol (#62 โ€” locked at PR 4 with `tests/test_persona_protocol_conformance.py` parametrized across registered backends + `tests/test_persona_filesystem_backend.py` + `tests/test_persona_composition.py` + `tests/test_profile_composition_snapshot.py` + `tests/test_profile_composition_restore.py`, `FilesystemPersonaBackend(personas_root)` reference impl at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json` sidecar (hidden namespace mirrors `.snapshots/`; `list_agents()` skips dot-prefixed entries so personas don't surface as agents), `persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal, group-atomic `save_persona` with `mkdir(exist_ok=False)` for race-free fresh-create + swap-and-delete for `overwrite=True` (20-iteration retry bound on macOS APFS `ENOTEMPTY`), snapshot trio capability flipped `supports_snapshot=False โ†’ True` in PR 3 with nested storage `//.snapshots//` (D-PP-10 โ€” geometric cross-persona isolation: `rm -rf //` removes the persona AND its full history cleanly) + `snap__<12hex>` snapshot IDs matching AgentProfile (D-PP-11 โ€” 48-bit entropy + cross-Protocol uniformity enables a shared `_validate_snapshot_id` path-security guard), `/persona.link.md` (YAML-in-code-block with `kind: shared` + `persona_id` per D-ER-4) is the ownership trigger driving AgentProfileBackend composition via `external_persona_ref(agent_id) -> str | None` (D-PP-3 โ€” supersedes D-ER-1's earlier boolean for cleaner bootstrap-path resolution) so `load_profile` repopulates persona fields + re-derives `agent_mode` (D-PP-4), `save_profile` ignores persona fields when externally owned (D6, mirrors spec/24 Decision 6's `agent_mode` pattern), `snapshot()` + `restore()` drop persona fields with one-time `agent_profile_restore_dropped_persona_fields` warning per `(agent_id, snapshot_id)` via thread-safe per-process dedup (D-PP-13 migration-window event), `PersonaOwnershipConflict` raised on filesystem-backend when both `persona.link.md` and `persona/IDENTITY.md` coexist (D2a + D-PP-8 โ€” filesystem-only loud refusal; SQLite uses silent-drop with the equivalent `agent_profile_save_dropped_persona_fields` event for cross-backend uniformity), SQLite v1โ†’v2 schema migration adds `agents.persona_id` column with forward-only race-loser handling, D-PP-1 sentinel sweep teaches `load_profile/list_agents/exists/list_skills/load_skill_body` about the shared-persona layout (D-PP-12 closed the sweep in PR 3), operator override via `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars OR `AtomicAgent(..., persona_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner/EvalRunner/DreamRunner + `delegate.py` explicit-only threading per D-ER-2 (mirrors Policy's `_policy_backend_was_explicit` precedent at `agent.py:401`; default-resolved backends do not leak the coordinator's `personas_root` to delegates because persona is per-agent semantic context), `atomic-agents persona list / show / snapshot / list-snapshots / restore / clone` CLI (zero LLM calls) catches `PersonaError` subclasses + `OSError` + `PermissionError` cleanly, doctor `check_persona_backend` coherence check with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction, Implementer contract for persona backends documented in spec/33 ยง"Implementer contract for persona backends" (8 normative MUSTs), D5 retires spec/24's `TemplateProfileBackend` reservation โ€” `PersonaCapabilities.supports_templates` is the canonical home for a future persona-template marketplace surface โ€” **closes the shared-persona cliff**: a team running 5 customer-support agents stops maintaining 5 separate `SOUL.md` files that drift; one canonical persona record (`shared:customer-support-v3`) serves all 5 agents with consistent identity, snapshot/restore lifecycle, and operator-editable markdown โ€” home users with one agent running the legacy `/persona/{IDENTITY,SOUL,USER}.md` layout see byte-identical pre-#62 behavior because the legacy layout works forever through AgentProfile's existing filesystem walk).** **CorpusBackend Protocol** (#65, locked at PR 4 with `tests/test_corpus_protocol_conformance.py` parametrized across registered backends + `tests/test_corpus_filesystem_backend.py` + `tests/test_corpus_sqlite_backend.py` + `tests/test_corpus_registry.py` + `tests/test_corpus_composition.py` + `tests/test_corpus_wiring.py` + `tests/test_corpus_migration_regression.py` + `tests/test_corpus_doctor.py`, `FilesystemCorpusBackend(agent_root)` reference impl reading `/wiki/` (distilled knowledge per the Karpathy style) + `/raw/` (operator-ingested source documents) with per-page `_io.atomic_write` safety + `render_index_summary(corpus)` Protocol method that returns the routing INDEX the agent loads at step [7] of the canonical load order per spec/04, `SQLiteCorpusBackend` with FTS5 (stdlib `sqlite3`, no optional extra; hybrid storage shape with metadata in SQL + bodies on disk matching ToolRegistryBackend precedent; WAL journal mode + `PRAGMA busy_timeout=5000` before WAL pragma mirroring the multi-process race fix from #64; FTS5 virtual table for O(log N) indexed full-text query on page bodies + frontmatter titles; cross-agent isolation enforced at the SQL layer via `WHERE agent_scope = ? AND corpus = ?` double discriminator; `BEGIN IMMEDIATE` transaction discipline wrapping the read-validate-UPSERT-FTS sequence in `write_page`; INSERT-first + atomic_write-on-success-only atomicity for hybrid storage half-failure recovery; idempotent `INSERT OR IGNORE` cold-start schema init for multi-replica deployments), page name charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / leading-dot refusal, side-effect-free construction (empty or missing `wiki/` + `raw/` yields zero registrations so all 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no corpus is configured; IRON RULE byte-identity regression suite at `tests/test_corpus_migration_regression.py` pins the contract across 5 explicit assertions covering the wiki INDEX read path and bundle rendering), parametrized conformance suite across both backends pins the Protocol contract so future `PgvectorCorpusBackend` + Postgres adapters register via `register_corpus_backend(...)` without forking core (the semantic-search seam is deferred to the coordinated #258 Postgres-adapter family release so semantic-search coverage stays symmetric across MemoryBackend + CorpusBackend; ROADMAP ยง"Semantic memory retrieval" frames this as the Letta-gap closer), call-site migration: `agent.py:_load_indexes()` routes `wiki/INDEX.md` reads through `corpus_backend.render_index_summary("wiki")` when registered (per spec/04 step [7]; legacy direct-read path catches `OSError` + `UnicodeDecodeError` with logged warning marker for soft-degrade symmetry), `bundle.py:_render_memory_breakpoint` gains a `corpus_backend: CorpusBackend | None = None` parameter threaded three levels through `render_bundle`, with a shared `_render_wiki_index_section(label, path, content)` helper producing byte-identical output between Protocol path and legacy fallback (IRON RULE assertion 4), `bundle.py:_source_paths` migration deferred to v1.1 (filesystem-only function; pinned by the deferral test and tracked at #314), `CorpusBackend` becomes the source of truth for `wiki/` and `raw/` per spec/34 while `MemoryBackend` retains exclusive ownership of `memory/` and `journal/` (spec/24 Decision 7 addendum), operator override via `ATOMIC_AGENTS_CORPUS_BACKEND` + optional `ATOMIC_AGENTS_CORPUS_BACKEND_URL` env vars (when `=sqlite` without URL, defaults to `/.corpus.db` with `agent_scope=quote_plus(agent_root.name)` so single-host operators get a working SQLite default by flipping one env var) OR `AtomicAgent(..., corpus_backend=...)` constructor kwarg + per-runner kwargs on OutcomeRunner (threads at `outcome.py:255`) / EvalRunner (at `eval.py:363`) / DreamRunner (stores as `self._corpus_backend` for API parity; no internal `AtomicAgent` construction site in v1), `delegate.py` explicit-only threading via `_corpus_backend_was_explicit` flag mirroring PersonaBackend D-ER-2 at `agent.py:431` (default-resolved backends do not leak the coordinator's `agent_root` to delegates because corpus is per-agent semantic context, distinct from fleet-scoped Policy + AgentProfile which always thread), `doctor.check_corpus_backend` coherence check with PASS/WARN/FAIL ladder + capability snapshot + page-count performance cliff WARN when `stats().page_count` exceeds 1000 pages on `supports_full_text_search=False` (the WARN hint names `ATOMIC_AGENTS_CORPUS_BACKEND=sqlite` as the remedy, mirroring the LogBackend doctor precedent) + URL credential redaction across operator-facing error paths, `atomic-agents corpus` CLI (`list`/`show`/`query`/`version`/`restore` subcommands, zero LLM calls, env-var-aware), Implementer contract for corpus backends documented in spec/34 ยง"Implementer contract for corpus backends" (9 normative MUSTs covering page name charset validation at API boundary, side-effect-free construction, capability honesty including `embedding_provider=None` invariant, `query()` capability precedence rule, `write_page()` 4-case behavior table, URL credential redaction across operator-facing error paths, cross-corpus isolation at storage layer, snapshot id determinism + cross-page isolation, `backend_id` stability + `close()` idempotency). **Closes the GB-scale wiki cliff**: operators with a 10K-page wiki or hundreds of MB of raw documents stop waiting seconds per keyword grep over an unindexed filesystem; `SQLiteCorpusBackend` with FTS5 delivers O(log N) indexed full-text search at stdlib cost (no Postgres operator burden); future `PgvectorCorpusBackend` arrives via the coordinated #258 release for symmetric semantic retrieval across both substrates. Same agent definitions, same `agent.call()` flow, same audit trail, different corpus substrate. **Eleven backend protocols shipped.** **Next per ROADMAP**: MCPServerRegistry ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201)) protocol, one remaining for v1.0 close. #201 was carved out of #64 via spec/25 Decision 3 (MCP servers are processes; ToolRegistry is functions โ€” they share Protocol-pattern shape but not invocation semantics). A person at home runs filesystem-everything with one agent. An organization runs the same agents over Postgres, behind an HTTP service, with a fleet of orchestrated roles. **Same agent definitions, same call() flow, same audit trail. Different backends.** The spec is the central artifact. The Python package is one conforming reference implementation. Anyone can build agents to the spec without using this code โ€” and eventually, alternate implementations will. @@ -49,7 +49,7 @@ When you can't tell whether a design move helps both โ€” stop, name the tradeoff Mandate โœ… (locked at #124 PR 4) Policy โœ… (locked at #89 PR 4) Persona โœ… (locked at #62 PR 4) - Corpus ๐ŸŸก MCPServerRegistry ๐ŸŸก + Corpus โœ… (locked at #65 PR 4) MCPServerRegistry ๐ŸŸก โ”‚ Storage substrate โ€” swappable Filesystem (today) โ†’ Postgres / pgvector / Redis (later) @@ -207,7 +207,7 @@ uv run pytest # full suite uv run pytest tests/test_.py -v # one module ``` -Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 2,686 tests collected, 2026-05-28). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol โ€” the conformance suite is what keeps the protocol honest. +Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 2,937 tests collected, 2026-06-01). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol โ€” the conformance suite is what keeps the protocol honest. ### Releases + SemVer @@ -292,7 +292,7 @@ If the project ever needs to optimize differently, `docs/methodology.md` is the | Doc | Purpose | |-----|---------| | `docs/architecture.md` | Mental model in diagrams. Read first. | -| `docs/spec/01-...33-persona-backend.md` | Locked spec (32 docs today โ€” 30 locked + 2 drafts at spec/26 (cascade bundle) and spec/30 (responsibility audit)). The product. | +| `docs/spec/01-...33-persona-backend.md` | Locked spec (33 docs today, 31 locked + 2 drafts at spec/26 (cascade bundle) and spec/30 (responsibility audit)). The product. | | `docs/implementation/` | Build guides per runtime (cron, Claude skill, dashboard) | | `docs/deployment/versioning.md`, `upgrading.md` | SemVer + operator runbook | | `docs/deployment/release-runbook.md` | Maintainer `/ship` runbook: two-mode workflow + manual surface check | @@ -341,7 +341,7 @@ These are not forbidden forever โ€” they're explicitly deferred with rationale. ## Status -**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 2,686 tests collected, 2026-05-28). Capability-gated skips fall into four buckets โ€” ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Ten backend protocols shipped**: +**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 2,937 tests collected, 2026-06-01). Capability-gated skips fall into four buckets โ€” ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: - **MemoryBackend** (PR #57) โ€” filesystem reference impl + conformance suite. - **LLMBackend** (#87) โ€” Anthropic + OpenAI + Moonshot reference impls, registered at framework import; conformance suite parametrizes across all three. @@ -354,6 +354,6 @@ These are not forbidden forever โ€” they're explicitly deferred with rationale. - **MandateBackend Protocol** (#124, **locked at PR 4** with `tests/test_mandate_protocol_conformance.py` parametrized across registered backends + `tests/test_mandate_check.py` + `tests/test_mandate_reservations.py` + `tests/test_mandate_filesystem_backend.py` + `tests/test_mandate_integration.py`) โ€” `FilesystemMandateBackend(scope_root)` reference impl: markdown + embedded YAML descriptors at `/mandates.md` (project scope) or `//mandates.md` (agent scope); state at `/.judge-state/mandates.json` via `_io.atomic_write`; refuses path-traversal in `mandate_id` at API boundary; source-hash recomputation on every `load_mandate`; derived-EXPIRED state computed at load time. Only reference impl in v1; future SaaS / mobile / Slack-bot adapters register via `register_mandate_backend(...)` per /office-hours 2026-05-17 Option 2 decision (build the seam upfront, don't retrofit later). `MandateCheck` judge specialist (~730 LOC) implements validation steps 1-9: existence + source-hash binding + state + tool allowlist + target allowlist via per-agent named `TargetExtractorRegistry` (7 built-in heuristic extractors pre-registered at agent construction; MCP tools prefix extracted target with `mcp::`) + time window + token-cost projection with stale-baseline defense (if most-recent matching event's `ts` is before current iteration's start, fall back to `expected_cost_per_call_usd` so stale-baseline drift doesn't compound across multi-iteration runs) + external-cost projection via `CostEstimatorRegistry` fail-closed to spec-stable `mandate_external_cost_unprojectable` BLOCK reason + escalation thresholds with ESCALATE-preempts-BLOCK precedence. Reservation pattern (`MandateReservationManager.create / commit / rollback / _expire` lifecycle with `threading.Timer`-driven TTL watchers + `threading.Lock`-serialized in-process state; `compute_outstanding(log_backend, scope, mandate_id)` four-clause definition โ€” created AND NOT committed/rolled_back/expired/committed_on_recovery AND no cost event with matching `proposal_id` AND age < ttl_s โ€” closes the cost-event-landed-without-_committed window; cost events for mandate-citing actions carry `mandate_id` + `proposal_id` so cumulative budget defense `_sum_prior_token_cost` matches against the right ledger). Crash recovery via `MandateBackend.recover_orphan_reservations(log_backend, scope, *, lock_backend=None)` with `LockBackend.acquire(scope='mandate-recovery:')` scan-inside-lock discipline (pessimistic over-report > silent under-bill โ€” token orphans emit `mandate_reservation_committed_on_recovery`; external orphans emit BOTH `_committed_on_recovery` AND `mandate_reservation_external_unverified` so operators verify in Stripe / vendor via the `atomic-agents mandate reconcile --action {committed|rolled_back}` CLI). Post-action verification event family (`mandate_action_verified` / `mandate_action_diverged` / `mandate_action_verification_unavailable` emitted exactly once per `external_side_effect` / `irreversible` action after cost commit; operator-facing audit signal, NOT a refund mechanism in v1). Suspicious-rebind throttle (60s default; closes the source-hash-before-state edit window for prompt-injection-style threats; persisted on-disk in `MandateBackend.read_state` shape under `throttles` key โ€” in-memory-only forbidden because crash-restart loop would defeat the prompt-injection defense). `mandates.md` parser + `judges.md ## Mandates` operator config with cascade-aware project floor (floor-wins where stricter for safety: longer throttle, "block" beats "escalate") + constraint enforceability discipline (mandates without enforceable constraints AND without `unconstrained: true` + non-empty justification are rejected at load time). Structural write protection: `mandates.md` excluded from default WritePolicy alongside `tools.md` / `judges.md` / `model.md` / `persona/IDENTITY.md` / `persona/SOUL.md` / `persona/USER.md` โ€” even a malicious actor with a write-capable tool cannot grant itself authority; the WritePolicy is the authoritative protection, the `## Only operators grant mandates` discipline is the behavioral story. Operator override via `ATOMIC_AGENTS_MANDATE_BACKEND` env var OR `AtomicAgent(..., mandate_backend=...)` / `OutcomeRunner(..., mandate_backend=...)` / `EvalRunner(..., mandate_backend=...)` / `DreamRunner(..., mandate_backend=...)` constructor kwargs (programmatic path always wins; threads through to internal sub-agents; `delegate.py` deliberately NOT threaded โ€” per-agent scoping per spec/29 + spec/15 delegate isolation). `doctor.check_mandate_backend` validates operator-config coherence. Implementer contract for mandate backends documented in spec/29 ยง"Implementer contract for mandate backends" (8 normative MUSTs covering path-traversal refusal at API boundary, per-scope isolation enforced at storage layer, state persistence via `read_state` / `write_state` Protocol methods (NOT filesystem-path contract), source-hash recomputation per load, lifecycle event emission via `LogBackend.append(record)`, reservation event discriminator shape, pessimistic crash recovery semantics, capability honesty). Operator CLI surface ships with the impl: `atomic-agents mandate list` / `show` / `usage` / `reconcile`. **Closes the durable-authorization cliff**: operators authoring `cumulative_external_usd: 6000` on a procurement mandate now have that cap defended against concurrent action races + crash-restart; post-hoc divergence audits surface when an action's executed target differed from authorization at proposal time; mandate revocation is operator-editable in `mandates.md` with immediate effect on the next agent run. Same agent definitions, same `agent.call()` flow, same audit trail โ€” durable revocable scoped authority for actors that need to handle real money + real external side effects without re-authorization per turn. **The Mandate primitive is orthogonal to the v1.0 Protocol queue** (Corpus / MCPServerRegistry remain after PersonaBackend locked at #62 PR 4; Mandate primitive ships its OWN `MandateBackend` seam from day 1). - **PersonaBackend Protocol** (#62, **locked at PR 4** with `tests/test_persona_protocol_conformance.py` parametrized across registered backends + `tests/test_persona_filesystem_backend.py` + `tests/test_persona_composition.py` + `tests/test_profile_composition_snapshot.py` + `tests/test_profile_composition_restore.py`) โ€” `FilesystemPersonaBackend(personas_root)` reference impl: persona records at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json` sidecar (hidden namespace mirrors `.snapshots/` so `list_agents()` skips dot-prefixed entries and personas don't surface as agents). Only reference impl in v1; future Postgres / SaaS / git adapters register via `register_persona_backend(...)` per the established Protocol-pattern seam. `persona_id` charset `[a-zA-Z0-9_.+@-]+` enforced at API boundary with path-traversal / control-char / newline / leading-dot refusal. Side-effect-free construction (lazy walk on first method call so the 166 existing `AtomicAgent(...)` construction sites stay byte-identical when no `persona.link.md` exists). Group-atomic `save_persona`: `mkdir(exist_ok=False)` claims the persona dir exclusively before any file write for race-free fresh-create (`overwrite=False` losers raise `PersonaExists` WITHOUT touching disk); `overwrite=True` uses swap-and-delete via a sibling temp directory with a 20-iteration retry bound sized for 16-thread contention on macOS APFS `ENOTEMPTY` semantics; PR 1 Round 3 closed an orphan-backup leak via best-effort `shutil.rmtree(backup, ignore_errors=True)`. Snapshot trio (`snapshot` / `restore` / `list_snapshots`) flipped `supports_snapshot=False โ†’ True` in PR 3 with nested storage `//.snapshots//{IDENTITY,SOUL,USER}.md + metadata.json` (D-PP-10 โ€” geometric cross-persona isolation: a snapshot record always resides under its parent persona's directory, so `rm -rf //` removes the persona AND its full history cleanly without an explicit `persona_id` cross-check on the snapshot record). `snap__<12hex>` snapshot ID format with 48-bit `secrets.token_hex(6)` random tail matches AgentProfile spec/24 Implementer Contract #8 (D-PP-11 โ€” cross-Protocol uniformity enables a shared `_validate_snapshot_id` path-security guard; same-second collision probability at 4K snapshots/sec is ~6e-8). `_save_persona_group_atomic` merges backup `.snapshots/` entry-by-entry on `overwrite=True` so a concurrent `snapshot()` racing the persona-dir replace cannot destroy snapshot history (PR 3 Round 1 P1 adversarial โ€” the original single-directory-rename approach lost the full snapshot history under contention). `list_snapshots` defense-in-depth symlink-escape guard via `entry.resolve().relative_to(snapshots_root.resolve())` (PR 3 Round 1 P2 adversarial โ€” matches `restore()`'s confinement check). URL factory `make_filesystem_persona_backend_from_url("filesystem:///path")` handles `filesystem:///absolute/path` URLs and refuses non-filesystem schemes, netloc, fragments, duplicate / unknown query params, and relative paths; credentials redacted from all `ValueError` sites via `_redact_url`. **Composition with AgentProfileBackend (D1 + D3 + D6 + D-PP-13).** `/persona.link.md` is the ownership trigger (YAML in a code block with two scalar fields: `kind: shared` + `persona_id: customer-support-v3` per D-ER-4 โ€” the colon-prefixed single-scalar `shared:customer-support-v3` was rejected at /plan-eng-review because the colon violates D4's `persona_id` charset). `AgentProfileBackend.external_persona_ref(agent_id) -> str | None` (D-PP-3 โ€” supersedes D-ER-1's original boolean signature because the architecturally-right Optional[str] returns the persona_id the framework needs in one Protocol call) gives the bootstrap path the persona_id to look up without importing PersonaBackend. `AgentProfileBackend.load_profile()` repopulates persona fields via `persona_backend.load_persona(persona_id)` and re-derives `agent_mode` from the loaded persona text (D-PP-4 โ€” `agent_mode` is derived from `persona_identity` and would otherwise be stale because the persona fields are empty at `load_profile` return time when externally owned). `save_profile()` ignores `profile.persona_identity / soul / user` when externally owned (D6 โ€” mirrors spec/24 Decision 6's `agent_mode` ignore-on-save pattern; writes go through `persona_backend.save_persona()` only). `snapshot()` drops persona fields when externally owned (persona has its own snapshot history via PersonaBackend). `restore()` drops snapshot's persona fields when restoring a pre-PersonaBackend snapshot (carrying full persona text) into an agent that is NOW externally owned; the framework emits a one-time `agent_profile_restore_dropped_persona_fields` warning per `(agent_id, snapshot_id)` via thread-safe per-process dedup with `threading.Lock`-guarded check-and-add (D-PP-13 migration-window event; the lock-guarded check restores the "exactly once per `(agent_id, snapshot_id)` per process" promise after PR 3 Round 1 P2 adversarial caught the under-lock-or-CAS race). `/persona.link.md` AND `/persona/IDENTITY.md` both present raises `PersonaOwnershipConflict` at filesystem-backend `load_profile()` (D2a + D-PP-8 โ€” filesystem-only loud refusal because two files on disk is a visible operator mistake the framework must surface; SQLite uses silent-drop with the equivalent `agent_profile_save_dropped_persona_fields` event for cross-backend uniformity). SQLite v1โ†’v2 schema migration adds the `agents.persona_id` column via forward-only upgrade routine with explicit race-loser handling (catches `sqlite3.OperationalError "duplicate column name"` then re-reads `schema_version`; the original D1a wording's `INSERT OR IGNORE` pattern was the wrong shape โ€” D-PP-2 corrected to UPDATE+ALTER per Python's `sqlite3` implicit-commit-before-DDL semantics). D-PP-1 sentinel sweep (`_is_agent_dir(agent_root)` predicate admits either `persona/IDENTITY.md` OR `persona.link.md`) updated at `load_profile`, `list_agents`, `exists`, AND extended to `list_skills` + `load_skill_body` in PR 3 (D-PP-12 โ€” externally-owned agents now succeed at skill operations end-to-end; the two missed call sites were a shipped bug from PR 2). **Operator surface.** `atomic-agents persona list / show / snapshot --label "..." / list-snapshots / restore / clone` CLI exposes the full PersonaBackend lifecycle with zero LLM calls; catches `PersonaError` subclasses (including `PersonaNotFound`, `PersonaCorrupted`, `PersonaLinkInvalid`, `PersonaOwnershipConflict`, `PersonaSnapshotNotFound`) + `OSError` + `PermissionError` cleanly with `Error: ` on stderr + exit 1 (PR 3 Round 2 adversarial; previously bare `PersonaError` only). Default backend resolves to `FilesystemPersonaBackend(/.personas)`. Operator override via `ATOMIC_AGENTS_PERSONA_BACKEND` + optional `ATOMIC_AGENTS_PERSONA_BACKEND_URL` env vars OR `AtomicAgent(..., persona_backend=...)` / `OutcomeRunner(..., persona_backend=...)` / `EvalRunner(..., persona_backend=...)` / `DreamRunner(..., persona_backend=...)` constructor kwargs (programmatic path always wins; threads through to internal sub-agents). `delegate.py` threads `persona_backend` ONLY when the operator supplied it explicitly via the constructor kwarg (D-ER-2 โ€” mirrors Policy's `_policy_backend_was_explicit` precedent at `agent.py:401`; default-resolved backends do not leak the coordinator's `personas_root` to delegates because persona is per-agent semantic context; distinct from fleet-scoped Policy + AgentProfile which always thread, matching the Mandate precedent that per-agent isolation is the right shape for delegate-relationship semantics). `doctor.check_persona_backend` validates operator-config coherence with PASS/WARN/FAIL ladder + capability snapshot + URL credential redaction. Implementer contract for persona backends documented in spec/33 ยง"Implementer contract for persona backends" (8 normative MUSTs covering `persona_id` charset validation at API boundary, side-effect-free construction, capability honesty, URL credential redaction in factory `ValueError` sites, group-atomic save with the 20-iteration retry bound + last-writer-wins semantics, snapshot id determinism + cross-persona isolation, `backend_id` property stability, and `snap__<12hex>` snapshot ID format with `metadata.json` schema). D5 retires spec/24's `TemplateProfileBackend` reservation entirely โ€” `PersonaCapabilities.supports_templates` is the canonical home; a future persona-template marketplace (`pip install atomic-personas-starters` or a curated GitHub registry) is a v1.1+ distribution surface that the Protocol seam already accommodates without a forking change. **Closes the shared-persona cliff**: a team running 5 customer-support agents stops maintaining 5 separate `SOUL.md` files that drift; one canonical persona record (`shared:customer-support-v3`) serves all 5 regional agents with consistent identity, versioning, snapshot/restore lifecycle, and operator-editable markdown. Home users with one agent running the legacy `/persona/{IDENTITY,SOUL,USER}.md` layout see byte-identical pre-#62 behavior because the legacy layout works forever through AgentProfile's existing filesystem walk; PersonaBackend reads activate only when an operator explicitly creates a `persona.link.md` shared-reference. Same agent definitions, same `agent.call()` flow, same audit trail, different persona substrate. -MCP client support shipped (PRs #55 + #56). Active backlog covers the remaining two protocols (Corpus / MCPServerRegistry) for v1.0 close. Single-developer project; reference implementation that anyone can use, fork, or extend. +MCP client support shipped (PRs #55 + #56). Active backlog covers the remaining protocol (MCPServerRegistry) for v1.0 close. Single-developer project; reference implementation that anyone can use, fork, or extend. Going forward: **the elegance is the product.** Protect it. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index e5c2487..a62e36d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -44,7 +44,7 @@ Run the full suite before pushing: uv run pytest ``` -2401 tests today; CI runs Python 3.11 + 3.12. New backend protocols add ~25 conformance tests + ~10 implementation-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol. +2937 tests today; CI runs Python 3.11 + 3.12. New backend protocols add ~25 conformance tests + ~10 implementation-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol. ### Review diff --git a/README.md b/README.md index 397ed62..b45bef8 100644 --- a/README.md +++ b/README.md @@ -107,7 +107,7 @@ Honest about what isn't shipped or fully tested: - **Alpha, single maintainer.** Pre-1.0 means Minor releases may contain breaking changes; read release notes before upgrading. - **macOS / Linux primary; Windows under-tested.** `atomic_agents/_locks.py` uses POSIX `fcntl`. iOS can't run the runtime at all (Markdown vault files sync there fine โ€” see [`docs/deployment/obsidian.md`](docs/deployment/obsidian.md)). -- **`MemoryBackend` + `LLMBackend` + `JudgeBackend` + `LockBackend` + `LogBackend` + `AgentProfileBackend` + `ToolRegistryBackend` + `MandateBackend` + `PolicyBackend` are shipped from the protocol roadmap.** Three reference LLM backends (Anthropic, OpenAI direct via `OpenAICompatibleLLMBackend`, Moonshot via the same factory class) all register at framework import; third-party Gemini / Bedrock / Vertex / vLLM-local backends can register without forking core. `LockBackend` ships filesystem + Redis reference impls; `LogBackend` ships filesystem + SQLite; `AgentProfileBackend` ships filesystem + SQLite (with JSON-based snapshot trio + `supports_skills` capability + Implementer contract for future Postgres / git / SaaS-database adapters); `ToolRegistryBackend` ships filesystem + SQLite (with hybrid metadata-in-SQL + handler-bodies-on-disk storage shape + `install` / `uninstall` capability flipped True on SQLite + cross-scope isolation enforced at the SQL layer + Implementer contract for future PyPI / git / company-internal-HTTP / SaaS-database adapters); `PolicyBackend` ships filesystem reference impl reading `/policy.md` (markdown + embedded YAML), with cost-cap MIN composition, tool / MCP / model surfaces enforced by default after PR 4 (set `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP=false` to opt back into log-only mode), `policy_decision` audit event family with `decision_kind` / `axis` discriminators, and Implementer contract for future Postgres / SaaS / org-admin-console adapters. `PersonaBackend` ships filesystem reference impl at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json`, with `persona.link.md` ownership trigger, snapshot trio nested under each persona's directory (`supports_snapshot=True`), `atomic-agents persona` CLI lifecycle, `AgentProfileBackend` composition that drops persona fields when externally owned, and Implementer contract for future Postgres / SaaS / git PersonaBackend adapters. `Corpus` backend is still filesystem-default-only today; its protocol contract comes later. Org-scale deployments today can run filesystem + Redis + SQLite mixed (e.g., SQLite for logs + profiles + tools, Redis for locks); future Postgres adapters slot in via the same Protocol seams. +- **`MemoryBackend` + `LLMBackend` + `JudgeBackend` + `LockBackend` + `LogBackend` + `AgentProfileBackend` + `ToolRegistryBackend` + `MandateBackend` + `PolicyBackend` + `PersonaBackend` + `CorpusBackend` are shipped from the protocol roadmap.** Three reference LLM backends (Anthropic, OpenAI direct via `OpenAICompatibleLLMBackend`, Moonshot via the same factory class) all register at framework import; third-party Gemini / Bedrock / Vertex / vLLM-local backends can register without forking core. `LockBackend` ships filesystem + Redis reference impls; `LogBackend` ships filesystem + SQLite; `AgentProfileBackend` ships filesystem + SQLite (with JSON-based snapshot trio + `supports_skills` capability + Implementer contract for future Postgres / git / SaaS-database adapters); `ToolRegistryBackend` ships filesystem + SQLite (with hybrid metadata-in-SQL + handler-bodies-on-disk storage shape + `install` / `uninstall` capability flipped True on SQLite + cross-scope isolation enforced at the SQL layer + Implementer contract for future PyPI / git / company-internal-HTTP / SaaS-database adapters); `PolicyBackend` ships filesystem reference impl reading `/policy.md` (markdown + embedded YAML), with cost-cap MIN composition, tool / MCP / model surfaces enforced by default after PR 4 (set `ATOMIC_AGENTS_POLICY_ENFORCE_NONCAP=false` to opt back into log-only mode), `policy_decision` audit event family with `decision_kind` / `axis` discriminators, and Implementer contract for future Postgres / SaaS / org-admin-console adapters. `PersonaBackend` ships filesystem reference impl at `/.personas//{IDENTITY,SOUL,USER}.md` + `metadata.json`, with `persona.link.md` ownership trigger, snapshot trio nested under each persona's directory (`supports_snapshot=True`), `atomic-agents persona` CLI lifecycle, `AgentProfileBackend` composition that drops persona fields when externally owned, and Implementer contract for future Postgres / SaaS / git PersonaBackend adapters. `CorpusBackend` ships `FilesystemCorpusBackend` + `SQLiteCorpusBackend` with FTS5 reference impls; `/wiki/` + `/raw/` per-agent corpus; `render_index_summary(corpus)` Protocol method; page-count performance cliff WARN at 1000+ pages on `supports_full_text_search=False` filesystem (with the `ATOMIC_AGENTS_CORPUS_BACKEND=sqlite` remedy hint); `atomic-agents corpus` CLI; operator override via `ATOMIC_AGENTS_CORPUS_BACKEND` env var or `corpus_backend=` constructor kwarg; Implementer contract in spec/34. Org-scale deployments today can run filesystem + Redis + SQLite mixed (e.g., SQLite for logs + profiles + tools, Redis for locks); future Postgres adapters slot in via the same Protocol seams. - **Cost guardrail `alert` action is log-backed today.** The `alert_channel` field is parsed, but external dispatch (Telegram / email / webhook) is not wired up yet. Today's alerts go to the run log; the dashboard surfaces them visually. See [`#70`](https://github.com/dep0we/atomic-agents-stack/issues/70). - **Cross-host locking is shipped via the `LockBackend` Protocol** ([`#60`](https://github.com/dep0we/atomic-agents-stack/issues/60) โ€” locked at PR 4). Default filesystem backend preserves the pre-arc per-host POSIX `fcntl.flock` semantic for single-host deployments; operators on Cloud Run / Kubernetes / gizmo can opt into `RedisLockBackend` via `ATOMIC_AGENTS_LOCK_BACKEND=redis`. Cross-host correctness is now a Protocol-level concern, not an operator burden. - **`__all__` lags behind raised exceptions.** A few public-facing exceptions are raised inside the package but not in `atomic_agents.__all__` yet ([`#99`](https://github.com/dep0we/atomic-agents-stack/issues/99)); documented in `docs/deployment/programmatic.md`. @@ -127,7 +127,7 @@ This is the slot in the AI-agent-tooling landscape `atomic-agents-stack` occupie | **Audit trail** | JSONL per run with `parent_run_id` rollups; helper + delegate + tool + capture lines all link back | Dashboards in Letta UI / cloud | Mem0 dashboards | LangSmith (hosted) | Build it | | **Cost guardrails** | First-class โ€” daily / monthly caps, threshold warnings, fallback action, `critical=True` override, tree-cap across delegates | Per their pricing model | Per their pricing model | Not built into core OSS | Build it | | **Multi-agent coordination** | Role ร— project cascade defined in spec/06 | Multi-agent shared memory blocks | Agent-shared memory pools | LangGraph: graph-based orchestration (more flexible) | Build it | -| **Numbered, locked spec** | 30 locked docs in `docs/spec/` (+ 2 RFCs) | API + concept docs | API + concept docs | API reference + concept docs | None | +| **Numbered, locked spec** | 31 locked docs in `docs/spec/` (+ 2 RFCs) | API + concept docs | API + concept docs | API reference + concept docs | None | | **Reference runtime** | Python, macOS / Linux primary | Python (server) + multi-language clients | Python (OSS) + multi-language clients | Python + JavaScript | Whatever | **Where the alternatives win:** @@ -141,7 +141,7 @@ This is the slot in the AI-agent-tooling landscape `atomic-agents-stack` occupie - **Markdown-source-of-truth, human-editable.** Operators can edit persona / tools / memory from any text editor or Obsidian without a vendor app. - **No required server.** The framework is "files + Python." A complete agent runs on a laptop with zero infrastructure. -- **Spec-level file layout.** 30 numbered docs lock the contract (plus 2 RFCs in progress); conformance is testable; alternate implementations are possible. +- **Spec-level file layout.** 31 numbered docs lock the contract (plus 2 RFCs in progress); conformance is testable; alternate implementations are possible. - **Crash-safe writes by default.** `temp file + fsync + rename + parent-dir fsync` for every mutation; an interrupted run leaves recoverable artifacts, not corruption. - **Cost story is structural, not bolted on.** Daily / monthly caps + tree-cap for delegations + per-call cost reservation for helper batches + a `critical=True` override that's part of the API, not a per-vendor workaround. @@ -180,6 +180,7 @@ Start at [`docs/README.md`](docs/README.md) for the spec entry point. The locked - [31 โ€” LLM backend protocol](docs/spec/31-llm-backend.md) โ€” provider routing; Anthropic + OpenAI + Moonshot reference impls - [32 โ€” Policy backend protocol](docs/spec/32-policy-backend.md) โ€” fleet-wide `policy.md`; cost-cap MIN composition + allowlist enforcement - [33 โ€” PersonaBackend Protocol](docs/spec/33-persona-backend.md) โ€” persona ownership, snapshot/restore, `persona.link.md` format +- [34 โ€” CorpusBackend Protocol](docs/spec/34-corpus-backend.md) โ€” wiki/raw corpus protocol; filesystem + SQLite (FTS5) reference impls; GB-scale indexed full-text search Each spec doc is locked when the implementation matches and tests pass. Spec changes that imply implementation changes get filed as GitHub issues. **Spec docs separate shipped behavior from explicit future / deferred boundaries** โ€” sections that describe behavior not yet implemented are explicitly marked as such, not silently aspirational. @@ -201,10 +202,10 @@ The framework is moving toward swappable backends layer by layer. The shape: a P | `MandateBackend` | โœ… Shipped | Filesystem reference impl; `MandateCheck` specialist + reservation pattern + crash recovery; closes the durable-authorization cliff | [`spec/29`](docs/spec/29-mandates.md) | | `PolicyBackend` | โœ… Shipped | Filesystem reference impl (`policy.md` at project root); cost-cap MIN composition + tool / MCP / model surfaces enforced by default (PR 4 flag flip); unified `policy_decision` audit event family | [`spec/32`](docs/spec/32-policy-backend.md) | | `PersonaBackend` | โœ… Shipped | Filesystem reference impl at `/.personas//`; `persona.link.md` ownership trigger; snapshot trio nested under each persona's directory; `atomic-agents persona` CLI; AgentProfile composition with migration-window restore event | [`spec/33`](docs/spec/33-persona-backend.md) | -| `CorpusBackend` | Planned | Wiki/raw corpus at GB scale + semantic search | [`#65`](https://github.com/dep0we/atomic-agents-stack/issues/65) | +| `CorpusBackend` | โœ… Shipped | Filesystem + SQLite (FTS5) reference impls; per-agent `wiki/` + `raw/`; `render_index_summary(corpus)` Protocol method; closes the GB-scale wiki cliff via O(log N) indexed full-text query | [`spec/34`](docs/spec/34-corpus-backend.md) | | `MCPServerRegistryBackend` | Planned | Catalog + install/audit for MCP servers (MCP equivalent of ToolRegistry) | [`#201`](https://github.com/dep0we/atomic-agents-stack/issues/201) | -**v1 direction:** a home user runs filesystem-everything today. An organization runs the same agent definitions over Postgres / Redis / SQLite-Datadog / behind an HTTP service once the remaining two protocols ship. v1.0 closes when those two land + their conformance suites pin the contract. See [`docs/architecture.md`](docs/architecture.md) for the mental model, [`docs/TENSIONS.md`](docs/TENSIONS.md) for architectural tensions this scaling story has to survive, and [`ROADMAP.md`](ROADMAP.md) for the full backlog beyond v1.0. +**v1 direction:** a home user runs filesystem-everything today. An organization runs the same agent definitions over Postgres / Redis / SQLite-Datadog / behind an HTTP service once the remaining two protocols ship. v1.0 closes when MCPServerRegistry lands + its conformance suite pins the contract. See [`docs/architecture.md`](docs/architecture.md) for the mental model, [`docs/TENSIONS.md`](docs/TENSIONS.md) for architectural tensions this scaling story has to survive, and [`ROADMAP.md`](ROADMAP.md) for the full backlog beyond v1.0. --- @@ -279,8 +280,8 @@ Same pattern for OpenAI (`atomic-agents-openai`) and Moonshot (`atomic-agents-mo ## Repository structure - `atomic_agents/` โ€” the Python package (runtime in `agent.py`; backend protocols in `memory/`, `_llm.py`, `_locks.py`, `_costs.py`, etc.; CLI in `cli.py`; preflight in `doctor.py`) -- `tests/` โ€” 2686 tests collected (2686 passing + 48 skipped), Python 3.11 + 3.12 matrix -- `docs/` โ€” [spec entry point](docs/README.md), [`architecture.md`](docs/architecture.md), [`spec/`](docs/spec/) (30 locked docs + 2 RFCs), [`deployment/`](docs/deployment/) (8 operator runbooks), [`samples/caldwell/`](docs/samples/caldwell/) (complete worked example), [`GOVERNANCE.md`](docs/GOVERNANCE.md), [`TENSIONS.md`](docs/TENSIONS.md), [`methodology.md`](docs/methodology.md) +- `tests/` 2937 tests collected (2889 passing + 48 skipped), Python 3.11 + 3.12 matrix +- `docs/` โ€” [spec entry point](docs/README.md), [`architecture.md`](docs/architecture.md), [`spec/`](docs/spec/) (31 locked docs + 2 RFCs), [`deployment/`](docs/deployment/) (8 operator runbooks), [`samples/caldwell/`](docs/samples/caldwell/) (complete worked example), [`GOVERNANCE.md`](docs/GOVERNANCE.md), [`TENSIONS.md`](docs/TENSIONS.md), [`methodology.md`](docs/methodology.md) - `extras/` โ€” operational templates (Claude Code skill wrappers, macOS LaunchAgent plists, cron examples) --- @@ -310,4 +311,4 @@ Before opening a PR, read [`CLAUDE.md`](CLAUDE.md) (the project's design ethos a ## Status -**v0.13.0, alpha.** Core runtime stable. 2686 tests collected (2686 passing + 48 skipped) on Python 3.11 / 3.12. Ten of twelve backend protocols shipped (see the backend protocols table above); `CorpusBackend` and `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 โ€” Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. +**v0.13.0, alpha.** Core runtime stable. 2937 tests collected (2889 passing + 48 skipped) on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 โ€” Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. diff --git a/ROADMAP.md b/ROADMAP.md index c31aeac..a1a2d00 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,14 +8,13 @@ For shipped work, see [CHANGELOG.md](CHANGELOG.md). For the framework's design s ## v1.0 โ€” remaining backend protocols -The framework's protocol-pattern scaling story closes at v1.0. **Ten of twelve** backend protocols have shipped: `MemoryBackend`, `LLMBackend`, `JudgeBackend`, `LockBackend`, `LogBackend`, `AgentProfileBackend`, `ToolRegistryBackend`, `MandateBackend`, `PolicyBackend`, `PersonaBackend`. Two remain. +The framework's protocol-pattern scaling story closes at v1.0. **Eleven of twelve** backend protocols have shipped: `MemoryBackend`, `LLMBackend`, `JudgeBackend`, `LockBackend`, `LogBackend`, `AgentProfileBackend`, `ToolRegistryBackend`, `MandateBackend`, `PolicyBackend`, `PersonaBackend`, `CorpusBackend`. One remains. | Issue | Backend | Status | What it unblocks | |-------|---------|--------|------------------| -| [#65](https://github.com/dep0we/atomic-agents-stack/issues/65) | `CorpusBackend` | Planned | Wiki/raw corpus at GB scale, semantic search, RAG retrieval | | [#201](https://github.com/dep0we/atomic-agents-stack/issues/201) | `MCPServerRegistryBackend` | Planned | Catalog + install/audit for MCP servers (the MCP equivalent of the ToolRegistry pattern) | -**v1.0 ships when both remaining backends land + their conformance suites pin the contract.** Same agent definitions, same `agent.call()` flow, same audit trail โ€” different backends registered. +**v1.0 ships when the remaining backend lands + its conformance suite pins the contract.** Same agent definitions, same `agent.call()` flow, same audit trail โ€” different backends registered. --- diff --git a/atomic_agents/corpus/__init__.py b/atomic_agents/corpus/__init__.py index 67074dd..2112e55 100644 --- a/atomic_agents/corpus/__init__.py +++ b/atomic_agents/corpus/__init__.py @@ -8,7 +8,7 @@ PolicyBackend (#89, shipped), and PersonaBackend (#62, shipped). See ``docs/spec/34-corpus-backend.md`` for the prose contract. -Public surface (scaffolding PR -- no behavior change today): +Public surface: from atomic_agents.corpus import ( # Protocol contract @@ -31,7 +31,7 @@ Log + Lock + Profile registries it stores backend *classes*, not instances -- corpus backends are constructed per agent scope and the registry's job is to let an operator pick "filesystem vs sqlite vs -pgvector" for a deployment. The caller (``AtomicAgent.__init__`` in PR 3) +pgvector" for a deployment. The caller (``AtomicAgent.__init__``) instantiates the chosen class with its scope-specific args. Thread-safety: registration is expected at import time (one-shot from @@ -159,32 +159,24 @@ def list_corpus_backends() -> list[str]: register_corpus_backend("sqlite", SQLiteCorpusBackend) -# โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ -# PR 3 wiring contract -- PRE-PR-3 state (describes what WILL be wired -# by PR 3; today, in PR 1 scaffolding, none of these call sites exist). -# -# Wired by #65 PR 3: -# 1. ``AtomicAgent.__init__`` accepts ``corpus_backend: -# CorpusBackend | None``; if unset, calls -# ``get_default_corpus_backend(self.agent_root)``. Public -# ``self.corpus_backend`` mirrors ``self.log_backend`` / -# ``self.profile_backend``. -# 2. ``agent.py:2937-2939`` wiki-index read routes through -# ``self.corpus_backend.render_index_summary("wiki")`` instead -# of the raw ``Path.read_text()`` call. -# 3. ``bundle.py:_render_memory_breakpoint`` routes through -# ``corpus_backend.render_index_summary("wiki")``. -# 4. ``DreamRunner``, ``OutcomeRunner``, ``EvalRunner`` accept -# ``corpus_backend=`` kwarg and thread it to the internal -# ``AtomicAgent`` instance. -# 5. ``doctor.check_corpus_backend`` validates operator config and -# reports backend stats (page count cliff WARN at ~1000 pages -# without FTS per plan-eng-review 2026-05-29 finding P1); -# URL-credential-redacted error messages. +# Wiring contract (all items below landed in #65 PR 3, locked at #65 PR 4): +# - AtomicAgent.__init__ accepts the corpus_backend kwarg + resolves the default +# via get_default_corpus_backend(self.agent_root) when not supplied. +# - OutcomeRunner, EvalRunner, DreamRunner all accept corpus_backend per-runner +# kwargs (OutcomeRunner threads at outcome.py:255, EvalRunner at eval.py:363, +# DreamRunner stores as self._corpus_backend for API parity). +# - delegate.py threads corpus_backend ONLY when supplied explicitly via the +# AtomicAgent constructor kwarg (_corpus_backend_was_explicit flag tracking). +# - ATOMIC_AGENTS_CORPUS_BACKEND env var + optional ATOMIC_AGENTS_CORPUS_BACKEND_URL +# resolve via get_default_corpus_backend. +# - doctor.check_corpus_backend lands with PASS/WARN/FAIL ladder + page-count cliff. +# - agent.py:_load_indexes() routes wiki/INDEX.md through render_index_summary("wiki"). +# - bundle.py:_render_memory_breakpoint accepts corpus_backend parameter. # # DEFERRED (intentional): -# - ``SQLiteCorpusBackend`` with FTS5 -- PR 2 scope. -# - Semantic search (pgvector, embedding provider) -- PR 2+ scope. +# - Semantic search (pgvector, embedding provider): ships in the coordinated #258 +# Postgres-adapter family release alongside PgvectorMemoryBackend so semantic- +# search coverage stays symmetric across both substrates. def get_default_corpus_backend(agent_root: Path) -> CorpusBackend: @@ -215,8 +207,7 @@ def get_default_corpus_backend(agent_root: Path) -> CorpusBackend: For programmatic operators who want to construct the backend themselves (custom database connection, custom path, etc.), the - ``AtomicAgent(..., corpus_backend=...)`` constructor kwarg (wired - in PR 3) bypasses this factory entirely. + ``AtomicAgent(..., corpus_backend=...)`` constructor kwarg bypasses this factory entirely. See spec/34 ยง"Operator override surface" for the full env-var reference + the env-var-vs-kwarg trade-off rationale. diff --git a/atomic_agents/corpus/backend.py b/atomic_agents/corpus/backend.py index 2644a28..db654bc 100644 --- a/atomic_agents/corpus/backend.py +++ b/atomic_agents/corpus/backend.py @@ -11,15 +11,8 @@ framework core stays small and alternate storage substrates (SQLite-FTS5, Postgres, pgvector) drop in without forking. -Shipping plan (issue #65): - -- **PR 1** (this file) -- Protocol scaffolding: ``CorpusBackend`` Protocol + - dataclasses (``types.py``) + exception hierarchy (``exceptions.py``) + - ``FilesystemCorpusBackend`` reference impl (``filesystem.py``) + backend - registry (``__init__.py``) + ``spec/34`` RFC + conformance suite. -- **PR 2** -- ``SQLiteCorpusBackend`` with FTS5 + hybrid storage shape. -- **PR 3** -- Wiring through ``AtomicAgent`` + call-site migration. -- **PR 4** -- spec/34 LOCK; RFC banner dropped; Implementer Contract finalized. +All four PRs of issue #65 have shipped. spec/34 is locked. The 9-MUST Implementer +Contract is final. See ``docs/spec/34-corpus-backend.md`` for the normative contract. ``VersionRef`` and ``WritePolicy`` are re-used verbatim from ``atomic_agents/memory/backend.py`` for cross-Protocol uniformity (Premise P7). @@ -154,11 +147,11 @@ def render_index_summary( ``FilesystemCorpusBackend``; raw corpora typically have no INDEX equivalent. The empty-string contract lets callers branch on truthiness the same way they branch on the legacy ``wiki_index.read_text()`` - direct-read pattern, so the PR 3 call-site migration at + direct-read pattern, so the call-site migration at ``agent.py:2937-2939`` and ``bundle.py:_render_memory_breakpoint`` is a single ``if self.corpus_backend is None:`` branch. - This is the primary migration target for ``agent.py:2937-2939`` (PR 3). + This is the primary migration target for ``agent.py:2937-2939``. """ ... diff --git a/atomic_agents/corpus/types.py b/atomic_agents/corpus/types.py index 9e22f91..6f92821 100644 --- a/atomic_agents/corpus/types.py +++ b/atomic_agents/corpus/types.py @@ -1,6 +1,6 @@ """Canonical dataclasses for the CorpusBackend Protocol (spec/34). -Four dataclasses define the corpus substrate contract (issue #65, PR 1 of 4): +Four dataclasses define the corpus substrate contract (issue #65, spec/34): - ``CorpusCapabilities`` -- frozen capability advertisement for a backend instance; conformance tests assert claim-vs-behavior parity. @@ -20,12 +20,9 @@ without creating cross-module import cycles. No Protocol definition lives here. The ``CorpusBackend`` Protocol is in -``atomic_agents/corpus/backend.py`` (PR 1, File 2 of 3). +``atomic_agents/corpus/backend.py``. + -Scaffolding PR (#65 PR 1 of 4): no call site routes through the Protocol -yet. ``AtomicAgent.__init__`` is unchanged. PR 3 wires the bootstrap path; -these types exist so PR 1's ``FilesystemCorpusBackend`` + conformance suite -have a stable contract to build against. """ from __future__ import annotations @@ -66,7 +63,7 @@ class CorpusCapabilities: ``supports_versioning``: True if the snapshot trio (``snapshot`` / ``restore_version`` / ``list_versions``) is fully implemented. ``FilesystemCorpusBackend`` and ``SQLiteCorpusBackend`` - both set this True in PR 1 / PR 2 respectively. + both set this True. ``supports_streaming_iteration``: True if the backend prefers chunked iteration for ``list_pages()`` over in-memory collection. The @@ -193,9 +190,9 @@ class CorpusPage: # โ”€โ”€ Raw-side fields โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ # Per issue #65's stated schema. Wiki pages typically don't carry these; # raw docs typically do. NOTE: no sample raw-doc frontmatter exists in - # docs/samples/caldwell/raw/ (Subagent 2 HIGH H4 -- raw-side field shape - # is a design assumption until real raw sample data is added at PR 1 prep - # or contributed by operators). Accept as provisional for v1.0. + # docs/samples/caldwell/raw/; the raw-side field shape is locked at v1.0 + # against issue #65's stated schema. Operator-contributed raw sample data + # could surface refinements for v1.1. source_url: str | None = None # provenance URL for ingested content mime_type: str | None = None # original document MIME type (e.g. "application/pdf") diff --git a/docs/deployment/programmatic.md b/docs/deployment/programmatic.md index 7a7b45c..e4fefbc 100644 --- a/docs/deployment/programmatic.md +++ b/docs/deployment/programmatic.md @@ -724,15 +724,15 @@ public surface) is internal. Specifically: The protocol pattern from [`spec/20-memory-backend.md`](../spec/20-memory-backend.md) is the -template every backend follows. Ten backend protocols have shipped +template every backend follows. Eleven backend protocols have shipped (`MemoryBackend`, `LLMBackend`, `JudgeBackend`, `LockBackend`, `LogBackend`, `AgentProfileBackend`, `ToolRegistryBackend`, -`MandateBackend`, `PolicyBackend`, `PersonaBackend`); two remain for -v1.0 (`CorpusBackend`, `MCPServerRegistryBackend`). Each ships with a +`MandateBackend`, `PolicyBackend`, `PersonaBackend`, `CorpusBackend`); +one remains for v1.0 (`MCPServerRegistryBackend`). Each ships with a filesystem reference impl, a conformance suite, an operator override env var + constructor kwarg, a `doctor.check_` coherence check, and a numbered spec doc (`spec/20`, `21`, `22`, `24`, `25`, -`28`, `29`, `31`, `32`, `33`). Future SaaS / Postgres / git adapters +`28`, `29`, `31`, `32`, `33`, `34`). Future SaaS / Postgres / git adapters register via the matching `register__backend()` entry point without forking core. diff --git a/docs/methodology.md b/docs/methodology.md index 1a1d9fd..4b94fc4 100644 --- a/docs/methodology.md +++ b/docs/methodology.md @@ -9,10 +9,10 @@ The shape of the project so far (snapshot at the time of original capture, 2026-05-09): 4 published tags (v0.1.0 retroactive, v0.9.0 retroactive, v0.10.0, v0.13.0), ~70 merged PRs, ~1327 tests, no production rollback events. Three backend protocols shipped at that point (MemoryBackend, -LLMBackend, JudgeBackend); today ten are shipped (MemoryBackend, +LLMBackend, JudgeBackend); today eleven are shipped (MemoryBackend, LLMBackend, JudgeBackend, LockBackend, LogBackend, AgentProfileBackend, -ToolRegistryBackend, MandateBackend, PolicyBackend, PersonaBackend) with -parametrized conformance suites and 2686+ tests โ€” see the empirical +ToolRegistryBackend, MandateBackend, PolicyBackend, PersonaBackend, +CorpusBackend) with parametrized conformance suites and 2937+ tests โ€” see the empirical record table below for arc-by-arc evidence of how the methodology held across them. diff --git a/docs/spec/01-anatomy.md b/docs/spec/01-anatomy.md index 2d7942e..61424cf 100644 --- a/docs/spec/01-anatomy.md +++ b/docs/spec/01-anatomy.md @@ -370,6 +370,8 @@ sources: `raw/` holds original ingested documents โ€” unedited. The Wiki is *derivative*; raw is *primary*. Lets you re-derive the wiki, audit claims, or detect drift. +**CorpusBackend (spec/34).** When registered, `CorpusBackend` becomes the storage abstraction for `wiki/` and `raw/`. The filesystem default (`FilesystemCorpusBackend`) preserves the layout described above byte-for-byte. `SQLiteCorpusBackend` with FTS5 provides O(log N) indexed full-text query for operators with large corpora (10K+ pages or hundreds of MB of raw documents). See spec/34 for the Protocol contract and the 9-MUST Implementer Contract. + --- ## journal/ diff --git a/docs/spec/02-atomic-memory.md b/docs/spec/02-atomic-memory.md index a7c5ecf..6b76841 100644 --- a/docs/spec/02-atomic-memory.md +++ b/docs/spec/02-atomic-memory.md @@ -61,6 +61,8 @@ They differ in three ways that matter: If you collapse them into one pool, the agent can't tell the difference between "I learned this from the user" and "I read this in a book." That distinction matters at advice-time. +**CorpusBackend (spec/34).** The `wiki/` and `raw/` layers are abstracted by `CorpusBackend` when registered (locked at #65 PR 4 of 4). `MemoryBackend` (spec/20) retains exclusive ownership of `memory/` and `journal/`; `CorpusBackend` owns `wiki/` and `raw/`. They compose at prompt assembly: `agent.py:_load_indexes()` reads from both. See spec/34 for the Protocol contract. + --- ## INDEX-driven recall (the load-bearing trick) diff --git a/docs/spec/04-runtime-assembly.md b/docs/spec/04-runtime-assembly.md index 30c8ad9..945084e 100644 --- a/docs/spec/04-runtime-assembly.md +++ b/docs/spec/04-runtime-assembly.md @@ -25,6 +25,8 @@ Every Atomic Agent runtime โ€” cron job, Claude skill, openclaw gateway, anythin This order is **mandatory**. Every runtime must follow it. The order is what gives Atomic Agents their cross-runtime equivalence โ€” the same agent behaves the same way whether driven by cron or skill or openclaw because the system prompt is built identically. +**CorpusBackend integration (spec/34).** When `CorpusBackend` is registered (locked at #65 PR 4 of 4), step [7] routes through `corpus_backend.render_index_summary("wiki")` instead of a bare `Path.read_text()` on `wiki/INDEX.md`. The assembled context is identical; the Protocol seam makes the storage substrate swappable. Operators with `ATOMIC_AGENTS_CORPUS_BACKEND=sqlite` see the same INDEX rendering at O(log N) query cost instead of unindexed filesystem scan. + **Step [3.5] โ€” goal context**: For reactive agents (most agents), this step is skipped. For goal-driven agents, the active `goal.md` is loaded between persona and tools/memory โ€” placed there so the goal becomes part of the agent's "anchored context" that shapes everything below it. For hybrid agents, the runtime decides per invocation: skill triggers skip step [3.5] (reactive mode); cron triggers load it (goal-driven mode). See [12-goals-and-intent](12-goals-and-intent.md) for the full goal-driven mechanics. --- diff --git a/docs/spec/24-agent-profile-backend.md b/docs/spec/24-agent-profile-backend.md index d859a70..64e5d3a 100644 --- a/docs/spec/24-agent-profile-backend.md +++ b/docs/spec/24-agent-profile-backend.md @@ -120,7 +120,7 @@ For cascaded agents (`/projects//agents//`), `FilesystemA `_load_indexes()` (`agent.py:2031`) reads `wiki/INDEX.md` for system-prompt assembly, but this is memory-layer state, not identity-layer config. Memory is already abstracted via `MemoryBackend` (spec/20). -**Why:** scope discipline. `AgentProfile` is the identity/config layer; including wiki content would conflate config with memory state and create two backends with overlapping write responsibilities. Memory backend stays the source of truth for `wiki/`, `memory/`, `journal/`. Profile backend stays for `persona/`, config files, skills. +**Why:** scope discipline. `AgentProfile` is the identity/config layer; including wiki content would conflate config with memory state and create two backends with overlapping write responsibilities. Memory backend stays the source of truth for `memory/` and `journal/`. Profile backend stays for `persona/`, config files, skills. When `CorpusBackend` is registered (spec/34, locked at #65 PR 4 of 4), it becomes the source of truth for `wiki/` and `raw/`; `MemoryBackend` retains exclusive ownership of `memory/` and `journal/`. ## Canonical types diff --git a/docs/spec/26-cascade-bundle.md b/docs/spec/26-cascade-bundle.md index 0ad3c50..3dad39d 100644 --- a/docs/spec/26-cascade-bundle.md +++ b/docs/spec/26-cascade-bundle.md @@ -91,7 +91,7 @@ Bundles are derived state. Per CLAUDE.md rule #2, protocols exist for storage pr **Future composition.** When `PersonaBackend` (#62) ships, the persona section of the bundle should route through it instead of reading the filesystem directly. That's a small follow-up: replace the persona-file reads in `bundle._render_cascaded` with `persona_backend.load_persona(agent_id)` calls. The bundle's public interface (`render_bundle`, `BundleResult`, the cache layout, the CLI flags) stays stable. Bundle file format stays stable. -Similarly, when `CorpusBackend` ships (#65), wiki INDEX rendering can route through it. None of these compositions require a bundle protocol โ€” they're implementation-internal edits to `bundle.py`. +Similarly, now that `CorpusBackend` has shipped (spec/34, locked at #65 PR 4 of 4), wiki INDEX rendering routes through `corpus_backend.render_index_summary("wiki")` per the `bundle.py` composition note in spec/34. None of these compositions require a bundle protocol โ€” they're implementation-internal edits to `bundle.py`. The non-decision: we are NOT creating `BundleBackend` for alternate cache substrates (SQLite-backed bundle storage, S3-backed, etc.). The bundle cache is local-disk-per-operator-machine by design (Decision 1). @@ -242,7 +242,7 @@ The bundle contains your full cascade in canonical spec/04 + spec/06 order. | Future protocol | What it changes in the bundle | What stays | | --- | --- | --- | | `PersonaBackend` (#62) | Persona section reads route through `persona_backend.load_persona(agent_id)` instead of direct filesystem reads | Bundle format, CLI flags, cache layout, section ordering | -| `CorpusBackend` (#65) | Wiki INDEX rendering routes through the corpus backend's INDEX accessor | Bundle format | +| `CorpusBackend` (spec/34, locked at #65 PR 4) | Wiki INDEX rendering routes through the corpus backend's `render_index_summary("wiki")` accessor | Bundle format | | `PolicyBackend` (#89) | Project `policy/*` concatenation routes through the policy backend | Bundle format | | `MCPServerRegistryBackend` (#201) | Not relevant โ€” bundle doesn't include MCP server registration content | | diff --git a/docs/spec/31-llm-backend.md b/docs/spec/31-llm-backend.md index 0713f4a..87ec003 100644 --- a/docs/spec/31-llm-backend.md +++ b/docs/spec/31-llm-backend.md @@ -8,7 +8,7 @@ `atomic_agents/_llm.py` used procedural dispatch (`if model.startswith(...)`) for LLM provider routing. Adding a fourth provider (Gemini, Bedrock, Ollama) meant editing core. The `LLMBackend` Protocol replaces that with the same protocol-pattern shape PR #57 established for `MemoryBackend`: a Protocol contract, canonical types that decouple the agent from provider shapes, a registry, and reference implementations. -The shape is the same as the rest of the protocol-pattern series alongside Lock, Log, Persona, AgentProfile, ToolRegistry, and Corpus protocols. The agent runtime โ€” `agent.call()`, the cost gates, the multi-turn tool loop โ€” talks to LLM providers only through canonical types. Backends translate at their own boundaries. Third-party packages implementing the Protocol drop in without forking core. +The shape is the same as the rest of the protocol-pattern series alongside Lock, Log, Persona, AgentProfile, ToolRegistry, and Corpus (spec/34) protocols. The agent runtime โ€” `agent.call()`, the cost gates, the multi-turn tool loop โ€” talks to LLM providers only through canonical types. Backends translate at their own boundaries. Third-party packages implementing the Protocol drop in without forking core. The framework ships three reference backends. Operators wanting Gemini, Vertex, Bedrock, vLLM-local, etc., either configure a fourth `OpenAICompatibleLLMBackend` instance or ship a 200-line third-party `atomic-agents-` package satisfying the Protocol. The framework's own surface stays small and auditable. diff --git a/docs/spec/34-corpus-backend.md b/docs/spec/34-corpus-backend.md index 851979f..ce625f9 100644 --- a/docs/spec/34-corpus-backend.md +++ b/docs/spec/34-corpus-backend.md @@ -1,12 +1,6 @@ # spec/34 โ€” CorpusBackend Protocol -> **Status:** RFC (Request for Comments) โ€” ships with PR 1 of issue #65. Will be LOCKED at PR 4 after operator-facing surfaces ship. -> -> **Shipping plan across the arc:** -> - **PR 1** โ€” Protocol scaffolding + dataclasses + exception hierarchy + `FilesystemCorpusBackend` reference impl + versioning layout + backend registry + this RFC spec (draft Implementer Contract with provisional MUSTs) + ~25 parametrized conformance tests + ~10 fs-specific tests + `atomic-agents corpus` CLI subcommands (filesystem path only). -> - **PR 2** โ€” `SQLiteCorpusBackend` with FTS5 (SQLite full-text-search, stdlib) + hybrid storage shape (metadata in SQL, bodies on disk) + URL factory + WAL race fix + ~35 SQLite-specific tests. CLI SQLite path activates here. -> - **PR 3** โ€” Wiring through `AtomicAgent`, per-runner kwargs on `OutcomeRunner` / `EvalRunner` / `DreamRunner`, delegate threading, `ATOMIC_AGENTS_CORPUS_BACKEND` env var + constructor kwarg, `doctor.check_corpus_backend` with PASS/WARN/FAIL ladder + page-count performance cliff WARN, call-site migration (`agent.py:2937-2939` + `bundle.py:_render_memory_breakpoint`), IRON RULE regression suite for all 5 migration assertions. ~30 wiring tests. -> - **PR 4** โ€” spec/34 LOCK (RFC banner dropped, Implementer Contract finalized at N MUSTs, per-PR markers folded). Status flip to "Eleven of twelve backend protocols shipped." CLAUDE.md architecture diagram + 11th lock-paragraph. README backend-protocols table row flipped to Shipped. Both ROADMAPs refreshed (repo-root + vault). CHANGELOG arc-closer entry. +> **Status:** LOCKED. CorpusBackend is the eleventh backend protocol locked in the atomic-agents-stack series. --- @@ -14,7 +8,7 @@ `CorpusBackend` is the **eleventh** open Protocol in the protocol-pattern series (Memory, LLM, Judge, Lock, Log, AgentProfile, ToolRegistry, Mandate, Policy, Persona, **Corpus**). It abstracts `/wiki/` (Atomic Wiki โ€” distilled knowledge in the Karpathy style) and `/raw/` (source documents โ€” PDFs, transcripts, operator-ingested content) behind a Protocol so the framework's core stays small and alternate storage substrates (SQLite-FTS5, Postgres, pgvector) drop in without forking. -`AtomicAgent` exposes `agent.corpus: CorpusBackend`. Call-site code stops touching wiki or raw paths directly once PR 3 lands. +`AtomicAgent` exposes `agent.corpus: CorpusBackend`. Call-site code stops touching wiki or raw paths directly. **The problem this closes.** Today, `agent.py:2937-2939` reads `wiki/INDEX.md` with a bare `Path.read_text()`. `bundle.py:295-303 + 497-510` does the same for rendering. There is no Protocol between the agent and the corpus. For the home user with one agent and a handful of wiki pages, the direct walk is fine. For the operator with a 10K-page wiki or hundreds of MB of raw documents, keyword grep over an unindexed filesystem takes seconds per query. The GB-scale unlock is `SQLiteCorpusBackend` with FTS5: `O(log N)` indexed full-text query, stdlib dependency, no Postgres operator burden. @@ -42,7 +36,7 @@ | D10 | `query()` precedence rule | Semantic MUST win over FTS when both flags are True. Explicitly documented (finding A2). Prevents ambiguous behavior on Postgres backends that have both pgvector AND tsquery. | | D11 | `write_page()` 4-case behavior table | Fresh write / content-identical idempotent no-op / explicit overwrite via CAS / collision raises. Mirrors MemoryBackend `write_note` idempotency + CAS discipline. Finding CQ1 from /plan-eng-review 2026-05-29. | | D12 | `read_page` returns `None`, `read_version` raises | `read_page(name, corpus) -> CorpusPage | None` is the common-path "does this page exist?" query. `read_version(version_ref) -> CorpusPage` raises `CorpusVersionNotFound` because a missing version body indicates an unexpected infrastructure failure (SQL row exists but on-disk body file is gone under the hybrid storage shape), not a routine presence check. Matches MemoryBackend precedent: `read_note` returns None; `read_version` raises. | -| D13 | `bundle.py:_source_paths` migration deferred to v1.1 | The function returns filesystem paths for staleness tracking. SQLite backends synthesize the INDEX from page metadata and have no equivalent path to return. v1.0 keeps the direct path check. Follow-up issue filed at PR 4. | +| D13 | `bundle.py:_source_paths` migration deferred to v1.1 | The function returns filesystem paths for staleness tracking. SQLite backends synthesize the INDEX from page metadata and have no equivalent path to return. v1.0 keeps the direct path check. Follow-up issue filed at #314. | --- @@ -56,8 +50,6 @@ atomic_agents/corpus/ โ”œโ”€โ”€ types.py # CorpusCapabilities, CorpusRef, CorpusPage, CorpusStats โ”‚ # VersionRef + WritePolicy re-used from MemoryBackend โ””โ”€โ”€ filesystem.py # FilesystemCorpusBackend (default reference impl) - -# SQLite ships in PR 2: โ””โ”€โ”€ sqlite.py # SQLiteCorpusBackend (FTS5 reference impl) ``` @@ -236,7 +228,7 @@ class CorpusBackend(Protocol): # exists but has no INDEX.md analog (raw corpora typically have no # INDEX; the empty-string contract lets callers branch on truthiness # the same way they branch on the legacy direct-read pattern). - # This is the primary migration target for agent.py:2937-2939 (PR 3). + # This is the primary migration target for agent.py:2937-2939. # โ”€โ”€โ”€ Write operations โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ @@ -393,7 +385,7 @@ Versioning mirrors MemoryBackend's `.versions/` pattern (spec/20:228-233). Cross `_<8hex>` snapshot file name format matches `memory/filesystem.py`'s `_version_filename` helper verbatim (prep-pass checklist item 6). `FilesystemCorpusBackend` MUST copy that helper, not reinvent it. -**SQLite hybrid layout (PR 2):** +**SQLite hybrid layout:** SQL stores metadata and FTS5 index. Page bodies and version snapshots live on disk under `///` following the same path structure as the filesystem reference. This keeps the SQL store small and version snapshot listing to a simple `glob` without a JOIN. The INSERT-first + `atomic_write`-on-success-only atomicity pattern from ToolRegistryBackend (spec/25, #64 PR 3 precedent) prevents orphan SQL rows on disk-write failure. @@ -488,13 +480,13 @@ When `ATOMIC_AGENTS_CORPUS_BACKEND=sqlite` is set without a URL, the default res The `AtomicAgent(..., corpus_backend=...)` constructor kwarg always wins over the env var (programmatic path beats environment). -**Per-runner kwargs (PR 3 -- implemented):** +**Per-runner kwargs:** -`OutcomeRunner`, `EvalRunner`, and `DreamRunner` accept `corpus_backend=...` constructor kwargs that thread through to internal sub-agents. Implemented in #65 PR 3 of 4: `OutcomeRunner` threads at `outcome.py:255`, `EvalRunner` at `eval.py:363`, `DreamRunner` stores as `self._corpus_backend` (no internal `AtomicAgent` construction site in v1). +`OutcomeRunner`, `EvalRunner`, and `DreamRunner` accept `corpus_backend=...` constructor kwargs that thread through to internal sub-agents. `OutcomeRunner` threads at `outcome.py:255`, `EvalRunner` at `eval.py:363`, `DreamRunner` stores as `self._corpus_backend` (no internal `AtomicAgent` construction site in v1). -**`delegate.py` threading (PR 3 -- implemented):** +**`delegate.py` threading:** -`delegate.py` threads `corpus_backend` ONLY when the operator supplied it explicitly via the `AtomicAgent(..., corpus_backend=...)` kwarg (`_corpus_backend_was_explicit` flag tracked at `agent.py` construction). Default-resolved backends do not leak the coordinator's `agent_root` to delegates. Mirrors PersonaBackend's `D-ER-2` pattern (spec/33 ยง"`delegate.py` threading"). Corpus is per-agent semantic context -- distinct from fleet-scoped Policy + AgentProfile, which always thread. Operators who want a shared corpus backend across a coordinator and its delegates pass `corpus_backend=` explicitly. Implemented in #65 PR 3 of 4. +`delegate.py` threads `corpus_backend` ONLY when the operator supplied it explicitly via the `AtomicAgent(..., corpus_backend=...)` kwarg (`_corpus_backend_was_explicit` flag tracked at `agent.py` construction). Default-resolved backends do not leak the coordinator's `agent_root` to delegates. Mirrors PersonaBackend's `D-ER-2` pattern (spec/33 ยง"`delegate.py` threading"). Corpus is per-agent semantic context -- distinct from fleet-scoped Policy + AgentProfile, which always thread. Operators who want a shared corpus backend across a coordinator and its delegates pass `corpus_backend=` explicitly. --- @@ -569,7 +561,7 @@ All page writes and all version snapshot writes go through `_io.atomic_write`. N --- -## `SQLiteCorpusBackend` storage layout (PR 2) +## `SQLiteCorpusBackend` storage layout ```python SQLiteCorpusBackend( @@ -682,27 +674,27 @@ Implementation checklist (sourced from prep-pass SEVERE S1): A backend that implements the `CorpusBackend` Protocol commits to the contract below. The reference `FilesystemCorpusBackend` is the canonical example; `SQLiteCorpusBackend` is the second reference impl; future Postgres / pgvector / SaaS adapters slot in via `register_corpus_backend(...)` without forking core. -**This contract is provisional at PR 1 RFC and will be finalized at PR 4 LOCK.** The MUST count follows the 7-8 MUST range of prior arcs (spec/22: 7, spec/24: 8, spec/25: 8, spec/29: 8, spec/32: 7, spec/33: 8). The nine categories enumerated here map to the finalized MUSTs at PR 4. +The MUST count follows the 7-8 MUST range of prior arcs (spec/22: 7, spec/24: 8, spec/25: 8, spec/29: 8, spec/32: 7, spec/33: 8). CorpusBackend locks at 9 because the FTS5 / semantic / substring `query()` precedence rule is an additional cross-cutting contract that prior arcs without a capability-gated query path did not require. Implementers MUST: -1. **`name` and `corpus` charset validation at API boundary.** Every Protocol method that accepts `name` validates it against `[a-zA-Z0-9_.+@-]+` BEFORE any storage or dict access. Reject path-traversal tokens (`..`, `/`, `\`), control characters (`\x00`-`\x1f`, `\x7f`), newlines, leading dots, and empty strings โ€” raise `CorpusInvalidName`. Every method that accepts `corpus` validates `corpus in ("wiki", "raw")` โ€” raise `ValueError` otherwise. The validation is at the API boundary, not inside storage helpers; callers that bypass it violate the contract. Reference: `_validate_corpus_name` + `_validate_corpus_type` in `corpus/filesystem.py`. +1. **`name` and `corpus` charset validation at API boundary.** Every Protocol method that accepts `name` validates it against `[a-zA-Z0-9_.+@-]+` BEFORE any storage or dict access. Reject path-traversal tokens (`..`, `/`, `\`), control characters (`\x00`-`\x1f`, `\x7f`), newlines, leading dots, and empty strings; raise `CorpusInvalidName`. Every method that accepts `corpus` validates `corpus in ("wiki", "raw")`; raise `ValueError` otherwise. The validation is at the API boundary, not inside storage helpers; callers that bypass it violate the contract. Reference: `_validate_corpus_name` + `_validate_corpus_type` in `corpus/filesystem.py`. -2. **Side-effect-free construction.** Backend `__init__` MUST NOT stat the filesystem, query a database, call an external API, or read any environment variable. The first method call performs lazy initialization. Malformed operator config surfaces on the first method call, not at construction. Preserves the framework's byte-identical-construction promise for the 166 existing `AtomicAgent(...)` test sites. Profile's "validate existence" pattern is the WRONG precedent here โ€” corpus directories may legitimately not exist on fresh agents. +2. **Side-effect-free construction.** Backend `__init__` MUST NOT stat the filesystem, query a database, call an external API, or read any environment variable. The first method call performs lazy initialization. Malformed operator config surfaces on the first method call, not at construction. Preserves the framework's byte-identical-construction promise for the existing `AtomicAgent(...)` test sites. Profile's "validate existence" pattern is the WRONG precedent here; corpus directories may legitimately not exist on fresh agents. -3. **Capability honesty.** `capabilities -> CorpusCapabilities` is a contract, not a hint. Backends declaring `supports_versioning=False` MUST raise `NotImplementedError` on `list_versions`, `read_version`, `restore_version`, and `snapshot`. Backends declaring `supports_full_text_search=True` MUST use indexed FTS (FTS5 / tsquery / equivalent) in `query()` when `supports_semantic_search` is False. Backends that advertise a flag True but do not implement the corresponding behavior produce silent failures rather than loud refusals; conformance tests gate on capability flags. +3. **Capability honesty.** `capabilities -> CorpusCapabilities` is a contract, not a hint. Backends declaring `supports_versioning=False` MUST raise `NotImplementedError` on `list_versions`, `read_version`, `restore_version`, and `snapshot`. Backends declaring `supports_full_text_search=True` MUST use indexed FTS (FTS5 / tsquery / equivalent) in `query()` when `supports_semantic_search` is False. `embedding_provider` MUST be `None` when `supports_semantic_search=False`. Backends that advertise a flag True but do not implement the corresponding behavior produce silent failures rather than loud refusals; conformance tests gate on capability flags. -4. **`query()` capability precedence rule (finding A2).** When `supports_semantic_search=True`, the backend MUST use embedding-vector cosine match and MUST NOT exercise FTS infrastructure on that code path, even if `supports_full_text_search` is also True. When `supports_semantic_search=False` and `supports_full_text_search=True`, the backend MUST use indexed full-text search. When both are False, the backend MUST fall back to case-insensitive substring + frontmatter-tag match, ordered by match count. No caller-choice override in v1.0; v1.1+ may add a `mode` kwarg if operators surface a need. +4. **`query()` capability precedence rule.** When `supports_semantic_search=True`, the backend MUST use embedding-vector cosine match and MUST NOT exercise FTS infrastructure on that code path, even if `supports_full_text_search` is also True. When `supports_semantic_search=False` and `supports_full_text_search=True`, the backend MUST use indexed full-text search. When both are False, the backend MUST fall back to case-insensitive substring + frontmatter-tag match, ordered by match count. No caller-choice override in v1.0; v1.1+ may add a `mode` kwarg if operators surface a need. -5. **`write_page()` 4-case behavior table (finding CQ1).** (a) Fresh write: page does not exist โ€” write via `_io.atomic_write`, update INDEX-equivalent. (b) Content-identical idempotent no-op: page exists, body + frontmatter SHA-256 unchanged โ€” no-op, safe for re-delivery. (c) Explicit overwrite via CAS: page exists, content differs, `expected_content_sha256` matches current hash โ€” snapshot if `supports_versioning=True`, write new content, update INDEX-equivalent. (d) Collision: page exists, content differs, `expected_content_sha256` is None โ€” raise `CorpusPageExists`; content differs, hash supplied but mismatched โ€” raise `CorpusPreconditionFailed`. CAS via `expected_content_sha256` is the ONLY safe overwrite path; silent overwrite without operator intent is refused by default. +5. **`write_page()` 4-case behavior table.** (a) Fresh write: page does not exist, write via `_io.atomic_write`, update INDEX-equivalent. (b) Content-identical idempotent no-op: page exists, body + frontmatter SHA-256 unchanged, no-op, safe for re-delivery. (c) Explicit overwrite via CAS: page exists, content differs, `expected_content_sha256` matches current hash, snapshot if `supports_versioning=True`, write new content, update INDEX-equivalent. (d) Collision: page exists, content differs, `expected_content_sha256` is None, raise `CorpusPageExists`; content differs, hash supplied but mismatched, raise `CorpusPreconditionFailed`. CAS via `expected_content_sha256` is the ONLY safe overwrite path; silent overwrite without operator intent is refused by default. -6. **URL credential redaction across all `ValueError` sites in factory functions.** URL factories and `get_default_corpus_backend` error paths MUST NOT echo raw URL credentials. The reference uses `_redact_url` from `persona/filesystem.py:177-197` that strips after `://` and truncates. Operators may accidentally paste `postgres://user:password@host/db` into env vars; the error message MUST NOT echo the password. Reference: 6 `ValueError` sites in `make_sqlite_corpus_backend_from_url` (non-sqlite scheme, netloc present, fragment present, duplicate query parameter, unknown query parameter, empty or root-only path) following the ToolRegistryBackend precedent. +6. **URL credential redaction across all operator-facing error paths.** URL factories, `get_default_corpus_backend`, and `doctor.check_corpus_backend` error paths MUST NOT echo raw URL credentials. The reference impls use `_redact_url` (filesystem factory) and `_redact_for_error_message` (corpus/__init__.py) helpers that strip credentials after `://` and truncate. Operators may accidentally paste `postgres://user:password@host/db` into env vars; the error message MUST NOT echo the password. The SQLite URL factory covers 6 `ValueError` sites (non-sqlite scheme, netloc present, fragment present, duplicate query parameter, unknown query parameter, empty or root-only path), all redacted. 7. **Cross-corpus isolation at storage layer.** `wiki` and `raw` corpora are fully independent. SQLite backends MUST include `WHERE corpus = ?` (or equivalent) on every query; no `wiki` query may touch `raw` rows and vice versa. Filesystem backends enforce isolation geometrically via separate subdirectories. A conformance test verifies that writing a page to `corpus="wiki"` does not make it visible via `corpus="raw"` and vice versa. 8. **Snapshot id determinism and cross-page isolation.** Backend-issued `VersionRef` tokens MUST be monotonic or sortable, supporting `list_versions` returning chronological order. Cross-page isolation MUST be enforced at the storage layer: a `VersionRef` issued for page A MUST raise `CorpusVersionNotFound` when passed to `read_version` or `restore_version` in the context of page B. The `_<8hex>.md` filesystem version filename format (from `memory/filesystem.py:_version_filename`) provides this guarantee geometrically via the `/` subdirectory. -9. **`backend_id` property stable across calls.** The property MUST return the same string across calls and MUST match what `list_corpus_backends()` registered the class under. Backends with `backend_id="filesystem"` re-registered under `"sqlite"` violate conformance. +9. **`backend_id` property stable across calls; `close()` idempotent.** The `backend_id` property MUST return the same string across calls and MUST match what `list_corpus_backends()` registered the class under. Backends with `backend_id="filesystem"` re-registered under `"sqlite"` violate conformance. `close()` is a method-level contract documented at `corpus/backend.py:388-389` and MUST be idempotent: calling it twice MUST NOT raise. Database backends that hold connection pools MUST guard teardown on a `_closed` flag or equivalent. --- @@ -725,7 +717,7 @@ Decisions locked across the office-hours session (2026-05-29), /plan-eng-review | S4 | Regression suite for PR 3 | Zero test coverage exists today on the `wiki/INDEX.md` read path; all 9 existing wiki-touching integration tests create an empty `wiki/` dir and assert nothing about INDEX content. PR 3 IRON RULE regression suite is load-bearing for silent-corruption prevention. | | S5 | `atomic_write` non-negotiable | Every page write + version snapshot write via `_io.atomic_write`. Never `target.write_text()` directly. | | D-RC-1 | `read_page` vs `read_version` None/raise convention | `read_page` returns `None` (routine presence check). `read_version` raises `CorpusVersionNotFound` (unexpected infrastructure failure โ€” SQL row exists but body file gone). Matches MemoryBackend's `read_note` vs `read_version` convention. Documented in Protocol contract so conformance test authors don't disagree. | -| D-RC-2 | `bundle.py:_source_paths` deferred | Filesystem-only function; SQLite has no wiki/INDEX.md path to return. v1.0 keeps the direct path check. Follow-up issue filed at PR 4. | +| D-RC-2 | `bundle.py:_source_paths` deferred | Filesystem-only function; SQLite has no wiki/INDEX.md path to return. v1.0 keeps the direct path check. Follow-up issue filed at #314. | --- @@ -743,7 +735,7 @@ Items considered and explicitly deferred, with rationale. | `resolve_links(page) -> list[CorpusRef]` Protocol method | Wiki pages reference each other (`[[other-page]]`). Link resolution lives in the agent layer (consumer-side), not the Protocol. Keeps the Protocol surface tight. | | Multi-modal corpus pages (PDFs binary, images, audio transcripts) | All pages are markdown text in v1.0. Binary substrate is a v1.1+ scope expansion via a new capability flag. | | Dream-distillation pipeline `write_page()` integration | No current framework-side wiki writes exist. `write_page()` ships in PR 1 for future-state use. The distillation pipeline migration is a separate arc. | -| `bundle.py:_source_paths` migration to Protocol | Filesystem-only function; SQLite has no equivalent path to track. Deferred to v1.1 with a follow-up issue filed at PR 4 (D13). | +| `bundle.py:_source_paths` migration to Protocol | Filesystem-only function; SQLite has no equivalent path to track. Deferred to v1.1 with a follow-up issue filed at #314 (D13). | | MCPServerRegistryBackend (#201) | Separate arc; v1.0 closes when this AND #65 ship. | --- @@ -773,6 +765,11 @@ Items considered and explicitly deferred, with rationale. - `_validate_corpus_type` refuses unknown corpus values - `atomic_write` usage (write fault injection verifies no partial state) - path-traversal refusal via `safe_resolve_under` +- 4 registry primitive tests in `tests/test_corpus_registry.py`: + - `register_corpus_backend` / `unregister_corpus_backend` round-trip and collision-replace semantics + - `get_corpus_backend` raises `CorpusBackendNotRegistered` on unknown id + - `list_corpus_backends` returns ids in lexicographic order + - `get_default_corpus_backend` honors the `ATOMIC_AGENTS_CORPUS_BACKEND` env var **PR 2 (~46 actual, 35 estimated):** @@ -803,7 +800,7 @@ Plus: per-runner kwargs, delegate threading (`_corpus_backend_was_explicit` flag --- -## Call-site migration reference (PR 3 -- implemented in #65 PR 3 of 4) +## Call-site migration reference The wiring contract described in this section is implemented. Both call sites migrated; the 5 IRON RULE regression assertions in `tests/test_corpus_migration_regression.py` pin the byte-identity guarantees. The `_source_paths` row remains deferred to v1.1 as documented. @@ -812,13 +809,13 @@ The wiring contract described in this section is implemented. Both call sites mi | `agent.py` | `AtomicAgent.__init__` (2937-2939) | `wiki_index.read_text()` direct `Path` read | `self.corpus.render_index_summary(corpus="wiki")` when `self.corpus_backend is not None`, else fall back to the direct read. Single `if self.corpus_backend is None:` branch. | | `agent.py` | `AtomicAgent` prompt assembly (3058-3059) | uses already-read `self._wiki_index_text` | unchanged (the read happens upstream at lines 2937-2939) | | `bundle.py` | `_render_memory_breakpoint(instance_root)` at line 494 | `wiki_dir / "INDEX.md"` direct path read at line 497 | function signature gains `corpus_backend: CorpusBackend | None = None`; when not None, calls `corpus_backend.render_index_summary(corpus="wiki")`; when None, falls back to the direct path. Callers thread the parameter through. Note: this is NOT a `getattr` pattern โ€” `bundle.py`'s call path has no `AtomicAgent` instance; the parameter is explicit. | -| `bundle.py` | `_source_paths(agent_root)` at line 266 | `wiki_dir / "INDEX.md"` appended to path list at line 295 | **Deferred to v1.1.** SQLite backends have no `wiki/INDEX.md` file path to track. v1.0 keeps the direct path check. Follow-up issue filed at PR 4. | +| `bundle.py` | `_source_paths(agent_root)` at line 266 | `wiki_dir / "INDEX.md"` appended to path list at line 295 | **Deferred to v1.1.** SQLite backends have no `wiki/INDEX.md` file path to track. v1.0 keeps the direct path check. Follow-up issue filed at #314. | **What does NOT need migration:** `dashboard/memory.py` does not touch wiki at all (verified). `agent.py:721` (`self._wiki_index_text: str = ""`): unchanged (it buffers the read result; the new read happens upstream at 2937-2939). `migrate.py:233 + 595-606`: references wiki/raw paths as part of vault migration utilities โ€” those stay outside CorpusBackend (migrate is a one-shot operator tool). -**Crucial fact:** there are NO writes to `/wiki/` or `/raw/` in the framework code today. Dream output writes to `dream_dir/report.md` (a separate output directory); operator additions are manual file edits. PR 3's call-site migration scope is writes of `render_index_summary` only โ€” not write_page migrations, because there are no write sites to migrate. +**Crucial fact:** there are NO writes to `/wiki/` or `/raw/` in the framework code today. Dream output writes to `dream_dir/report.md` (a separate output directory); operator additions are manual file edits. The call-site migration scope is reads through `render_index_summary` only, not `write_page` migrations, because there are no write sites to migrate. -Both fallback shapes preserve byte-identical pre-#65 behavior. The PR 3 IRON RULE regression suite (5 explicit assertions above) is the enforcement mechanism. +Both fallback shapes preserve byte-identical pre-#65 behavior. The IRON RULE regression suite (5 explicit assertions above) is the enforcement mechanism. --- @@ -844,29 +841,10 @@ No failure mode is both silent AND lacking planned coverage. Every failure eithe --- -## PR 4 documentation-update checklist - -Sourced from /plan-subagent Subagent 4. Each PR 4 implementer ticks these off. - -- `docs/spec/34-corpus-backend.md` โ€” Drop RFC banner; add LOCKED status; finalize N MUSTs. -- `docs/spec/24-agent-profile-backend.md:123` โ€” Update Decision 7 to acknowledge CorpusBackend ownership of wiki/raw: "CorpusBackend, when registered, becomes the source of truth for `wiki/` and `raw/` per spec/34; MemoryBackend retains ownership of `memory/` and `journal/`." -- `docs/spec/26-cascade-bundle.md:94, 245` โ€” Cross-reference spec/34 (DRAFT spec; updates freely). -- `docs/spec/01-anatomy.md:351-371` โ€” Add CorpusBackend cross-reference paragraph. -- `docs/spec/04-runtime-assembly.md:42` โ€” Note step [7] routes through `corpus_backend.render_index_summary("wiki")` when registered. -- `docs/spec/02-atomic-memory.md:38-49` โ€” Add CorpusBackend cross-reference paragraph. -- `docs/spec/31-llm-backend.md:11` โ€” Add `(spec/34)` link after "Corpus" in the Protocol list. -- `docs/spec/27-doctor.md` โ€” Insert `### corpus-backend` entry between `tool-registry-backend` and `mandate-backend`. -- `CLAUDE.md` โ€” 4 edits: architecture diagram flip (`Corpus ๐ŸŸก` to `Corpus โœ…`), 11th lock-paragraph, Design Principles ยง2 update, "Where things live" table count update. -- `README.md` โ€” 6 edits: backend-protocols table row, spec list addition, 4 status-counter updates. -- `ROADMAP.md` (repo-root) โ€” Status paragraph + table row. -- `ROADMAP.md` (vault) โ€” `last_review` date + status paragraph + table entry. - ---- - ## References - `docs/spec/20-memory-backend.md` โ€” MemoryBackend Protocol; the template this arc follows. `VersionRef`, `WritePolicy`, `.versions/` layout, and the `read_note` None vs `read_version` raise convention all originate here. -- `docs/spec/24-agent-profile-backend.md` โ€” Decision 7 (to be updated at PR 4); Implementer Contract MUST count range; snapshot id entropy budget. +- `docs/spec/24-agent-profile-backend.md` โ€” Decision 7 (updated at #65 PR 4 of 4); Implementer Contract MUST count range; snapshot id entropy budget. - `docs/spec/25-tool-registry-backend.md` โ€” INSERT-first + atomic_write-on-success-only atomicity pattern for hybrid storage; `PRAGMA busy_timeout=5000` before WAL pragma precedent; URL factory + credential redaction across 5 `ValueError` sites. - `docs/spec/27-doctor.md` โ€” PASS/WARN/FAIL ladder shape; page-count cliff WARN precedent from LogBackend. - `docs/spec/32-policy-backend.md` โ€” `_policy_backend_was_explicit` precedent for explicit-only delegate threading; `policy_decision(axis="cost_cap")` event schema unchanged. diff --git a/tests/test_corpus_sqlite_backend.py b/tests/test_corpus_sqlite_backend.py index 1d64482..6d42bf2 100644 --- a/tests/test_corpus_sqlite_backend.py +++ b/tests/test_corpus_sqlite_backend.py @@ -1,4 +1,4 @@ -"""SQLite-specific tests for SQLiteCorpusBackend (spec/34 PR 2 of 4). +"""SQLite-specific tests for SQLiteCorpusBackend (spec/34). These tests cover behaviors that are specific to the SQLite backend and cannot be exercised via the parametrized conformance suite. The conformance suite