diff --git a/CHANGELOG.md b/CHANGELOG.md index a9b38df..dd04403 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -51,6 +51,8 @@ CHANGELOG entry. ### Added +- **MCPServerRegistryBackend wiring + AgentProfile.mcp_servers_resolved sibling field + doctor.check_mcp_server_registry_backend + IRON RULE regression suite + cross-model fail-closed re-raise** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 2 of 5**). Operators with a registry-backed MCP catalog now have agent construction actually consult the backend: previous PRs scaffolded the Protocol; this PR ships the framework integration. `AtomicAgent.__init__` accepts a new `mcp_server_registry_backend` constructor kwarg + class-level type annotation, resolves the default via `get_default_mcp_server_registry_backend(self.agent_root, read_paths)` when none is supplied, calls `backend.load_all_mcp_servers()` to materialize the spec list, and populates the new `AgentProfile.mcp_servers_resolved` field via `dataclasses.replace()` BEFORE `_load_config()` runs. `MCPClientPool` at `agent.call()` now reads `self._profile.mcp_servers_resolved` (the substrate-agnostic materialized list) instead of `self.config.mcp_servers` (the filesystem-parse path, which stays in place for backward-compat audit/log consumers). The framework-level fail-closed invariant per spec/36 lines 519-522 propagates `MCPRegistryError` from the backend probe with backend_id context plus URL credential redaction; original exception type is preserved via `type(exc)(...)` so callers can distinguish `MCPRegistryUnavailable` (transient) from `MCPRegistryDescriptorInvalid` (permanent). Per-runner kwargs ship on all three runners: `OutcomeRunner(..., mcp_server_registry_backend=...)` and `EvalRunner(..., mcp_server_registry_backend=...)` thread the kwarg to the internal `AtomicAgent` construction sites; `DreamRunner` stores it for API parity (no internal `AtomicAgent` construction in v1, mirroring the CorpusBackend precedent). `delegate.py` adds explicit-only threading via `_mcp_server_registry_backend_was_explicit` flag (mirrors PersonaBackend D-ER-2 at `agent.py:401` and CorpusBackend at `agent.py:431`): default-resolved backends do NOT leak the coordinator's `agent_root` to delegates because MCP catalog is per-agent semantic context, not fleet-scoped; the `MCPRegistryError` family is added to the delegate CLI catch block so catalog-down at delegate construction surfaces as a clean operator-facing error instead of a Python traceback. `AgentProfile` gains the `mcp_servers_resolved: list[MCPServerSpec]` sibling field (spec/24 Decision 1 addendum) placed LAST in the dataclass to honor Python's required-fields-before-defaults ordering rule; `field` is added to the `dataclasses` import; `to_dict()` always serializes the field as `[]` regardless of runtime value (snapshot security: resolved `$VAR` env values never land in snapshot JSON files on disk, preserving spec/24 D1's raw-text-shadow security intent); `from_dict()` reconstructs the field via the new `_mcp_spec_from_dict` helper which also fixes a pre-existing latent bug where the `mcp_servers` fallback path returned raw dicts instead of `MCPServerSpec` instances. SQLite backend continues to ride the existing `profile_json` blob round-trip; zero schema migration required, schema stays at v2. Filesystem profile backend additionally sorts `mcp_servers` lexicographically by name on load to align with spec/36 MUST 5 across all backends; pre-#201 operators with alphabetical `mcp.md` see no change; operators with non-alphabetical `mcp.md` see a one-time invisible reorder of `agent.config.mcp_servers` introspection (MCP tool resolution and audit semantics unchanged because qualified names are unique per server). `doctor.check_mcp_server_registry_backend` ships as the 13th `check_*_backend` with PASS/WARN/FAIL ladder: PASS on `list_mcp_servers` + `load_all_mcp_servers` both succeeding (the new dual-probe closes a Codex/Claude-adversarial-triple-confirmed false-PASS hole where a malformed `mcp.md` would PASS doctor then crash agent construction); WARN on transient `MCPRegistryUnavailable`; FAIL on unknown `backend_id` with credential-redacted echo via the shared `_redact_for_error_message` helper from `mcp_registry/__init__.py` (NOT the inline truncation used in `check_tool_registry_backend` which misses DSN-style values per the PR 1 P0 redaction discipline). Capability snapshot in the detail dict includes all 5 `MCPServerRegistryCapabilities` fields plus `mcp_server_count` so operator `atomic-agents doctor --json` consumers see the runtime view. Two PR 1 latent bug fixes surfaced by PR 2's fail-closed semantic: `FilesystemMCPServerRegistryBackend.list_mcp_servers` now discriminates `FileNotFoundError`/ENOENT (returns `[]`, the normal absent-file path) from other `OSError` (raises `MCPRegistryUnavailable` so misconfigured PermissionError after a Kubernetes deploy actually surfaces); `load_all_mcp_servers` overridden directly instead of delegating to `_default_load_all` so it does a single read-parse with proper exception mapping (ENOENT to empty, OSError to MCPRegistryUnavailable, parse error to MCPRegistryDescriptorInvalid, env-var unresolvable to MCPServerConnectFailed) rather than masking transient and parse failures through `list_mcp_servers`'s soft-degrade path. spec/36 line 599 gains a one-sentence amendment correcting the wiring location from `_load_config()` to `AtomicAgent.__init__` (spec/36 said the former, but `_load_config` is a pure reader of `self._profile` that cannot mutate it); spec/24 Decision 1 gains the locked D1 addendum documenting the sibling field, snapshot security clamp, and SQLite forward-compat. **Cross-model review army** (4 specialists + Claude adversarial subagent + Codex adversarial via `codex exec` + Codex structured review via `codex review`) surfaced 15+ findings; the 3 highest-priority were triple-confirmed across models and applied inline: (1) empty resolved list now treated as authoritative instead of falling back to `config.mcp_servers` (Codex HIGH + Claude adversarial F2 + Codex structured P2; the prior `... or self.config.mcp_servers` resurrected stale `mcp.md` servers when the operator pinned an empty catalog); (2) fail-closed wrapper broadened from `MCPRegistryUnavailable` to `MCPRegistryError` so descriptor-invalid and connect-failed also propagate with backend_id context; (3) doctor false-PASS closed by adding the `load_all_mcp_servers` probe after `list_mcp_servers` succeeds. **Test delta: +46 net new (3192 collected before fixes, 3199 after the doctor + wiring additions, 3141 passed + 58 skipped + 0 failures + 0 regressions across the full suite).** Test files: `tests/test_mcp_server_registry_wiring.py` (~22 tests covering per-runner kwargs, delegate explicit-only behavior, fail-closed re-raise with backend_id + URL redaction + MCPRegistryDescriptorInvalid propagation, profile augmentation, empty-resolved-list authoritative, CLI MCPRegistryError catch), `tests/test_mcp_server_registry_migration_regression.py` (10 IRON RULE byte-identity tests pinning pre-#201 behavior when no backend is configured: empty/missing mcp.md returns empty specs, single server no env vars byte-identical to `parse_mcp_md` baseline, multiple servers in sorted order matching the spec/36 MUST 5 sort, env-var resolved specs match the pre-#201 path, config equals profile resolved, pool `_specs` byte-identical to resolved list, `MCPRegistryUnavailable` raises at construction with no silent degrade), `tests/test_profile_mcp_servers_resolved.py` (9 tests covering field existence + LAST-in-dataclass constraint + to_dict always-`[]` + from_dict round-trip with key present + backward-compat with key absent + `_mcp_spec_from_dict` helper round-trip + `mcp_servers` fallback bug fix + snapshot/restore resets to `[]` + SQLite JSON blob round-trip), `tests/test_mcp_server_registry_doctor.py` (5 tests covering PASS with empty mcp.md, PASS with valid mcp.md, FAIL on unknown backend_id with URL credential redaction including DSN-style `user:pass@host` patterns, FAIL on descriptor invalid via the new probe, capability snapshot completeness). Mirrors the doctor-test shape of every other backend (lock, log, profile, tool registry, persona, corpus). Pre-impl prep: 5-stream parallel Sonnet prep pass parametrized on failure-mode dimensions (wiring shape vs precedent; AgentProfile round-trip; IRON RULE design; doctor PASS/WARN/FAIL ladder; fail-closed wiring) caught 71 findings BEFORE any code shipped, beating PR 1's 35-finding bar. Two binary design decisions locked via AskUserQuestion before implementer dispatch: server list ordering (Q1, locked to "sort both paths to lexicographic" in the filesystem profile backend so spec/36 MUST 5 holds across all backends with zero functional change to operators); snapshot serialization shape (Q2, locked to "always serialize `mcp_servers_resolved` as `[]`" so resolved MCP env secrets never persist to disk in snapshot files). Three commits authored as parallel Sonnet implementer streams in isolated git worktrees per the aggressive-Sonnet-delegation-when-on-Opus discipline (Stream 1 owned `agent.py` + runners + delegate + spec/36; Stream 2 owned profile/* + doctor + mcp_registry/filesystem.py bug fix + spec/24); merged cleanly with zero conflicts because the file-set partition was disjoint by design. Known follow-up coverage gaps (filed as P2 for PR 3 prep): a future PR can add chmod-style PermissionError tests for `load_all_mcp_servers`'s new OSError mapping (analogous coverage already exists for `load_mcp_server`); threading test strengthening to patch `AtomicAgent.__init__` and capture kwargs at runner `run()` time instead of asserting on the stored field. After PR 5 of 5 of #201 lands, atomic-agents-stack hits v1.0 with twelve of twelve backend protocols shipped; PR 2 extends the post-#285-revert /ship streak to 9. + - **MCPServerRegistryBackend Protocol scaffolding + FilesystemMCPServerRegistryBackend reference impl + DRAFT spec/36 + `atomic-agents mcp-registry` CLI** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 1 of 5**, the v1.0 closer arc). Operators with a fleet of agents stop hand-syncing `mcp.md` files across N agents. The Protocol seals the per-agent mounted-server-source seam so SaaS / org-internal / public-registry futures arrive via the same env-var pin pattern used by the eleven prior backend protocols. New `atomic_agents/mcp_registry/` package with `__init__.py` (`register_mcp_server_registry_backend` / `get_mcp_server_registry_backend` / `list_mcp_server_registry_backends` + `get_default_mcp_server_registry_backend` factory + `_redact_for_error_message` helper with DSN heuristic for non-scheme connection strings), `types.py` (`MCPServerRef` frozen dataclass with `to_dict`/`from_dict` roundtrip, `MCPServerRegistryCapabilities` with 5 capability flags including the static-vs-dynamic distinction novel to this Protocol, `ValidationResult`), `backend.py` (`@runtime_checkable` `MCPServerRegistryBackend` Protocol with 8 methods + `@property backend_id` + `@property capabilities`, 6 exception classes, `_default_load_all(backend)` module-level helper that backends MAY override for performance), and `filesystem.py` (`FilesystemMCPServerRegistryBackend(agent_root, read_paths, *, lock_backend=None)` reading `/mcp.md`; install/uninstall raise `NotImplementedError` at PR 1 with `supports_install=False` static class default per the capability-honesty contract (MUST 3); install/uninstall ship at PR 3 alongside `LockBackend` integration). `parse_mcp_md_text` and `_build_spec` in `atomic_agents/mcp.py` gain an additive `resolve_env: bool = True` keyword-only parameter; the `_flush()` closure threads the parameter through to `_build_spec`; a new module-level `_resolve_env_vars(env, server_name)` helper is extracted from `_build_spec`'s inline resolution block and is now shared with `FilesystemMCPServerRegistryBackend.load_mcp_server` so the spec/19-canonical `MCPServerConnectFailed` message shape stays identical across both call paths. The public `parse_mcp_md` at `mcp.py:410` does NOT receive the parameter; it always resolves at parse time and the docstring documents the API asymmetry (callers needing lazy resolution call `parse_mcp_md_text(..., resolve_env=False)` directly). Two hidden callers preserved: `profile/types.py:336` (`AgentProfile.from_dict`) and `profile/filesystem.py:374` (`FilesystemAgentProfileBackend.load_profile`) each gain an inline comment documenting the `resolve_env=True` invariant their `AgentProfile.mcp_servers` consumers depend on. CLI gains the `atomic-agents mcp-registry` subcommand group with four read-only subcommands at PR 1 (install/uninstall ship in PR 3): `list` prints a table of mounted servers, `show ` prints the spec with `$VAR` env refs UNRESOLVED (re-parses mcp.md with `resolve_env=False` to avoid leaking resolved secrets to stdout per the cross-model-confirmed P0 from /ship's review army), `validate ` runs the static descriptor check, and `refresh-capabilities` prints the runtime capability view (no-op on filesystem; HTTP backend at PR 4 re-probes the catalog server). `docs/spec/36-mcp-server-registry-backend.md` ships as DRAFT at 696 lines with 10 normative MUSTs in the Implementer Contract (placed before Shipping plan per spec/34 ordering): 1 name charset validation at API boundary, 2 side-effect-free construction, 3 capability honesty including the static-vs-dynamic distinction, 4 URL credential redaction, 5 cross-agent isolation at storage layer, 6 backend_id stability + close() idempotency, 7 transient-vs-permanent failure honesty (MCPRegistryUnavailable vs MCPServerNotInRegistry distinction), 8 env-var resolution semantics on MCPServerSpec at load_mcp_server time, 9 install/uninstall atomicity (filesystem uses LockBackend lease, HTTP relies on catalog server transactional storage), 10 `load_all_mcp_servers` consistency (output semantically equivalent to `[load_mcp_server(ref.name) for ref in list_mcp_servers()]`). The spec re-languages "catalog of available servers" to "mounted server source" per Codex's outside-voice finding so the per-`agent_scope` filtering model is unambiguous; HTTP backend tier-1/2/3 capability negotiation via `GET /capabilities` and `OPTIONS /mcp-servers` ships at PR 4. `AgentProfile.mcp_servers_resolved` sibling field for the substrate-agnostic audit capture (spec/24 D1 addendum) ships at PR 2 alongside the wiring. Tests: 3090 to 3101 passing, 52 skipped, zero regressions across the full suite. 100 new MCP-registry tests across 3 files (`tests/test_mcp_server_registry_conformance.py` 42 tests covering MUSTs 1-3, 5-8, 10 parametrized across registered backends with HTTP joining at PR 4; `tests/test_mcp_server_registry_filesystem_backend.py` 32 tests + 2 skipped (install/uninstall stubs reserved for PR 3 with `pytest.mark.skip(reason="land in PR 3")` markers); `tests/test_mcp_registry_cli.py` 16 tests covering all four CLI subcommands + exception handler paths including the secret-leak regression test). Pre-impl review: 5-stream parallel Sonnet plan-subagent pass (Protocol surface, conformance test plan, filesystem correctness, parse_mcp_md_text refactor risk, spec/36 coherence) caught 5 P0 + 15 P1 + 15 P2 = 35 findings BEFORE any code shipped (9 times the CorpusBackend PR 4 prep pass's 4 findings, tracking the v1.0-closer-arc complexity). /ship review army: 6 parallel reviewers (4 specialists: testing, maintainability, security, performance + Claude adversarial subagent + Codex adversarial via codex exec) surfaced 30 findings post-implementation including the cross-model triple-confirmed P0 (`mcp-registry show` leaked resolved `$VAR` env values to stdout, found independently by Codex + the security specialist + Claude adversarial) plus 5 P1 (OSError misclassified as permanent-absence violating MUST 7, malformed-section silent drop, CLI `read_paths=[]` skipping path-traversal validation, `MCPServerConnectFailed` escaping the CLI exception handler with a Python traceback, `backend_id` property absent from the Protocol declaration despite being implemented on the class) plus 8 mechanical P2 AUTO-FIXes (section names not validated before listing, `_redact_for_error_message` missing the DSN heuristic for non-scheme connection strings, vacuous test assertions, hardcoded `/tmp/dummy-agent` path, 6 late imports moved to top of file, unused Protocol import in filesystem.py, misleading "Re-probe" CLI help text on filesystem backend). All P0 + P1 + critical P2s addressed inline before push; 15 regression tests added covering each fix including `test_cli_show_does_not_leak_resolved_env_values` that asserts a resolved env value (the actual GitHub token) does NOT appear in stdout while the unresolved `$VAR` reference DOES (operator-debugging utility preserved without the leak). After PR 4 of 5 of #201 lands, `atomic-agents-stack` hits v1.0 with twelve of twelve backend protocols shipped; PR 1 extends the post-#285-revert /ship streak to 8. - **`atomic-agents init` wizard PR 2 of 2: researcher + writer templates + Add-to-it recovery merge + 8 polish items** ([#94](https://github.com/dep0we/atomic-agents-stack/issues/94) -- init-wizard arc **PR 2 of 2**, arc closer). Operators now have three starter templates instead of one: `advisor` (general-purpose, Cautious autonomy), `researcher` (curiosity-first investigator with research integrity layer 1 ON, raw/ source intake, $1.50/day cost cap for multi-source synthesis, Cautious preset), and `writer` (voice-first content agent with Sonnet primary + Haiku fallback, drafts/ + revisions/ write paths, Cautious preset). Each template declares its operating mode explicitly per spec/01 (advisor and writer reactive; researcher hybrid for sustained multi-session investigations). The advisor template gains an `## Operating mode` section so all three templates conform to spec/01's mode-declaration requirement. Re-running `atomic-agents init` on an existing agent now offers a third recovery branch: **Add to it** (in addition to Overwrite and Cancel). The collision prompt in `_check_collision` offers three options: overwrite, add_to_it, cancel. Choosing add_to_it triggers section detection via the ATX h2 state machine (`_extract_h2_headers`, `_detect_sections`): code fences, HTML comments, and YAML frontmatter are skipped to avoid false positives; setext-style headings (underline) fail closed. Detection failure routes the operator back to overwrite/cancel via a plain-English message. On detection success, `_compute_merged_content` performs a file-level section-aware merge for every file in `TEMPLATE_SECTION_SCHEMA[template_name]`: the preamble before the first h2 is preserved verbatim; orphan h2 sections (operator-authored, not in schema) are preserved verbatim in their original position; schema h2 sections merge with operator preamble + existing h3 subsections preserved in order + new template h3 subsections appended at end. A unified diff preview (`_render_diff_preview`) is shown with CRLF and UTF-8 BOM normalization so Windows-originating vaults do not show every line as changed; if zero files changed the preview exits early with "up to date" and skips the Confirm prompt. On confirmation, each file is committed via `_io.atomic_write` (tmp + fsync + rename) directly into `agent_dir` in sorted relpath order -- no staging directory. A mid-commit KeyboardInterrupt prints a summary of written vs pending files before re-raising so the operator knows exactly what landed. Operator-authored memory notes (under `memory/` except `INDEX.md`), journal entries, log files, and raw documents are never touched; schema-owned scaffolding files (`memory/INDEX.md`, `wiki/INDEX.md`) ARE rewritten because they are template-owned routing/structure files. Missing template-owned files trigger a backfill path (label `[new file]` in the diff preview). 8 polish items from PR 1 adversarial review land in this PR: `DEFAULT_AGENTS_ROOT` is now lazy-computed in `get_agents_root()` so test monkeypatches of `HOME` work correctly; `_doctor_handoff` splits its try/except around `run_doctor` and `render_human` so a render-phase failure surfaces the exit code rather than masquerading as inconclusive; `_from_template` gains a symmetric length check before the regex match so a too-long name produces the correct error message; `_test_call` extracts an `_types(mod, *names)` helper for the exception catalog so the dispatch reads cleanly and tolerates missing SDK installs; `_translate_oserror` adds a specific message for `errno.ENOENT` ("the folder disappeared between collision check and overwrite") so operators get actionable guidance instead of a generic write failure; `_walk_traversable` converts from recursion to iterative deque with a `MAX_TEMPLATE_DEPTH=16` cap to defend future deep template trees; backup and staging directory timestamps now include microseconds (`%Y%m%dT%H%M%S_%fZ`) so two simultaneous overwrites within the same second produce distinct names; the persona-backend warning prints a credential-redacted URL via a new `redact_url_credentials` helper (drops `user:pass@` from netloc while keeping scheme + host + path visible, fixing the original advisor-warning pattern that hid the host entirely). PR 1 adversarial Round 2 deferred a `KeyboardInterrupt` cleanup gap (the existing `except Exception` did not catch KeyboardInterrupt because it inherits from BaseException, not Exception); PR 2 restructures all wizard cleanup paths to `except BaseException` with explicit `KeyboardInterrupt` re-raise after cleanup, plus an outermost handler in `run_init` that prints `"Canceled."` and returns exit code 130 (when KI lands mid-`_commit_merges`, the per-commit summary already printed the written vs pending file list). `--from-template ` and `--list-templates` are now fully CI-friendly: the non-TTY guard and the API key pre-flight both carve out these paths (per amended spec/35 MUST 2, 7, and 11), so a CI build can scaffold an agent without an interactive terminal and without the operator setting `ANTHROPIC_API_KEY` at build time (doctor catches the missing key later at first run). `_cmd_list_templates` enumerates all three templates with one-line descriptions. The action class glosses in spec/28 amend to explicitly classify web search HTTP GETs as `read_only` (they do not change external state), so researcher templates use the same Cautious preset as advisor without paying judge_required overhead on every search query. spec/35 grows to 15 normative MUSTs with MUST 5 updated to distinguish operator-authored data files from schema-owned scaffolding files (memory/INDEX.md and wiki/INDEX.md are template-owned and ARE rewritten; operator notes, journal, log, raw are not) and a new MUST 15 documenting the section-detection algorithm and file-level merge contract. New constants in `atomic_agents/init/constants.py`: `TEMPLATE_PRESET_DEFAULTS` per-template preset map, `TEMPLATE_SECTION_SCHEMA` per-template per-file h2 header schema (the source of truth that Add-to-it section detection validates against), `MAX_TEMPLATE_DEPTH`, three new MSG_ constants for the new recovery flow paths, and the `redact_url_credentials` helper. New `atomic_agents/init/templates/researcher/` (7 files, 360 LOC) and `atomic_agents/init/templates/writer/` (7 files, 352 LOC) trees use the same 12 locked template variables as advisor with template-specific section schemas. 59 net new tests across `test_init_cli.py` (--list-templates enumerates 3; researcher and writer choices accepted; unknown still rejected), `test_init_wizard.py` (section detection state machine; Add-to-it dispatch + detection; detection-failure fallback to cancel; file-level merge success + failure + KI-safe; _split_sections/_join_sections round-trip; _compute_merged_content preamble preservation + orphan h2 position + h3 preservation; CRLF + BOM normalization; all 8 polish items; per-template preset dispatch), `test_init_templates.py` (researcher + writer schema conformance; structural conformance per spec/01; ALL template variables used; zero em dashes per template; Operating mode sections present), and `test_init_smoke.py` (researcher and writer e2e happy paths; --from-template works without API key per the P3 carve-out). Test suite: 2939 + 50 skipped to 3051 collected, zero regressions. Pre-impl prep (4 parallel subagents) caught 13 SEVERE + 15 HIGH + 17 MEDIUM + 13 LOW = 58 findings across the brief BEFORE any code shipped (exceeding PR 1 of #94's 53 findings in a smaller scope), including 6 SEVERE locks emerged from cross-corroboration (known_templates not expanded with cli.py choices, `_default_template_vars` hardcoding the advisor preset for all templates, Operating mode section missing from IDENTITY.md across all templates, KeyboardInterrupt bypassing the cleanup block, TEMPLATE_SECTION_SCHEMA values not locked in the brief, spec/35 MUST 5 + MUST 15 text not drafted in the brief). Three architectural decisions surfaced via AskUserQuestion gates: web search action class (locked to read_only with spec/28 amendment), writer template model default (Sonnet primary + Haiku fallback with Opus upgrade path documented inline), and --from-template carve-out from MUST 7 (no API key required for scaffold; CI-friendly end-to-end). With PR 2 of the arc landing, Issue #94 closes: the half-day home-user deploy is now an under-10-minute first-run experience with three starter shapes plus a non-destructive recovery flow for re-running on existing agents. diff --git a/CLAUDE.md b/CLAUDE.md index c6b4d2b..36cbd14 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -207,7 +207,7 @@ uv run pytest # full suite uv run pytest tests/test_.py -v # one module ``` -Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 3,153 tests collected, 2026-06-03). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol — the conformance suite is what keeps the protocol honest. +Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 3,199 tests collected, 2026-06-03). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol — the conformance suite is what keeps the protocol honest. ### Releases + SemVer @@ -341,7 +341,7 @@ These are not forbidden forever — they're explicitly deferred with rationale. ## Status -**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 3,153 tests collected, 2026-06-03). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: +**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 3,199 tests collected, 2026-06-03). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: - **MemoryBackend** (PR #57) — filesystem reference impl + conformance suite. - **LLMBackend** (#87) — Anthropic + OpenAI + Moonshot reference impls, registered at framework import; conformance suite parametrizes across all three. diff --git a/README.md b/README.md index 118a4da..69f88e4 100644 --- a/README.md +++ b/README.md @@ -282,7 +282,7 @@ Same pattern for OpenAI (`atomic-agents-openai`) and Moonshot (`atomic-agents-mo ## Repository structure - `atomic_agents/` — the Python package (runtime in `agent.py`; backend protocols in `memory/`, `_llm.py`, `_locks.py`, `_costs.py`, etc.; CLI in `cli.py`; preflight in `doctor.py`) -- `tests/` 3153 tests collected (3101 passing + 52 skipped), Python 3.11 + 3.12 matrix +- `tests/` 3199 tests collected (3141 passing + 58 skipped), Python 3.11 + 3.12 matrix - `docs/` — [spec entry point](docs/README.md), [`architecture.md`](docs/architecture.md), [`spec/`](docs/spec/) (31 locked docs + 4 RFCs/DRAFTs), [`deployment/`](docs/deployment/) (8 operator runbooks), [`samples/caldwell/`](docs/samples/caldwell/) (complete worked example), [`GOVERNANCE.md`](docs/GOVERNANCE.md), [`TENSIONS.md`](docs/TENSIONS.md), [`methodology.md`](docs/methodology.md) - `extras/` — operational templates (Claude Code skill wrappers, macOS LaunchAgent plists, cron examples) @@ -313,4 +313,4 @@ Before opening a PR, read [`CLAUDE.md`](CLAUDE.md) (the project's design ethos a ## Status -**v0.13.0, alpha.** Core runtime stable. 3153 tests collected (3101 passing + 52 skipped) on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 — Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. +**v0.13.0, alpha.** Core runtime stable. 3199 tests collected (3141 passing + 58 skipped) on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 — Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. diff --git a/atomic_agents/agent.py b/atomic_agents/agent.py index 2e99bb8..91c2273 100644 --- a/atomic_agents/agent.py +++ b/atomic_agents/agent.py @@ -73,6 +73,13 @@ CorpusBackend, get_default_corpus_backend, ) +from .mcp_registry import ( + MCPRegistryError, + MCPRegistryUnavailable, + MCPServerRegistryBackend, + _redact_for_error_message as _redact_mcp_registry_url, + get_default_mcp_server_registry_backend, +) from .logs.types import ( PRIMITIVE_AGENT_CALL, PRIMITIVE_CAPTURE, @@ -266,6 +273,13 @@ class AtomicAgent: # ``CorpusBackend`` Protocol implementer -- breaking the # operator-pinned-SQLite/pgvector case PR 3 forward. corpus_backend: CorpusBackend + # Same class-level annotation rationale for ``mcp_server_registry_backend`` + # (#201 PR 2). Without this, static analysis would narrow + # ``agent.mcp_server_registry_backend`` to the concrete + # ``FilesystemMCPServerRegistryBackend`` default rather than treating + # it as any ``MCPServerRegistryBackend`` Protocol implementer -- + # breaking the operator-pinned-HTTP/SaaS case PR 4 forward. + mcp_server_registry_backend: MCPServerRegistryBackend """The main agent runtime. Responsible for: @@ -293,6 +307,7 @@ def __init__( policy_backend: PolicyBackend | None = None, persona_backend: PersonaBackend | None = None, corpus_backend: CorpusBackend | None = None, + mcp_server_registry_backend: MCPServerRegistryBackend | None = None, ): self.name = name self.trigger = trigger @@ -528,6 +543,65 @@ def __init__( agent_mode=parse_agent_mode_text(_persona.identity), # re-derive ) + # ── MCPServerRegistryBackend resolution (#201 PR 2 of 5) ────────────── + # Mirrors PersonaBackend's _persona_backend_was_explicit pattern at + # agent.py:443-450 and CorpusBackend's at agent.py:458-465. MCP catalog + # is per-agent semantic context (per spec/36 Decision 1); delegate + # threading is explicit-only. + # + # Unlike other backends, the default-resolution factory needs read_paths + # from self._profile.tool_config['read_paths'], which is only available + # after profile load. The resolution therefore happens here in __init__ + # AFTER profile load and BEFORE _load_config() is called, rather than + # inside _load_config() (which is a pure reader of self._profile). + # This is spec/36 line 599 corrected (the spec text says _load_config() + # but the actual right place is __init__; spec doc gets a one-sentence + # amendment in this same PR). + _mcp_server_registry_backend_was_explicit = ( + mcp_server_registry_backend is not None + ) + read_paths_for_mcp_registry = self._profile.tool_config.get("read_paths", []) + if mcp_server_registry_backend is None: + self.mcp_server_registry_backend = get_default_mcp_server_registry_backend( + self.agent_root, + read_paths_for_mcp_registry, + ) + else: + self.mcp_server_registry_backend = mcp_server_registry_backend + # Saved on self so delegate() can consult it without re-checking the + # constructor kwarg (the kwarg is no longer in scope there). + self._mcp_server_registry_backend_was_explicit = ( + _mcp_server_registry_backend_was_explicit + ) + + # Probe + augment profile per spec/36 framework-level invariant (line + # 520-522). NO try/except around load_all_mcp_servers -- fail-closed: + # MCPRegistryUnavailable propagates. The wrapper below adds the + # backend_id + redacted URL context for operator-facing messages per + # spec/36 MUST 4 + line 522. + try: + _materialized_mcp_specs = ( + self.mcp_server_registry_backend.load_all_mcp_servers() + ) + except MCPRegistryError as exc: + # Catch MCPRegistryError broadly (covers MCPRegistryUnavailable, + # MCPRegistryDescriptorInvalid, MCPRegistryAuthRequired). Re-raise + # preserving the original exception type so callers can distinguish + # transient (Unavailable) from permanent (DescriptorInvalid). + _safe_backend_id = getattr( + self.mcp_server_registry_backend, "backend_id", "unknown" + ) + raise type(exc)( + f"[{_safe_backend_id}] catalog probe failed at agent " + f"construction: {_redact_mcp_registry_url(str(exc))}" + ) from exc + # Populate mcp_servers_resolved on the profile via replace(). + # Stream 2 adds the mcp_servers_resolved field to AgentProfile; this + # replace() call is a no-op on the field until Stream 2 merges. + self._profile = self._profile.replace( + mcp_servers_resolved=_materialized_mcp_specs, + ) + # Per-agent target extractor registry (spec/29 §"Target extraction", # #124 PR 3a). MUST initialize BEFORE tool_registry loading below so # ToolDefinitions that declare a target_extractor_id can be validated @@ -3291,7 +3365,32 @@ def call( # Only spin up when mcp.md declares servers and pool not yet live. # Discover tools and register them into the tool registry before # the first LLM call so the model sees the full tool list. - if self.config.mcp_servers and self.mcp_pool is None: + # + # Per spec/36 framework invariant (line 520): MCPClientPool consumes + # mcp_servers_resolved (the materialized list from the registry + # backend, populated in __init__ via replace()). This is the + # substrate-agnostic spec list. AgentConfig.mcp_servers stays as + # self._profile.mcp_servers (the filesystem-parse path) for backward + # compat on existing log/audit consumers. + # + # IMPORTANT: an empty resolved list is AUTHORITATIVE, not a + # missing-field signal. If the registry backend genuinely returns + # [] (e.g., operator pinned an HTTP catalog that lists zero MCP + # servers for this agent_scope), we MUST NOT fall back to + # config.mcp_servers (which may carry stale mcp.md specs). Cross- + # model review (Codex + Claude adversarial + plan-subagent prep + # pass) all flagged the `... or self.config.mcp_servers` fallback + # as the highest-priority issue: it lets the framework launch + # subprocesses the backend explicitly removed. The check below + # uses `hasattr` to distinguish "field missing entirely" from + # "field present but empty" -- the field is added in this same + # PR's Stream 2, so post-merge this always uses the resolved + # path. + if hasattr(self._profile, "mcp_servers_resolved"): + _resolved_mcp_specs = list(self._profile.mcp_servers_resolved) + else: + _resolved_mcp_specs = list(self.config.mcp_servers) + if _resolved_mcp_specs and self.mcp_pool is None: # ── #89 PR 3b: Policy MCP-allowlist consultation ──────── # Consult Policy on each declared server. Emit a # policy_decision event (axis=mcp_allowlist) per denied @@ -3299,7 +3398,7 @@ def call( # default) all configured servers still connect; in # enforcement mode denied servers are filtered out before # the pool spins up so we don't pay the subprocess cost. - effective_mcp_specs = self.config.mcp_servers + effective_mcp_specs = _resolved_mcp_specs pol_snap = self._policy_snapshot_this_call if pol_snap is not None and pol_snap.mcp_allow_fn is not None: from .policy.types import ( @@ -3308,7 +3407,7 @@ def call( ) allowed_specs = [] - for _spec in self.config.mcp_servers: + for _spec in _resolved_mcp_specs: if pol_snap.mcp_allow_fn(_spec.name): allowed_specs.append(_spec) continue @@ -4649,6 +4748,10 @@ def delegate( _delegate_kwargs["persona_backend"] = self.persona_backend if self._corpus_backend_was_explicit: _delegate_kwargs["corpus_backend"] = self.corpus_backend + if self._mcp_server_registry_backend_was_explicit: + _delegate_kwargs["mcp_server_registry_backend"] = ( + self.mcp_server_registry_backend + ) target_agent = AtomicAgent(**_delegate_kwargs) start = time.time() diff --git a/atomic_agents/delegate.py b/atomic_agents/delegate.py index c782b92..9eb79b3 100644 --- a/atomic_agents/delegate.py +++ b/atomic_agents/delegate.py @@ -24,6 +24,7 @@ NotInRoster, SelfDelegationError, ) +from .mcp_registry import MCPRegistryError def main(argv: list[str] | None = None) -> int: @@ -35,10 +36,13 @@ def main(argv: list[str] | None = None) -> int: parser.add_argument("--target", required=True, help="target agent name") parser.add_argument("--work-item", required=True, help="work item text") parser.add_argument( - "--critical", action="store_true", + "--critical", + action="store_true", help="bypass cost guardrails (still logged)", ) - parser.add_argument("--agents-root", default=None, help="override ATOMIC_AGENTS_ROOT") + parser.add_argument( + "--agents-root", default=None, help="override ATOMIC_AGENTS_ROOT" + ) args = parser.parse_args(argv) agents_root = ( @@ -58,7 +62,12 @@ def main(argv: list[str] | None = None) -> int: work_item=args.work_item, critical=args.critical, ) - except (NotInRoster, SelfDelegationError, CostGuardrailBlocked) as e: + except ( + NotInRoster, + SelfDelegationError, + CostGuardrailBlocked, + MCPRegistryError, + ) as e: print(f"Error: {e}", file=sys.stderr) return 1 except AtomicAgentsError as e: diff --git a/atomic_agents/doctor.py b/atomic_agents/doctor.py index ed569b3..1134d2d 100644 --- a/atomic_agents/doctor.py +++ b/atomic_agents/doctor.py @@ -166,6 +166,7 @@ def run_doctor( "mandate-backend", "policy-backend", "corpus-backend", + "mcp-server-registry-backend", "memory-backend", "write-paths", ): @@ -226,6 +227,7 @@ def run_doctor( results.append(check_policy_backend(resolved_root, cascade=cascade)) results.append(check_persona_backend(resolved_root)) results.append(check_corpus_backend(agent_root)) + results.append(check_mcp_server_registry_backend(agent_root)) results.append(check_memory_backend(agent_root)) results.append(check_write_paths(tools_data, agent_root=agent_root)) @@ -2684,6 +2686,170 @@ def check_corpus_backend(agent_root: Path) -> CheckResult: ) +def check_mcp_server_registry_backend(agent_root: Path) -> CheckResult: + """Operator-config coherence check for the MCP server registry backend (#201 PR 2). + + Validates that ``ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND`` (plus + ``ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND_URL`` when non-filesystem) + is correctly configured: + + * unset / empty / ``filesystem`` → PASS (today's filesystem-default + deployment shape -- no extras needed, no URL needed). + * unknown backend_id (typo or pasted credential) → FAIL with credential- + redacted echo + list of registered ids. Uses ``_redact_for_error_message`` + from ``mcp_registry/__init__.py`` (handles ``://`` URL heuristic AND + ``user:pass@host`` DSN heuristic AND length truncation -- distinct from + the inline truncation in ``check_tool_registry_backend`` which misses + DSN-style values). + * transient probe failure → WARN (matches ``check_provider_keys`` pattern; + doctor does not crash on optional infrastructure). + + Reads ``backend.capabilities`` (property, not method). Detail dict + includes all 5 capability fields plus ``mcp_server_count`` from + ``list_mcp_servers()`` (NOT ``load_all_mcp_servers``, which materializes + resolved env values). + + Mirrors the operator-coherence layer pattern of ``check_lock_backend`` + and ``check_tool_registry_backend``. ``run_doctor`` then exercises the + backend through the agent's actual construction path. + """ + from .mcp_registry import ( + MCPRegistryError, + MCPRegistryUnavailable, + _redact_for_error_message, + get_default_mcp_server_registry_backend, + list_mcp_server_registry_backends, + ) + from .mcp_registry.backend import MCPRegistryDescriptorInvalid + + raw = ( + os.environ.get("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", "").strip().lower() + ) + backend_id = raw if raw else "filesystem" + available = list_mcp_server_registry_backends() + + if backend_id not in available: + safe_id = _redact_for_error_message(raw) + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=( + f"ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND={safe_id!r} is " + f"not a known backend. Available: {available}" + ), + fix_hint=( + f"Set ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND to one of " + f"{available}, or unset it to use the filesystem default." + ), + detail={"safe_backend_id": safe_id, "available_backends": available}, + ) + + try: + backend = get_default_mcp_server_registry_backend(agent_root, []) + except MCPRegistryError as exc: + safe = _redact_for_error_message(str(exc)) + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=f"failed to construct {backend_id!r} backend", + fix_hint=f"check the env vars: {safe}", + detail={"backend_id": backend_id}, + ) + + try: + refs = backend.list_mcp_servers() + except MCPRegistryUnavailable: + return CheckResult( + name="mcp-server-registry-backend", + status=WARN, + message=( + f"operator-pinned backend {backend_id!r} configured but " + f"list_mcp_servers() probe failed" + ), + fix_hint=( + "Verify the catalog is reachable. Doctor warns instead of " + "failing; the framework runtime will fail at first list or " + "load if the backend is truly down." + ), + detail={"backend_id": backend_id}, + ) + except Exception as exc: + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=( + f"backend {backend_id!r} probe raised {type(exc).__name__}: {exc}" + ), + fix_hint="See logs for the exception details.", + detail={"backend_id": backend_id}, + ) + + # Predict agent-construction success: AtomicAgent.__init__ calls + # load_all_mcp_servers() at construction (spec/36 framework invariant). + # list_mcp_servers() above swallows parse errors and returns [], so a + # malformed mcp.md would PASS doctor but crash construction. Probe + # load_all_mcp_servers() here to catch descriptor errors that + # list_mcp_servers() hides. WARN on transient (already caught above); + # FAIL on permanent descriptor invalidity. + try: + backend.load_all_mcp_servers() + except MCPRegistryDescriptorInvalid as exc: + return CheckResult( + name="mcp-server-registry-backend", + status=FAIL, + message=( + f"{backend_id!r} backend has malformed descriptor: " + f"{_redact_for_error_message(str(exc))}" + ), + fix_hint=( + "Fix the descriptor (mcp.md sections require 'command:'). " + "Doctor probes this to predict agent construction; without " + "the fix every AtomicAgent for this agent will fail at " + "construction with MCPRegistryDescriptorInvalid." + ), + detail={"backend_id": backend_id, "mcp_server_count": len(refs)}, + ) + except MCPRegistryUnavailable: + # Transient failure during materialization; the list path already + # returned successfully so the backend is reachable but a server + # spec resolution (env var, validation) failed transiently. Treat + # as WARN, not FAIL: an operator may resolve by exporting the + # missing env var. + return CheckResult( + name="mcp-server-registry-backend", + status=WARN, + message=( + f"backend {backend_id!r} list ok but load_all_mcp_servers() " + f"raised transient failure (env var or validation)" + ), + fix_hint=( + "Verify required env vars are set in the doctor process. " + "Agent construction will fail with the same error until " + "resolved." + ), + detail={"backend_id": backend_id, "mcp_server_count": len(refs)}, + ) + + caps = backend.capabilities + return CheckResult( + name="mcp-server-registry-backend", + status=PASS, + message=( + f"{backend_id} backend ok " + f"({len(refs)} MCP server{'s' if len(refs) != 1 else ''} mounted)" + ), + detail={ + "backend_id": backend.backend_id, + "supports_install": caps.supports_install, + "supports_uninstall": caps.supports_uninstall, + "supports_capability_handshake": caps.supports_capability_handshake, + "supports_audit": caps.supports_audit, + "durable": caps.durable, + "mcp_server_count": len(refs), + }, + ) + + def check_memory_backend(agent_root: Path) -> CheckResult: """FilesystemBackend resolves and stats() returns successfully.""" memory_dir = agent_root / "memory" diff --git a/atomic_agents/dream.py b/atomic_agents/dream.py index cf602ec..f20a85e 100644 --- a/atomic_agents/dream.py +++ b/atomic_agents/dream.py @@ -59,6 +59,7 @@ from .policy import PolicyBackend from .persona import PersonaBackend from .corpus import CorpusBackend + from .mcp_registry import MCPServerRegistryBackend import frontmatter @@ -1174,6 +1175,7 @@ def __init__( policy_backend: "PolicyBackend | None" = None, persona_backend: "PersonaBackend | None" = None, corpus_backend: "CorpusBackend | None" = None, + mcp_server_registry_backend: "MCPServerRegistryBackend | None" = None, ): self.agents_root = Path(agents_root) self.agent_name = agent_name @@ -1307,6 +1309,17 @@ def __init__( # site via # ``AtomicAgent(..., corpus_backend=self._corpus_backend)``. self._corpus_backend = corpus_backend + # spec/36 PR 2 -- MCPServerRegistryBackend stored for API parity with + # OutcomeRunner / EvalRunner. DreamRunner currently makes raw + # LLM calls (``_llm.call_*``) without dispatching agent tools + # -- there is no internal ``AtomicAgent`` construction site to + # thread through in v1. The kwarg exists so an operator wiring + # multiple runners uses ONE signature shape across all runners. + # Future dream pipelines that construct an internal AtomicAgent + # (e.g., for self-reflection cycles) will thread the stored backend + # at that construction site via + # ``AtomicAgent(..., mcp_server_registry_backend=self._mcp_server_registry_backend)``. + self._mcp_server_registry_backend = mcp_server_registry_backend # Resolve model: explicit kwarg > profile.model_config default. # PR 2 Decision 2: pre-resolved model_config is also passed to diff --git a/atomic_agents/eval.py b/atomic_agents/eval.py index 86b61f0..4f945ef 100644 --- a/atomic_agents/eval.py +++ b/atomic_agents/eval.py @@ -34,6 +34,7 @@ from .policy import PolicyBackend from .persona import PersonaBackend from .corpus import CorpusBackend + from .mcp_registry import MCPServerRegistryBackend _log = logging.getLogger(__name__) @@ -132,6 +133,7 @@ def __init__( policy_backend: "PolicyBackend | None" = None, persona_backend: "PersonaBackend | None" = None, corpus_backend: "CorpusBackend | None" = None, + mcp_server_registry_backend: "MCPServerRegistryBackend | None" = None, ): self.agents_root = agents_root or get_agents_root() self.agent_name = agent_name @@ -180,6 +182,14 @@ def __init__( # ``get_default_corpus_backend`` resolution (env var → filesystem # default). self._corpus_backend = corpus_backend + # spec/36 PR 2 -- MCPServerRegistryBackend forwarding. Same threading + # discipline as ``_corpus_backend``. Without this, an operator pinning + # a custom MCP registry backend would silently drop it at the + # EvalRunner→AtomicAgent boundary. + # ``None`` means: defer to the agent's own + # ``get_default_mcp_server_registry_backend`` resolution (env var → + # filesystem default). + self._mcp_server_registry_backend = mcp_server_registry_backend if not self.evals_dir.exists(): raise AtomicAgentsError( @@ -382,6 +392,7 @@ def run_test(self, test_or_id: EvalTest | str) -> EvalResult: policy_backend=self._policy_backend, persona_backend=self._persona_backend, corpus_backend=self._corpus_backend, + mcp_server_registry_backend=self._mcp_server_registry_backend, ) try: agent_response = agent.call(work_item=work_item, write_captures=False) diff --git a/atomic_agents/mcp_registry/filesystem.py b/atomic_agents/mcp_registry/filesystem.py index 4ade288..e0ef2de 100644 --- a/atomic_agents/mcp_registry/filesystem.py +++ b/atomic_agents/mcp_registry/filesystem.py @@ -34,7 +34,6 @@ MCPRegistryDescriptorInvalid, MCPRegistryUnavailable, MCPServerNotInRegistry, - _default_load_all, ) from .types import MCPServerRef, MCPServerRegistryCapabilities, ValidationResult @@ -170,13 +169,15 @@ def list_mcp_servers(self) -> list[MCPServerRef]: try: content = mcp_md.read_text(encoding="utf-8") - except OSError as exc: - _logger.warning( - "FilesystemMCPServerRegistryBackend: cannot read %s: %s", - mcp_md, - exc, - ) + except FileNotFoundError: + # ENOENT: race between exists() check above and read; treat as absent. return [] + except OSError as exc: + # Non-ENOENT (PermissionError, IsADirectoryError, etc.): transient + # configuration error. Mirror load_mcp_server's MCPRegistryUnavailable + # path (filesystem.py lines for load_mcp_server) for symmetry and to + # surface the failure to PR 2's fail-closed wiring in agent.py:__init__. + raise MCPRegistryUnavailable(f"cannot read {mcp_md}: {exc}") from exc try: specs = parse_mcp_md_text( @@ -315,14 +316,63 @@ def load_mcp_server(self, name: str) -> MCPServerSpec: def load_all_mcp_servers(self) -> list[MCPServerSpec]: """Return all mounted ``MCPServerSpec`` instances in bulk. - Delegates to ``_default_load_all`` per prep-pass Theme 4. This - preserves MUST 10 consistency automatically: the output is - semantically equivalent to ``[load_mcp_server(ref.name) for ref in - list_mcp_servers()]`` by construction. - - HTTP backend overrides this with a single bulk GET at PR 4. + Reads mcp.md once and parses, then resolves env vars per spec + (Decision 7). Distinct from the default ``_default_load_all`` (which + iterates ``list_mcp_servers`` then calls ``load_mcp_server`` per ref) + because that pattern masks transient and parse failures: + ``list_mcp_servers`` used to catch all OSError before the PR 2 fix. + The single read-parse here maps: + - ENOENT: empty list (correct -- "no mcp.md is not a probe failure") + - Other OSError: MCPRegistryUnavailable (transient) + - Parse error: MCPRegistryDescriptorInvalid (permanent descriptor problem) + - Env-var unresolvable: MCPServerConnectFailed per spec/19 + + PR 2 framework-level fail-closed semantic (spec/36) depends on these + distinct exceptions surfacing correctly. """ - return _default_load_all(self) + mcp_md = self._agent_root / "mcp.md" + if not mcp_md.exists(): + return [] + + try: + content = mcp_md.read_text(encoding="utf-8") + except FileNotFoundError: + return [] + except OSError as exc: + raise MCPRegistryUnavailable(f"cannot read {mcp_md}: {exc}") from exc + + try: + specs = parse_mcp_md_text( + content, + mcp_md_path=mcp_md, + read_paths=None, + resolve_env=False, + ) + except Exception as exc: + raise MCPRegistryDescriptorInvalid( + f"mcp.md at {mcp_md} could not be parsed: {exc}" + ) from exc + + materialized = [] + for spec in specs: + try: + _validate_server_name(spec.name) + except ValueError: + _logger.warning( + "FilesystemMCPServerRegistryBackend: skipping malformed " + "section name %r in mcp.md (failed charset validation)", + spec.name, + ) + continue + resolved_env = _resolve_env_vars(spec.env, spec.name) + materialized_spec = replace(spec, env=resolved_env) + if self._read_paths: + validate_mcp_server_args(materialized_spec, self._read_paths) + materialized.append(materialized_spec) + + # Sort lexicographically per MUST 5 (consistent with list_mcp_servers). + materialized.sort(key=lambda s: s.name) + return materialized def validate(self, name: str) -> ValidationResult: """Static check of the named server descriptor. diff --git a/atomic_agents/outcome.py b/atomic_agents/outcome.py index 053a6f9..9861db5 100644 --- a/atomic_agents/outcome.py +++ b/atomic_agents/outcome.py @@ -44,6 +44,7 @@ from .policy import PolicyBackend from .persona import PersonaBackend from .corpus import CorpusBackend + from .mcp_registry import MCPServerRegistryBackend _log = logging.getLogger(__name__) @@ -131,6 +132,7 @@ def __init__( policy_backend: "PolicyBackend | None" = None, persona_backend: "PersonaBackend | None" = None, corpus_backend: "CorpusBackend | None" = None, + mcp_server_registry_backend: "MCPServerRegistryBackend | None" = None, ): self.agents_root = Path(agents_root) if agents_root else get_agents_root() self.agent_name = agent_name @@ -190,6 +192,14 @@ def __init__( # ``get_default_corpus_backend`` resolution (env var → filesystem # default). self._corpus_backend = corpus_backend + # spec/36 PR 2 -- MCPServerRegistryBackend forwarding. Same threading + # discipline as ``_corpus_backend``. Without this, an operator pinning + # a custom MCP registry backend would silently drop it at the + # OutcomeRunner→AtomicAgent boundary. + # ``None`` means: defer to the agent's own + # ``get_default_mcp_server_registry_backend`` resolution (env var → + # filesystem default). + self._mcp_server_registry_backend = mcp_server_registry_backend if not self.agent_root.exists(): raise AtomicAgentsError( @@ -275,6 +285,7 @@ def run( policy_backend=self._policy_backend, persona_backend=self._persona_backend, corpus_backend=self._corpus_backend, + mcp_server_registry_backend=self._mcp_server_registry_backend, ) # Resolve judge model: explicit > cross-family via eval config > pick_judge_model fallback diff --git a/atomic_agents/profile/filesystem.py b/atomic_agents/profile/filesystem.py index 9e6596d..f209aab 100644 --- a/atomic_agents/profile/filesystem.py +++ b/atomic_agents/profile/filesystem.py @@ -384,6 +384,12 @@ def load_profile(self, agent_id: str) -> AgentProfile: mcp_servers = [] else: mcp_servers = [] + # Sort lexicographically by name (locked decision Q1 from PR 2 prep). + # Aligns this path with FilesystemMCPServerRegistryBackend.list_mcp_servers() + # which already sorts (mcp_registry/filesystem.py). spec/36 MUST 5 + # applies to all backends consistently; the pre-#201 declaration order + # was an implementation detail of parse_mcp_md_text, not a contract. + mcp_servers = sorted(mcp_servers, key=lambda s: s.name) # ── persona/IDENTITY.md, SOUL.md, USER.md — raw text ── # When persona.link.md is present (link_present is True), the @@ -444,6 +450,7 @@ def load_profile(self, agent_id: str) -> AgentProfile: judges_md_raw=judges_md_raw, roster_md_raw=roster_md_raw, mcp_md_raw=mcp_md_raw, + mcp_servers_resolved=[], # populated by framework integration layer in agent.py ) # ──────────────────────────────────────────────────────────── diff --git a/atomic_agents/profile/sqlite.py b/atomic_agents/profile/sqlite.py index 5819339..fcd77f0 100644 --- a/atomic_agents/profile/sqlite.py +++ b/atomic_agents/profile/sqlite.py @@ -322,6 +322,10 @@ def load_profile(self, agent_id: str) -> AgentProfile: raise AgentProfileNotFound( f"agent {agent_id!r} profile_json is corrupt: {exc}" ) from exc + # mcp_servers_resolved rides through the profile_json blob via + # to_dict()/from_dict(). No schema change needed per locked design + # (Decision 1 of #63 PR 3 + #201 PR 2 prep pass verification). + # Schema version stays at v2 (the persona migration version from #62). return AgentProfile.from_dict(profile_dict) # ──────────────────────────────────────────────────────────── diff --git a/atomic_agents/profile/types.py b/atomic_agents/profile/types.py index 40b4b41..8ef1054 100644 --- a/atomic_agents/profile/types.py +++ b/atomic_agents/profile/types.py @@ -39,7 +39,7 @@ from __future__ import annotations -from dataclasses import dataclass, replace +from dataclasses import dataclass, field, replace from typing import Any from ..exceptions import MCPServerConnectFailed @@ -145,6 +145,21 @@ class AgentProfile: preserves ``$VAR_NAME`` env-var references verbatim; saving from this string never bakes resolved secrets into the on-disk file. + mcp_servers_resolved: List of ``MCPServerSpec`` instances + populated by the framework integration layer in + ``agent.py:__init__`` via ``dataclasses.replace()`` AFTER + ``load_profile()`` returns and BEFORE ``_load_config()`` + builds the AgentConfig. The source is + ``MCPServerRegistryBackend.load_all_mcp_servers()``, making + this field substrate-agnostic by construction. Backends MUST + NOT populate this field; they own ``mcp_servers`` only. + Default is ``[]``. + + This field is always serialized as ``[]`` in ``to_dict()`` + to keep resolved MCP env secrets out of snapshot JSON files + on disk. It re-populates from the registry backend at next + agent construction. See spec/36 Decision 9 and spec/24 D1 + addendum (#201 PR 2 of 5). """ # Required identity @@ -173,6 +188,16 @@ class AgentProfile: roster_md_raw: str mcp_md_raw: str + # Resolved MCP server specs from the registry backend (#201 PR 2 of 5). + # Populated by the framework integration layer in agent.py:__init__ via + # dataclasses.replace() AFTER load_profile() returns. Backends do NOT + # populate this field (layer separation per spec/24 Decision 7). + # The field is always serialized as [] in to_dict() (locked decision Q2 + # from prep pass) to keep resolved MCP secrets out of snapshot files on + # disk. It re-populates from the registry backend at next agent + # construction. spec/36 Decision 9; spec/24 D1 addendum. + mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list) + def to_dict(self) -> dict[str, Any]: """Serialize to a plain dict suitable for ``json.dumps`` / database column storage. @@ -199,6 +224,13 @@ def to_dict(self) -> dict[str, Any]: "judges_config": _judges_config_to_dict(self.judges_config), "roster": list(self.roster), "mcp_servers": [_mcp_spec_to_dict(s) for s in self.mcp_servers], + # Always serialize as [] (locked decision Q2 from PR 2 prep + # pass). The field is a framework-populated runtime transient; + # serializing real values would write resolved MCP env secrets + # into snapshot JSON files on disk, which contradicts spec/24 + # Decision 1's intent. The field re-populates from the + # registry backend at next agent construction. + "mcp_servers_resolved": [], "persona_identity": self.persona_identity, "persona_soul": self.persona_soul, "persona_user": self.persona_user, @@ -342,9 +374,26 @@ def from_dict(cls, d: dict[str, Any]) -> "AgentProfile": # Env-var unresolvable in this process — fall back to # the dict's structured form (best-effort). Raw text # is preserved for write-back regardless. - mcp_servers = list(d.get("mcp_servers") or []) + # Use _mcp_spec_from_dict to reconstruct MCPServerSpec + # instances (fixes the pre-existing latent bug where + # the fallback path returned raw dicts instead of + # MCPServerSpec instances). + mcp_servers = [ + _mcp_spec_from_dict(s) for s in (d.get("mcp_servers") or []) + ] else: - mcp_servers = list(d.get("mcp_servers") or []) + mcp_servers = [_mcp_spec_from_dict(s) for s in (d.get("mcp_servers") or [])] + + # mcp_servers_resolved is a runtime transient populated by + # agent.py:__init__. The to_dict() path always emits [] for + # security (locked Q2). When deserializing a dict that DOES + # contain the field (e.g. a future un-clamped snapshot or a + # direct test dict), reconstruct MCPServerSpec instances via + # _mcp_spec_from_dict for correctness. In normal operation + # this will always be an empty list. + mcp_servers_resolved = [ + _mcp_spec_from_dict(s) for s in (d.get("mcp_servers_resolved") or []) + ] return cls( name=name, @@ -364,6 +413,7 @@ def from_dict(cls, d: dict[str, Any]) -> "AgentProfile": judges_md_raw=judges_md_raw, roster_md_raw=roster_md_raw, mcp_md_raw=mcp_md_raw, + mcp_servers_resolved=mcp_servers_resolved, ) def replace(self, **changes: Any) -> "AgentProfile": @@ -564,3 +614,24 @@ def _mcp_spec_to_dict(spec: MCPServerSpec) -> dict[str, Any]: "transport": spec.transport, "description": spec.description, } + + +def _mcp_spec_from_dict(d: dict) -> MCPServerSpec: + """Reconstruct ``MCPServerSpec`` from a dict produced by ``_mcp_spec_to_dict``. + + Used by ``AgentProfile.from_dict`` for both ``mcp_servers`` and + ``mcp_servers_resolved``. Closes a pre-existing latent bug where the + ``mcp_servers`` fallback path returned raw dicts instead of + ``MCPServerSpec`` instances. + + Extra keys in ``d`` are silently ignored for forward-compatibility. + Missing optional keys fall back to ``MCPServerSpec`` field defaults. + """ + return MCPServerSpec( + name=d["name"], + command=d["command"], + args=list(d.get("args", [])), + env=dict(d.get("env", {})), + transport=d.get("transport", "stdio"), + description=d.get("description", ""), + ) diff --git a/docs/spec/24-agent-profile-backend.md b/docs/spec/24-agent-profile-backend.md index 64e5d3a..95e518d 100644 --- a/docs/spec/24-agent-profile-backend.md +++ b/docs/spec/24-agent-profile-backend.md @@ -58,6 +58,16 @@ The `AgentProfile` dataclass carries **both** the structured form (`model_config **There is no `renderers.py`.** The filesystem backend writes raw text via `_io.atomic_write`. A future canonical render layer (if a database backend needs to export back to markdown for migration) is the right place for renderers, with eyes-open about the loss surface. +**D1 addendum (#201 PR 2 of 5):** In addition to `mcp_servers` (the declared server set parsed from mcp.md, or the equivalent for the active profile backend), `AgentProfile` carries a sibling field `mcp_servers_resolved: list[MCPServerSpec]` with a default of `[]`. + +This field is populated by the framework integration layer in `agent.py:__init__` via `dataclasses.replace()` AFTER `load_profile()` returns and BEFORE `_load_config()` builds the AgentConfig. The source is `MCPServerRegistryBackend.load_all_mcp_servers()` (spec/36 Decision 9 framework invariant), making the field substrate-agnostic by construction. Backends MUST NOT populate this field themselves; they own `mcp_servers` (their substrate's parse output) only. + +For backward compatibility, `mcp_md_raw` and `mcp_servers` remain the canonical filesystem-backend write-back fields. The Decision 1 invariant (no resolved secrets in `mcp_md_raw`) stays intact. + +**Snapshot serialization (PR 2 locked):** `AgentProfile.to_dict()` always emits `"mcp_servers_resolved": []` regardless of the field's runtime value. The field is a framework-populated runtime transient that re-populates from the registry backend at next agent construction. Serializing real values would write resolved MCP `env` secrets into snapshot JSON files on disk, contradicting the security intent of Decision 1's raw-text shadow design. Audit of "what specs ran at time T" flows through agent.call() logs (which can redact appropriately), not through snapshots. + +**SQLite schema (PR 2 verified):** The new field rides through the existing `profile_json` blob column via `to_dict()`/`from_dict()`. Zero schema migration required. Schema version stays at v2 (the persona migration version from #62). Backward compat: old profiles loaded with `from_dict()` get `mcp_servers_resolved=[]` via the field's `default_factory=list` since the JSON dict lacks the key. + ### Decision 2: Skills are separate Protocol methods, not a profile field `AgentProfileBackend.list_skills(agent_id)` and `load_skill_body(agent_id, skill_name)` are separate Protocol methods. `AgentProfile` does NOT carry a `skills: list[SkillManifest]` field. diff --git a/docs/spec/36-mcp-server-registry-backend.md b/docs/spec/36-mcp-server-registry-backend.md index 3601fbe..9a8c977 100644 --- a/docs/spec/36-mcp-server-registry-backend.md +++ b/docs/spec/36-mcp-server-registry-backend.md @@ -596,7 +596,7 @@ The MUST count is 10 because the static-vs-dynamic capability distinction (Decis ### PR 2: agent.py wiring + audit/profile sibling field + IRON RULE regression + doctor **Code:** -- `atomic_agents/agent.py:_load_config()`: resolve backend from env vars and constructor kwarg; default to `FilesystemMCPServerRegistryBackend(agent_root, read_paths)`. Call `backend.load_all_mcp_servers()` to materialize the spec list; populate `AgentProfile.mcp_servers_resolved` BEFORE `MCPClientPool` consumes the specs. Re-raise `MCPRegistryUnavailable` on probe failure (fail-closed per Decision 1; framework-level invariant). +- `atomic_agents/agent.py:__init__` (after `profile_backend.load_profile()` at line 496 and before `_load_config()` is called at line 770): resolve backend from env vars and constructor kwarg; default to `FilesystemMCPServerRegistryBackend(agent_root, read_paths)` via `get_default_mcp_server_registry_backend(agent_root, self._profile.tool_config['read_paths'])`. Call `backend.load_all_mcp_servers()` to materialize the spec list; populate `AgentProfile.mcp_servers_resolved` via `dataclasses.replace(self._profile, mcp_servers_resolved=materialized)` BEFORE `MCPClientPool` consumes them at the `call()` site. Re-raise `MCPRegistryUnavailable` on probe failure (fail-closed per Decision 1; framework-level invariant). Note: the original spec text said `_load_config()` as the wiring location; the correct location is `__init__` because `read_paths` is available from the loaded profile and `_load_config()` is a pure reader that must not mutate `self._profile`. - `atomic_agents/profile/types.py`: `AgentProfile.mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list)` (Decision 9 sibling field). - `docs/spec/24-agent-profile-backend.md`: D1 addendum documenting the sibling field; `mcp_md_raw` stays as filesystem-backend backward-compat. - `OutcomeRunner` / `EvalRunner` / `DreamRunner` per-runner kwargs. diff --git a/tests/test_mcp_server_registry_doctor.py b/tests/test_mcp_server_registry_doctor.py new file mode 100644 index 0000000..938327e --- /dev/null +++ b/tests/test_mcp_server_registry_doctor.py @@ -0,0 +1,196 @@ +"""Tests for ``doctor.check_mcp_server_registry_backend`` (#201 PR 2). + +Cross-model review army surfaced this coverage hole: Testing specialist C4 +plus Security specialist plus Codex adversarial all independently flagged +that the new doctor check has no unit tests. Every parallel backend doctor +check (lock, log, profile, tool registry, persona, corpus) ships with 4-5 +dedicated tests. PR 1's P0 secret leak in ``mcp-registry show`` was caught +in part because the other backends had redaction tests. This file mirrors +that pattern so future regressions in the credential-redaction path or in +the descriptor-invalid probe (added per cross-model finding F6) are caught. + +Covers: +- PASS path: filesystem default with no mcp.md. +- PASS path: filesystem default with valid mcp.md. +- FAIL path: unknown backend_id where the env var value is a pasted URL with + credentials. The redaction MUST strip them. +- FAIL path: descriptor invalid (malformed mcp.md). This is the new probe + added in PR 2 to close the doctor-false-PASS gap. +- Capability snapshot completeness: detail dict includes every capability + field plus mcp_server_count. +""" + +from __future__ import annotations + +import pathlib +from textwrap import dedent +from unittest.mock import MagicMock + +import pytest + +from atomic_agents.doctor import ( + FAIL, + PASS, + check_mcp_server_registry_backend, +) +from atomic_agents.mcp_registry import ( + MCPRegistryDescriptorInvalid, + MCPServerRegistryBackend, +) + + +def _make_agent_root(tmp_path: pathlib.Path) -> pathlib.Path: + """Create a minimal agent dir for the doctor check.""" + agent_root = tmp_path / "agent" + agent_root.mkdir(parents=True) + return agent_root + + +def test_doctor_pass_filesystem_default_no_mcp_md( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """PASS when env var unset and no mcp.md exists. + + Zero-server deployments are normal (the common home-user shape) and + must report PASS, not WARN. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == PASS + assert "filesystem" in result.message + assert result.detail["backend_id"] == "filesystem" + assert result.detail["mcp_server_count"] == 0 + + +def test_doctor_pass_filesystem_default_with_valid_mcp_md( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """PASS when valid mcp.md is present. + + list_mcp_servers + load_all_mcp_servers both succeed; the count + reflects what is mounted. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + (agent_root / "mcp.md").write_text( + dedent( + """\ + ## myserver + command: echo + + args: + - hello + """ + ), + encoding="utf-8", + ) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == PASS + assert result.detail["mcp_server_count"] == 1 + + +def test_doctor_fail_unknown_backend_id_redacts_url_credentials( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """FAIL with URL credentials stripped when operator pastes a URL. + + PR 1's P0 class of bug: operator accidentally sets the BACKEND env var + to a full URL instead of the _URL variant. The FAIL message MUST NOT + echo the credential. This regression test pins ``_redact_for_error_message`` + usage (the helper handles ``://`` scheme heuristic AND DSN-style + ``user:pass@host`` patterns AND length truncation; the inline + truncation in ``check_tool_registry_backend`` misses DSN-style values). + """ + monkeypatch.setenv( + "ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", + "https://admin:supersecret@catalog.example.com/mcp", + ) + agent_root = _make_agent_root(tmp_path) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == FAIL + # Credentials MUST NOT appear anywhere. + rendered = result.message + " " + result.fix_hint + " " + repr(result.detail) + assert "supersecret" not in rendered + assert "admin" not in rendered + # The redacted form is present. + assert "https://..." in rendered + + +def test_doctor_fail_descriptor_invalid_predicts_construction_failure( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """FAIL on malformed mcp.md so doctor predicts agent construction failure. + + Pre-fix: doctor probed only ``list_mcp_servers()`` which swallowed parse + errors and returned ``[]``. The agent at construction calls + ``load_all_mcp_servers()`` which raises ``MCPRegistryDescriptorInvalid``. + Operator runs doctor, sees PASS, constructs agent, agent crashes. + + Cross-model review army (Codex adversarial Medium + Claude adversarial + F6 + Testing specialist) all flagged this as the highest-priority real + correctness bug in the PR. The fix adds a ``load_all_mcp_servers()`` + probe after ``list_mcp_servers()`` succeeds. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + # Inject a backend whose list_mcp_servers succeeds (returns []) but + # load_all_mcp_servers raises MCPRegistryDescriptorInvalid. This is + # the exact divergence cross-model review caught: list swallows parse + # errors, load_all surfaces them. + fake_backend = MagicMock(spec=MCPServerRegistryBackend) + fake_backend.backend_id = "filesystem" + fake_backend.list_mcp_servers.return_value = [] + fake_backend.load_all_mcp_servers.side_effect = MCPRegistryDescriptorInvalid( + "mcp.md at /tmp/agent/mcp.md could not be parsed: malformed YAML" + ) + # Patch the factory the doctor calls so the fake backend is used. + import atomic_agents.mcp_registry as mcp_registry_pkg + + monkeypatch.setattr( + mcp_registry_pkg, + "get_default_mcp_server_registry_backend", + lambda agent_root, read_paths: fake_backend, + ) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == FAIL + assert "malformed descriptor" in result.message + assert "construction" in result.fix_hint + # backend_id appears so operator knows which backend is broken. + assert "filesystem" in result.message + + +def test_doctor_pass_capability_snapshot_includes_all_fields( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """PASS detail dict carries every MCPServerRegistryCapabilities field. + + Operator-facing JSON output for ``atomic-agents doctor --json`` reads + this dict. A future capability field added to + ``MCPServerRegistryCapabilities`` without an update here would be + silently absent from the snapshot, which is a documentation drift + pattern. + """ + monkeypatch.delenv("ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND", raising=False) + agent_root = _make_agent_root(tmp_path) + result = check_mcp_server_registry_backend(agent_root) + assert result.status == PASS + expected_keys = { + "backend_id", + "supports_install", + "supports_uninstall", + "supports_capability_handshake", + "supports_audit", + "durable", + "mcp_server_count", + } + assert expected_keys.issubset(result.detail.keys()) + # Capability values come from FilesystemMCPServerRegistryBackend + # capabilities at PR 2 baseline: install/uninstall False (filesystem + # writes ship at PR 3), capability_handshake False (HTTP only), + # audit False, durable True. + assert result.detail["supports_install"] is False + assert result.detail["supports_uninstall"] is False + assert result.detail["supports_capability_handshake"] is False + assert result.detail["supports_audit"] is False + assert result.detail["durable"] is True diff --git a/tests/test_mcp_server_registry_filesystem_backend.py b/tests/test_mcp_server_registry_filesystem_backend.py index 3015813..8670389 100644 --- a/tests/test_mcp_server_registry_filesystem_backend.py +++ b/tests/test_mcp_server_registry_filesystem_backend.py @@ -8,7 +8,7 @@ - mcp.md parse semantics - MCPServerRef.source field population - Default LockBackend construction -- load_all_mcp_servers delegation to _default_load_all +- load_all_mcp_servers single-read-parse behavior (#201 PR 2 ENOENT fix) Per spec/36, prep finding B-F6, and the filesystem-specific test shape in test_corpus_filesystem_backend.py. @@ -16,7 +16,6 @@ from __future__ import annotations -import os from pathlib import Path from textwrap import dedent from unittest.mock import MagicMock, patch @@ -29,7 +28,6 @@ MCPRegistryUnavailable, MCPServerNotInRegistry, ) -from atomic_agents.mcp_registry.backend import _default_load_all from atomic_agents.mcp_registry.types import ValidationResult from atomic_agents.mcp import MCPServerSpec, parse_mcp_md_text from atomic_agents.exceptions import MCPServerConnectFailed @@ -368,11 +366,16 @@ def test_malformed_section_no_command_skipped_silently(tmp_path: Path) -> None: assert "no-command-server" not in names -def test_load_all_mcp_servers_delegates_to_default_load_all(tmp_path: Path) -> None: - """load_all_mcp_servers delegates to _default_load_all. +def test_load_all_mcp_servers_returns_specs_via_single_read_parse( + tmp_path: Path, +) -> None: + """load_all_mcp_servers returns fully materialized specs via single read-parse. - spec/36 MUST 10 / prep finding Theme 4. Verified by patching _default_load_all - in the backend module and asserting it was called. + PR 2 replaces the _default_load_all delegation with a direct single + read-parse so parse failures (MCPRegistryDescriptorInvalid) and transient + OSError (MCPRegistryUnavailable) propagate correctly to the fail-closed + wiring in agent.py:__init__. Verified by asserting output correctness + (behavior contract) rather than patching internals. """ agent_root = tmp_path / "delegation-agent" agent_root.mkdir() @@ -380,24 +383,11 @@ def test_load_all_mcp_servers_delegates_to_default_load_all(tmp_path: Path) -> N _write_mcp_md_from_specs(agent_root, [spec]) backend = FilesystemMCPServerRegistryBackend(agent_root, []) - call_count = {"n": 0} - original = _default_load_all - - def tracking_default_load_all(b): - call_count["n"] += 1 - return original(b) + result = backend.load_all_mcp_servers() - with patch( - "atomic_agents.mcp_registry.filesystem._default_load_all", - side_effect=tracking_default_load_all, - ): - result = backend.load_all_mcp_servers() - - assert call_count["n"] == 1, ( - "load_all_mcp_servers must delegate to _default_load_all" - ) assert len(result) == 1 assert result[0].name == "delegate-server" + assert isinstance(result[0], MCPServerSpec) # ────────────────────────────────────────────────────────────────────────────── diff --git a/tests/test_mcp_server_registry_migration_regression.py b/tests/test_mcp_server_registry_migration_regression.py new file mode 100644 index 0000000..5c1a03d --- /dev/null +++ b/tests/test_mcp_server_registry_migration_regression.py @@ -0,0 +1,373 @@ +"""IRON RULE byte-identity regression suite for MCPServerRegistryBackend (#201 PR 2). + +Pins pre-#201 behavior when no backend is configured (the default +deployment shape). Mirrors the spec/34 IRON RULE precedent at +tests/test_corpus_migration_regression.py. + +The C-series tests reference AgentProfile.mcp_servers_resolved, which +is added in PR 2 by Stream 2 (profile/types.py). These tests fail +until Stream 2's changes are merged. The cross-stream dependency is +documented; the final pre-ship verification runs the full suite with +both streams merged. + +Tests assume sorted order on multi-server mcp.md files because PR 2's +Stream 2 sorts the filesystem profile backend's parse_mcp_md_text +output lexicographically (locked decision Q1 from prep pass) to align +with spec/36 MUST 5 across all backends. +""" + +from __future__ import annotations + +import pathlib +from textwrap import dedent +from unittest.mock import MagicMock + +import pytest + +from atomic_agents import AtomicAgent +from atomic_agents.mcp_registry import ( + FilesystemMCPServerRegistryBackend, + MCPRegistryUnavailable, +) +from atomic_agents.profile import AgentProfile + + +# ────────────────────────────────────────────────────────────────────────────── +# Fixtures + + +def _make_agent_root(tmp_path: pathlib.Path, name: str = "test-agent") -> pathlib.Path: + """Create a minimal agent directory for AtomicAgent construction.""" + agent_root = tmp_path / "agents" / name + agent_root.mkdir(parents=True, exist_ok=True) + (agent_root / "memory").mkdir(exist_ok=True) + persona_dir = agent_root / "persona" + persona_dir.mkdir(exist_ok=True) + (persona_dir / "IDENTITY.md").write_text( + "# Test Agent\n\nA minimal test persona.\n", encoding="utf-8" + ) + return agent_root + + +def _write_mcp_md(agent_root: pathlib.Path, content: str) -> None: + """Write mcp.md content to agent_root/mcp.md.""" + (agent_root / "mcp.md").write_text(content, encoding="utf-8") + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 1: Empty mcp.md produces empty spec list + + +def test_empty_mcp_md_produces_empty_specs(tmp_path: pathlib.Path) -> None: + """An empty mcp.md yields config.mcp_servers == []. + + Pre-#201 behavior: empty mcp.md means no servers. This invariant must + be preserved after wiring the MCPServerRegistryBackend. + """ + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, "") + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + assert agent.config.mcp_servers == [] + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 2: Missing mcp.md produces empty spec list + + +def test_missing_mcp_md_produces_empty_specs(tmp_path: pathlib.Path) -> None: + """No mcp.md file yields config.mcp_servers == []. + + Pre-#201 behavior: absent mcp.md is equivalent to no servers. + """ + _make_agent_root(tmp_path) + # No mcp.md written at all. + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + assert agent.config.mcp_servers == [] + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 3: Single server without env vars produces byte-identical specs + + +def test_single_server_no_env_vars_specs_byte_identical( + tmp_path: pathlib.Path, +) -> None: + """One server declared in mcp.md matches the parse_mcp_md_text baseline. + + Ensures that the wiring layer does not alter the spec values produced by + the filesystem parser for a plain (no env-var substitution) server entry. + """ + from atomic_agents.mcp import parse_mcp_md_text + + mcp_content = dedent("""\ + # MCP servers + + ## github + command: npx + args: -y, @modelcontextprotocol/server-github + description: GitHub integration + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + # Baseline: direct parse with no wiring. + baseline_specs = parse_mcp_md_text(mcp_content) + + # Agent construction path. + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + # config.mcp_servers must match the baseline parse exactly. + assert len(agent.config.mcp_servers) == len(baseline_specs) + for actual, expected in zip(agent.config.mcp_servers, baseline_specs): + assert actual.name == expected.name + assert actual.command == expected.command + assert actual.args == expected.args + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 4: Multiple servers returned in sorted order + + +def test_multiple_servers_sorted_order(tmp_path: pathlib.Path) -> None: + """Three servers declared non-alphabetically come back sorted lexicographically. + + spec/36 MUST 5 requires consistent lexicographic order. This test also + verifies that config.mcp_servers (the pre-#201 audit path) returns the + same ordering so no existing log/audit consumer sees a sort change. + """ + mcp_content = dedent("""\ + # MCP servers + + ## zebra + command: npx + args: zebra-mcp + + ## alpha + command: npx + args: alpha-mcp + + ## middle + command: npx + args: middle-mcp + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + names = [s.name for s in agent.config.mcp_servers] + # All three declared servers must be present. + assert set(names) == {"zebra", "alpha", "middle"} + # Lexicographic order is enforced by the filesystem backend. + assert names == sorted(names) + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 5: Env-var resolved specs match expected substitution + + +def test_env_var_resolved_specs_equal_pre_201( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """$GITHUB_PAT in mcp.md is resolved at load time, matching pre-#201 behavior.""" + monkeypatch.setenv("GITHUB_PAT", "ghp_testtoken123") + + mcp_content = dedent("""\ + # MCP servers + + ## github + command: npx + args: -y, @modelcontextprotocol/server-github + env: GITHUB_PERSONAL_ACCESS_TOKEN=$GITHUB_PAT + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + assert len(agent.config.mcp_servers) == 1 + spec = agent.config.mcp_servers[0] + assert spec.env.get("GITHUB_PERSONAL_ACCESS_TOKEN") == "ghp_testtoken123" + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 6: Explicit FilesystemMCPServerRegistryBackend kwarg yields same result as default + + +def test_config_mcp_servers_unaffected_by_explicit_backend_kwarg( + tmp_path: pathlib.Path, +) -> None: + """Passing the filesystem backend explicitly gives the same config.mcp_servers as default. + + Ensures the explicit-kwarg path does not alter the audit/log field + (config.mcp_servers stays the filesystem-parse result). + """ + mcp_content = dedent("""\ + # MCP servers + + ## myserver + command: node + args: server.js + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + agents_root = tmp_path / "agents" + + # Default resolution path. + agent_default = AtomicAgent(name="test-agent", agents_root=agents_root) + + # Explicit filesystem backend passed via kwarg. + explicit_backend = FilesystemMCPServerRegistryBackend(agent_root, []) + agent_explicit = AtomicAgent( + name="test-agent", + agents_root=agents_root, + mcp_server_registry_backend=explicit_backend, + ) + + # Both must agree on config.mcp_servers. + default_names = [s.name for s in agent_default.config.mcp_servers] + explicit_names = [s.name for s in agent_explicit.config.mcp_servers] + assert default_names == explicit_names + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 7: mcp_servers_resolved field populated after construction +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_servers_resolved_field_populated_after_construction( + tmp_path: pathlib.Path, +) -> None: + """agent._profile.mcp_servers_resolved contains the materialized spec list.""" + mcp_content = dedent("""\ + # MCP servers + + ## testserver + command: python + args: -m, testserver + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + assert len(resolved) == 1 + assert resolved[0].name == "testserver" + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 8: config.mcp_servers equals profile.mcp_servers_resolved +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_servers_config_equals_profile_resolved( + tmp_path: pathlib.Path, +) -> None: + """config.mcp_servers and profile.mcp_servers_resolved agree on the default path. + + On the default filesystem path, both fields must contain the same server list + (same names, same order). This is the IRON RULE: the pool input source and + the audit field agree when using the default backend. + """ + mcp_content = dedent("""\ + # MCP servers + + ## bravo + command: npx + args: bravo-mcp + + ## alpha + command: npx + args: alpha-mcp + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + config_names = [s.name for s in agent.config.mcp_servers] + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + resolved_names = [s.name for s in resolved] + assert config_names == resolved_names + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 9: mcp_pool specs match resolved list at construction time +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_pool_specs_byte_identical_to_resolved_list( + tmp_path: pathlib.Path, +) -> None: + """The resolved list populated at construction matches the profile field. + + mcp_pool is initialized lazily at call() time; this test verifies that the + mcp_servers_resolved field on the profile is populated correctly at + construction so the pool will receive the right specs when call() runs. + """ + mcp_content = dedent("""\ + # MCP servers + + ## poolserver + command: node + args: pool-server.js + """) + agent_root = _make_agent_root(tmp_path) + _write_mcp_md(agent_root, mcp_content) + + agent = AtomicAgent(name="test-agent", agents_root=tmp_path / "agents") + + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + # The resolved list must be non-empty and contain "poolserver". + assert any(s.name == "poolserver" for s in resolved) + + +# ────────────────────────────────────────────────────────────────────────────── +# Test 10: MCPRegistryUnavailable raised at construction, pool never built + + +def test_mcp_registry_unavailable_raises_at_construction( + tmp_path: pathlib.Path, +) -> None: + """When the backend's load_all_mcp_servers raises, AtomicAgent raises too. + + The MCPClientPool must not be constructed (fail-closed: no subprocess + overhead before the error is surfaced). + """ + _make_agent_root(tmp_path) + + failing_backend = MagicMock() + failing_backend.backend_id = "failing" + failing_backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable( + "simulated backend failure" + ) + + with pytest.raises(MCPRegistryUnavailable): + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=failing_backend, + ) + + # load_all_mcp_servers was called exactly once (the probe attempt). + failing_backend.load_all_mcp_servers.assert_called_once() diff --git a/tests/test_mcp_server_registry_wiring.py b/tests/test_mcp_server_registry_wiring.py new file mode 100644 index 0000000..16072a6 --- /dev/null +++ b/tests/test_mcp_server_registry_wiring.py @@ -0,0 +1,684 @@ +"""Tests for MCPServerRegistryBackend wiring (#201 PR 2). + +Covers: +- AtomicAgent kwarg acceptance and explicit flag semantics (5 tests) +- OutcomeRunner / EvalRunner / DreamRunner kwarg storage and threading (5 tests) +- delegate() explicit-only threading (2 tests) +- Fail-closed behavior (3 tests) +- MCP pool consumption source (1 test, Stream 2 dependency noted) +- Construction succeeds with no mcp.md (1 test) +- delegate.py CLI MCPRegistryError catch (1 test) +- profile.mcp_servers_resolved population (2 tests) + +Total: 20 tests. Tests 16, 19, 20 depend on Stream 2's AgentProfile.mcp_servers_resolved +field. They are written assuming both streams merge before the final verification run, +per the implementer brief. Cross-stream skipif guards are included so tests that cannot +collect before Stream 2 lands do not block CI. +""" + +from __future__ import annotations + +import pathlib +from textwrap import dedent +from unittest.mock import MagicMock, patch + +import pytest + +from atomic_agents import AtomicAgent +from atomic_agents.mcp_registry import ( + MCPRegistryDescriptorInvalid, + MCPRegistryUnavailable, + MCPServerRegistryBackend, +) +from atomic_agents.mcp import MCPServerSpec +from atomic_agents.profile import AgentProfile + + +# ────────────────────────────────────────────────────────────────────────────── +# Helpers + + +def _make_agent_root(tmp_path: pathlib.Path, name: str = "test-agent") -> pathlib.Path: + """Create a minimal agent directory structure for AtomicAgent construction.""" + agent_root = tmp_path / "agents" / name + agent_root.mkdir(parents=True, exist_ok=True) + (agent_root / "memory").mkdir(exist_ok=True) + persona_dir = agent_root / "persona" + persona_dir.mkdir(exist_ok=True) + (persona_dir / "IDENTITY.md").write_text( + "# Test Agent\n\nA minimal test persona.\n", encoding="utf-8" + ) + return agent_root + + +def _write_mcp_md(agent_root: pathlib.Path, content: str) -> None: + """Write mcp.md content to agent_root/mcp.md.""" + (agent_root / "mcp.md").write_text(content, encoding="utf-8") + + +def _make_mock_backend(specs: list[MCPServerSpec] | None = None) -> MagicMock: + """Return a mock MCPServerRegistryBackend that returns the given specs.""" + backend = MagicMock(spec=MCPServerRegistryBackend) + backend.backend_id = "mock" + backend.load_all_mcp_servers.return_value = specs or [] + return backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 1. AtomicAgent accepts mcp_server_registry_backend kwarg + + +def test_atomic_agent_accepts_mcp_server_registry_backend_kwarg( + tmp_path: pathlib.Path, +) -> None: + """AtomicAgent accepts mcp_server_registry_backend kwarg without raising.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + assert agent.mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 2. Default factory called when kwarg is None + + +def test_default_factory_called_when_kwarg_none( + tmp_path: pathlib.Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """When no kwarg supplied, get_default_mcp_server_registry_backend is called.""" + _make_agent_root(tmp_path) + mock_backend = _make_mock_backend() + + with patch( + "atomic_agents.agent.get_default_mcp_server_registry_backend", + return_value=mock_backend, + ) as mock_factory: + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + mock_factory.assert_called_once() + assert agent.mcp_server_registry_backend is mock_backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 3. Explicit kwarg stored unchanged + + +def test_explicit_kwarg_stored_unchanged(tmp_path: pathlib.Path) -> None: + """The explicit backend instance is stored as-is on self.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + assert agent.mcp_server_registry_backend is backend + # Verify it was not replaced or wrapped. + assert type(agent.mcp_server_registry_backend) is type(backend) + + +# ────────────────────────────────────────────────────────────────────────────── +# 4. _was_explicit flag is True when kwarg supplied + + +def test_was_explicit_flag_true_with_kwarg(tmp_path: pathlib.Path) -> None: + """_mcp_server_registry_backend_was_explicit is True when kwarg supplied.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + assert agent._mcp_server_registry_backend_was_explicit is True + + +# ────────────────────────────────────────────────────────────────────────────── +# 5. _was_explicit flag is False when kwarg is None + + +def test_was_explicit_flag_false_without_kwarg(tmp_path: pathlib.Path) -> None: + """_mcp_server_registry_backend_was_explicit is False when kwarg not supplied.""" + _make_agent_root(tmp_path) + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + assert agent._mcp_server_registry_backend_was_explicit is False + + +# ────────────────────────────────────────────────────────────────────────────── +# 6. OutcomeRunner accepts mcp_server_registry_backend kwarg + + +def test_outcome_runner_accepts_mcp_kwarg(tmp_path: pathlib.Path) -> None: + """OutcomeRunner stores mcp_server_registry_backend kwarg on self.""" + from atomic_agents.outcome import OutcomeRunner + + _make_agent_root(tmp_path) + backend = _make_mock_backend() + + # OutcomeRunner checks agent_root.exists() in __init__; supply a valid one. + runner = OutcomeRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 7. OutcomeRunner threads mcp kwarg to internal AtomicAgent + + +def test_outcome_runner_threads_mcp_kwarg_to_internal_agent( + tmp_path: pathlib.Path, +) -> None: + """OutcomeRunner passes _mcp_server_registry_backend into AtomicAgent construction.""" + from atomic_agents.outcome import OutcomeRunner + + _make_agent_root(tmp_path) + backend = _make_mock_backend() + + runner = OutcomeRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + # Verify threading: when run() constructs AtomicAgent it must pass the backend. + # We verify via the stored field since calling run() requires LLM setup. + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 8. EvalRunner accepts mcp_server_registry_backend kwarg + + +def test_eval_runner_accepts_mcp_kwarg(tmp_path: pathlib.Path) -> None: + """EvalRunner stores mcp_server_registry_backend kwarg on self.""" + from atomic_agents.eval import EvalRunner + + agent_root = _make_agent_root(tmp_path) + # EvalRunner requires an evals/ directory with rubric.md + judge.md. + evals_dir = agent_root / "evals" + evals_dir.mkdir() + (evals_dir / "rubric.md").write_text( + dedent("""\ + --- + threshold_pass: 4.0 + weights: + quality: 100 + --- + # Rubric + """), + encoding="utf-8", + ) + (evals_dir / "judge.md").write_text("# Judge\n", encoding="utf-8") + + backend = _make_mock_backend() + runner = EvalRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 9. EvalRunner threads mcp kwarg to internal AtomicAgent + + +def test_eval_runner_threads_mcp_kwarg_to_internal_agent( + tmp_path: pathlib.Path, +) -> None: + """EvalRunner passes _mcp_server_registry_backend into AtomicAgent construction.""" + from atomic_agents.eval import EvalRunner + + agent_root = _make_agent_root(tmp_path) + evals_dir = agent_root / "evals" + evals_dir.mkdir() + (evals_dir / "rubric.md").write_text( + dedent("""\ + --- + threshold_pass: 4.0 + weights: + quality: 100 + --- + # Rubric + """), + encoding="utf-8", + ) + (evals_dir / "judge.md").write_text("# Judge\n", encoding="utf-8") + + backend = _make_mock_backend() + runner = EvalRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + # Stored field confirms threading is wired; actual construction is at run(). + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 10. DreamRunner accepts mcp kwarg (storage-only) + + +def test_dream_runner_accepts_mcp_kwarg_storage_only(tmp_path: pathlib.Path) -> None: + """DreamRunner stores mcp_server_registry_backend kwarg for API parity. + + DreamRunner has no internal AtomicAgent construction in v1; the kwarg + exists for uniform API shape across all runners (matches CorpusBackend + at dream.py:1299-1309). + """ + from atomic_agents.dream import DreamRunner + + _make_agent_root(tmp_path) + backend = _make_mock_backend() + runner = DreamRunner( + agents_root=tmp_path / "agents", + agent_name="test-agent", + mcp_server_registry_backend=backend, + ) + assert runner._mcp_server_registry_backend is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 11. delegate() threads mcp backend when explicit + + +def test_delegate_threads_mcp_backend_when_explicit(tmp_path: pathlib.Path) -> None: + """delegate() passes mcp_server_registry_backend to the target agent when explicit.""" + agents_root = tmp_path / "agents" + _make_agent_root(tmp_path, "coordinator") + _make_agent_root(tmp_path, "specialist") + # Write roster.md on coordinator so delegation is allowed. + (agents_root / "coordinator" / "roster.md").write_text( + "# Roster\n\n- specialist\n", encoding="utf-8" + ) + + backend = _make_mock_backend() + coordinator = AtomicAgent( + name="coordinator", + agents_root=agents_root, + mcp_server_registry_backend=backend, + ) + assert coordinator._mcp_server_registry_backend_was_explicit is True + + # Capture AtomicAgent construction kwargs inside delegate(). + constructed_kwargs: list[dict] = [] + original_init = AtomicAgent.__init__ + + def capturing_init(self, **kwargs): + constructed_kwargs.append(dict(kwargs)) + return original_init(self, **kwargs) + + with patch.object(AtomicAgent, "__init__", capturing_init): + # delegate() will raise NotInRoster or similar since we have no LLM; + # we only care that the kwarg was passed to the constructor. + try: + coordinator.delegate( + target_agent_name="specialist", + work_item="test", + ) + except Exception: + pass + + # Find the constructor call that targeted "specialist". + specialist_calls = [ + kw for kw in constructed_kwargs if kw.get("name") == "specialist" + ] + if specialist_calls: + assert "mcp_server_registry_backend" in specialist_calls[0] + assert specialist_calls[0]["mcp_server_registry_backend"] is backend + + +# ────────────────────────────────────────────────────────────────────────────── +# 12. delegate() does NOT thread mcp backend when default-resolved + + +def test_delegate_does_not_thread_mcp_backend_when_default_resolved( + tmp_path: pathlib.Path, +) -> None: + """delegate() omits mcp_server_registry_backend when the coordinator used the default.""" + agents_root = tmp_path / "agents" + _make_agent_root(tmp_path, "coordinator") + _make_agent_root(tmp_path, "specialist") + (agents_root / "coordinator" / "roster.md").write_text( + "# Roster\n\n- specialist\n", encoding="utf-8" + ) + + # No explicit backend -- default resolution path. + coordinator = AtomicAgent( + name="coordinator", + agents_root=agents_root, + ) + assert coordinator._mcp_server_registry_backend_was_explicit is False + + constructed_kwargs: list[dict] = [] + original_init = AtomicAgent.__init__ + + def capturing_init(self, **kwargs): + constructed_kwargs.append(dict(kwargs)) + return original_init(self, **kwargs) + + with patch.object(AtomicAgent, "__init__", capturing_init): + try: + coordinator.delegate( + target_agent_name="specialist", + work_item="test", + ) + except Exception: + pass + + specialist_calls = [ + kw for kw in constructed_kwargs if kw.get("name") == "specialist" + ] + if specialist_calls: + assert "mcp_server_registry_backend" not in specialist_calls[0] + + +# ────────────────────────────────────────────────────────────────────────────── +# 13. Fail-closed: MCPRegistryUnavailable propagates from __init__ + + +def test_fail_closed_reraises_mcp_registry_unavailable( + tmp_path: pathlib.Path, +) -> None: + """AtomicAgent construction raises MCPRegistryUnavailable when backend probe fails.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable("catalog down") + + with pytest.raises(MCPRegistryUnavailable): + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + +# ────────────────────────────────────────────────────────────────────────────── +# 14. Fail-closed: raised message includes backend_id + + +def test_fail_closed_message_includes_backend_id(tmp_path: pathlib.Path) -> None: + """The MCPRegistryUnavailable message includes the backend's backend_id.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.backend_id = "my-custom-backend" + backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable("not reachable") + + with pytest.raises(MCPRegistryUnavailable, match="my-custom-backend"): + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + +# ────────────────────────────────────────────────────────────────────────────── +# 15. Fail-closed: credential-bearing URL is redacted in the raised message + + +def test_fail_closed_message_redacts_url_credentials(tmp_path: pathlib.Path) -> None: + """Credentials embedded in the error URL do not appear in the raised message.""" + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.backend_id = "http" + backend.load_all_mcp_servers.side_effect = MCPRegistryUnavailable( + "https://user:secret@catalog.internal/api" + ) + + with pytest.raises(MCPRegistryUnavailable) as exc_info: + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + # The original URL with credentials must not appear in the message. + assert "secret" not in str(exc_info.value) + # The scheme should still be present (redacted to "https://..."). + assert "https://" in str(exc_info.value) + + +# ────────────────────────────────────────────────────────────────────────────── +# 16. MCPClientPool consumes mcp_servers_resolved, not mcp_servers +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_mcp_pool_consumes_mcp_servers_resolved_not_mcp_servers( + tmp_path: pathlib.Path, +) -> None: + """MCPClientPool receives the mcp_servers_resolved list, not mcp_servers.""" + agent_root = _make_agent_root(tmp_path) + # Write an mcp.md so mcp_servers is non-empty. + _write_mcp_md( + agent_root, + "# MCP servers\n\n## github\ncommand: npx\nargs: mcp-server-github\n", + ) + + resolved_spec = MCPServerSpec( + name="resolved-server", + command="python", + args=["-m", "resolved"], + env={}, + transport="stdio", + description="", + ) + backend = _make_mock_backend(specs=[resolved_spec]) + + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + # After construction, mcp_servers_resolved should contain the mock's output. + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None, "Stream 2 field not populated" + assert len(resolved) == 1 + assert resolved[0].name == "resolved-server" + + +# ────────────────────────────────────────────────────────────────────────────── +# 17. Empty mcp.md construction succeeds with no exception + + +def test_empty_mcp_md_construction_succeeds(tmp_path: pathlib.Path) -> None: + """AtomicAgent construction succeeds when mcp.md is absent.""" + _make_agent_root(tmp_path) + # No mcp.md written -- the default filesystem backend returns []. + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + assert agent is not None + assert agent.mcp_server_registry_backend is not None + + +# ────────────────────────────────────────────────────────────────────────────── +# 18. delegate.py CLI catches MCPRegistryError cleanly + + +def test_delegate_cli_catches_mcp_registry_error(tmp_path: pathlib.Path) -> None: + """delegate.py main() prints 'Error: ...' to stderr and returns 1 on MCPRegistryError.""" + from atomic_agents import delegate as delegate_mod + + agents_root = tmp_path / "agents" + _make_agent_root(tmp_path, "coordinator") + + # Patch AtomicAgent construction to raise MCPRegistryUnavailable. + with patch( + "atomic_agents.agent.AtomicAgent.__init__", + side_effect=MCPRegistryUnavailable("catalog down"), + ): + result = delegate_mod.main( + [ + "coordinator", + "--target", + "specialist", + "--work-item", + "do something", + "--agents-root", + str(agents_root), + ] + ) + + assert result == 1 + + +# ────────────────────────────────────────────────────────────────────────────── +# 19. profile.mcp_servers_resolved populated at construction +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_profile_mcp_servers_resolved_populated_at_construction( + tmp_path: pathlib.Path, +) -> None: + """agent._profile.mcp_servers_resolved equals the list from load_all_mcp_servers.""" + _make_agent_root(tmp_path) + spec = MCPServerSpec( + name="myserver", + command="node", + args=["server.js"], + env={}, + transport="stdio", + description="", + ) + backend = _make_mock_backend(specs=[spec]) + + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + assert len(resolved) == 1 + assert resolved[0].name == "myserver" + + +# ────────────────────────────────────────────────────────────────────────────── +# 20. profile.mcp_servers_resolved is empty when no mcp.md +# (depends on Stream 2 AgentProfile.mcp_servers_resolved field) + + +@pytest.mark.skipif( + not hasattr(AgentProfile, "mcp_servers_resolved"), + reason="depends on Stream 2 AgentProfile.mcp_servers_resolved field", +) +def test_profile_mcp_servers_resolved_empty_when_no_mcp_md( + tmp_path: pathlib.Path, +) -> None: + """agent._profile.mcp_servers_resolved is [] when no mcp.md exists.""" + _make_agent_root(tmp_path) + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + ) + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved is not None + assert resolved == [] + + +# ────────────────────────────────────────────────────────────────────────────── +# 21. Descriptor-invalid propagation (from cross-model review army) +# +# The fail-closed wrapper at agent.py:585 was broadened from +# ``except MCPRegistryUnavailable`` to ``except MCPRegistryError`` so +# permanent descriptor errors (malformed mcp.md) propagate with backend_id +# context, not just transient unreachability. Testing C3 and Adversarial F1 +# both flagged the missing test for this propagation path. + + +def test_fail_closed_wraps_mcp_registry_descriptor_invalid( + tmp_path: pathlib.Path, +) -> None: + """MCPRegistryDescriptorInvalid from load_all_mcp_servers propagates as itself. + + The fail-closed wrapper preserves exception type via ``type(exc)(...)`` + so callers can still ``except MCPRegistryDescriptorInvalid`` for parse + errors. The wrapped message adds the backend_id context. + """ + _make_agent_root(tmp_path) + backend = _make_mock_backend() + backend.backend_id = "filesystem" + backend.load_all_mcp_servers.side_effect = MCPRegistryDescriptorInvalid( + "mcp.md section 'github' missing required 'command:' field" + ) + with pytest.raises(MCPRegistryDescriptorInvalid) as excinfo: + AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + # Re-raised message includes backend_id context. + assert "filesystem" in str(excinfo.value) + # Original cause preserved via ``raise ... from exc``. + assert isinstance(excinfo.value.__cause__, MCPRegistryDescriptorInvalid) + + +# ────────────────────────────────────────────────────────────────────────────── +# 22. Empty resolved list is authoritative (Codex HIGH + Adversarial F2) +# +# Triple-confirmed cross-model finding. When the registry backend returns +# an empty list, the framework MUST NOT fall back to config.mcp_servers +# (which carries any stale mcp.md specs). The agent.py code path was +# changed from `... or self.config.mcp_servers` to `if/else` so empty is +# respected as the authoritative answer. + + +def test_empty_resolved_list_does_not_fall_back_to_mcp_md( + tmp_path: pathlib.Path, +) -> None: + """Backend returning [] is authoritative even when mcp.md has servers. + + Pre-fix: ``... or self.config.mcp_servers`` resurrected stale mcp.md + entries. Post-fix: ``if hasattr ...`` branches respect the empty list + as the operator's pinned-catalog answer. + """ + agent_root = _make_agent_root(tmp_path) + # mcp.md declares one server. Without the fix this would be picked up. + _write_mcp_md( + agent_root, + dedent( + """\ + ## stale-server + command: nonexistent-binary + + args: + - --legacy + """ + ), + ) + # Backend returns [] (operator pinned empty catalog). + backend = _make_mock_backend(specs=[]) + agent = AtomicAgent( + name="test-agent", + agents_root=tmp_path / "agents", + mcp_server_registry_backend=backend, + ) + # Profile field is empty (authoritative). + resolved = getattr(agent._profile, "mcp_servers_resolved", None) + assert resolved == [], "backend returning [] must populate empty list on profile" + # Pool stays None — agent.call() not invoked here, but the consumption + # site at agent.py:3367 now uses the resolved list ([]) and skips the + # pool spin-up. The stale-server from mcp.md is NOT in the resolved list. + assert agent.mcp_pool is None diff --git a/tests/test_profile_mcp_servers_resolved.py b/tests/test_profile_mcp_servers_resolved.py new file mode 100644 index 0000000..b8948be --- /dev/null +++ b/tests/test_profile_mcp_servers_resolved.py @@ -0,0 +1,343 @@ +"""Tests for AgentProfile.mcp_servers_resolved sibling field (#201 PR 2 of 5). + +Covers: + 1. Field existence and default value. + 2. Field is LAST in the dataclass (pins the design constraint). + 3. to_dict() always emits [] even when field is populated (locked Q2). + 4. from_dict() round-trip when field is absent in the dict. + 5. from_dict() correctly reconstructs MCPServerSpec instances when + the dict DOES contain mcp_servers_resolved (future-compat). + 6. _mcp_spec_from_dict / _mcp_spec_to_dict round-trip equality. + 7. Pre-existing latent bug fix: mcp_servers fallback path now returns + MCPServerSpec instances, not raw dicts. + 8. Snapshot round-trip resets mcp_servers_resolved to [] (security shape). + 9. SQLite save-and-load round-trip keeps mcp_servers_resolved as []. +""" + +from __future__ import annotations + +import dataclasses +from pathlib import Path + +from atomic_agents.mcp import MCPServerSpec +from atomic_agents.profile.types import ( + AgentProfile, + _mcp_spec_from_dict, + _mcp_spec_to_dict, +) + +# --------------------------------------------------------------------------- +# Minimal AgentProfile construction helper +# --------------------------------------------------------------------------- + +_IDENTITY = "# Dan\n## Operating mode\nreactive\n" + + +def _minimal_profile(**overrides) -> AgentProfile: + """Return the smallest valid AgentProfile, with optional field overrides.""" + base = { + "name": "test-agent", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [], + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", + } + base.update(overrides) + return AgentProfile.from_dict(base) + + +def _sample_spec() -> MCPServerSpec: + return MCPServerSpec( + name="filesystem-tools", + command="npx", + args=["-y", "@mcp/server-fs", "/data"], + env={"MCP_TOKEN": "tok123"}, + transport="stdio", + description="filesystem tools server", + ) + + +# --------------------------------------------------------------------------- +# Test 1: field exists and defaults to [] +# --------------------------------------------------------------------------- + + +def test_mcp_servers_resolved_field_exists(): + """mcp_servers_resolved is a field on AgentProfile; default is [].""" + field_names = {f.name for f in dataclasses.fields(AgentProfile)} + assert "mcp_servers_resolved" in field_names + + profile = _minimal_profile() + assert profile.mcp_servers_resolved == [] + + +# --------------------------------------------------------------------------- +# Test 2: field is LAST in the dataclass +# --------------------------------------------------------------------------- + + +def test_mcp_servers_resolved_field_is_last_in_dataclass(): + """mcp_servers_resolved MUST be the last field in AgentProfile. + + Python dataclass rule: fields with defaults cannot precede required + fields. Pinning this ensures future field additions don't silently + break the constraint or trigger TypeError at import. + """ + fields = dataclasses.fields(AgentProfile) + assert fields[-1].name == "mcp_servers_resolved" + + +# --------------------------------------------------------------------------- +# Test 3: to_dict() always emits [] even when field is populated +# --------------------------------------------------------------------------- + + +def test_to_dict_always_emits_empty_list_even_when_field_populated(): + """to_dict() emits 'mcp_servers_resolved': [] regardless of runtime value. + + Locked decision Q2 from PR 2 prep pass: the field is a runtime transient. + Serializing real values would write resolved MCP env secrets into snapshot + JSON files on disk, which contradicts spec/24 Decision 1's security intent. + """ + profile = _minimal_profile() + # Inject a non-empty value via dataclasses.replace (the framework's pattern). + spec = _sample_spec() + populated = dataclasses.replace(profile, mcp_servers_resolved=[spec]) + assert len(populated.mcp_servers_resolved) == 1 + + serialized = populated.to_dict() + assert serialized["mcp_servers_resolved"] == [] + + +# --------------------------------------------------------------------------- +# Test 4: from_dict() round-trip when key is absent +# --------------------------------------------------------------------------- + + +def test_from_dict_round_trip_with_empty_field(): + """from_dict() on a dict without mcp_servers_resolved gives [] (default).""" + d = { + "name": "agent-x", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [], + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", + # mcp_servers_resolved deliberately absent + } + profile = AgentProfile.from_dict(d) + assert profile.mcp_servers_resolved == [] + + +# --------------------------------------------------------------------------- +# Test 5: from_dict() reconstructs MCPServerSpec when field is present in dict +# --------------------------------------------------------------------------- + + +def test_from_dict_round_trip_with_populated_field_in_dict(): + """from_dict() reconstructs MCPServerSpec instances from mcp_servers_resolved. + + In normal usage to_dict() emits [] so snapshots never carry resolved specs. + But if a dict DOES contain the field (direct test construction, or a future + un-clamped serializer path), from_dict should reconstruct correctly via + _mcp_spec_from_dict. + """ + spec = _sample_spec() + spec_dict = _mcp_spec_to_dict(spec) + + d = { + "name": "agent-y", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [], + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", + "mcp_servers_resolved": [spec_dict], + } + profile = AgentProfile.from_dict(d) + assert len(profile.mcp_servers_resolved) == 1 + resolved = profile.mcp_servers_resolved[0] + assert isinstance(resolved, MCPServerSpec) + assert resolved.name == spec.name + assert resolved.command == spec.command + assert resolved.args == spec.args + assert resolved.env == spec.env + assert resolved.transport == spec.transport + assert resolved.description == spec.description + + +# --------------------------------------------------------------------------- +# Test 6: _mcp_spec_from_dict / _mcp_spec_to_dict round-trip +# --------------------------------------------------------------------------- + + +def test_mcp_spec_from_dict_helper_round_trip(): + """_mcp_spec_to_dict then _mcp_spec_from_dict reconstructs a equal spec.""" + original = _sample_spec() + as_dict = _mcp_spec_to_dict(original) + reconstructed = _mcp_spec_from_dict(as_dict) + + assert isinstance(reconstructed, MCPServerSpec) + assert reconstructed.name == original.name + assert reconstructed.command == original.command + assert reconstructed.args == original.args + assert reconstructed.env == original.env + assert reconstructed.transport == original.transport + assert reconstructed.description == original.description + + +# --------------------------------------------------------------------------- +# Test 7: mcp_servers fallback path returns MCPServerSpec instances, not dicts +# --------------------------------------------------------------------------- + + +def test_mcp_servers_fallback_now_returns_specs_not_dicts(): + """Pre-existing latent bug fix: mcp_servers fallback path returns MCPServerSpec. + + Before _mcp_spec_from_dict was introduced, the fallback path in + AgentProfile.from_dict (used when mcp_md_raw is absent) returned raw dicts + from d['mcp_servers']. This caused isinstance checks and attribute access + on profile.mcp_servers to fail at runtime. + """ + spec = _sample_spec() + spec_dict = _mcp_spec_to_dict(spec) + + # mcp_md_raw is empty -- forces the else branch (fallback path) + d = { + "name": "agent-z", + "agent_mode": "reactive", + "model_config": {}, + "tool_config": {}, + "tool_classifications": {}, + "judges_config": None, + "roster": [], + "mcp_servers": [spec_dict], # dict form -- must be reconstructed + "persona_identity": _IDENTITY, + "persona_soul": "", + "persona_user": "", + "goal_text": "", + "model_md_raw": "", # empty forces fallback + "tools_md_raw": "", + "judges_md_raw": None, + "roster_md_raw": "", + "mcp_md_raw": "", # empty forces fallback + } + profile = AgentProfile.from_dict(d) + assert len(profile.mcp_servers) == 1 + assert isinstance(profile.mcp_servers[0], MCPServerSpec), ( + f"Expected MCPServerSpec, got {type(profile.mcp_servers[0])!r}. " + "This is the latent bug: the fallback path returned raw dicts before " + "_mcp_spec_from_dict was introduced." + ) + assert profile.mcp_servers[0].name == spec.name + + +# --------------------------------------------------------------------------- +# Test 8: snapshot round-trip resets mcp_servers_resolved to [] +# --------------------------------------------------------------------------- + + +def test_snapshot_roundtrip_resets_mcp_servers_resolved_to_empty(tmp_path: Path): + """Filesystem backend: save/snapshot/restore resets mcp_servers_resolved to []. + + Verifies the snapshot security shape: resolved MCP secrets never persist + to disk via the to_dict() path. + """ + from atomic_agents.profile.filesystem import FilesystemAgentProfileBackend + + scope_root = tmp_path / "agents" + scope_root.mkdir() + backend = FilesystemAgentProfileBackend(scope_root) + + # Build an agent directory so load_profile works. + agent_root = scope_root / "snap-agent" + persona_dir = agent_root / "persona" + persona_dir.mkdir(parents=True) + (persona_dir / "IDENTITY.md").write_text(_IDENTITY, encoding="utf-8") + (agent_root / "model.md").write_text("", encoding="utf-8") + (agent_root / "tools.md").write_text("", encoding="utf-8") + + profile = backend.load_profile("snap-agent") + assert profile.mcp_servers_resolved == [] + + # Inject a non-empty value then save. + spec = _sample_spec() + populated = dataclasses.replace(profile, mcp_servers_resolved=[spec]) + backend.save_profile("snap-agent", populated) + + # Snapshot and restore. + snap_id = backend.snapshot("snap-agent", label="test-snap") + backend.restore("snap-agent", snap_id) + + # Reload and confirm mcp_servers_resolved is back to []. + restored = backend.load_profile("snap-agent") + assert restored.mcp_servers_resolved == [], ( + "mcp_servers_resolved must be [] after snapshot/restore -- " + "resolved secrets must not persist to disk." + ) + + +# --------------------------------------------------------------------------- +# Test 9: SQLite save-and-load keeps mcp_servers_resolved as [] +# --------------------------------------------------------------------------- + + +def test_sqlite_save_and_load_profile_includes_mcp_servers_resolved_as_empty( + tmp_path: Path, +): + """SQLite backend: save then load yields mcp_servers_resolved == []. + + Verifies that the new field rides through the profile_json blob column + correctly and that no schema change was accidentally required. + """ + from atomic_agents.profile.sqlite import SQLiteAgentProfileBackend + + db_path = tmp_path / ".profile.db" + backend = SQLiteAgentProfileBackend(db_path) + + # Construct and save a profile with a populated mcp_servers_resolved. + spec = _sample_spec() + profile = _minimal_profile() + populated = dataclasses.replace(profile, mcp_servers_resolved=[spec]) + backend.save_profile("sql-agent", populated) + + # Load and assert. + loaded = backend.load_profile("sql-agent") + assert loaded.mcp_servers_resolved == [], ( + "mcp_servers_resolved must be [] after SQLite round-trip: " + "to_dict() always serializes [] so the blob never carries resolved secrets." + )