diff --git a/CHANGELOG.md b/CHANGELOG.md index dd04403..7b0c7d3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -51,6 +51,8 @@ CHANGELOG entry. ### Added +- **MCPServerRegistryBackend filesystem install/uninstall + LockBackend lease + render_mcp_md serializers + CLI install/uninstall subcommands** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 3 of 5**). Operators on every backend now have the framework's CLI as the canonical install surface for MCP servers. `atomic-agents mcp-registry install --command --args --env --description ` atomically appends a new H2 section to `/mcp.md` under a `LockBackend` lease; `atomic-agents mcp-registry uninstall ` removes it idempotently (absent-name returns 0 without error, matching the SQLiteToolRegistryBackend uninstall precedent). The README's "same agent definitions, same call flow, different backends" promise now extends to operator write commands at v1.0, not just runtime read paths. **The mcp.md serializer that the install path depends on shipped in the same PR**: `atomic_agents/mcp.py` gains `render_mcp_md_section(spec) -> str` and `render_mcp_md_full(specs) -> str` with a round-trip property pinned by 4 new tests (parse_mcp_md_text(render_mcp_md_full(specs), resolve_env=False) == specs). The serializer writes `$VAR` env references verbatim per spec/36 Decision 7 (resolved env values never persist to disk); refuses any field (command, args items, env keys, env values, description) containing a newline so the parser cannot be tricked into interpreting attacker-controlled content as a phantom H2 section. `FilesystemMCPServerRegistryBackend.install(spec) -> MCPServerRef` implements the 7-step critical section per spec/36 MUST 9: `_validate_server_name(spec.name)` at the API boundary (MUST 1), `with lock_backend.acquire("mcp_registry", timeout=self._install_lock_timeout) as handle:` (context-manager idiom releases on every exit path including exceptions), `LockBusy` caught at the boundary and re-raised as `MCPRegistryUnavailable` (preserves the framework-level fail-closed semantic at `agent.py:__init__`), `cleanup_stale_tempfiles_for_file(mcp_md)` scoped to the target file's siblings only (NOT recursive over the whole agent_root tree), read mcp.md with FileNotFoundError → empty content cold-start, parse with `resolve_env=False` keeping $VAR refs raw, dual-probe collision detection across both the parsed-name set AND a raw `re.findall(r"^## (\S+)", content, re.MULTILINE)` scan (catches malformed sections that the parser silently skipped), `check_lock_lost(handle)` immediately before atomic_write (no-op on filesystem; raises LockLost on Redis-backed leases that expired mid-critical-section, re-raised as MCPRegistryUnavailable; non-LockLost exceptions from the helper also caught), full-file render via `render_mcp_md_full`, `_io.atomic_write` (temp + fsync + rename + parent dir fsync). Returns `MCPServerRef` projected from input spec (name, single-line description, transport, version=None, source=`f"mcp.md#section:{name}"`) with NO env / command / args fields — the CLI handler can safely echo `ref.name` without secret-leak risk (closes the symmetric class of the PR 1 P0 secret leak in `mcp-registry show` that was caught by cross-model triple-confirmation). `uninstall(name)` mirrors the lock discipline: validate-name-first, dual-probe absent check, no atomic_write on no-op path (preserves mcp.md mtime), returns None on both present-removed and absent-no-op paths, no pre-lock fast-path (a concurrent install could add the name between an unlocked check and the subsequent read; spec/36 MUST 9 amendment documents this discipline). Constructor signature gains `install_lock_timeout: float = 30.0` kwarg (per spec/21 `apply_staging_lock_timeout` precedent; CI fail-fast tests use `install_lock_timeout=0.0`) and the default `lock_backend=None` now lazily resolves via `get_default_lock_backend(self._agent_root)` so multi-host operators pinning `ATOMIC_AGENTS_LOCK_BACKEND=redis` on Cloud Run / Kubernetes automatically get `RedisLockBackend` for registry writes without per-construction operator config. Capability flags flip at this PR per spec/36 Decision 5 evolution table: `supports_install=True`, `supports_uninstall=True` (was False at PR 1/2 because the methods raised NotImplementedError; MUST 3 capability honesty now means conformance suite calls install/uninstall directly and asserts they return correctly typed values, not that they raise). `MCPRegistryError` rebased to inherit from `AtomicAgentsError` (was `Exception`) so framework-wide `except AtomicAgentsError` catch-alls see registry failures consistently with the 11 other backend protocols' hierarchies. `atomic_agents/cli.py` adds `install` + `uninstall` subparsers under `mcp-registry` with `--command` (required), `--args` (comma-separated, empty entries dropped), `--env` (comma-separated KEY=$VAR pairs, split on first `=` so values may contain `=`, empty key raises ValueError), `--description` (single-line; refused if any line matches `^##\s` per defense-in-depth against H2 injection), `--transport` (choices=["stdio"] for v1). CLI WARNs on stderr when `--env KEY=value` doesn't start with `$` (operator likely typed a literal secret; install still succeeds per decision 3 = WARN, not ERROR — legitimate non-secret use cases like `--env MODE=production` aren't blocked but the operator sees feedback). CLI REFUSES newlines in `--command`, any `--args` item, any `--env` key, or any `--env` value with operator-readable errors naming the offending flag (defense-in-depth against the API-path H2 injection class that Claude Adversarial + Codex independently flagged as P1). Lazy import block adds `MCPServerAlreadyInstalled` + `MCPRegistryError` (was missing — would have caused `NameError` on the first install collision); exception handler chain adds explicit `MCPServerAlreadyInstalled` catch before the `MCPRegistryError` base-class backstop (catch-order matters: more-specific subclass before base). Top-level `mcp-registry` description and module docstring updated to remove the "deferred to PR 3" language now that install/uninstall ship. spec/36 PR 3 amendments (committed at `3a3a23e` before implementer dispatch): constructor signature gains `install_lock_timeout` kwarg with usage rationale; `lock_backend` parameter docstring routes default through `get_default_lock_backend(agent_root)` and explicitly names the custom-lock-backend deadlock failure mode (passing `agent.lock_backend` competes with `agent.call()` for `/.lock` and raises LockBusy whenever a call is in flight); new "Install / uninstall semantics" subsection documents the 7-step critical section with the context-manager idiom (NOT bare `handle.release()` because LockHandle is a frozen dataclass and release is a backend method), dual-probe collision detection, absent-name idempotency with no fast-path bypass; new "LockBackend integration" subsection documents factory routing + install_lock_timeout knob + LockBusy translation + check_lock_lost discipline + custom-lock-backend operator surface + multi-host pinning + non-reentrant default; MUST 9 contract updated to require the context-manager idiom + explicit LockBusy → MCPRegistryUnavailable mapping + check_lock_lost before atomic_write + no-pre-lock-fast-path rule + mtime-preservation note for absent-name uninstall; capabilities label flipped from "PR 1/2" to "PR 3+" reflecting the flag flip. **Cross-model review army at /ship time** (7 parallel Sonnet subagents covering plan completion + pre-landing checklist + testing/maintainability/security/performance specialists + Claude adversarial + Codex adversarial via `codex exec`) surfaced 44 findings; 5 were triple-confirmed across at least 3 independent reviewers and applied inline: (1) **H2 injection refusal in render_mcp_md_section** for newlines in spec.command, spec.args items, spec.env keys + values (Claude Adversarial #1 CRITICAL FIXABLE + Codex P1 + Pre-Landing — an API caller could construct an MCPServerSpec that wrote multi-section content the parser interpreted as MULTIPLE H2 sections, bypassing collision detection + name validation with no audit record); (2) **cleanup_stale_tempfiles moved out of __init__** to install/uninstall write paths with a tightly-scoped `cleanup_stale_tempfiles_for_file(mcp_md)` glob (NOT rglob) helper in `_io.py` (Pre-Landing CRITICAL + Codex P1 + Performance + Claude Adversarial — the constructor was recursively deleting `.*.tmp` anywhere under agent_root which violated MUST 2 side-effect-free construction and could delete unrelated user/application tempfiles, including from read-only commands like `list`); (3) **BackendNotRegistered escape from locks module fixed**: `_resolve_lock_backend` wraps `get_default_lock_backend` in try/except and re-raises as `MCPRegistryUnavailable` so operator typos in `ATOMIC_AGENTS_LOCK_BACKEND` produce clean errors instead of raw Python tracebacks (Codex P2 + Claude Adversarial #3); (4) **check_lock_lost broaden except clause**: non-LockLost exceptions (ImportError from broken redis dep, AttributeError from malformed handle.backend_state, etc.) now translate to MCPRegistryUnavailable instead of escaping raw (Codex P3 + Claude Adversarial #4); (5) **lock timeout test added** `test_install_lock_timeout_zero_under_contention` exercises the spec/36 MUST 9 LockBusy → MCPRegistryUnavailable contract by holding the lock in the test setup and asserting install with `install_lock_timeout=0.0` raises the wrapper exception (Testing specialist CRITICAL + Pre-Landing + Maintainability — module docstring had promised this test category but no implementation existed). Auto-fix cluster also applied inline: late imports (`render_mcp_md_full` + `check_lock_lost`) moved to top-level for visibility and micro-perf; stale docstrings in `filesystem.py` + `backend.py` rewritten to present-tense PR 3+ baseline (removed PR-1 historical claims that misled future readers); test assertion gaps closed (`test_uninstall_idempotent_double_call` now asserts `result2 is None`, `test_cli_install_warns_on_literal_env_value` now asserts `exit_code == 0`, `test_install_empty_command_raises` tightened from `pytest.raises((ValueError, Exception))` to `pytest.raises(ValueError)`); CLI H2 description guard aligned to the renderer's `re.match(r'^##\s', line)` regex (catches `##\t` tab-separated case that the prior `line.startswith("## ")` missed). Conformance suite tightened per spec/36 MUST 3 + MUST 9 + MUST 10: `test_capability_honesty_install` True-branch now asserts `isinstance(ref, MCPServerRef)` (replaces the prior `except Exception: pass` that accepted any exception as conformant); `test_capability_honesty_uninstall` True-branch now asserts `result is None` (idempotent contract); new `test_must9_install_atomicity_concurrent_same_name` spawns 3 threads installing the same spec and asserts exactly one wins (others raise MCPServerAlreadyInstalled or MCPRegistryUnavailable); new `test_must9_uninstall_absent_name_is_noop` asserts uninstall("definitely-not-in-registry") returns None without raising; new `test_must10_post_install_consistency` asserts `set(load_all_mcp_servers()) == set(load_mcp_server(ref.name) for ref in list_mcp_servers())` after install (MUST 10 equivalence holds across the read paths post-mutation); the placeholder `@pytest.mark.skip("PR 3")` stubs in `test_mcp_server_registry_filesystem_backend.py` removed (real tests now in the new file). Doctor capability snapshot test updated to assert `supports_install=True, supports_uninstall=True` (was the PR 2 baseline). **Pre-impl prep**: 5-stream parallel Sonnet prep pass parametrized on failure-mode dimensions (LockBackend integration + acquire/release; install atomicity; uninstall idempotency; CLI surface + secret-leak discipline; capability flag flip + conformance) caught 58 findings BEFORE any code shipped, mirroring the PR 1 (35 findings) and PR 2 (71 findings) prep cadence. The single load-bearing P0 caught at prep stage: **the mcp.md serializer didn't exist anywhere in the codebase** — spec/36 said "append new H2 section" without naming the missing primitive. Streams B and C independently flagged it; PR 3 grew the test budget from +15 to +19 to ship the serializer alongside its consumers in a single review pass. Two implementer streams ran in parallel under git worktree isolation (Stream 1 owned mcp.py serializer + filesystem.py install/uninstall + backend.py base class fix; Stream 2 owned cli.py + tests/test_mcp_server_registry_filesystem_install.py NEW + conformance suite updates) per the aggressive-Sonnet-delegation-when-on-Opus discipline; merged cleanly with zero conflicts because file-set partition was disjoint by design. **Test delta: +33 net new (3199 collected before PR 3, 3232 after; 3176 passed + 56 skipped + 0 failures + 0 regressions across the full suite)**. Test files: `tests/test_mcp.py` (+4 render round-trip tests pinning the serializer's parse/render symmetry including $VAR refs preserved unresolved and descriptions stripped to single-line); `tests/test_mcp_server_registry_filesystem_install.py` NEW (28 tests covering install happy path + cold-start mcp.md creation + collision raising MCPServerAlreadyInstalled + path-traversal name raising ValueError + empty-command rejection + $VAR env round-trip + install/load round-trip + uninstall present/absent/double-call/install-uninstall-install cycle + concurrent same-name exactly-one-wins + concurrent different-names all-win + lock-timeout-zero-under-contention + CLI no-env-echo + CLI WARN on literal env + CLI refuses H2 in description + _parse_env_flag/_parse_args_flag unit tests); `tests/test_mcp_server_registry_conformance.py` (+3 new conformance tests + 2 tightened existing tests + 1 docstring correction); `tests/test_mcp_server_registry_filesystem_backend.py` (placeholder skips removed); `tests/test_mcp_server_registry_doctor.py` (capability assertions updated for the flag flip). After PR 5 of 5 of #201 lands, atomic-agents-stack hits v1.0 with twelve of twelve backend protocols shipped. PR 3 extends the post-#285-revert `/ship` streak to 10. + - **MCPServerRegistryBackend wiring + AgentProfile.mcp_servers_resolved sibling field + doctor.check_mcp_server_registry_backend + IRON RULE regression suite + cross-model fail-closed re-raise** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 2 of 5**). Operators with a registry-backed MCP catalog now have agent construction actually consult the backend: previous PRs scaffolded the Protocol; this PR ships the framework integration. `AtomicAgent.__init__` accepts a new `mcp_server_registry_backend` constructor kwarg + class-level type annotation, resolves the default via `get_default_mcp_server_registry_backend(self.agent_root, read_paths)` when none is supplied, calls `backend.load_all_mcp_servers()` to materialize the spec list, and populates the new `AgentProfile.mcp_servers_resolved` field via `dataclasses.replace()` BEFORE `_load_config()` runs. `MCPClientPool` at `agent.call()` now reads `self._profile.mcp_servers_resolved` (the substrate-agnostic materialized list) instead of `self.config.mcp_servers` (the filesystem-parse path, which stays in place for backward-compat audit/log consumers). The framework-level fail-closed invariant per spec/36 lines 519-522 propagates `MCPRegistryError` from the backend probe with backend_id context plus URL credential redaction; original exception type is preserved via `type(exc)(...)` so callers can distinguish `MCPRegistryUnavailable` (transient) from `MCPRegistryDescriptorInvalid` (permanent). Per-runner kwargs ship on all three runners: `OutcomeRunner(..., mcp_server_registry_backend=...)` and `EvalRunner(..., mcp_server_registry_backend=...)` thread the kwarg to the internal `AtomicAgent` construction sites; `DreamRunner` stores it for API parity (no internal `AtomicAgent` construction in v1, mirroring the CorpusBackend precedent). `delegate.py` adds explicit-only threading via `_mcp_server_registry_backend_was_explicit` flag (mirrors PersonaBackend D-ER-2 at `agent.py:401` and CorpusBackend at `agent.py:431`): default-resolved backends do NOT leak the coordinator's `agent_root` to delegates because MCP catalog is per-agent semantic context, not fleet-scoped; the `MCPRegistryError` family is added to the delegate CLI catch block so catalog-down at delegate construction surfaces as a clean operator-facing error instead of a Python traceback. `AgentProfile` gains the `mcp_servers_resolved: list[MCPServerSpec]` sibling field (spec/24 Decision 1 addendum) placed LAST in the dataclass to honor Python's required-fields-before-defaults ordering rule; `field` is added to the `dataclasses` import; `to_dict()` always serializes the field as `[]` regardless of runtime value (snapshot security: resolved `$VAR` env values never land in snapshot JSON files on disk, preserving spec/24 D1's raw-text-shadow security intent); `from_dict()` reconstructs the field via the new `_mcp_spec_from_dict` helper which also fixes a pre-existing latent bug where the `mcp_servers` fallback path returned raw dicts instead of `MCPServerSpec` instances. SQLite backend continues to ride the existing `profile_json` blob round-trip; zero schema migration required, schema stays at v2. Filesystem profile backend additionally sorts `mcp_servers` lexicographically by name on load to align with spec/36 MUST 5 across all backends; pre-#201 operators with alphabetical `mcp.md` see no change; operators with non-alphabetical `mcp.md` see a one-time invisible reorder of `agent.config.mcp_servers` introspection (MCP tool resolution and audit semantics unchanged because qualified names are unique per server). `doctor.check_mcp_server_registry_backend` ships as the 13th `check_*_backend` with PASS/WARN/FAIL ladder: PASS on `list_mcp_servers` + `load_all_mcp_servers` both succeeding (the new dual-probe closes a Codex/Claude-adversarial-triple-confirmed false-PASS hole where a malformed `mcp.md` would PASS doctor then crash agent construction); WARN on transient `MCPRegistryUnavailable`; FAIL on unknown `backend_id` with credential-redacted echo via the shared `_redact_for_error_message` helper from `mcp_registry/__init__.py` (NOT the inline truncation used in `check_tool_registry_backend` which misses DSN-style values per the PR 1 P0 redaction discipline). Capability snapshot in the detail dict includes all 5 `MCPServerRegistryCapabilities` fields plus `mcp_server_count` so operator `atomic-agents doctor --json` consumers see the runtime view. Two PR 1 latent bug fixes surfaced by PR 2's fail-closed semantic: `FilesystemMCPServerRegistryBackend.list_mcp_servers` now discriminates `FileNotFoundError`/ENOENT (returns `[]`, the normal absent-file path) from other `OSError` (raises `MCPRegistryUnavailable` so misconfigured PermissionError after a Kubernetes deploy actually surfaces); `load_all_mcp_servers` overridden directly instead of delegating to `_default_load_all` so it does a single read-parse with proper exception mapping (ENOENT to empty, OSError to MCPRegistryUnavailable, parse error to MCPRegistryDescriptorInvalid, env-var unresolvable to MCPServerConnectFailed) rather than masking transient and parse failures through `list_mcp_servers`'s soft-degrade path. spec/36 line 599 gains a one-sentence amendment correcting the wiring location from `_load_config()` to `AtomicAgent.__init__` (spec/36 said the former, but `_load_config` is a pure reader of `self._profile` that cannot mutate it); spec/24 Decision 1 gains the locked D1 addendum documenting the sibling field, snapshot security clamp, and SQLite forward-compat. **Cross-model review army** (4 specialists + Claude adversarial subagent + Codex adversarial via `codex exec` + Codex structured review via `codex review`) surfaced 15+ findings; the 3 highest-priority were triple-confirmed across models and applied inline: (1) empty resolved list now treated as authoritative instead of falling back to `config.mcp_servers` (Codex HIGH + Claude adversarial F2 + Codex structured P2; the prior `... or self.config.mcp_servers` resurrected stale `mcp.md` servers when the operator pinned an empty catalog); (2) fail-closed wrapper broadened from `MCPRegistryUnavailable` to `MCPRegistryError` so descriptor-invalid and connect-failed also propagate with backend_id context; (3) doctor false-PASS closed by adding the `load_all_mcp_servers` probe after `list_mcp_servers` succeeds. **Test delta: +46 net new (3192 collected before fixes, 3199 after the doctor + wiring additions, 3141 passed + 58 skipped + 0 failures + 0 regressions across the full suite).** Test files: `tests/test_mcp_server_registry_wiring.py` (~22 tests covering per-runner kwargs, delegate explicit-only behavior, fail-closed re-raise with backend_id + URL redaction + MCPRegistryDescriptorInvalid propagation, profile augmentation, empty-resolved-list authoritative, CLI MCPRegistryError catch), `tests/test_mcp_server_registry_migration_regression.py` (10 IRON RULE byte-identity tests pinning pre-#201 behavior when no backend is configured: empty/missing mcp.md returns empty specs, single server no env vars byte-identical to `parse_mcp_md` baseline, multiple servers in sorted order matching the spec/36 MUST 5 sort, env-var resolved specs match the pre-#201 path, config equals profile resolved, pool `_specs` byte-identical to resolved list, `MCPRegistryUnavailable` raises at construction with no silent degrade), `tests/test_profile_mcp_servers_resolved.py` (9 tests covering field existence + LAST-in-dataclass constraint + to_dict always-`[]` + from_dict round-trip with key present + backward-compat with key absent + `_mcp_spec_from_dict` helper round-trip + `mcp_servers` fallback bug fix + snapshot/restore resets to `[]` + SQLite JSON blob round-trip), `tests/test_mcp_server_registry_doctor.py` (5 tests covering PASS with empty mcp.md, PASS with valid mcp.md, FAIL on unknown backend_id with URL credential redaction including DSN-style `user:pass@host` patterns, FAIL on descriptor invalid via the new probe, capability snapshot completeness). Mirrors the doctor-test shape of every other backend (lock, log, profile, tool registry, persona, corpus). Pre-impl prep: 5-stream parallel Sonnet prep pass parametrized on failure-mode dimensions (wiring shape vs precedent; AgentProfile round-trip; IRON RULE design; doctor PASS/WARN/FAIL ladder; fail-closed wiring) caught 71 findings BEFORE any code shipped, beating PR 1's 35-finding bar. Two binary design decisions locked via AskUserQuestion before implementer dispatch: server list ordering (Q1, locked to "sort both paths to lexicographic" in the filesystem profile backend so spec/36 MUST 5 holds across all backends with zero functional change to operators); snapshot serialization shape (Q2, locked to "always serialize `mcp_servers_resolved` as `[]`" so resolved MCP env secrets never persist to disk in snapshot files). Three commits authored as parallel Sonnet implementer streams in isolated git worktrees per the aggressive-Sonnet-delegation-when-on-Opus discipline (Stream 1 owned `agent.py` + runners + delegate + spec/36; Stream 2 owned profile/* + doctor + mcp_registry/filesystem.py bug fix + spec/24); merged cleanly with zero conflicts because the file-set partition was disjoint by design. Known follow-up coverage gaps (filed as P2 for PR 3 prep): a future PR can add chmod-style PermissionError tests for `load_all_mcp_servers`'s new OSError mapping (analogous coverage already exists for `load_mcp_server`); threading test strengthening to patch `AtomicAgent.__init__` and capture kwargs at runner `run()` time instead of asserting on the stored field. After PR 5 of 5 of #201 lands, atomic-agents-stack hits v1.0 with twelve of twelve backend protocols shipped; PR 2 extends the post-#285-revert /ship streak to 9. - **MCPServerRegistryBackend Protocol scaffolding + FilesystemMCPServerRegistryBackend reference impl + DRAFT spec/36 + `atomic-agents mcp-registry` CLI** ([#201](https://github.com/dep0we/atomic-agents-stack/issues/201) -- MCPServerRegistryBackend arc **PR 1 of 5**, the v1.0 closer arc). Operators with a fleet of agents stop hand-syncing `mcp.md` files across N agents. The Protocol seals the per-agent mounted-server-source seam so SaaS / org-internal / public-registry futures arrive via the same env-var pin pattern used by the eleven prior backend protocols. New `atomic_agents/mcp_registry/` package with `__init__.py` (`register_mcp_server_registry_backend` / `get_mcp_server_registry_backend` / `list_mcp_server_registry_backends` + `get_default_mcp_server_registry_backend` factory + `_redact_for_error_message` helper with DSN heuristic for non-scheme connection strings), `types.py` (`MCPServerRef` frozen dataclass with `to_dict`/`from_dict` roundtrip, `MCPServerRegistryCapabilities` with 5 capability flags including the static-vs-dynamic distinction novel to this Protocol, `ValidationResult`), `backend.py` (`@runtime_checkable` `MCPServerRegistryBackend` Protocol with 8 methods + `@property backend_id` + `@property capabilities`, 6 exception classes, `_default_load_all(backend)` module-level helper that backends MAY override for performance), and `filesystem.py` (`FilesystemMCPServerRegistryBackend(agent_root, read_paths, *, lock_backend=None)` reading `/mcp.md`; install/uninstall raise `NotImplementedError` at PR 1 with `supports_install=False` static class default per the capability-honesty contract (MUST 3); install/uninstall ship at PR 3 alongside `LockBackend` integration). `parse_mcp_md_text` and `_build_spec` in `atomic_agents/mcp.py` gain an additive `resolve_env: bool = True` keyword-only parameter; the `_flush()` closure threads the parameter through to `_build_spec`; a new module-level `_resolve_env_vars(env, server_name)` helper is extracted from `_build_spec`'s inline resolution block and is now shared with `FilesystemMCPServerRegistryBackend.load_mcp_server` so the spec/19-canonical `MCPServerConnectFailed` message shape stays identical across both call paths. The public `parse_mcp_md` at `mcp.py:410` does NOT receive the parameter; it always resolves at parse time and the docstring documents the API asymmetry (callers needing lazy resolution call `parse_mcp_md_text(..., resolve_env=False)` directly). Two hidden callers preserved: `profile/types.py:336` (`AgentProfile.from_dict`) and `profile/filesystem.py:374` (`FilesystemAgentProfileBackend.load_profile`) each gain an inline comment documenting the `resolve_env=True` invariant their `AgentProfile.mcp_servers` consumers depend on. CLI gains the `atomic-agents mcp-registry` subcommand group with four read-only subcommands at PR 1 (install/uninstall ship in PR 3): `list` prints a table of mounted servers, `show ` prints the spec with `$VAR` env refs UNRESOLVED (re-parses mcp.md with `resolve_env=False` to avoid leaking resolved secrets to stdout per the cross-model-confirmed P0 from /ship's review army), `validate ` runs the static descriptor check, and `refresh-capabilities` prints the runtime capability view (no-op on filesystem; HTTP backend at PR 4 re-probes the catalog server). `docs/spec/36-mcp-server-registry-backend.md` ships as DRAFT at 696 lines with 10 normative MUSTs in the Implementer Contract (placed before Shipping plan per spec/34 ordering): 1 name charset validation at API boundary, 2 side-effect-free construction, 3 capability honesty including the static-vs-dynamic distinction, 4 URL credential redaction, 5 cross-agent isolation at storage layer, 6 backend_id stability + close() idempotency, 7 transient-vs-permanent failure honesty (MCPRegistryUnavailable vs MCPServerNotInRegistry distinction), 8 env-var resolution semantics on MCPServerSpec at load_mcp_server time, 9 install/uninstall atomicity (filesystem uses LockBackend lease, HTTP relies on catalog server transactional storage), 10 `load_all_mcp_servers` consistency (output semantically equivalent to `[load_mcp_server(ref.name) for ref in list_mcp_servers()]`). The spec re-languages "catalog of available servers" to "mounted server source" per Codex's outside-voice finding so the per-`agent_scope` filtering model is unambiguous; HTTP backend tier-1/2/3 capability negotiation via `GET /capabilities` and `OPTIONS /mcp-servers` ships at PR 4. `AgentProfile.mcp_servers_resolved` sibling field for the substrate-agnostic audit capture (spec/24 D1 addendum) ships at PR 2 alongside the wiring. Tests: 3090 to 3101 passing, 52 skipped, zero regressions across the full suite. 100 new MCP-registry tests across 3 files (`tests/test_mcp_server_registry_conformance.py` 42 tests covering MUSTs 1-3, 5-8, 10 parametrized across registered backends with HTTP joining at PR 4; `tests/test_mcp_server_registry_filesystem_backend.py` 32 tests + 2 skipped (install/uninstall stubs reserved for PR 3 with `pytest.mark.skip(reason="land in PR 3")` markers); `tests/test_mcp_registry_cli.py` 16 tests covering all four CLI subcommands + exception handler paths including the secret-leak regression test). Pre-impl review: 5-stream parallel Sonnet plan-subagent pass (Protocol surface, conformance test plan, filesystem correctness, parse_mcp_md_text refactor risk, spec/36 coherence) caught 5 P0 + 15 P1 + 15 P2 = 35 findings BEFORE any code shipped (9 times the CorpusBackend PR 4 prep pass's 4 findings, tracking the v1.0-closer-arc complexity). /ship review army: 6 parallel reviewers (4 specialists: testing, maintainability, security, performance + Claude adversarial subagent + Codex adversarial via codex exec) surfaced 30 findings post-implementation including the cross-model triple-confirmed P0 (`mcp-registry show` leaked resolved `$VAR` env values to stdout, found independently by Codex + the security specialist + Claude adversarial) plus 5 P1 (OSError misclassified as permanent-absence violating MUST 7, malformed-section silent drop, CLI `read_paths=[]` skipping path-traversal validation, `MCPServerConnectFailed` escaping the CLI exception handler with a Python traceback, `backend_id` property absent from the Protocol declaration despite being implemented on the class) plus 8 mechanical P2 AUTO-FIXes (section names not validated before listing, `_redact_for_error_message` missing the DSN heuristic for non-scheme connection strings, vacuous test assertions, hardcoded `/tmp/dummy-agent` path, 6 late imports moved to top of file, unused Protocol import in filesystem.py, misleading "Re-probe" CLI help text on filesystem backend). All P0 + P1 + critical P2s addressed inline before push; 15 regression tests added covering each fix including `test_cli_show_does_not_leak_resolved_env_values` that asserts a resolved env value (the actual GitHub token) does NOT appear in stdout while the unresolved `$VAR` reference DOES (operator-debugging utility preserved without the leak). After PR 4 of 5 of #201 lands, `atomic-agents-stack` hits v1.0 with twelve of twelve backend protocols shipped; PR 1 extends the post-#285-revert /ship streak to 8. diff --git a/CLAUDE.md b/CLAUDE.md index 36cbd14..ee26d4a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -207,7 +207,7 @@ uv run pytest # full suite uv run pytest tests/test_.py -v # one module ``` -Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 3,199 tests collected, 2026-06-03). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol — the conformance suite is what keeps the protocol honest. +Run `uv run pytest --collect-only -q | tail -1` for the live test count (last refresh: 3,232 tests collected, 2026-06-04). New backend protocols add ~25 conformance + ~10 impl-specific tests. New features ship with tests. Migration-shaped PRs need parameterized fixture tests across the backend protocol — the conformance suite is what keeps the protocol honest. ### Releases + SemVer @@ -292,7 +292,7 @@ If the project ever needs to optimize differently, `docs/methodology.md` is the | Doc | Purpose | |-----|---------| | `docs/architecture.md` | Mental model in diagrams. Read first. | -| `docs/spec/01-...33-persona-backend.md` | Locked spec (33 docs today, 31 locked + 2 drafts at spec/26 (cascade bundle) and spec/30 (responsibility audit)). The product. | +| `docs/spec/01-...36-mcp-server-registry-backend.md` | Locked spec (35 docs today, 31 locked + 4 drafts at spec/26 (cascade bundle), spec/30 (responsibility audit), spec/35 (init wizard), and spec/36 (MCPServerRegistryBackend)). The product. | | `docs/implementation/` | Build guides per runtime (cron, Claude skill, dashboard) | | `docs/deployment/versioning.md`, `upgrading.md` | SemVer + operator runbook | | `docs/deployment/release-runbook.md` | Maintainer `/ship` runbook: two-mode workflow + manual surface check | @@ -341,7 +341,7 @@ These are not forbidden forever — they're explicitly deferred with rationale. ## Status -**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 3,199 tests collected, 2026-06-03). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: +**v0.13.0, alpha, PUBLIC.** Core runtime stable. Test suite: run `uv run pytest --collect-only -q | tail -1` for the live count (last refresh: 3,232 tests collected, 2026-06-04). Capability-gated skips fall into four buckets — ToolRegistry conformance (filesystem-shape + `supports_uninstall=False` variants), AgentProfile (skill-content + filesystem-shape on SQLite), cross-process Redis (require real Redis instead of fakeredis), and judge-conformance dispatch (LLM-only + PolicyJudge concurrent-evaluate). Full CI runs against `uv sync --extra dev --extra openai --extra validation --extra redis`. **Eleven backend protocols shipped**: - **MemoryBackend** (PR #57) — filesystem reference impl + conformance suite. - **LLMBackend** (#87) — Anthropic + OpenAI + Moonshot reference impls, registered at framework import; conformance suite parametrizes across all three. diff --git a/README.md b/README.md index 69f88e4..19bc8bb 100644 --- a/README.md +++ b/README.md @@ -182,7 +182,7 @@ Start at [`docs/README.md`](docs/README.md) for the spec entry point. The locked - [33 — PersonaBackend Protocol](docs/spec/33-persona-backend.md) — persona ownership, snapshot/restore, `persona.link.md` format - [34 — CorpusBackend Protocol](docs/spec/34-corpus-backend.md) — wiki/raw corpus protocol; filesystem + SQLite (FTS5) reference impls; GB-scale indexed full-text search - [35 — init wizard](docs/spec/35-init-wizard.md) — `atomic-agents init` on-ramp; template scaffolding + Add-to-it merge; CI-friendly `--from-template` (RFC) -- [36 — MCPServerRegistryBackend Protocol](docs/spec/36-mcp-server-registry-backend.md) — MCP server catalog + install/audit; `FilesystemMCPServerRegistryBackend` reference impl; `atomic-agents mcp-registry` CLI (DRAFT, PR 1 of 5) +- [36 — MCPServerRegistryBackend Protocol](docs/spec/36-mcp-server-registry-backend.md) — MCP server catalog + install/audit; `FilesystemMCPServerRegistryBackend` reference impl; `atomic-agents mcp-registry` CLI (DRAFT, PR 3 of 5) Each spec doc is locked when the implementation matches and tests pass. Spec changes that imply implementation changes get filed as GitHub issues. **Spec docs separate shipped behavior from explicit future / deferred boundaries** — sections that describe behavior not yet implemented are explicitly marked as such, not silently aspirational. @@ -282,7 +282,7 @@ Same pattern for OpenAI (`atomic-agents-openai`) and Moonshot (`atomic-agents-mo ## Repository structure - `atomic_agents/` — the Python package (runtime in `agent.py`; backend protocols in `memory/`, `_llm.py`, `_locks.py`, `_costs.py`, etc.; CLI in `cli.py`; preflight in `doctor.py`) -- `tests/` 3199 tests collected (3141 passing + 58 skipped), Python 3.11 + 3.12 matrix +- `tests/` 3,232 tests collected, Python 3.11 + 3.12 matrix - `docs/` — [spec entry point](docs/README.md), [`architecture.md`](docs/architecture.md), [`spec/`](docs/spec/) (31 locked docs + 4 RFCs/DRAFTs), [`deployment/`](docs/deployment/) (8 operator runbooks), [`samples/caldwell/`](docs/samples/caldwell/) (complete worked example), [`GOVERNANCE.md`](docs/GOVERNANCE.md), [`TENSIONS.md`](docs/TENSIONS.md), [`methodology.md`](docs/methodology.md) - `extras/` — operational templates (Claude Code skill wrappers, macOS LaunchAgent plists, cron examples) @@ -313,4 +313,4 @@ Before opening a PR, read [`CLAUDE.md`](CLAUDE.md) (the project's design ethos a ## Status -**v0.13.0, alpha.** Core runtime stable. 3199 tests collected (3141 passing + 58 skipped) on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` planned. The surface stabilizes at v1.0. Pre-1.0 — Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. +**v0.13.0, alpha.** Core runtime stable. 3,232 tests collected on Python 3.11 / 3.12. Eleven of twelve backend protocols shipped (see the backend protocols table above); `MCPServerRegistryBackend` in progress (PR 3 of 5). The surface stabilizes at v1.0. Pre-1.0 — Minor releases may contain breaking changes (see [`docs/deployment/versioning.md`](docs/deployment/versioning.md)). Single-maintainer project; reference implementation anyone can use, fork, or extend. diff --git a/atomic_agents/_io.py b/atomic_agents/_io.py index 697040b..e93a054 100644 --- a/atomic_agents/_io.py +++ b/atomic_agents/_io.py @@ -124,6 +124,11 @@ def cleanup_stale_tempfiles(directory: Path) -> list[Path]: """Find and delete leftover .tmp files from crashed writes. Run as part of lint or startup. Returns the list of files cleaned. + + Note: this uses rglob which scans the entire directory tree recursively. + Prefer cleanup_stale_tempfiles_for_file when the target file is known; + that function is scoped to siblings of a specific file rather than the + full tree. """ cleaned = [] if not directory.exists(): @@ -135,3 +140,28 @@ def cleanup_stale_tempfiles(directory: Path) -> list[Path]: except FileNotFoundError: pass return cleaned + + +def cleanup_stale_tempfiles_for_file(target: Path) -> list[Path]: + """Delete leftover .tmp files from crashed atomic_write calls for target. + + Scoped to the parent directory of target, matching only tempfiles that + atomic_write would have created for this specific file. The pattern is + `...tmp` (the same pattern atomic_write uses when creating + the temporary file next to the destination). + + Returns the list of files removed. Ignores FileNotFoundError races + (another process may have already cleaned the same file). + + Use inside a lock so cleanup is serialized with the install/uninstall + operation that follows. + """ + cleaned = [] + pattern = f".{target.name}.*.tmp" + for path in target.parent.glob(pattern): + try: + path.unlink() + cleaned.append(path) + except FileNotFoundError: + pass + return cleaned diff --git a/atomic_agents/cli.py b/atomic_agents/cli.py index 9195a0a..7516e01 100644 --- a/atomic_agents/cli.py +++ b/atomic_agents/cli.py @@ -20,6 +20,8 @@ atomic-agents corpus query TEXT --corpus wiki [--top-k N] [--agent-root PATH] atomic-agents corpus version NAME --corpus wiki [--agent-root PATH] atomic-agents corpus restore NAME VERSION_ID --corpus wiki [--agent-root PATH] + atomic-agents mcp-registry install --command [options] + atomic-agents mcp-registry uninstall atomic-agents init [--from-template advisor] [--agents-root PATH] atomic-agents init --list-templates @@ -40,6 +42,7 @@ from __future__ import annotations import argparse import os +import re import sys from pathlib import Path @@ -378,12 +381,13 @@ def main(argv: list[str] | None = None) -> int: # ── mcp-registry subcommand group ──────────────────────────────────── mcp_registry_cmd = sub.add_parser( "mcp-registry", - help="Inspect mounted MCP servers (list, show, validate, refresh-capabilities)", + help="Inspect and manage MCP servers (list, show, validate, install, uninstall, refresh-capabilities)", description=( - "Read-only inspection of the MCP server registry for an agent. " + "Inspect and manage the MCP server registry for an agent. " "Uses the filesystem backend by default (reads /mcp.md). " "Override with ATOMIC_AGENTS_MCP_SERVER_REGISTRY_BACKEND env var. " - "Install and uninstall subcommands are deferred to PR 3." + "Read-only subcommands: list, show, validate, refresh-capabilities. " + "Write subcommands (PR 3+): install, uninstall." ), ) mcp_registry_sub = mcp_registry_cmd.add_subparsers( @@ -439,6 +443,66 @@ def main(argv: list[str] | None = None) -> int: help="override ATOMIC_AGENTS_AGENT_ROOT (default: $ATOMIC_AGENTS_AGENT_ROOT or cwd)", ) + # mcp-registry install --command [options] + mcp_registry_install = mcp_registry_sub.add_parser( + "install", + help="Install a new MCP server into the registry (write path; PR 3+)", + ) + mcp_registry_install.add_argument( + "name", help="MCP server name (charset: [a-zA-Z0-9_.+@-]+)" + ) + mcp_registry_install.add_argument( + "--command", + required=True, + help="executable for the MCP server (e.g., npx, docker, uv)", + ) + mcp_registry_install.add_argument( + "--args", + default="", + help=( + "comma-separated args for the executable. Example: " + "--args -y,@modelcontextprotocol/server-github. Default: none." + ), + ) + mcp_registry_install.add_argument( + "--env", + default="", + help=( + "comma-separated KEY=$VAR pairs. Use env-var references " + "(KEY=$VAR_NAME), NOT literal values, to avoid storing secrets in " + "mcp.md plaintext. Example: GITHUB_PAT=$GITHUB_PAT,LOG_LEVEL=$LOG_LEVEL. " + "Default: none." + ), + ) + mcp_registry_install.add_argument( + "--description", + default="", + help="operator-readable description (single line). Default: empty.", + ) + mcp_registry_install.add_argument( + "--transport", + default="stdio", + choices=["stdio"], + help="MCP server transport protocol (only 'stdio' is supported in v1.0; default: stdio)", + ) + mcp_registry_install.add_argument( + "--agent-root", + default=None, + help="override ATOMIC_AGENTS_AGENT_ROOT (default: $ATOMIC_AGENTS_AGENT_ROOT or cwd)", + ) + + # mcp-registry uninstall [--agent-root PATH] + mcp_registry_uninstall = mcp_registry_sub.add_parser( + "uninstall", + help="Remove an MCP server from the registry (write path; PR 3+). Idempotent.", + ) + mcp_registry_uninstall.add_argument("name", help="MCP server name to remove") + mcp_registry_uninstall.add_argument( + "--agent-root", + default=None, + help="override ATOMIC_AGENTS_AGENT_ROOT (default: $ATOMIC_AGENTS_AGENT_ROOT or cwd)", + ) + # ── init subcommand ─────────────────────────────────────────────────── init_cmd = sub.add_parser( "init", @@ -1060,7 +1124,7 @@ def _corpus_show(backend, name: str, corpus: str) -> int: if page.confidence is not None: print(f"confidence: {page.confidence}") if page.pinned: - print(f"pinned: true") + print("pinned: true") ts = ref.last_modified.strftime("%Y-%m-%dT%H:%M:%SZ") print(f"last_modified: {ts}") print(f"byte_size: {ref.byte_size}") @@ -1160,13 +1224,13 @@ def _cmd_mcp_registry(args) -> int: Exit codes: 0 on success, 1 on any error. Errors go to stderr; normal output to stdout. Zero LLM calls -- pure local I/O. - - Install and uninstall subcommands are deferred to PR 3. """ from .mcp_registry import ( MCPRegistryAuthRequired, MCPRegistryDescriptorInvalid, + MCPRegistryError, MCPRegistryUnavailable, + MCPServerAlreadyInstalled, MCPServerNotInRegistry, get_default_mcp_server_registry_backend, ) @@ -1175,9 +1239,8 @@ def _cmd_mcp_registry(args) -> int: agent_root = _resolve_mcp_registry_agent_root(args) # Empty read_paths means the CLI skips path-traversal validation. This is - # consistent with the read-only inspection use case (mcp-registry show/validate/ - # list); install/uninstall in PR 3 will require non-empty read_paths from agent - # runtime context. + # consistent with the inspection use case (mcp-registry show/validate/list); + # install/uninstall write directly to mcp.md without needing read_paths. try: backend = get_default_mcp_server_registry_backend(agent_root, []) except Exception as e: @@ -1195,9 +1258,24 @@ def _cmd_mcp_registry(args) -> int: return _mcp_registry_validate(backend, args.name) elif sub_cmd == "refresh-capabilities": return _mcp_registry_refresh_capabilities(backend) + elif sub_cmd == "install": + return _mcp_registry_install( + backend, + args.name, + args.command, + args.args, + args.env, + args.description, + args.transport, + ) + elif sub_cmd == "uninstall": + return _mcp_registry_uninstall(backend, args.name) except MCPServerConnectFailed as e: print(f"Error: env var not resolved: {e}", file=sys.stderr) return 1 + except MCPServerAlreadyInstalled as e: + print(f"Error: MCP server already installed: {e}", file=sys.stderr) + return 1 except MCPRegistryDescriptorInvalid as e: print(f"Error: MCP server descriptor is invalid: {e}", file=sys.stderr) return 1 @@ -1216,6 +1294,9 @@ def _cmd_mcp_registry(args) -> int: except MCPServerNotInRegistry as e: print(f"Error: MCP server not found: {e}", file=sys.stderr) return 1 + except MCPRegistryError as e: + print(f"Error: MCP registry error: {e}", file=sys.stderr) + return 1 except ValueError as e: print(f"Error: invalid server name: {e}", file=sys.stderr) return 1 @@ -1315,5 +1396,146 @@ def _mcp_registry_refresh_capabilities(backend) -> int: return 0 +def _parse_env_flag(raw: str) -> dict[str, str]: + """Parse comma-separated K=V pairs from --env flag. + + Rules per spec/36 + Stream D finding D-F2: + - Empty input -> {}. + - Split on ',' for entries; split on FIRST '=' for each entry (values + may contain '='). + - Empty key raises ValueError with a clear message. + - Empty value is valid (results in empty string). + - An entry with no '=' raises ValueError. + + Note: this parser does NOT support comma-containing values. Operators + needing such values must edit mcp.md directly or split into multiple + registrations. + """ + if not raw: + return {} + result: dict[str, str] = {} + for pair in raw.split(","): + if "=" not in pair: + raise ValueError( + f"--env entry {pair!r} is missing '='. " + f"Expected format: KEY=$VAR_NAME or KEY=value" + ) + key, val = pair.split("=", 1) # split on FIRST '=' only + if not key: + raise ValueError( + f"--env entry {pair!r} has an empty key. Expected format: KEY=$VAR_NAME" + ) + result[key] = val + return result + + +def _parse_args_flag(raw: str) -> list[str]: + """Parse comma-separated args from --args flag. + + Rules per Stream D finding D-F3: + - Empty input -> []. + - Each entry is stripped of leading/trailing whitespace. + - Empty entries after stripping are silently dropped. + """ + if not raw: + return [] + return [a.strip() for a in raw.split(",") if a.strip()] + + +def _mcp_registry_install( + backend, + name: str, + command: str, + args_raw: str, + env_raw: str, + description: str, + transport: str, +) -> int: + """Install a new MCP server. Per Stream D findings D-F1/D-F4/D-F5/D-F9.""" + from .mcp import MCPServerSpec + + # Parse args / env at CLI boundary (raises ValueError on bad shape; caught above). + args_list = _parse_args_flag(args_raw) + env_dict = _parse_env_flag(env_raw) + + # Defense-in-depth: refuse newlines in --command, --args, --env at the CLI + # boundary so operators see a clean, named-flag error before the renderer's + # ValueError fires. Newlines in these fields would create phantom H2 sections + # in mcp.md that bypass collision detection and name validation. + if "\n" in command: + print( + "Error: --command must not contain newline characters.", + file=sys.stderr, + ) + return 1 + for i, arg in enumerate(args_list): + if "\n" in arg: + print( + f"Error: --args item at index {i} must not contain newline characters.", + file=sys.stderr, + ) + return 1 + for env_key, env_val in env_dict.items(): + if "\n" in env_key or "\n" in env_val: + print( + f"Error: --env {env_key!r} key or value must not contain newline " + f"characters.", + file=sys.stderr, + ) + return 1 + + # Stream D finding D-F9 + Stream B finding B-10 + Security finding: + # refuse description with H2-looking content (catches both '## ' and '##\t' + # to match the renderer's re.match(r'^##\s', line) guard). + if description and any( + re.match(r"^##\s", line) for line in description.splitlines() + ): + print( + "Error: --description must not contain lines starting with '## ' " + "(H2 headers delimit server sections in mcp.md).", + file=sys.stderr, + ) + return 1 + + # Stream D finding D-F1 (decision 3 = WARN + proceed, v1.0 contract): + # warn on env values that do not look like $VAR references. Literal values + # land on disk plaintext. + for env_key, env_val in env_dict.items(): + if not env_val.startswith("$"): + print( + f"Warning: --env {env_key}= does not look like an env-var " + f"reference (expected $VAR_NAME). Literal values are stored " + f"unredacted in mcp.md. Use --env {env_key}=$SOME_VAR_NAME to " + f"store the reference instead.", + file=sys.stderr, + ) + + spec = MCPServerSpec( + name=name, + command=command, + args=args_list, + env=env_dict, + transport=transport, + description=description, + ) + + # Install. backend.install may raise (caught by outer handler). + ref = backend.install(spec) + + # Stream D finding D-F4: success output prints ONLY the Ref.name. + # NEVER echo env / command / args / spec repr / load_mcp_server result. + print(f"Installed MCP server {ref.name!r}.") + return 0 + + +def _mcp_registry_uninstall(backend, name: str) -> int: + """Uninstall an MCP server. Idempotent. Per Stream D finding D-F10.""" + backend.uninstall(name) + # Idempotent: print neutral message that is valid for both present-removed + # and absent-no-op paths. + print(f"Uninstalled MCP server {name!r} (or was not present).") + return 0 + + if __name__ == "__main__": sys.exit(main()) diff --git a/atomic_agents/mcp.py b/atomic_agents/mcp.py index 62b8ea8..d2033bf 100644 --- a/atomic_agents/mcp.py +++ b/atomic_agents/mcp.py @@ -651,6 +651,130 @@ def _build_spec( ) +# ────────────────────────────────────────────────────────────────── +# mcp.md serializers + + +def render_mcp_md_section(spec: MCPServerSpec) -> str: + """Render a single MCPServerSpec as an mcp.md H2 section string. + + Round-trip property: parsing the output with + ``parse_mcp_md_text(render_mcp_md_section(spec), resolve_env=False)`` + returns a spec equal to the input field-for-field (after env resolution + on the re-parsed result). + + env values are written verbatim. $VAR references are NEVER resolved here + per spec/36 Decision 7: resolved values must not persist to disk. + + Raises: + ValueError: if spec.command is empty or None. + ValueError: if spec.command, any spec.args item, any spec.env key, + or any spec.env value contains a newline character (newline + injection would create phantom H2 sections the parser reads as + separate server declarations, bypassing collision detection). + ValueError: if spec.description contains a line starting with '## ' + or '##\t' (H2 injection defense per Stream D finding D-F9 + + Stream B B-10 + security finding on tab-separated H2). + """ + if not spec.command: + raise ValueError( + f"MCP server {spec.name!r}: cannot render section without a command. " + f"Set spec.command to a non-empty executable name." + ) + + # Newline injection guard: a newline in command/args/env would let an API + # caller write a line that the parser reads as a new H2 section header, + # creating a phantom server entry that bypasses collision detection and + # name validation. Reject BEFORE any output is written. + if "\n" in spec.command: + raise ValueError( + f"MCP server {spec.name!r}: spec.command must not contain newline " + f"characters (newline injection creates phantom mcp.md sections)." + ) + for i, arg in enumerate(spec.args): + if "\n" in arg: + raise ValueError( + f"MCP server {spec.name!r}: spec.args[{i}] must not contain " + f"newline characters (newline injection creates phantom mcp.md " + f"sections)." + ) + for env_key, env_val in spec.env.items(): + if "\n" in env_key: + raise ValueError( + f"MCP server {spec.name!r}: env key {env_key!r} must not contain " + f"newline characters (newline injection creates phantom mcp.md " + f"sections)." + ) + if "\n" in env_val: + raise ValueError( + f"MCP server {spec.name!r}: env value for {env_key!r} must not " + f"contain newline characters (newline injection creates phantom " + f"mcp.md sections)." + ) + + # Guard against H2 injection via description. A description containing a + # line that starts with '## ' or '##\t' would make the parser treat it as + # a new section header, breaking the round-trip property. + if spec.description: + for line in spec.description.splitlines(): + if re.match(r"^##\s", line): + raise ValueError( + f"MCP server {spec.name!r}: description contains a line that " + f"starts with '## ' (H2 injection). Strip or replace the " + f"offending line before rendering." + ) + + lines: list[str] = [f"## {spec.name}"] + lines.append(f"command: {spec.command}") + + # args: only written when non-empty; comma+space separator. + if spec.args: + lines.append("args: " + ", ".join(spec.args)) + + # env: one line per key-value pair, sorted by key for determinism. + # Values are written as-is (no $VAR resolution) per Decision 7. + if spec.env: + for key in sorted(spec.env): + lines.append(f"env: {key}={spec.env[key]}") + + # transport: only written when non-default (parser defaults to "stdio"). + if spec.transport and spec.transport != "stdio": + lines.append(f"transport: {spec.transport}") + + # description: first line only (newlines stripped to single line). + if spec.description: + first_line = spec.description.splitlines()[0].strip() + if first_line: + lines.append(f"description: {first_line}") + + # Section ends with a trailing newline. + return "\n".join(lines) + "\n" + + +def render_mcp_md_full(specs: list[MCPServerSpec]) -> str: + """Render a full mcp.md file from a list of MCPServerSpec objects. + + Always prefixes the ``# MCP servers`` H1 header. An empty list yields + just the H1 header (preserves the file's identity for downstream + watchers that detect content changes by diffing). + + Round-trip property: parsing the output with + ``parse_mcp_md_text(render_mcp_md_full(specs), resolve_env=False)`` + returns a list structurally equal to the input list (field-for-field + after env resolution). + + Raises: + ValueError: if any spec.command is empty/None (propagated from + render_mcp_md_section). + """ + header = "# MCP servers\n" + if not specs: + return header + + sections = "\n".join(render_mcp_md_section(spec) for spec in specs) + return header + "\n" + sections + + # ────────────────────────────────────────────────────────────────── # Path-traversal check for MCP server args @@ -672,7 +796,6 @@ def validate_mcp_server_args( This is best-effort — we can't know what every MCP server treats as a path. The obvious path-shaped cases get caught here. """ - from ._io import safe_resolve_under from .exceptions import PathTraversalError if not agent_read_paths: diff --git a/atomic_agents/mcp_registry/backend.py b/atomic_agents/mcp_registry/backend.py index 82537d3..7387a97 100644 --- a/atomic_agents/mcp_registry/backend.py +++ b/atomic_agents/mcp_registry/backend.py @@ -21,8 +21,11 @@ for the normative contract. The module-level ``_default_load_all`` helper is the canonical default -implementation for ``load_all_mcp_servers()``. Filesystem backend delegates to -it directly (PR 1). HTTP backend overrides with a single bulk GET at PR 4. +implementation for ``load_all_mcp_servers()``. The filesystem backend overrides +``load_all_mcp_servers()`` with a custom single-read-parse for better exception +mapping (distinguishing ENOENT from transient OSError, and surfacing parse errors +as MCPRegistryDescriptorInvalid). ``_default_load_all`` is the fallback for +backends that do not override ``load_all_mcp_servers()``. Backends overriding MUST preserve the MUST 10 consistency guarantee: the output must be semantically equivalent to ``[load_mcp_server(ref.name) for ref in list_mcp_servers()]``. @@ -32,6 +35,7 @@ from typing import TYPE_CHECKING, Protocol, runtime_checkable +from ..exceptions import AtomicAgentsError from .types import MCPServerRegistryCapabilities, MCPServerRef, ValidationResult if TYPE_CHECKING: @@ -42,12 +46,14 @@ # Exception classes -class MCPRegistryError(Exception): +class MCPRegistryError(AtomicAgentsError): """Base class for MCPServerRegistry subsystem errors (spec/36). All MCPServerRegistry reference implementations raise subclasses of this exception. Operators may ``except MCPRegistryError`` to catch the entire - MCP registry error family. + MCP registry error family. Inherits from ``AtomicAgentsError`` so + framework-wide catch-alls (``except AtomicAgentsError``) see registry + failures automatically (spec/36 Decision 4 / prep finding E12). """ diff --git a/atomic_agents/mcp_registry/filesystem.py b/atomic_agents/mcp_registry/filesystem.py index e0ef2de..b521a6f 100644 --- a/atomic_agents/mcp_registry/filesystem.py +++ b/atomic_agents/mcp_registry/filesystem.py @@ -5,12 +5,10 @@ has its own ``/mcp.md`` file; the backend is scoped per-agent via ``agent_root``. -Read paths only at PR 1. Install and uninstall ship in PR 3 alongside the -``LockBackend`` lease integration (per spec/36 D5 + D6 PR cadence). - -Constructor signature stability: ``lock_backend`` is accepted but unused at PR 1 -so PR 3 callers can pass the kwarg without breaking PR 1 callers. This matches -the spec/36 §"FilesystemMCPServerRegistryBackend" constructor note. +Implements the full MCPServerRegistryBackend Protocol including atomic +install/uninstall via LockBackend (spec/36 MUST 9). The lock is acquired +before any read-modify-write on mcp.md, and stale tempfiles from crashed +prior installs are cleaned up inside the lock before reading. """ from __future__ import annotations @@ -24,15 +22,20 @@ from pathlib import Path from typing import TYPE_CHECKING +from .._io import atomic_write, cleanup_stale_tempfiles_for_file +from ..exceptions import LockBusy, LockLost +from ..locks import check_lock_lost, get_default_lock_backend from ..mcp import ( MCPServerSpec, _resolve_env_vars, parse_mcp_md_text, + render_mcp_md_full, validate_mcp_server_args, ) from .backend import ( MCPRegistryDescriptorInvalid, MCPRegistryUnavailable, + MCPServerAlreadyInstalled, MCPServerNotInRegistry, ) from .types import MCPServerRef, MCPServerRegistryCapabilities, ValidationResult @@ -83,12 +86,16 @@ def _validate_server_name(name: str) -> None: class FilesystemMCPServerRegistryBackend: """``mcp.md``-backed implementation of ``MCPServerRegistryBackend`` (spec/36). - Reads ``/mcp.md`` for server declarations. Covers the read - path only at PR 1 (list, load, load_all, validate, capabilities, - refresh_capabilities, close). Install and uninstall land in PR 3 with the - ``LockBackend`` lease integration. + Reads and writes ``/mcp.md`` for server declarations. Supports + the full MCPServerRegistryBackend Protocol: list, load, load_all, validate, + install, uninstall, capabilities, refresh_capabilities, close. + + Install and uninstall use LockBackend lease acquisition around every + read-modify-write on mcp.md (spec/36 MUST 9). Stale tempfiles from + prior crashed installs accumulate until the next install or uninstall; + cleanup is performed inside the lock before reading mcp.md. - Constructor: + Constructor parameters: ``agent_root``: the agent's directory. ``mcp.md`` lives at ``/mcp.md``. MAY not exist at construction -- MUST 2 @@ -99,12 +106,15 @@ class FilesystemMCPServerRegistryBackend: path-traversal check (Decision 8 of spec/36). Captured at construction; NOT applied at ``list_mcp_servers()`` time (list is metadata-only). - ``lock_backend``: reserved for PR 3 (install/uninstall atomicity via - ``LockBackend.acquire("mcp_registry", timeout=30)``). Accepted but - unused at PR 1 so PR 3 callers can pass the kwarg without a constructor - change. Operators passing a custom ``lock_backend`` MUST scope it to a - registry-specific resource (e.g., ``.mcp_registry.lock``), NOT the - agent's main ``.lock`` file. + ``lock_backend``: the LockBackend to use for install/uninstall atomicity. + When None (the default), the backend is resolved lazily at first use + via ``get_default_lock_backend(agent_root)``, which honors the + ``ATOMIC_AGENTS_LOCK_BACKEND`` env var. Operators passing a custom + ``lock_backend`` MUST scope it to a registry-specific resource (e.g., + ``.mcp_registry.lock``), NOT the agent's main ``.lock`` file. + + ``install_lock_timeout``: seconds to wait for the mcp_registry lock before + raising MCPRegistryUnavailable. Defaults to 30.0. """ def __init__( @@ -112,13 +122,15 @@ def __init__( agent_root: Path, read_paths: list, *, - lock_backend: LockBackend | None = None, + lock_backend: "LockBackend | None" = None, + install_lock_timeout: float = 30.0, ) -> None: # MUST 2: side-effect-free construction. No file opens, no subprocess # spawns, no network calls here. self._agent_root = agent_root self._read_paths = read_paths - self._lock_backend = lock_backend # unused at PR 1; reserved for PR 3 + self._lock_backend = lock_backend # may be None; resolved lazily at first use + self._install_lock_timeout = install_lock_timeout # ─── Capability advertisement ───────────────────────────────────────── @@ -126,14 +138,14 @@ def __init__( def capabilities(self) -> MCPServerRegistryCapabilities: """Static capability advertisement. Constant for the lifetime of this instance. - ``supports_install`` and ``supports_uninstall`` are False at PR 1 - (methods ship in PR 3). Capability honesty (MUST 3): the conformance - suite asserts that False-reported capabilities raise ``NotImplementedError``. - This flips to True at PR 3 when the methods land. + ``supports_install`` and ``supports_uninstall`` flip to True at PR 3 + when the install/uninstall methods land (per spec/36 Decision 5 + + MUST 3 capability honesty). The conformance suite asserts that + True-reported capabilities do NOT raise ``NotImplementedError``. """ return MCPServerRegistryCapabilities( - supports_install=False, - supports_uninstall=False, + supports_install=True, + supports_uninstall=True, supports_capability_handshake=False, supports_audit=False, durable=True, @@ -466,31 +478,273 @@ def validate(self, name: str) -> ValidationResult: ok = len(errors) == 0 return ValidationResult(ok=ok, errors=errors, warnings=warnings) + # ─── Lock backend resolution ────────────────────────────────────────── + + def _resolve_lock_backend(self) -> "LockBackend": + """Lazily resolve the default lock backend on first use. + + Routes through ``get_default_lock_backend(agent_root)`` so operators + who have set ``ATOMIC_AGENTS_LOCK_BACKEND=redis`` automatically get + ``RedisLockBackend`` without any extra configuration here. Single-host + operators get ``FilesystemLockBackend(agent_root)`` which stores the + lock at ``/.mcp_registry.lock`` (distinct from the agent's + main ``.lock`` so the two locks never deadlock per spec/36 constructor + docstring). + + Callers that pass ``lock_backend=`` at construction bypass this + entirely and always get their custom backend. + + Raises: + MCPRegistryUnavailable: if the lock backend cannot be resolved + (e.g., ATOMIC_AGENTS_LOCK_BACKEND is set to an unknown value, + a required optional dependency is missing, or the URL is + malformed). Wraps any exception from get_default_lock_backend + so the CLI's existing MCPRegistryError catch-all handles it + cleanly instead of surfacing a raw Python traceback. + """ + if self._lock_backend is None: + try: + self._lock_backend = get_default_lock_backend(self._agent_root) + except Exception as exc: + raise MCPRegistryUnavailable( + f"could not resolve lock backend: {exc}" + ) from exc + return self._lock_backend + # ─── Capability-gated lifecycle (PR 3) ─────────────────────────────── def install(self, spec: MCPServerSpec) -> MCPServerRef: - """Not implemented at PR 1. Lands in PR 3 with LockBackend integration. - - ``capabilities.supports_install=False`` at this PR -- conformance suite - asserts this path raises ``NotImplementedError`` when ``supports_install`` - reports False (MUST 3). + """Install a new MCP server into mcp.md atomically. + + Uses a LockBackend lease around the read-modify-write critical + section per spec/36 MUST 9 and the Install / uninstall semantics + section added at PR 3. + + Raises: + ValueError: invalid spec.name (charset or path-traversal). + ValueError: spec.command is empty or None. + MCPServerAlreadyInstalled: name already exists in mcp.md. + MCPRegistryUnavailable: lock contention timeout, lock lease + expired mid-install, or filesystem I/O error. + MCPRegistryDescriptorInvalid: existing mcp.md cannot be parsed. + + Returns an MCPServerRef projecting name/description/transport from + the input spec. version is always None; source is set to + ``mcp.md#section:``. The Ref carries no env/command/args so + the caller can safely echo it without leaking secrets. """ - raise NotImplementedError( - "FilesystemMCPServerRegistryBackend.install lands in PR 3 alongside " - "LockBackend integration and atomic mcp.md write semantics." + # MUST 1: charset validation at the API boundary, BEFORE any I/O. + _validate_server_name(spec.name) + + # spec.command is required for rendering. + if not spec.command: + raise ValueError( + f"MCP server {spec.name!r}: install requires a non-empty command." + ) + + lock_backend = self._resolve_lock_backend() + + # MUST 9: acquire the registry lock. LockBusy on timeout maps to + # MCPRegistryUnavailable so callers see a consistent transient-failure + # exception rather than a LockBusy leaking through the abstraction. + try: + handle = lock_backend.acquire( + "mcp_registry", timeout=self._install_lock_timeout + ) + except LockBusy as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock contention: another install or uninstall " + f"may be in progress ({exc})" + ) from exc + + with handle: + mcp_md = self._agent_root / "mcp.md" + + # Cleanup stale tempfiles from prior crashed installs inside the + # lock so cleanup is serialized with this install operation. Scoped + # to siblings of mcp.md (not recursive) to avoid touching unrelated + # files elsewhere under agent_root (MUST 2 spirit). + try: + cleanup_stale_tempfiles_for_file(mcp_md) + except OSError: + _logger.warning( + "cleanup_stale_tempfiles_for_file failed in install; continuing", + exc_info=True, + ) + + # Read current mcp.md content. FileNotFoundError means no servers + # installed yet; treat as empty for the cold-start case. + try: + content = mcp_md.read_text(encoding="utf-8") + except FileNotFoundError: + content = "" + except OSError as exc: + raise MCPRegistryUnavailable(f"cannot read {mcp_md}: {exc}") from exc + + # Parse with resolve_env=False to keep $VAR refs raw in memory. + try: + specs = parse_mcp_md_text( + content, + mcp_md_path=mcp_md, + read_paths=None, + resolve_env=False, + ) + except Exception as exc: + raise MCPRegistryDescriptorInvalid( + f"mcp.md at {mcp_md} could not be parsed: {exc}" + ) from exc + + # Dual-probe collision detection (Stream B finding B-4). + # Check both the raw H2 header text AND the parsed spec list so a + # malformed section (no command:) still blocks a name collision. + h2_names = set(re.findall(r"^## (\S+)", content, re.MULTILINE)) + parsed_names = {s.name for s in specs} + if spec.name in h2_names or spec.name in parsed_names: + raise MCPServerAlreadyInstalled( + f"MCP server {spec.name!r} is already in mcp.md. " + f"Uninstall it first or choose a different name." + ) + + # Lease check before write: no-op for filesystem (no heartbeat); + # raises LockLost for Redis when the lease has expired mid-install. + try: + check_lock_lost(handle) + except LockLost as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock lease expired mid-install: {exc}" + ) from exc + except Exception as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock state check failed mid-install: {exc}" + ) from exc + + # Render the updated file (existing specs + new spec). + updated = list(specs) + [spec] + rendered = render_mcp_md_full(updated) + + try: + atomic_write(mcp_md, rendered) + except OSError as exc: + raise MCPRegistryUnavailable(f"cannot write {mcp_md}: {exc}") from exc + + # with-block exits here; handle.__exit__ releases the lock. + + # Project MCPServerRef from spec. No env/command/args in the Ref + # so the caller can safely echo it without leaking secrets. + description_first_line = ( + spec.description.splitlines()[0].strip() if spec.description else "" + ) + return MCPServerRef( + name=spec.name, + description=description_first_line, + transport=spec.transport, + version=None, + source=f"mcp.md#section:{spec.name}", ) def uninstall(self, name: str) -> None: - """Not implemented at PR 1. Lands in PR 3 with LockBackend integration. + """Remove an MCP server from mcp.md atomically. Idempotent. + + Missing name is a no-op -- no exception raised, matches the + SQLiteToolRegistryBackend.uninstall precedent (spec/25). + + No pre-lock fast-path: the lock is always acquired before reading + mcp.md because a concurrent install could add the name between an + unlocked check and the subsequent read-modify-write. Per spec/36 + MUST 9 (no fast-path shortcut). - ``capabilities.supports_uninstall=False`` at this PR -- conformance suite - asserts this path raises ``NotImplementedError`` when ``supports_uninstall`` - reports False (MUST 3). + Raises: + ValueError: invalid name (charset or path-traversal). + MCPRegistryUnavailable: lock contention timeout, lock lease + expired mid-uninstall, or filesystem I/O error. + MCPRegistryDescriptorInvalid: existing mcp.md cannot be parsed. + + Returns None on both the present-and-removed path AND the + absent-no-op path. """ - raise NotImplementedError( - "FilesystemMCPServerRegistryBackend.uninstall lands in PR 3 alongside " - "LockBackend integration and atomic mcp.md write semantics." - ) + # MUST 1: charset validation at the API boundary BEFORE any I/O. + _validate_server_name(name) + + lock_backend = self._resolve_lock_backend() + + try: + handle = lock_backend.acquire( + "mcp_registry", timeout=self._install_lock_timeout + ) + except LockBusy as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock contention: another install or uninstall " + f"may be in progress ({exc})" + ) from exc + + with handle: + mcp_md = self._agent_root / "mcp.md" + + # Cleanup stale tempfiles from prior crashed installs inside the + # lock so cleanup is serialized with this uninstall operation. + try: + cleanup_stale_tempfiles_for_file(mcp_md) + except OSError: + _logger.warning( + "cleanup_stale_tempfiles_for_file failed in uninstall; continuing", + exc_info=True, + ) + + try: + content = mcp_md.read_text(encoding="utf-8") + except FileNotFoundError: + # Absent mcp.md means no servers; nothing to uninstall. + _logger.debug( + "uninstall: mcp.md does not exist at %s; no-op for %r", + mcp_md, + name, + ) + return + except OSError as exc: + raise MCPRegistryUnavailable(f"cannot read {mcp_md}: {exc}") from exc + + try: + specs = parse_mcp_md_text( + content, + mcp_md_path=mcp_md, + read_paths=None, + resolve_env=False, + ) + except Exception as exc: + raise MCPRegistryDescriptorInvalid( + f"mcp.md at {mcp_md} could not be parsed: {exc}" + ) from exc + + # Dual-probe absent check (mirrors install's collision check). + h2_names = set(re.findall(r"^## (\S+)", content, re.MULTILINE)) + parsed_names = {s.name for s in specs} + if name not in h2_names and name not in parsed_names: + # Idempotent no-op: name is not present, nothing to remove. + _logger.debug("uninstall: server %r not in registry; no-op", name) + return + + # Lease check before write. + try: + check_lock_lost(handle) + except LockLost as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock lease expired mid-uninstall: {exc}" + ) from exc + except Exception as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock state check failed mid-uninstall: {exc}" + ) from exc + + # Render the file with the named spec removed. + updated = [s for s in specs if s.name != name] + rendered = render_mcp_md_full(updated) + + try: + atomic_write(mcp_md, rendered) + except OSError as exc: + raise MCPRegistryUnavailable(f"cannot write {mcp_md}: {exc}") from exc + + return None # ─── Lifecycle ──────────────────────────────────────────────────────── diff --git a/docs/spec/36-mcp-server-registry-backend.md b/docs/spec/36-mcp-server-registry-backend.md index 9a8c977..6aad17e 100644 --- a/docs/spec/36-mcp-server-registry-backend.md +++ b/docs/spec/36-mcp-server-registry-backend.md @@ -352,33 +352,94 @@ FilesystemMCPServerRegistryBackend( read_paths: list[Path], *, lock_backend: LockBackend | None = None, + install_lock_timeout: float = 30.0, ) ``` - `agent_root` is the agent's directory. mcp.md lives at `/mcp.md`. - `agent_root` MAY not exist at construction (matches `FilesystemToolRegistryBackend` precedent). `list_mcp_servers()` returns `[]` for missing / empty mcp.md. - `read_paths` is the list of paths the agent declares it may read (from `tools.md`). Used by `load_mcp_server(name)` to apply path-traversal validation (per Decision 8). -- `lock_backend` (per MUST 9): if absent, defaults to `FilesystemLockBackend(agent_root / ".mcp_registry.lock")`. The default lock file `.mcp_registry.lock` is **distinct** from the agent's main `.lock` (which the runtime acquires inside `agent.call()` for cost-cap and run-id serialization); the two locks never overlap and cannot deadlock even on non-reentrant `LockBackend` implementations. Operators passing a custom `lock_backend` MUST scope it to `.mcp_registry.lock` (or an equivalent registry-specific resource), NOT the agent's main lock; the docstring on the constructor parameter calls this out explicitly. +- `lock_backend` (per MUST 9): if absent, defaults to `get_default_lock_backend(agent_root)` from `atomic_agents.locks` (respects the `ATOMIC_AGENTS_LOCK_BACKEND` env var so multi-host operators on Cloud Run / Kubernetes pinning `=redis` get a `RedisLockBackend` automatically; single-host operators get `FilesystemLockBackend(agent_root)`). The lock is acquired with `name="mcp_registry"` which the filesystem backend maps to `/.mcp_registry.lock`. This lock file is **distinct** from the agent's main `.lock` (which the runtime acquires inside `agent.call()` for cost-cap and run-id serialization); the two locks never overlap and cannot deadlock even on non-reentrant `LockBackend` implementations. Operators passing a custom `lock_backend` MUST scope it to a registry-specific resource, NOT the agent's main lock; the docstring on the constructor parameter calls this out explicitly and names the failure mode (passing the same backend used for `agent.call()` raises `LockBusy` whenever `agent.call()` is in flight, because both operations would compete for the same lock resource). +- `install_lock_timeout` (per MUST 9): seconds to wait for the registry lock during `install` and `uninstall` before raising `MCPRegistryUnavailable`. Default 30 seconds matches typical operator-CLI patience. CI pipelines that want fail-fast set `install_lock_timeout=0.0`; NFS-mounted deployments with slow filesystems may raise it. The kwarg is per-instance and immutable post-construction (mirrors the `apply_staging_lock_timeout` precedent on `FilesystemBackend` per spec/21). - Reuses `parse_mcp_md_text()` from `atomic_agents/mcp.py:432` with the new `resolve_env=False` parameter (per Decision 7's implementation note); the backend resolves `$VAR` references itself at `load_mcp_server(name)` time. - `name` validated against path-traversal at API boundary: refuses `/`, `\\`, `..`, leading `.`, control chars, newlines. -**`install(spec)` atomicity (PR 3):** +#### Install / uninstall semantics (PR 3) -1. `lock_backend.acquire("mcp_registry", timeout=30)` (per spec/21's `acquire(name="", timeout=0.0)` signature at `locks/backend.py:82`). -2. Read current mcp.md. -3. Parse with `parse_mcp_md_text(content, resolve_env=False)`. -4. Check name collision; if present, release lock and raise `MCPServerAlreadyInstalled`. -5. Append new `## ` section to the markdown. -6. `_io.atomic_write` the merged content back (temp + fsync + rename + parent dir fsync). -7. Release lock via the returned `LockHandle.release()` (the spec/21 idiom). +The PR 3 write paths implement MUST 9 (atomicity + idempotency) through a strict read-modify-write critical section guarded by the `LockBackend` lease. Both `install(spec)` and `uninstall(name)` follow the same shape with one branch difference. -Crash between steps 1 and 6 leaves the original mcp.md intact; crash between 6 and 7 leaves the new content with a stale lock (lease expiry per spec/21's `LockLost` discipline handles recovery). +**Common preamble (both methods).** Validate `name` (or `spec.name`) charset at the API boundary per MUST 1 via `_validate_server_name(...)` BEFORE acquiring the lock or touching the filesystem. Refuses path-traversal tokens, control chars, newlines, leading dot, empty string. Raises `ValueError` cheaply for invalid input without contending for the lock. -**`uninstall(name)` atomicity (PR 3):** same shape; idempotent. Missing-name case completes steps 1-3, observes the name is absent, releases the lock, returns without raising. +**Critical section (steps 2-7, wrapped in the lock).** Use the spec/21 context-manager idiom so every exit path (success, collision, parse error, atomic_write failure, lease loss) releases the lock cleanly: + +```python +try: + handle = self._lock_backend.acquire("mcp_registry", timeout=self._install_lock_timeout) +except LockBusy as exc: + raise MCPRegistryUnavailable( + f"mcp_registry lock contention: {exc}" + ) from exc + +with handle: + # Step 2: read current mcp.md (FileNotFoundError -> treat as empty) + # Step 3: parse with parse_mcp_md_text(content, resolve_env=False) + # Step 4: dual-probe collision check (install) OR absent-name early-return (uninstall) + # Step 5: check_lock_lost(handle) -- raises LockLost if a lease-backed lock expired mid-critical-section + # Step 6: render mutated spec list via render_mcp_md_full(updated_specs) + # Step 7: _io.atomic_write(mcp_md, rendered_content) +``` + +The outer `try/except LockBusy` translates the lock-timeout to `MCPRegistryUnavailable` so the framework's fail-closed wrapper at `agent.py:__init__` catches it as `MCPRegistryError`. The `with handle:` block guarantees `backend.release(handle)` runs on every path including exceptions. + +**Step 4 (install) -- dual-probe collision detection.** A single check against the parsed-spec list misses malformed sections (`## name` header present but `command:` absent, which `_build_spec` silently skips with a warning). The install MUST check BOTH the parsed-name set AND a raw H2 regex scan against the file content: + +```python +h2_names = set(re.findall(r"^## (\S+)", content, re.MULTILINE)) +parsed_names = {s.name for s in specs} +if spec.name in h2_names or spec.name in parsed_names: + raise MCPServerAlreadyInstalled( + f"MCP server {spec.name!r} is already in mcp.md." + ) +``` + +The exception message MUST contain ONLY `spec.name`, never the full spec repr (env values could leak through `repr(spec)` if the spec was constructed with literal env values; the operator-visible error message is operator input, not framework-resolved values). + +**Step 4 (uninstall) -- absent-name idempotency.** If `name` is not present in either the parsed-name set or the H2 regex scan, the uninstall is a no-op: log at DEBUG, skip the `atomic_write` (avoids unnecessary mtime bump), exit the `with` block (releases lock), return `None`. No exception is raised. There is no pre-lock fast-path; the lock MUST be acquired before reading mcp.md, because a concurrent `install` could add the name between an unlocked check and the subsequent read. + +**Step 5 -- mid-critical-section lease check.** Before the `atomic_write`, call `check_lock_lost(handle)` from `atomic_agents.locks`. This is a no-op on filesystem (no heartbeat / TTL: POSIX `fcntl.flock` releases automatically on process death), but for lease-backed backends (`RedisLockBackend`) it raises `LockLost` if the lease expired mid-critical-section (Redis network blip). The implementation MUST catch `LockLost` and re-raise as `MCPRegistryUnavailable` so the framework treats it as a transient failure. + +**Step 6 -- full-file render.** The "append new section" phrasing in MUST 9 is conceptual; the implementation does a full read-modify-write. The renderer `render_mcp_md_full(specs)` from `atomic_agents/mcp.py` produces a complete mcp.md content string from the mutated spec list (install: existing specs + new spec; uninstall: existing specs minus removed). The renderer's round-trip property: `parse_mcp_md_text(render_mcp_md_full(specs), resolve_env=False) == specs`. The renderer writes `$VAR` references verbatim (never resolved values; resolved env never persists to disk per Decision 7). + +**MCPServerRef projection on install return.** `install(spec) -> MCPServerRef` projects `name`, `description` (single-line, newlines stripped), `transport` from the input spec; `version=None`; `source=f"mcp.md#section:{name}"`. The Ref carries no `command`, no `args`, no `env`, so the CLI handler can safely echo the Ref without secret-leak risk. + +**uninstall return.** `uninstall(name) -> None`. Both the present-and-removed path and the absent-no-op path return `None`. Matches `SQLiteToolRegistryBackend.uninstall` precedent. + +**Crash safety analysis.** All crash points produce recoverable state: +- Crash before step 7 atomic_write (lock held, file unchanged): lock released by OS on process death (filesystem) or by lease expiry (Redis); next install retries cleanly. +- Crash during atomic_write (rename incomplete): temp file `.mcp.md..tmp` left in `agent_root`; mcp.md still has original content (atomic rename guarantee). `cleanup_stale_tempfiles(agent_root)` from `_io.py` handles the orphan; PR 3 calls this from `FilesystemMCPServerRegistryBackend.__init__` as a side-effect-free recovery sweep. +- Crash after atomic_write (rename committed, lock held): mcp.md has new content; stale lock released as above. Correct on-disk state. + +#### LockBackend integration (PR 3) + +**Default factory routes through `get_default_lock_backend`.** When the constructor receives `lock_backend=None`, the backend lazily resolves the default via `atomic_agents.locks.get_default_lock_backend(agent_root)`. This respects `ATOMIC_AGENTS_LOCK_BACKEND` (env var that the agent's main lock already respects), so operators on Cloud Run / Kubernetes pinning `=redis` automatically get `RedisLockBackend` for registry writes too. Single-host operators get `FilesystemLockBackend(agent_root)`. The framework convention is consistent across the agent's main lock and the registry lock. + +**Context-manager idiom is the canonical release pattern.** `LockHandle` (per spec/21) implements `__enter__` and `__exit__`; `__exit__` invokes `backend.release(handle)` on every path including exceptions. The 7-step critical section MUST be wrapped in `with handle:` (NOT bare `try/finally` with `handle.release()` because `LockHandle` is a frozen dataclass without a `release()` method; release is a backend method, not a handle method). + +**`install_lock_timeout` is the operator knob.** Constructor kwarg with 30s default. Tests use `install_lock_timeout=0.0` for fail-fast assertions. CI pipelines that want immediate failure on contention set it to a low value. NFS-mounted deployments with slow filesystems raise it. + +**`LockBusy` translates to `MCPRegistryUnavailable` inside install/uninstall.** The `acquire()` call may raise `LockBusy` from spec/21 after the timeout elapses. The implementation catches it at the boundary and re-raises as `MCPRegistryUnavailable` so the CLI's exception handler + the framework's fail-closed wrapper at `agent.py:__init__` (both catch `MCPRegistryError`) handle it cleanly. Raw `LockBusy` escaping to either layer would bypass operator-readable error messages. + +**`check_lock_lost(handle)` discipline.** Called before the `atomic_write` step. No-op for filesystem (`supports_lease=False`); raises `LockLost` for lease-backed backends if the lease expired mid-critical-section. Caught and re-raised as `MCPRegistryUnavailable`. This closes the Redis-network-blip corruption window where two installs could both believe they hold the lock. + +**Custom `lock_backend` operator surface.** Operators passing a custom `lock_backend` MUST scope it to a registry-specific resource (`.mcp_registry.lock` namespace) NOT the agent's main lock. Passing the agent's main lock backend causes install/uninstall to raise `LockBusy` (translated to `MCPRegistryUnavailable`) whenever `agent.call()` is in flight, because both operations would compete for `/.lock`. The constructor docstring names this failure mode explicitly. + +**Multi-host pinning via env var.** Operators on Cloud Run / Kubernetes set `ATOMIC_AGENTS_LOCK_BACKEND=redis` + `ATOMIC_AGENTS_LOCK_BACKEND_URL=redis://...`. The default factory routes correctly without per-construction operator config. Redis keys are scoped per `agent_root` for cross-agent isolation (spec/21 §"Operator surface"). + +**Non-reentrant by default.** `FilesystemLockBackend` and `RedisLockBackend` both report `supports_reentrancy=False`. A caller that pre-acquires `acquire("mcp_registry", ...)` externally and then calls `backend.install(spec)` gets `LockBusy` on the internal acquire attempt. Operators wanting batch install should call `install(spec)` sequentially (each call is individually atomic) rather than wrapping in an external acquire. **`validate(name)`:** runs descriptor parses; `command` exists on PATH (warn if absent; do not fail validation since the agent's PATH at run time may differ); `transport` value recognized; `$VAR` refs resolve against current `os.environ` (warn if not). -**Capabilities (PR 1/2):** `supports_install=False, supports_uninstall=False, supports_capability_handshake=False, supports_audit=False, durable=True`. Flips to `supports_install=True, supports_uninstall=True` at PR 3 when the methods land. Static; no runtime probe required. +**Capabilities (PR 3+):** `supports_install=True, supports_uninstall=True, supports_capability_handshake=False, supports_audit=False, durable=True`. At PR 3 the install / uninstall methods land and the capability flags flip True (was False at PR 1/2 per Decision 5 evolution table). Static; no runtime probe required. **`list_mcp_servers()`** MUST call `parse_mcp_md_text(content, resolve_env=False, read_paths=None)` with `read_paths=None` explicitly, NOT `self._read_paths`. Passing `self._read_paths` at list time triggers `validate_mcp_server_args` at parse time and violates Decision 8's "validation at `load_mcp_server` boundary" invariant (prep notes Theme 3). @@ -562,7 +623,7 @@ The MUST count is 10 because the static-vs-dynamic capability distinction (Decis **MUST 8: Env-var resolution semantics on MCPServerSpec.** `$VAR` references in `MCPServerSpec.env` values MUST be resolved against the client process environment at `load_mcp_server(name)` time. Unresolvable references MUST raise `MCPServerConnectFailed` (the existing spec/19 exception; not a new exception class). This applies to ALL backends regardless of storage substrate: HTTP catalog servers MAY store unresolved `$VAR` strings, and the framework's HTTP client MUST resolve at materialization. -**MUST 9: Install / uninstall atomicity.** Concurrent `install(spec)` calls for the same server name MUST produce exactly one winner; the others MUST raise `MCPServerAlreadyInstalled`. Filesystem implementations MUST acquire a `LockBackend` (spec/21) lease around the read-modify-write critical section in `install` and `uninstall` (read mcp.md, parse, check name collision, write back); the `_io.atomic_write` discipline ensures crash-safety on the individual write, while the lock serializes concurrent callers. The lock acquisition uses the spec/21 signature: `lock_handle = lock_backend.acquire("mcp_registry", timeout=30)`; release via `lock_handle.release()`. HTTP implementations rely on the catalog server's transactional storage and translate HTTP 409 to `MCPServerAlreadyInstalled`. `uninstall(name)` MUST be idempotent (no exception when the name is absent). +**MUST 9: Install / uninstall atomicity.** Concurrent `install(spec)` calls for the same server name MUST produce exactly one winner; the others MUST raise `MCPServerAlreadyInstalled`. Filesystem implementations MUST acquire a `LockBackend` (spec/21) lease around the read-modify-write critical section in `install` and `uninstall` (read mcp.md, parse, check name collision via dual-probe across raw H2 regex and parsed-name set, write back via the renderer); the `_io.atomic_write` discipline ensures crash-safety on the individual write, while the lock serializes concurrent callers. The lock acquisition uses the spec/21 context-manager idiom: `with lock_backend.acquire("mcp_registry", timeout=self._install_lock_timeout) as handle:` wrapping the critical section. Implementations MUST catch `LockBusy` from `acquire` and re-raise as `MCPRegistryUnavailable` so the framework's fail-closed wrapper at `agent.py:__init__` (catches `MCPRegistryError`) sees a coherent exception type. Implementations MUST call `check_lock_lost(handle)` from `atomic_agents.locks` immediately before the `atomic_write` step; for lease-backed backends (`RedisLockBackend`) this raises `LockLost` if the lease expired mid-critical-section, which MUST be re-raised as `MCPRegistryUnavailable`. HTTP implementations rely on the catalog server's transactional storage and translate HTTP 409 to `MCPServerAlreadyInstalled`. `uninstall(name)` MUST be idempotent (no exception when the name is absent). There is no pre-lock fast-path for absent names on `uninstall`; the lock MUST be acquired before reading mcp.md, because a concurrent `install` could add the name between an unlocked check and the subsequent read. Absent-name uninstall MAY skip the `atomic_write` step (mtime preservation), but MUST still hold the lock for the read-and-check critical section. **MUST 10: `load_all_mcp_servers()` consistency.** The output of `load_all_mcp_servers()` MUST be semantically equivalent to `[load_mcp_server(ref.name) for ref in list_mcp_servers()]` for any given backend state. Backends MAY optimize the bulk implementation (HTTP backend uses single bulk GET via `?expand=spec`; SQLite backend can use a single SELECT) but MUST preserve the equivalence guarantee. The conformance suite asserts: for every registered backend, `set(load_all_mcp_servers())` equals `set([load_mcp_server(ref.name) for ref in list_mcp_servers()])` across populated and empty catalog states. diff --git a/tests/test_mcp.py b/tests/test_mcp.py index ce09276..7c040b0 100644 --- a/tests/test_mcp.py +++ b/tests/test_mcp.py @@ -47,6 +47,7 @@ # ────────────────────────────────────────────────────────────────── # Helpers + def _make_spec( name: str = "test-server", command: str = "npx", @@ -82,9 +83,7 @@ def _build_minimal_agent_dir(tmp_path: Path, name: str = "test-agent") -> Path: (agent_dir / "tools.md").write_text( f"## Read paths\n- ~/docs/\n\n## Write paths\n- {agent_dir}/\n" ) - (agent_dir / "model.md").write_text( - "## Default model\nclaude-haiku-4-5-20251001\n" - ) + (agent_dir / "model.md").write_text("## Default model\nclaude-haiku-4-5-20251001\n") (agent_dir / "memory").mkdir() (agent_dir / "log").mkdir() return agent_dir @@ -93,6 +92,7 @@ def _build_minimal_agent_dir(tmp_path: Path, name: str = "test-agent") -> Path: # ────────────────────────────────────────────────────────────────── # parse_mcp_md_text — basic parsing + def test_parse_mcp_md_empty_returns_empty_list(): assert parse_mcp_md_text("") == [] assert parse_mcp_md_text(" \n ") == [] @@ -208,6 +208,7 @@ def test_parse_mcp_md_non_path_args_not_validated(tmp_path): # ────────────────────────────────────────────────────────────────── # MCPClientPool — connect, discover, dispatch + def test_mcp_client_pool_connect_failure_doesnt_block_others(tmp_path): """One server failing connect still lets others connect successfully.""" specs = [ @@ -298,6 +299,7 @@ def test_mcp_disconnect_all_idempotent(tmp_path): # ────────────────────────────────────────────────────────────────── # AtomicAgent integration + def test_agent_loads_mcp_servers_from_config(tmp_path): """Agent with mcp.md has config.mcp_servers populated.""" agent_dir = _build_minimal_agent_dir(tmp_path) @@ -363,7 +365,9 @@ def test_agent_call_lazy_inits_pool_on_first_call_when_servers_declared(tmp_path mock_pool.disconnect_all = MagicMock() with patch("atomic_agents._llm.call_llm", return_value=raw): - with patch("atomic_agents.agent.MCPClientPool", return_value=mock_pool) as pool_cls: + with patch( + "atomic_agents.agent.MCPClientPool", return_value=mock_pool + ) as pool_cls: agent.call("Do something.") pool_cls.assert_called_once() @@ -430,11 +434,13 @@ def test_mcp_tool_invocation_routes_through_registry_and_loop(tmp_path): # Second LLM call: model uses the result and returns final text tool_use_raw = _make_raw_response( text="Let me check the time.", - tool_uses=[{ - "name": "time__get_current_time", - "id": "tu_001", - "input": {}, - }], + tool_uses=[ + { + "name": "time__get_current_time", + "id": "tu_001", + "input": {}, + } + ], ) final_raw = _make_raw_response("The current time is 2026-05-07T12:00:00Z.") @@ -455,6 +461,7 @@ def test_mcp_tool_invocation_routes_through_registry_and_loop(tmp_path): # ────────────────────────────────────────────────────────────────── # Helpers for agent tests + def _stub_anthropic(): """Ensure `anthropic` doesn't need to be installed for agent init.""" if "anthropic" not in sys.modules: @@ -484,6 +491,7 @@ def _make_raw_response( # M1 — env merges with parent environment + def test_mcp_env_merges_with_parent_environment(monkeypatch): """Merged env passed to StdioServerParameters includes parent PATH + spec.env vars. @@ -504,7 +512,9 @@ def test_mcp_env_merges_with_parent_environment(monkeypatch): captured_envs: list[dict] = [] # Build a fake mcp SDK that captures the StdioServerParameters env kwarg. - fake_params_cls = MagicMock(side_effect=lambda **kw: (captured_envs.append(kw.get("env", {})), kw)[1]) + fake_params_cls = MagicMock( + side_effect=lambda **kw: (captured_envs.append(kw.get("env", {})), kw)[1] + ) async def fake_list_tools_flow(): result = MagicMock() @@ -512,16 +522,30 @@ async def fake_list_tools_flow(): return result class _FakeSession: - def __init__(self, *args, **kwargs): pass - async def __aenter__(self): return self - async def __aexit__(self, *_): pass - async def initialize(self): pass - async def list_tools(self): return await fake_list_tools_flow() + def __init__(self, *args, **kwargs): + pass + + async def __aenter__(self): + return self + + async def __aexit__(self, *_): + pass + + async def initialize(self): + pass + + async def list_tools(self): + return await fake_list_tools_flow() class _FakeStdioClient: - def __init__(self, params): pass - async def __aenter__(self): return (MagicMock(), MagicMock()) - async def __aexit__(self, *_): pass + def __init__(self, params): + pass + + async def __aenter__(self): + return (MagicMock(), MagicMock()) + + async def __aexit__(self, *_): + pass # Build a fake mcp module hierarchy fake_mcp = types.ModuleType("mcp") @@ -539,13 +563,17 @@ async def __aexit__(self, *_): pass import asyncio - with patch.dict(sys.modules, { - "mcp": fake_mcp, - "mcp.client": fake_mcp_client, - "mcp.client.stdio": fake_mcp_stdio, - "mcp.client.session": fake_mcp_session_mod, - }): + with patch.dict( + sys.modules, + { + "mcp": fake_mcp, + "mcp.client": fake_mcp_client, + "mcp.client.stdio": fake_mcp_stdio, + "mcp.client.session": fake_mcp_session_mod, + }, + ): from atomic_agents.mcp import _async_connect_and_list as _acal + asyncio.run(_acal(spec)) # Exactly one StdioServerParameters call should have been made @@ -635,8 +663,7 @@ def test_agent_load_config_blocks_path_traversal_via_mcp_md(tmp_path): # Write a mcp.md with a path-traversal arg outside tools.md read_paths. # tools.md declares read_paths: ~/docs/ — ../../etc/passwd escapes that. (agent_dir / "mcp.md").write_text( - "# MCP servers\n\n## evil\ncommand: npx\n" - "args: -y, @mcp/srv, ../../etc/passwd\n" + "# MCP servers\n\n## evil\ncommand: npx\nargs: -y, @mcp/srv, ../../etc/passwd\n" ) _stub_anthropic() @@ -795,9 +822,132 @@ def test_mcp_pool_not_initialized_when_cost_cap_skips_call(tmp_path): # Patch _check_cost_guardrails to always return allow=False (skip) from atomic_agents.types import CostCheckResult - with patch.object(agent, "_check_cost_guardrails", return_value=CostCheckResult(allow=False, reason="daily cap hit")): + + with patch.object( + agent, + "_check_cost_guardrails", + return_value=CostCheckResult(allow=False, reason="daily cap hit"), + ): with patch("atomic_agents.agent.MCPClientPool", mock_pool_cls): agent.call("Do something.") # MCPClientPool should never have been instantiated mock_pool_cls.assert_not_called() + + +# ────────────────────────────────────────────────────────────────── +# render_mcp_md_section / render_mcp_md_full round-trip tests (PR 3) + + +def test_render_mcp_md_section_round_trip_basic(): + """Render a spec with all 6 fields then parse it back; fields must match.""" + from atomic_agents.mcp import render_mcp_md_section + + spec = MCPServerSpec( + name="github", + command="npx", + args=["-y", "@modelcontextprotocol/server-github"], + env={"GITHUB_PAT": "$GITHUB_PAT"}, + transport="stdio", + description="GitHub repo and issue access", + ) + rendered = render_mcp_md_section(spec) + parsed = parse_mcp_md_text(rendered, resolve_env=False) + + assert len(parsed) == 1 + result = parsed[0] + assert result.name == spec.name + assert result.command == spec.command + assert result.args == spec.args + assert result.env == spec.env + assert result.transport == spec.transport + assert result.description == spec.description + + +def test_render_mcp_md_section_env_not_resolved(): + """$VAR references in env must survive the render-parse round-trip verbatim.""" + from atomic_agents.mcp import render_mcp_md_section + + spec = MCPServerSpec( + name="myserver", + command="node", + args=["server.js"], + env={"GITHUB_PAT": "$GITHUB_PAT"}, + transport="stdio", + description="", + ) + rendered = render_mcp_md_section(spec) + # Render with resolve_env=False so $VAR stays raw. + parsed = parse_mcp_md_text(rendered, resolve_env=False) + + assert len(parsed) == 1 + # The $VAR reference must NOT have been resolved to an os.environ value. + assert parsed[0].env.get("GITHUB_PAT") == "$GITHUB_PAT" + + +def test_render_mcp_md_section_strips_description_newlines(): + """Description containing newlines: only the first line appears after round-trip.""" + from atomic_agents.mcp import render_mcp_md_section + + spec = MCPServerSpec( + name="multiline-desc", + command="python", + args=["-m", "myserver"], + env={}, + transport="stdio", + description="line1\nline2\nline3", + ) + rendered = render_mcp_md_section(spec) + parsed = parse_mcp_md_text(rendered, resolve_env=False) + + assert len(parsed) == 1 + # Only the first line survives the round-trip. + assert parsed[0].description == "line1" + + +def test_render_mcp_md_full_round_trip(): + """Render a 3-spec list as a full mcp.md file then parse all 3 back.""" + from atomic_agents.mcp import render_mcp_md_full + + specs = [ + MCPServerSpec( + name="alpha", + command="npx", + args=["-y", "@mcp/alpha"], + env={}, + transport="stdio", + description="Alpha server", + ), + MCPServerSpec( + name="beta", + command="python", + args=["-m", "beta_server"], + env={"BETA_KEY": "$BETA_KEY"}, + transport="stdio", + description="", + ), + MCPServerSpec( + name="gamma", + command="node", + args=["gamma.js", "--port", "9000"], + env={}, + transport="stdio", + description="Gamma description", + ), + ] + rendered = render_mcp_md_full(specs) + + # Must start with the H1 header. + assert rendered.startswith("# MCP servers\n") + + parsed = parse_mcp_md_text(rendered, resolve_env=False) + assert len(parsed) == 3 + + # All names must round-trip (order-independent check via set). + assert {s.name for s in parsed} == {"alpha", "beta", "gamma"} + + # Spot-check fields on each parsed spec. + by_name = {s.name: s for s in parsed} + assert by_name["alpha"].args == ["-y", "@mcp/alpha"] + assert by_name["beta"].env == {"BETA_KEY": "$BETA_KEY"} + assert by_name["gamma"].description == "Gamma description" diff --git a/tests/test_mcp_server_registry_conformance.py b/tests/test_mcp_server_registry_conformance.py index 94199ed..d01d60a 100644 --- a/tests/test_mcp_server_registry_conformance.py +++ b/tests/test_mcp_server_registry_conformance.py @@ -22,7 +22,6 @@ from __future__ import annotations -import os from pathlib import Path from textwrap import dedent from unittest.mock import patch @@ -37,7 +36,7 @@ MCPServerRegistryCapabilities, ) from atomic_agents.mcp_registry.backend import _default_load_all -from atomic_agents.mcp_registry.types import MCPServerRef, ValidationResult +from atomic_agents.mcp_registry.types import MCPServerRef from atomic_agents.mcp import MCPServerSpec, parse_mcp_md_text from atomic_agents.exceptions import MCPServerConnectFailed @@ -269,20 +268,21 @@ def test_capability_honesty_install(backend_factory) -> None: spec/36 MUST 3 -- capability honesty. Branches on reported value; never hardcodes the expected bool (B-F4 from prep notes). + + True-branch tightened per Stream E finding E5 (P0): when supports_install=True, + install() MUST return an MCPServerRef on a fresh backend with a valid spec. + The old broad 'except Exception: pass' masked real failures. """ caps = backend_factory.capabilities dummy_spec = _make_mcp_spec("test-install-server") if caps.supports_install: - # Method MUST NOT raise NotImplementedError when capability is True. - try: - backend_factory.install(dummy_spec) - except NotImplementedError: - pytest.fail( - "capabilities.supports_install=True but install() raised NotImplementedError" - ) - except Exception: - # Any other exception (MCPServerAlreadyInstalled, etc.) is acceptable. - pass + # MUST 3: install must NOT raise NotImplementedError; on a fresh + # backend with valid spec it must return MCPServerRef. + ref = backend_factory.install(dummy_spec) + assert isinstance(ref, MCPServerRef), ( + f"install() with supports_install=True must return MCPServerRef; " + f"got {type(ref)!r}" + ) else: # Method MUST raise NotImplementedError when capability is False. with pytest.raises(NotImplementedError): @@ -293,17 +293,19 @@ def test_capability_honesty_uninstall(backend_factory) -> None: """capabilities.supports_uninstall claim matches uninstall() behavior. spec/36 MUST 3 -- capability honesty. + + True-branch tightened per Stream E finding E4 (P1) + C10: when + supports_uninstall=True, uninstalling an absent name MUST be a no-op + (MUST 9 idempotency) and must return None. """ caps = backend_factory.capabilities if caps.supports_uninstall: - try: - backend_factory.uninstall("nonexistent-server") - except NotImplementedError: - pytest.fail( - "capabilities.supports_uninstall=True but uninstall() raised NotImplementedError" - ) - except Exception: - pass + # MUST 9: absent name is a no-op, no exception of any kind. + result = backend_factory.uninstall("definitely-not-in-registry") + assert result is None, ( + "uninstall() on absent name with supports_uninstall=True must return None " + "(idempotent no-op per MUST 9)" + ) else: with pytest.raises(NotImplementedError): backend_factory.uninstall("nonexistent-server") @@ -712,11 +714,17 @@ def test_load_all_ordering_parity(populated_backend) -> None: def test_load_all_consistency_with_default_load_all(populated_backend) -> None: - """load_all_mcp_servers() output is byte-identical to _default_load_all(backend). + """load_all_mcp_servers() output is semantically equivalent to _default_load_all. + + spec/36 MUST 10 -- load_all consistency. + The filesystem backend uses a custom single-read-parse (NOT _default_load_all); + this test asserts the output is semantically equivalent to the helper, + satisfying MUST 10. - spec/36 MUST 10 -- default-impl delegation parity. - The filesystem backend delegates to _default_load_all; this test pins that - the output is equivalent to calling the helper directly. + Docstring corrected per Stream E finding E2 (P2): the previous docstring + claimed filesystem "delegates to _default_load_all" which is false. + filesystem.py lines 316-375 implement a custom single-read-parse loop that + avoids N+1 load calls and surfaces ENOENT / OSError / parse errors distinctly. """ backend, _ = populated_backend via_method = backend.load_all_mcp_servers() @@ -755,6 +763,116 @@ def test_load_all_consistency_under_list_ordering(tmp_path: Path) -> None: assert [s.name for s in all_specs] == sorted(s.name for s in all_specs) +# ────────────────────────────────────────────────────────────────────────────── +# MUST 9 -- Install / uninstall atomicity (new tests, PR 3) + + +def test_must9_install_atomicity_concurrent_same_name(backend_factory) -> None: + """Concurrent install of the same server name: exactly one winner. + + spec/36 MUST 9 -- concurrent install atomicity. N=3 threads all call + install(same_spec); exactly 1 must succeed; the others must raise + MCPServerAlreadyInstalled or MCPRegistryUnavailable (lock contention). + Guarded on capability flag so HTTP backend at PR 4 (supports_install=False) + skips automatically. + + Stream E finding E3 (P1). + """ + import concurrent.futures + + from atomic_agents.mcp_registry import ( + MCPServerAlreadyInstalled, + MCPRegistryUnavailable, + ) + + caps = backend_factory.capabilities + if not caps.supports_install: + pytest.skip("backend does not support install; skipping MUST 9 atomicity test") + + spec = _make_mcp_spec("concurrent-conformance-server") + successes: list[bool] = [] + failures: list[bool] = [] + + def _try() -> None: + try: + backend_factory.install(spec) + successes.append(True) + except (MCPServerAlreadyInstalled, MCPRegistryUnavailable): + failures.append(True) + + n = 3 + with concurrent.futures.ThreadPoolExecutor(max_workers=n) as pool: + futs = [pool.submit(_try) for _ in range(n)] + for fut in futs: + fut.result() + + assert len(successes) == 1, ( + f"MUST 9: exactly 1 concurrent install winner; got {len(successes)}" + ) + assert len(successes) + len(failures) == n + + +def test_must9_uninstall_absent_name_is_noop(backend_factory) -> None: + """uninstall() on an absent name is a no-op (returns None, no exception). + + spec/36 MUST 9 -- uninstall idempotency. Guarded on capability flag so + HTTP backend at PR 4 (supports_uninstall=False) skips automatically. + + Stream E finding E4 (P1) + C11. + """ + caps = backend_factory.capabilities + if not caps.supports_uninstall: + pytest.skip( + "backend does not support uninstall; skipping MUST 9 idempotency test" + ) + + result = backend_factory.uninstall("absolutely-not-in-registry-conformance") + assert result is None, ( + "MUST 9: uninstall on absent name must return None (idempotent no-op)" + ) + + +# ────────────────────────────────────────────────────────────────────────────── +# MUST 10 -- Post-install consistency (new test, PR 3) + + +def test_must10_post_install_consistency(backend_factory) -> None: + """After install(), load_all_mcp_servers() and list_mcp_servers() are consistent. + + spec/36 MUST 10 -- load_all consistency must hold after write operations. + Verifies that every name from list_mcp_servers() is loadable via + load_mcp_server() and that set(load_all_mcp_servers()) equals the + per-name load iteration. + Guarded on capability flag so HTTP backend at PR 4 skips automatically. + + Stream E finding E6 (P1). + """ + caps = backend_factory.capabilities + if not caps.supports_install: + pytest.skip( + "backend does not support install; skipping MUST 10 post-install test" + ) + + spec = _make_mcp_spec("must10-consistency-server", command="echo") + backend_factory.install(spec) + + refs = backend_factory.list_mcp_servers() + all_specs = backend_factory.load_all_mcp_servers() + + ref_names = {r.name for r in refs} + spec_names = {s.name for s in all_specs} + + assert ref_names == spec_names, ( + f"MUST 10: list_mcp_servers names {ref_names!r} must equal " + f"load_all_mcp_servers names {spec_names!r} after install" + ) + + # Every name from list must be individually loadable. + for ref in refs: + loaded = backend_factory.load_mcp_server(ref.name) + assert loaded.name == ref.name + + # ────────────────────────────────────────────────────────────────────────────── # Module-level unit tests: _redact_for_error_message (MUST 4 -- PR 1 scope) diff --git a/tests/test_mcp_server_registry_doctor.py b/tests/test_mcp_server_registry_doctor.py index 938327e..5689b1e 100644 --- a/tests/test_mcp_server_registry_doctor.py +++ b/tests/test_mcp_server_registry_doctor.py @@ -186,11 +186,12 @@ def test_doctor_pass_capability_snapshot_includes_all_fields( } assert expected_keys.issubset(result.detail.keys()) # Capability values come from FilesystemMCPServerRegistryBackend - # capabilities at PR 2 baseline: install/uninstall False (filesystem - # writes ship at PR 3), capability_handshake False (HTTP only), - # audit False, durable True. - assert result.detail["supports_install"] is False - assert result.detail["supports_uninstall"] is False + # capabilities at PR 3+ baseline: install/uninstall True (filesystem + # writes shipped at PR 3 per spec/36 Decision 5 capability evolution + # table), capability_handshake False (HTTP only), audit False, + # durable True. + assert result.detail["supports_install"] is True + assert result.detail["supports_uninstall"] is True assert result.detail["supports_capability_handshake"] is False assert result.detail["supports_audit"] is False assert result.detail["durable"] is True diff --git a/tests/test_mcp_server_registry_filesystem_backend.py b/tests/test_mcp_server_registry_filesystem_backend.py index 8670389..eaac2bd 100644 --- a/tests/test_mcp_server_registry_filesystem_backend.py +++ b/tests/test_mcp_server_registry_filesystem_backend.py @@ -689,31 +689,9 @@ def test_malformed_mcp_md_raises_descriptor_invalid(tmp_path: Path) -> None: backend.load_mcp_server("malformed-server") -# ────────────────────────────────────────────────────────────────────────────── -# Install/uninstall stubs (deferred to PR 3) - - -@pytest.mark.skip( - reason="install/uninstall land in PR 3 alongside LockBackend integration" -) -def test_atomic_install_collision_via_concurrent_threads(tmp_path: Path) -> None: - """Atomic install collision via concurrent threads. - - Deferred to PR 3 when FilesystemMCPServerRegistryBackend.install ships - with LockBackend integration and atomic mcp.md write semantics. - """ - pass - - -@pytest.mark.skip( - reason="install/uninstall land in PR 3 alongside LockBackend integration" -) -def test_lock_file_acquired_during_install(tmp_path: Path) -> None: - """Lock file (.mcp_registry.lock) is acquired during install. - - Deferred to PR 3 when LockBackend integration lands. - """ - pass +# Install/uninstall tests now live in test_mcp_server_registry_filesystem_install.py +# (added at PR 3). The placeholder skip-marked stubs that were here have been +# removed per Stream E finding E10 -- real tests replace them. # ────────────────────────────────────────────────────────────────────────────── diff --git a/tests/test_mcp_server_registry_filesystem_install.py b/tests/test_mcp_server_registry_filesystem_install.py new file mode 100644 index 0000000..c1f03af --- /dev/null +++ b/tests/test_mcp_server_registry_filesystem_install.py @@ -0,0 +1,622 @@ +"""Filesystem install / uninstall tests for FilesystemMCPServerRegistryBackend. + +Covers PR 3 of 5 of #201 (MCPServerRegistryBackend). Companion to the existing +test_mcp_server_registry_filesystem_backend.py (read paths from PR 1) + +test_mcp_server_registry_conformance.py (cross-backend MUSTs). + +Test categories: +1. Happy path: install + verify list_mcp_servers + verify file content. +2. Collision: install same name twice -> MCPServerAlreadyInstalled. +3. Cold-start: install on agent_root with no mcp.md (creates it). +4. Path-traversal: install with bad name -> ValueError at API boundary. +5. Idempotency: uninstall absent name -> no exception; install/uninstall/install cycle. +6. Concurrent: N threads same name -> exactly one wins; different names -> all win. +7. Lock timeout: install_lock_timeout=0.0 + held lock -> MCPRegistryUnavailable. +8. CLI: install/uninstall via _mcp_registry_install/_uninstall handlers; + verify success-output discipline + WARN on literal env + H2 injection refusal. +""" + +from __future__ import annotations + +import concurrent.futures +from pathlib import Path + +import pytest + +from atomic_agents.locks.filesystem import FilesystemLockBackend +from atomic_agents.mcp_registry import ( + FilesystemMCPServerRegistryBackend, + MCPRegistryUnavailable, + MCPServerAlreadyInstalled, +) +from atomic_agents.mcp_registry.types import MCPServerRef +from atomic_agents.mcp import MCPServerSpec + + +# ────────────────────────────────────────────────────────────────────────────── +# Helpers + + +def _make_spec( + name: str = "test-server", + command: str = "echo", + args: list[str] | None = None, + env: dict[str, str] | None = None, + transport: str = "stdio", + description: str = "", +) -> MCPServerSpec: + """Build a minimal MCPServerSpec for test use.""" + return MCPServerSpec( + name=name, + command=command, + args=args or [], + env=env or {}, + transport=transport, + description=description, + ) + + +def _fresh_backend(agent_root: Path) -> FilesystemMCPServerRegistryBackend: + """Return a fresh FilesystemMCPServerRegistryBackend for agent_root.""" + return FilesystemMCPServerRegistryBackend(agent_root, []) + + +# ────────────────────────────────────────────────────────────────────────────── +# 1. Happy path + + +def test_install_happy_path(tmp_path: Path) -> None: + """install() returns an MCPServerRef with projected fields; mcp.md is created; + list_mcp_servers includes the new server. + + spec/36 MUST 9 -- install atomicity + post-install consistency. + """ + agent_root = tmp_path / "agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("my-server", command="npx", description="A test server") + + ref = backend.install(spec) + + assert isinstance(ref, MCPServerRef) + assert ref.name == "my-server" + assert ref.transport == "stdio" + # mcp.md must exist after install + assert (agent_root / "mcp.md").exists() + # list_mcp_servers must return the new entry + refs = backend.list_mcp_servers() + names = [r.name for r in refs] + assert "my-server" in names + + +def test_install_cold_start_creates_mcp_md(tmp_path: Path) -> None: + """install() on an agent_root with NO mcp.md creates mcp.md from scratch. + + spec/36 MUST 9 -- filesystem backend handles cold-start write. + """ + agent_root = tmp_path / "cold-agent" + agent_root.mkdir() + assert not (agent_root / "mcp.md").exists() + + backend = _fresh_backend(agent_root) + backend.install(_make_spec("first-server")) + + assert (agent_root / "mcp.md").exists() + refs = backend.list_mcp_servers() + assert any(r.name == "first-server" for r in refs) + + +def test_install_returns_ref_with_name_and_transport(tmp_path: Path) -> None: + """install() projects name and transport from the spec into the returned Ref. + + spec/36 MCPServerRef projection contract. + """ + agent_root = tmp_path / "proj-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("projected-server", transport="stdio") + + ref = backend.install(spec) + + assert ref.name == "projected-server" + assert ref.transport == "stdio" + + +def test_install_ref_does_not_contain_env(tmp_path: Path) -> None: + """install() returns MCPServerRef which does NOT include env fields. + + The Ref carries only metadata (name, description, transport, version, source). + Secrets must not leak into the Ref. + """ + agent_root = tmp_path / "ref-env-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("secret-server", env={"API_KEY": "$MY_SECRET"}) + + ref = backend.install(spec) + + # MCPServerRef has no 'env' attribute by design; confirm this. + assert not hasattr(ref, "env"), ( + "MCPServerRef must NOT carry env (secret-leak prevention)." + ) + + +# ────────────────────────────────────────────────────────────────────────────── +# 2. Collision + + +def test_install_collision_raises_already_installed(tmp_path: Path) -> None: + """install() the same name twice raises MCPServerAlreadyInstalled on second call. + + spec/36 MUST 9 -- atomic collision detection. + """ + agent_root = tmp_path / "collision-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("dupe-server") + + backend.install(spec) + with pytest.raises(MCPServerAlreadyInstalled): + backend.install(spec) + + +def test_install_collision_leaves_mcp_md_unchanged(tmp_path: Path) -> None: + """After a collision, mcp.md still has exactly one section for the server. + + spec/36 MUST 9 -- atomicity means no partial writes on collision. + """ + agent_root = tmp_path / "collision-check-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("unique-server") + + backend.install(spec) + try: + backend.install(spec) + except MCPServerAlreadyInstalled: + pass + + # Only one entry should exist. + refs = backend.list_mcp_servers() + assert len([r for r in refs if r.name == "unique-server"]) == 1 + + +# ────────────────────────────────────────────────────────────────────────────── +# 3. Path-traversal names + + +def test_install_path_traversal_name_raises(tmp_path: Path) -> None: + """install() with a path-traversal name raises ValueError BEFORE any disk access. + + spec/36 MUST 1 -- charset validation at API boundary. + """ + agent_root = tmp_path / "traversal-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("../etc/passwd") + + with pytest.raises(ValueError): + backend.install(spec) + + # mcp.md must NOT have been created. + assert not (agent_root / "mcp.md").exists() + + +def test_install_empty_command_raises(tmp_path: Path) -> None: + """install() with an empty command raises ValueError. + + spec/36 -- command is required; empty string is invalid. + """ + agent_root = tmp_path / "empty-cmd-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("ok-name", command="") + + with pytest.raises(ValueError): + backend.install(spec) + + +def test_uninstall_path_traversal_name_raises(tmp_path: Path) -> None: + """uninstall() with a path-traversal name raises ValueError. + + spec/36 MUST 1 -- charset validation applies to uninstall too. + """ + agent_root = tmp_path / "uninstall-traversal-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + + with pytest.raises(ValueError): + backend.uninstall("../etc/passwd") + + +# ────────────────────────────────────────────────────────────────────────────── +# 4. Dollar-var env preservation + + +def test_install_with_dollar_var_env_preserves_unresolved( + tmp_path: Path, monkeypatch +) -> None: + """install() with env={"K": "$VAR"} stores the literal $VAR string in mcp.md. + + spec/36 MUST 8 / Decision 7 -- env refs are NOT resolved at install time. + The raw $VAR reference is stored; resolution happens at load_mcp_server time. + """ + monkeypatch.setenv("MY_VAR", "resolved-secret") + agent_root = tmp_path / "dollar-var-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("var-server", env={"TOKEN": "$MY_VAR"}) + + backend.install(spec) + + mcp_md_content = (agent_root / "mcp.md").read_text(encoding="utf-8") + # The raw reference must be stored, not the resolved value. + assert "$MY_VAR" in mcp_md_content + assert "resolved-secret" not in mcp_md_content + + +# ────────────────────────────────────────────────────────────────────────────── +# 5. Install then load round-trip + + +def test_install_then_load_round_trip(tmp_path: Path, monkeypatch) -> None: + """install() then load_mcp_server() returns a spec with matching fields. + + spec/36 MUST 10 + MUST 8 -- post-install consistency and env resolution. + """ + monkeypatch.setenv("ROUND_TRIP_VAR", "expected-value") + agent_root = tmp_path / "round-trip-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec( + "rt-server", + command="npx", + args=["-y", "@some/package"], + env={"KEY": "$ROUND_TRIP_VAR"}, + description="Round trip test", + ) + + backend.install(spec) + loaded = backend.load_mcp_server("rt-server") + + assert loaded.name == "rt-server" + assert loaded.command == "npx" + assert loaded.args == ["-y", "@some/package"] + assert loaded.env["KEY"] == "expected-value" + assert loaded.description == "Round trip test" + + +# ────────────────────────────────────────────────────────────────────────────── +# 6. Uninstall behavior + + +def test_uninstall_present_name_removes(tmp_path: Path) -> None: + """uninstall() of an installed server removes it; list_mcp_servers returns []. + + spec/36 MUST 9 -- uninstall atomicity. + """ + agent_root = tmp_path / "remove-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + backend.install(_make_spec("to-remove")) + + backend.uninstall("to-remove") + + refs = backend.list_mcp_servers() + assert not any(r.name == "to-remove" for r in refs) + + +def test_uninstall_absent_name_is_noop(tmp_path: Path) -> None: + """uninstall() of a name that was never installed returns without raising. + + spec/36 MUST 9 -- uninstall is idempotent (no exception on absent name). + """ + agent_root = tmp_path / "noop-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + + # Must not raise anything. + result = backend.uninstall("never-installed") + assert result is None + + +def test_uninstall_idempotent_double_call(tmp_path: Path) -> None: + """install() then uninstall() twice; second uninstall must not raise and returns None. + + spec/36 MUST 9 -- uninstall idempotency for double-call; absent-name path + returns None per the spec contract. + """ + agent_root = tmp_path / "idempotent-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + backend.install(_make_spec("idempotent-server")) + + backend.uninstall("idempotent-server") + result2 = backend.uninstall("idempotent-server") # must not raise + assert result2 is None + + +def test_install_uninstall_install_cycle(tmp_path: Path) -> None: + """install -> uninstall -> install cycle succeeds on the third call. + + spec/36 MUST 9 -- uninstall removed the section so re-install is clean. + """ + agent_root = tmp_path / "cycle-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + spec = _make_spec("cycle-server") + + backend.install(spec) + backend.uninstall("cycle-server") + ref = backend.install(spec) # must succeed + + assert ref.name == "cycle-server" + refs = backend.list_mcp_servers() + assert any(r.name == "cycle-server" for r in refs) + + +# ────────────────────────────────────────────────────────────────────────────── +# 7. Concurrent install + + +def test_install_concurrent_same_name_exactly_one_wins(tmp_path: Path) -> None: + """5 threads installing the same server name: exactly 1 wins; rest raise. + + Winners raise nothing; losers raise MCPServerAlreadyInstalled or + MCPRegistryUnavailable (lock contention). mcp.md must contain exactly + one matching H2 section. + + spec/36 MUST 9 -- concurrent install atomicity. + """ + agent_root = tmp_path / "concurrent-same-agent" + agent_root.mkdir() + spec = _make_spec("concurrent-server") + + successes = [] + failures = [] + + def _try_install() -> None: + backend = _fresh_backend(agent_root) + try: + backend.install(spec) + successes.append(True) + except (MCPServerAlreadyInstalled, MCPRegistryUnavailable): + failures.append(True) + + n_threads = 5 + with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool: + futs = [pool.submit(_try_install) for _ in range(n_threads)] + for fut in futs: + fut.result() # re-raise unexpected exceptions + + assert len(successes) == 1, ( + f"Expected exactly 1 winner; got {len(successes)} winners and " + f"{len(failures)} failures." + ) + assert len(successes) + len(failures) == n_threads + + mcp_md_content = (agent_root / "mcp.md").read_text(encoding="utf-8") + import re + + h2_matches = re.findall(r"^## concurrent-server\s*$", mcp_md_content, re.MULTILINE) + assert len(h2_matches) == 1, ( + f"Expected exactly 1 H2 section for 'concurrent-server'; " + f"found {len(h2_matches)} in mcp.md." + ) + + +def test_install_concurrent_different_names_all_win(tmp_path: Path) -> None: + """N threads installing different server names: all succeed. + + spec/36 MUST 9 -- different names do not collide. + """ + agent_root = tmp_path / "concurrent-diff-agent" + agent_root.mkdir() + + n_threads = 4 + specs = [_make_spec(f"server-{i}") for i in range(n_threads)] + errors = [] + + def _try_install(spec: MCPServerSpec) -> None: + backend = _fresh_backend(agent_root) + try: + backend.install(spec) + except Exception as e: + errors.append(e) + + with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool: + futs = [pool.submit(_try_install, s) for s in specs] + for fut in futs: + fut.result() + + assert not errors, f"Unexpected errors installing distinct names: {errors}" + backend = _fresh_backend(agent_root) + refs = backend.list_mcp_servers() + installed_names = {r.name for r in refs} + expected_names = {f"server-{i}" for i in range(n_threads)} + assert expected_names == installed_names + + +# ────────────────────────────────────────────────────────────────────────────── +# 7b. Lock timeout test (spec/36 MUST 9 -- LockBusy -> MCPRegistryUnavailable) + + +def test_install_lock_timeout_zero_under_contention(tmp_path: Path) -> None: + """install() with install_lock_timeout=0.0 and a held lock raises MCPRegistryUnavailable. + + spec/36 MUST 9 -- the LockBusy -> MCPRegistryUnavailable translation is + the contract. This test pins it so a future regression is visible. + + Strategy: share a single FilesystemLockBackend between the test harness and + the backend. The test acquires "mcp_registry" first. The backend's install() + then tries to re-acquire via the same backend instance; FilesystemLockBackend + is non-reentrant so it raises LockBusy immediately (regardless of timeout), + which install() translates to MCPRegistryUnavailable. + """ + agent_root = tmp_path / "timeout-agent" + agent_root.mkdir() + + # One shared lock backend instance. + lock_backend = FilesystemLockBackend(agent_root) + + # Acquire the registry lock from the test harness (timeout=5 is generous). + handle = lock_backend.acquire("mcp_registry", timeout=5) + + try: + backend = FilesystemMCPServerRegistryBackend( + agent_root, + read_paths=[], + lock_backend=lock_backend, + install_lock_timeout=0.0, + ) + + with pytest.raises(MCPRegistryUnavailable): + backend.install(_make_spec("timeout-server")) + finally: + handle.__exit__(None, None, None) + + +# ────────────────────────────────────────────────────────────────────────────── +# 8. CLI handler tests + + +def test_cli_install_success_output_does_not_echo_env( + tmp_path: Path, monkeypatch, capsys +) -> None: + """_mcp_registry_install: success output contains server name but NOT env values. + + Stream D finding D-F4 -- never echo env / command / args in success output. + """ + from atomic_agents.cli import _mcp_registry_install + + agent_root = tmp_path / "cli-no-echo-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + + exit_code = _mcp_registry_install( + backend, + "safe-server", + "npx", + "", + "TOKEN=$MY_SECRET_VAR", + "", + "stdio", + ) + + assert exit_code == 0 + captured = capsys.readouterr() + assert "safe-server" in captured.out + assert "$MY_SECRET_VAR" not in captured.out + assert "TOKEN" not in captured.out + + +def test_cli_install_warns_on_literal_env_value(tmp_path: Path, capsys) -> None: + """_mcp_registry_install warns on stderr and exits 0 when --env value lacks $ prefix. + + Stream D finding D-F1 (decision 3 = WARN + proceed): literal values land + on disk plaintext; the CLI warns the operator but continues (exit 0). + This is the v1.0 contract per decision 3. + """ + from atomic_agents.cli import _mcp_registry_install + + agent_root = tmp_path / "cli-warn-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + + exit_code = _mcp_registry_install( + backend, + "warn-server", + "echo", + "", + "KEY=literal_not_a_var", + "", + "stdio", + ) + + # WARN + proceed is the v1.0 contract per decision 3. + assert exit_code == 0 + captured = capsys.readouterr() + assert "Warning" in captured.err or "warning" in captured.err.lower() + assert "literal" in captured.err or "KEY" in captured.err + + +def test_cli_install_refuses_h2_in_description(tmp_path: Path, capsys) -> None: + """_mcp_registry_install exits 1 when --description contains '## '. + + Stream D finding D-F9 + Stream B finding B-10 -- H2 headers delimit + server sections in mcp.md; injecting one via --description would corrupt + the file. + """ + from atomic_agents.cli import _mcp_registry_install + + agent_root = tmp_path / "cli-h2-agent" + agent_root.mkdir() + backend = _fresh_backend(agent_root) + + exit_code = _mcp_registry_install( + backend, + "legit-server", + "echo", + "", + "", + "## evil-section", + "stdio", + ) + + assert exit_code == 1 + captured = capsys.readouterr() + assert "##" in captured.err or "H2" in captured.err or "description" in captured.err + # mcp.md must NOT have been written. + assert not (agent_root / "mcp.md").exists() + + +def test_cli_parse_env_flag_empty() -> None: + """_parse_env_flag('') returns {}.""" + from atomic_agents.cli import _parse_env_flag + + assert _parse_env_flag("") == {} + + +def test_cli_parse_env_flag_single_pair() -> None: + """_parse_env_flag('K=$V') returns {'K': '$V'}.""" + from atomic_agents.cli import _parse_env_flag + + assert _parse_env_flag("K=$V") == {"K": "$V"} + + +def test_cli_parse_env_flag_multiple_pairs() -> None: + """_parse_env_flag('A=$X,B=$Y') returns {'A': '$X', 'B': '$Y'}.""" + from atomic_agents.cli import _parse_env_flag + + result = _parse_env_flag("A=$X,B=$Y") + assert result == {"A": "$X", "B": "$Y"} + + +def test_cli_parse_env_flag_value_contains_equals() -> None: + """_parse_env_flag splits on FIRST '=' only; value may contain '='.""" + from atomic_agents.cli import _parse_env_flag + + result = _parse_env_flag("K=a=b=c") + assert result == {"K": "a=b=c"} + + +def test_cli_parse_env_flag_missing_equals_raises() -> None: + """_parse_env_flag raises ValueError when an entry has no '='.""" + from atomic_agents.cli import _parse_env_flag + + with pytest.raises(ValueError, match="missing '='"): + _parse_env_flag("NOKEYVALUE") + + +def test_cli_parse_args_flag_empty() -> None: + """_parse_args_flag('') returns [].""" + from atomic_agents.cli import _parse_args_flag + + assert _parse_args_flag("") == [] + + +def test_cli_parse_args_flag_splits_on_comma() -> None: + """_parse_args_flag('-y,@some/pkg') returns ['-y', '@some/pkg'].""" + from atomic_agents.cli import _parse_args_flag + + assert _parse_args_flag("-y,@some/pkg") == ["-y", "@some/pkg"]