Skip to content

feat(agent): #201 PR 2 of 5. MCPServerRegistryBackend wiring + AgentProfile sibling field + doctor + IRON RULE regression#331

Merged
dep0we merged 8 commits into
mainfrom
feat-mcp-server-registry-backend-pr2
Jun 3, 2026
Merged

feat(agent): #201 PR 2 of 5. MCPServerRegistryBackend wiring + AgentProfile sibling field + doctor + IRON RULE regression#331
dep0we merged 8 commits into
mainfrom
feat-mcp-server-registry-backend-pr2

Conversation

@dep0we
Copy link
Copy Markdown
Owner

@dep0we dep0we commented Jun 3, 2026

Summary

PR 2 of 5 of the MCPServerRegistryBackend arc (#201, the v1.0 closer). Wires PR 1's Protocol scaffolding into AtomicAgent, ships the AgentProfile.mcp_servers_resolved sibling field for substrate-agnostic audit (spec/24 D1 addendum), adds doctor.check_mcp_server_registry_backend as the 13th check_*_backend, and pins pre-#201 byte-identical behavior via the IRON RULE regression suite. Three correctness fixes landed inline from cross-model review army (Codex + Claude adversarial + 4 specialists).

Framework integration:

  • AtomicAgent.__init__ accepts mcp_server_registry_backend constructor kwarg + class-level type annotation; defaults via get_default_mcp_server_registry_backend(self.agent_root, read_paths); calls backend.load_all_mcp_servers() to materialize the spec list; populates AgentProfile.mcp_servers_resolved via dataclasses.replace() BEFORE _load_config() runs.
  • MCPClientPool consumption site at agent.call() now reads self._profile.mcp_servers_resolved (substrate-agnostic). self.config.mcp_servers semantics preserved for backward-compat on log/audit consumers.
  • Fail-closed semantic per spec/36 lines 519-522: propagates MCPRegistryError from backend probe with backend_id context + URL credential redaction; original exception type preserved via type(exc)(...).
  • Per-runner kwargs on OutcomeRunner, EvalRunner (thread to internal AtomicAgent), DreamRunner (storage-only for API parity).
  • delegate.py explicit-only threading via _mcp_server_registry_backend_was_explicit flag (mirrors PersonaBackend D-ER-2 + CorpusBackend precedents).
  • MCPRegistryError added to delegate.py CLI catch block for clean operator UX on catalog-down at delegate construction.

Profile changes:

  • AgentProfile.mcp_servers_resolved: list[MCPServerSpec] field added LAST in dataclass (Python required-fields-before-defaults rule). field import added.
  • to_dict() always serializes the field as [] (snapshot security: resolved $VAR env values never persist to disk).
  • from_dict() uses new _mcp_spec_from_dict helper which also fixes a pre-existing latent bug where the mcp_servers fallback returned raw dicts.
  • SQLite backend continues riding the existing profile_json JSON blob; zero schema migration, schema stays at v2.
  • Filesystem backend sorts mcp_servers lexicographically on load (Q1 decision, aligns with spec/36 MUST 5).

PR 1 bug fixes (surfaced by PR 2's fail-closed semantic):

  • FilesystemMCPServerRegistryBackend.list_mcp_servers now discriminates FileNotFoundError/ENOENT (returns []) from other OSError (raises MCPRegistryUnavailable); previously swallowed PermissionError from misconfigured deploys.
  • load_all_mcp_servers overridden directly with single read-parse mapping ENOENT/OSError/parse/env-var to the proper exception types.

Cross-model review army findings (3 triple-confirmed, applied inline):

  1. Empty resolved list authoritative: ... or self.config.mcp_servers previously resurrected stale mcp.md servers when registry returned [] (Codex HIGH + Claude adversarial F2 + Codex structured P2). Fixed via explicit if hasattr ... branch.
  2. Fail-closed wrapper broadened from MCPRegistryUnavailable to MCPRegistryError so MCPRegistryDescriptorInvalid propagates with backend_id context.
  3. Doctor false-PASS closed: added load_all_mcp_servers() probe after list_mcp_servers() succeeds (Codex MEDIUM + Claude adversarial F6 + Testing specialist).

spec doc edits:

  • spec/24 D1 addendum: documents the sibling field, snapshot security clamp, SQLite forward-compat.
  • spec/36 line 599 amendment: corrects wiring location from _load_config() to AtomicAgent.__init__.

Test Coverage

Test count: +46 net new tests, 3199 collected, 3141 passed + 58 skipped, 0 failures, 0 regressions across the full suite.

Component                              | Branch                                          | Status
---------------------------------------|------------------------------------------------|------------------
agent.py __init__ kwarg resolution     | kwarg=None, kwarg=explicit, _was_explicit       | [★★★ TESTED]
agent.py fail-closed wrapper           | MCPRegistryUnavailable + MCPRegistryDescriptorInvalid + backend_id + URL redaction | [★★★ TESTED]
agent.py pool consumption              | mcp_servers_resolved authoritative, empty case  | [★★★ TESTED]
agent.py delegate explicit-only        | _was_explicit=True threads, False does not       | [★★★ TESTED]
OutcomeRunner kwarg                    | stored + threaded to internal AtomicAgent       | [★★ TESTED]
EvalRunner kwarg                       | stored + threaded                                | [★★ TESTED]
DreamRunner kwarg                      | storage-only (no internal AtomicAgent in v1)    | [★★★ TESTED]
delegate.py CLI MCPRegistryError catch | clean stderr + exit 1                            | [★★★ TESTED]
filesystem.py list_mcp_servers ENOENT  | FileNotFoundError -> [], other OSError -> raise | [★★★ TESTED]
filesystem.py load_all_mcp_servers     | single read-parse, exception mapping             | [★★★ TESTED]
profile/types.py field placement       | LAST in dataclass, default []                    | [★★★ TESTED]
profile/types.py to_dict()              | always emits [] (security clamp)                | [★★★ TESTED]
profile/types.py from_dict() + helper  | round-trip via _mcp_spec_from_dict              | [★★★ TESTED]
profile/filesystem.py sort + field     | lexicographic + mcp_servers_resolved=[]         | [★★★ TESTED]
profile/sqlite.py JSON blob round-trip | mcp_servers_resolved stays [] after save+load   | [★★★ TESTED]
doctor.check_mcp_server_registry_backend | PASS/WARN/FAIL ladder + capability snapshot + credential redaction + descriptor-invalid probe | [★★★ TESTED]

Coverage gate: PASS (~93% AI-assessed after the doctor + descriptor-invalid tests landed). Documented follow-up coverage notes (filed for PR 3 prep, non-blocking): chmod-style PermissionError tests for load_all_mcp_servers's new OSError mapping; threading test strengthening to patch AtomicAgent.__init__ and capture kwargs at runner run() time.

Tests: before 3153 -> after 3199 (+46 net new), broken down: tests/test_mcp_server_registry_wiring.py (~22 wiring tests), tests/test_mcp_server_registry_migration_regression.py (10 IRON RULE byte-identity tests), tests/test_profile_mcp_servers_resolved.py (9 sibling field round-trip tests), tests/test_mcp_server_registry_doctor.py (5 doctor PASS/WARN/FAIL tests including DSN-style credential redaction and the descriptor-invalid FAIL path).

Pre-Landing Review

Review army dispatched 4 specialists (testing, maintainability, security, performance) + Claude adversarial subagent + Codex adversarial (codex exec) + Codex structured review (codex review). 15+ findings surfaced. The 3 highest-priority were triple-confirmed across models and applied inline (see Summary above). Polish findings applied: removed _dc_replace import inside loop body in mcp_registry/filesystem.py:368, removed deferred import os inside doctor check function. Pre-existing mcp_servers deserialization bug (latent since #63) fixed as a free side effect of adding _mcp_spec_from_dict.

Codex structured review: PASS (no [P1] markers, only the empty-list fallback P2 which was applied inline).

Plan Completion

25/25 plan items DONE per the Step 8 plan-completion audit subagent. Zero deferred. Zero unverifiable. Plan source: design doc at ~/.gstack/projects/dep0we-atomic-agents-stack/dep0we-main-design-20260601-165020.md + spec/36 PR 2 scope section + 5-stream prep pass synthesis.

Documentation

README.md: test count updated from 3153 collected / 3101 passed / 52 skipped to 3199 collected / 3141 passed / 58 skipped (2 sites: repository structure section + status footer).

CLAUDE.md: test count updated from 3,153 to 3,199 collected (2 sites: Tests convention section + Status section).

All other docs current: CHANGELOG entry already in commit cd6c175; spec/24 D1 addendum and spec/36 line 599 amendment shipped in implementer commits; ROADMAP.md "Eleven of twelve" count and MCPServerRegistry "Planned" status remain accurate until PR 5 of 5.

Test plan

  • Full test suite (3199 collected, 3141 passed, 58 skipped, 0 failures)
  • Targeted PR 2 tests (wiring + IRON RULE + profile + doctor: 297 selected, 280 passed + 17 capability-gate skips)
  • No regressions across the 11 prior backend protocols (profile, persona, corpus, policy, mandate, lock, log, judge, llm, memory, tool registry suites all green)
  • Pre-[backend] MCPServerRegistryBackend — unify MCP server discovery across agents #201 byte-identical behavior verified via IRON RULE suite (empty mcp.md, missing mcp.md, single server no env, multiple servers sorted, env-var resolved, fail-closed MCPRegistryUnavailable raises at construction)

🤖 Generated with Claude Code

Dan Powers and others added 8 commits June 3, 2026 12:54
…ing field + doctor check + bug fixes

Stream 2 of PR 2 (parallel with Stream 1 which owns agent.py wiring and spec/36).

Core changes:

- profile/types.py: add `mcp_servers_resolved: list[MCPServerSpec] = field(default_factory=list)`
  as LAST field in AgentProfile (Python dataclass default placement rule). Adds `field` to
  dataclass import. to_dict() always emits `mcp_servers_resolved: []` (locked Q2 -- keeps
  resolved MCP env secrets out of snapshot JSON on disk). from_dict() reconstructs via
  _mcp_spec_from_dict for both mcp_servers and mcp_servers_resolved. Adds _mcp_spec_from_dict
  helper as inverse of existing _mcp_spec_to_dict.

- Latent bug fix (free side-effect): the pre-existing mcp_servers fallback path in
  AgentProfile.from_dict() returned raw dicts instead of MCPServerSpec instances. Now fixed
  by routing through _mcp_spec_from_dict. This silently broke attribute access on
  profile.mcp_servers in database-round-trip scenarios.

- profile/filesystem.py: sort mcp_servers lexicographically by name after parse (locked Q1).
  Aligns pre-#201 path with FilesystemMCPServerRegistryBackend.list_mcp_servers() which
  already sorts. Adds mcp_servers_resolved=[] to AgentProfile direct construction.

- profile/sqlite.py: adds comment confirming mcp_servers_resolved rides through profile_json
  blob via to_dict/from_dict with no schema change. Schema stays at v2.

- mcp_registry/filesystem.py PR 1 bug fix: list_mcp_servers() previously caught all OSError
  and returned [] -- this masked transient failures (PermissionError, IsADirectoryError) from
  the fail-closed path PR 2 wires. Fixed to mirror load_mcp_server's ENOENT discrimination:
  FileNotFoundError (race after exists() check) -> return []; other OSError -> raise
  MCPRegistryUnavailable. Also overrides load_all_mcp_servers() with a single read-parse
  (replacing _default_load_all delegation) so MCPRegistryDescriptorInvalid and
  MCPRegistryUnavailable propagate correctly from bulk load.

- doctor.py: adds check_mcp_server_registry_backend(agent_root) with PASS/WARN/FAIL ladder.
  Uses _redact_for_error_message from mcp_registry/__init__.py (handles URL + DSN heuristics,
  distinct from the inline truncation in check_tool_registry_backend). Reads
  backend.capabilities as a property. Detail dict includes all 5 capability fields +
  mcp_server_count from list_mcp_servers(). Added to SKIP enumeration and dispatch after
  check_corpus_backend.

- docs/spec/24-agent-profile-backend.md: D1 addendum (#201 PR 2) documenting the new field,
  its populate-via-agent.py-only rule, snapshot serialization security rationale, and SQLite
  schema verification.

- tests/test_profile_mcp_servers_resolved.py (NEW, 9 tests): covers field existence + last
  placement, to_dict security clamping, from_dict round-trips, _mcp_spec_from_dict helper,
  the mcp_servers fallback bug fix, snapshot round-trip, and SQLite round-trip.

- tests/test_mcp_server_registry_filesystem_backend.py: updated
  test_load_all_mcp_servers_delegates_to_default_load_all -> now tests direct single-read-
  parse behavior (the delegation test was correct for PR 1 behavior; it is wrong for PR 2's
  locked design). Removed unused _default_load_all import and os import.

CHANGELOG note: mcp_servers sort order change in pre-#201 filesystem profiles is a one-time
reorder in introspection output; no functional impact (declaration order was never a contract).
The PR 1 ENOENT discrimination fix should be called out in the CHANGELOG entry.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… delegate explicit-only + IRON RULE regression suite

Wires MCPServerRegistryBackend into AtomicAgent at the framework integration layer.

Backend resolution and load_all_mcp_servers() probe live in AtomicAgent.__init__
after profile load (line 496) and before _load_config() is called (line 770).
Spec/36 line 599 incorrectly said _load_config(); the spec text gets a one-sentence
amendment in this same PR. _load_config() at agent.py:2840 is a pure reader of
self._profile and cannot do the resolution.

Framework-level fail-closed semantic per spec/36 lines 519-522: load_all_mcp_servers()
call has no try/except for soft-degrade. MCPRegistryUnavailable propagates from
__init__. A wrapper re-raise adds the backend_id and redacted URL to the operator-
facing message per MUST 4.

AgentProfile.mcp_servers_resolved is populated via dataclasses.replace(self._profile,
mcp_servers_resolved=materialized) before _load_config() builds the AgentConfig.
The MCP pool consumption site at agent.py:3294-3334 changes its source from
self.config.mcp_servers to self._profile.mcp_servers_resolved. AgentConfig.mcp_servers
semantics stay unchanged for backward compat on existing log/audit consumers.

Per-runner kwargs added to OutcomeRunner, EvalRunner, and DreamRunner. Outcome
and Eval thread to their internal AtomicAgent construction. DreamRunner stores
on self for API parity (no internal AtomicAgent construction site in v1).

Delegate threading is explicit-only via the _mcp_server_registry_backend_was_explicit
flag, mirroring PersonaBackend D-ER-2 and CorpusBackend at agent.py:443-465. MCP
catalog is per-agent semantic context (per spec/36 Decision 1); default-resolved
backends do not leak the coordinator's agent_root to delegates.

delegate.py CLI catch updated to include MCPRegistryError so catalog-down at
delegate construction produces clean Error: stderr output, not a Python traceback.

Tests:
- tests/test_mcp_server_registry_wiring.py (~20 tests) covers per-runner kwarg
  threading, delegate explicit-only behavior, fail-closed re-raise on
  MCPRegistryUnavailable, fail-closed message includes backend_id + redacted URL,
  profile augmentation populates correctly.
- tests/test_mcp_server_registry_migration_regression.py (~10 IRON RULE tests)
  pins pre-#201 byte-identical behavior when no backend is configured. Empty
  mcp.md, missing mcp.md, single server no env, multiple servers in sorted order,
  env-var resolved specs, config equals profile resolved, mcp_pool._specs byte
  identical, MCPRegistryUnavailable raises at construction.

Tests that reference AgentProfile.mcp_servers_resolved depend on Stream 2's
profile changes and pass once both streams merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stream 1 used criteria.<dim>.weight (1.0) but EvalRunner._load_rubric reads
weights.<dim> (must sum to 100 per spec/08). The test rubric now uses the
correct frontmatter.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three findings triple-confirmed across Codex structured review, Codex
adversarial, and Claude adversarial subagent (plus the 5 specialist
subagents). All three are real correctness issues, not stylistic.

1. Empty resolved list must be authoritative, not a missing-field signal
   (agent.py:3367-3370). The `... or self.config.mcp_servers` fallback
   resurrected stale mcp.md servers when the registry backend genuinely
   returned []. An operator pinning an HTTP catalog that lists zero MCP
   servers for the agent_scope would still get mcp.md subprocesses
   spawned. Fix: replace `or` with explicit `if hasattr ...` so empty
   resolved lists are respected as the authoritative answer.

2. Fail-closed wrapper too narrow (agent.py:585). The `except
   MCPRegistryUnavailable` only caught transient failures, leaving
   MCPRegistryDescriptorInvalid (corrupt mcp.md) and MCPServerConnectFailed
   (env-var unresolvable) to escape uncontextualized. Fix: broaden to
   `except MCPRegistryError` and preserve original exception type via
   `type(exc)(...)` so callers can still distinguish transient from
   permanent. Adds defensive `getattr` default on backend_id for robust
   error-message construction.

3. Doctor false-PASS on malformed mcp.md (doctor.py:2786). The probe
   called only list_mcp_servers() which by design swallows parse errors
   and returns []. Agent construction at runtime calls
   load_all_mcp_servers() which raises MCPRegistryDescriptorInvalid.
   Operator runs doctor, sees PASS, constructs agent, agent crashes. Fix:
   add a load_all_mcp_servers() probe in the doctor check after
   list_mcp_servers() succeeds. Maps MCPRegistryDescriptorInvalid to
   FAIL with operator-readable message; MCPRegistryUnavailable to WARN.

Polish from the maintainability and performance specialists:
- Remove `from dataclasses import replace as _dc_replace` inside the
  per-spec loop body in mcp_registry/filesystem.py:368 (already imported
  at module level).
- Remove `import os` deferred inside doctor.py check function (already
  imported at module level).
- Include exception text in the generic-FAIL message so operators have
  actionable detail.

Tests added (+7 net new, 3141 collected, all pass):
- tests/test_mcp_server_registry_wiring.py: descriptor-invalid
  propagation test pinning the broadened catch + empty-resolved
  authoritative test pinning no-fallback-to-mcp.md.
- tests/test_mcp_server_registry_doctor.py (NEW): 5 doctor tests
  covering PASS with empty mcp.md, PASS with valid mcp.md, FAIL on
  unknown backend_id with URL credential redaction, FAIL on descriptor
  invalid, capability snapshot completeness. Mirrors the doctor-test
  shape of every other backend (lock, log, profile, tool registry,
  persona, corpus).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MCPServerRegistryBackend wiring + AgentProfile.mcp_servers_resolved
sibling field + doctor check + IRON RULE regression suite + cross-model
review army fixes (empty-list authoritative, fail-closed broader catch,
doctor false-PASS closure).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… for #201 PR 2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dep0we dep0we merged commit a12db59 into main Jun 3, 2026
5 checks passed
@dep0we dep0we deleted the feat-mcp-server-registry-backend-pr2 branch June 3, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant