feat(mcp): the agent face — odu mcp server (tools + resources)#3
Conversation
`odu mcp` serves odu's surface as an MCP stdio server so a coding agent
drives CI with structured calls instead of scraping terminal output. In-band
like status/logs/monitor: it dials .ci/odu.sock and predetermines no host
(pool lease / hosts.json stays the coordinator's job).
Tools (stateless, dial-per-call — mirroring src/cli/introspect.ts):
run start a background `odu run` and return once the socket is up
get_nodes snapshot the pipeline (Cell)
tail_log one node's output — live stream, or the durable per-SHA file
rerun_node reset a node + dependents on the live DAG (the only mutation)
wait_for_settle block until settled, or fail-fast the instant a node goes red
Resources (the live-push ceiling): odu://nodes + odu://log/{node} as
subscribable resources — resources/subscribe + notifications/resources/updated
maps the surface's snapshot-then-delta one-to-one. wait_for_settle is the
blocking-pull floor for hosts that don't wake the model on a notification.
Built on the SDK's low-level Server for full subscribe/unsubscribe control and
JSON-Schema tool inputs validated by odu's own zod (no SDK schema coupling).
The single piece of state — one live socket attachment driving the resource
pushes — is isolated in resources.ts; detach disposes the link (no abort/close
race) and a generation token stops a torn-down stream from rescheduling.
apm.yml declares the MCP server (the nix-chrome-devtools-mcp pattern) so
consumers that depend on juspay/odu get it wired into .mcp.json; the odu-mcp
skill ships the `serve` launcher (nix run, ODU_FLAKE-overridable).
17 new tests over the real unix-socket transport (tools + resource pusher).
Addressed all 7 of CODEX's findings on the odu MCP tool surface (src/mcp). All agreed-with — none disputed; one was already partly mitigated (timeout path) but the false-green stream-end path was a genuine bug. Typecheck (tsc --noEmit) clean and the full suite (61 tests, 7 files) green, including 4 new regression tests I added (F1 non-green-when-not-settled, F3 path-traversal rejection, F5 caller-cancel, F6 malformed-uri). Note for CODEX: pnpm isn't on PATH here either, but the repo's local binaries work — `node_modules/.bin/tsc` and `node_modules/.bin/vitest` ran fine, so I was able to verify what you couldn't. No JS formatter exists in this repo (just fmt only touches Nix), so no formatting pass was applicable to the TS files I changed.
codex (round 1) findings:
- [F1 · blocking] `waitForSettle` can return `passed: true` when the nodes stream ends before a terminal frame, as long as the last observed snapshot has no red nodes. That gives a false green for coordinator crashes, interrupts, or socket-close races while nodes are still pending/running. (src/mcp/tools.ts:260)
- [F2 · major] `startRun` waits for the socket poll to finish before observing child exit. A dirty-tree failure, ENOENT for `tsx`, bad justfile, or stale non-socket path will not be reported until the full 60s poll completes; if the child remains alive but never serves a socket, line 417 then waits forever. (src/mcp/tools.ts:415)
- [F3 · major] The durable-log fallback passes untrusted MCP input directly into `logPathFor`; `logPathFor` interpolates the parsed platform/namepath into a relative path. `join(repoRoot, ...)` will normalize `..`, so a crafted node id can escape `.ci/<sha7>` and read other `*.log` files accessible to the process. (src/mcp/tools.ts:122)
- [F4 · major] The file fallback reads the entire durable log synchronously into memory and returns it as one MCP text payload. CI logs can be very large, so this can block the server, consume unbounded memory, and produce unusable MCP responses; the live path is already capped by the in-memory log tail. (src/mcp/tools.ts:124)
- [F5 · minor] Long-running tool calls ignore the MCP request cancellation signal. If a client cancels `wait_for_settle`, the handler continues holding a socket and timer until settle or the 10 minute timeout. (src/mcp/server.ts:196)
- [F6 · minor] `resources/subscribe` accepts any URI and hands it to `ResourcePusher`. Unknown resources can keep the pusher retrying/attached forever, and malformed log URIs can throw after being added to the subscribed set. (src/mcp/server.ts:277)
- [F7 · nit] The test helper creates temporary directories but `close` only closes the socket listener; it does not remove the temp directory/socket path. Repeated test runs will leak `odu-mcp-test-*` directories under `/tmp`. (src/mcp/serveForTest.ts:54)
claude:
- F1 fixed: Agreed — real false-green bug. In waitForSettle's stream-end branch (src/mcp/tools.ts ~307-323) `passed` now requires `settled`: `const settled = last !== undefined && isDone(last); passed: settled && red.failed.length + red.errored.length === 0`. A half-observed run (coordinator crash/socket-close while nodes pending/running, no red yet) can no longer report passed:true. Added regression test 'never reports passed when the run does not settle green' (asserts passed:false, settled:false whether the stream ends or the wait times out).
- F2 fixed: Agreed. Replaced the serial 'await waitForSocket then await onExit' with a new awaitStartup() that races socket availability against child exit via an `exited` promise threaded into the poll loop (src/mcp/tools.ts). defaultWaitForSocket now stops the instant the child dies (one final dial for a detached coordinator, then gives up) instead of polling the full 60s, so a dirty-tree refusal / ENOENT tsx / bad justfile is reported immediately. The 'socket up always wins' tie is preserved (a clean run can fork a detached coordinator and let the launcher exit). The infinite-hang case is closed: on timeout with the child still alive (r.code===null) we SIGTERM it and return startupError() with a clear 'did not serve a socket within the startup window' message rather than awaiting exit forever.
- F3 fixed: Agreed — path traversal via untrusted token. Added durableLogPath(repoRoot, sha7, token) (src/mcp/tools.ts) that rejects empty/unknown platform or namepath, resolves the final path, and requires it to stay under resolve(repoRoot,'.ci',sha7) via a relative() check (rejects '', '..'-prefixed, and absolute-sibling). tail_log's file fallback now uses it and returns source:'missing' for any escaping id. Added regression test 'refuses a node id that escapes the per-SHA log dir' that plants a real .log outside .ci/<sha7> and confirms a ..-laden token does not read it.
- F4 fixed: Agreed — unbounded readFileSync. Replaced it with tailFile(path, MAX_LOG_CHARS) (src/mcp/tools.ts) that opens the fd, stats size, and reads only the last MAX_LOG_CHARS (64KB) bytes — matching the cap the live in-memory tail already enforces, so the file fallback can't block the server or blow up the MCP payload.
- F5 fixed: Agreed. Threaded extra.signal from the CallTool handler (src/mcp/server.ts) into a new WaitOptions.signal in waitForSettle (src/mcp/tools.ts): the caller signal aborts the same AbortController as the timeout (closing the dialed socket promptly), with listener cleanup in finally. Added a `cancelled` field to SettleVerdict and the abort branch distinguishes caller-cancel (cancelled:true, timed_out:false) from the timeout (timed_out:true). Added regression test 'returns cancelled when the caller aborts the wait'.
- F6 fixed: Agreed. resources/subscribe (src/mcp/server.ts) now rejects any URI that isn't NODES_URI or a well-formed odu://log/<node> before handing it to the pusher, so an unknown URI can't keep the pusher retrying/attached. Also hardened parseLogUri (src/mcp/resources.ts) to return null for an empty node or malformed percent-encoding (decodeURIComponent wrapped in try/catch) so a bad log URI can't throw downstream, and exported isValidLogUri. Added unit tests for empty/malformed URI rejection.
- F7 fixed: Agreed. serveForTest.ts close() now closes the listener AND rmSync(dir, {recursive, force}) the mkdtemp directory, so repeated runs don't leak odu-mcp-test-* under tmpdir. Verified: after the full suite, `ls -d /tmp/odu-mcp-test-*` count is 0.
Committed by the codex<->claude debate (round 1); not pushed or merged.
… and `odu status -o json`
Finding (lowy+hickey consensus): rowOf (src/mcp/tools.ts) and statusCommand's
json branch (src/cli/introspect.ts) build the same machine-readable node
snapshot twice and disagree — `odu status -o json` keyed the node id as `name`
while get_nodes keyed it `id`. The same snapshot wired through two faces with a
drifting vocabulary.
Plan: extract a shared `nodeRow(node)` helper in render.ts (next to summarize)
emitting the snake_cased {id, name, status, exit_code, duration_ms}. rowOf
spreads it and adds `red: STATUS_META[node.status].isRed`; statusCommand's json
branch maps state.order through it. Both faces now speak one `id`+`name`
vocabulary. Back-compat preserved: `odu status -o json` keeps `name`,
`status`, `exit_code`, `duration_ms` and gains `id` (additive — no consumers of
the status-json field names exist in the repo; the documented agent schema is
`--progress json`, which is untouched).
…, not literal status strings Finding (lowy+hickey consensus): redNodes hardcoded `status === "failed"` / `=== "errored"` instead of reading the receptacle STATUS_META[status].isRed (src/common/surface.ts). STATUS_META's docstring says adding a NodeStatus is a single edit there, enforced across consumers; rowOf and run.ts's verdict already gate on it. Only redNodes reached past it, so a future red NodeStatus would silently desync the agent verdict (wait_for_settle) from the TUI/run verdict. Plan: keep the failed[]/errored[] split (an MCP-specific projection), but gate redness on STATUS_META[status].isRed first, then bucket by concrete status — mirroring run.ts's verdict loop (bucket by concrete status, redness gated on STATUS_META). STATUS_META was already imported.
…), not a private isDone Finding (lowy+hickey consensus): isDone re-implemented render.ts's summarize().done (the pending/running scan). The settle predicate is the same receptacle the TUI uses — keep one. Plan: replace every isDone(state) call in waitForSettle with summarize(state).done (summarize imported from ../cli/render) and delete the now-unused isDone. codex's F1 false-green fix is preserved: the stream-end branch still computes `settled = last !== undefined && summarize(last).done` and `passed` still requires `settled`, so a half-observed run (stream ends mid-run) reports settled:false / passed:false rather than a false green.
Finding (code-police, low): the socket probe used a comma-operator side effect — `tryDialSocket(socketPath).then((d) => (d?.close(), d !== null))` — appearing twice, which obscures the probe-and-release intent. Plan: factor a `serving()` helper that awaits the dial, closes the connection, and returns whether it answered: `const d = await tryDialSocket(socketPath); d?.close(); return d !== null;`. The poll loop and the post-exit final dial both call it. Behaviour is unchanged; the intent (we only want to know it answers, not to hold it open) is now explicit.
Finding (code-police, low): the `version()` helper swallows any error reading
package.json with a bare `catch {}` and no rationale.
Plan: add a one-line why-comment — the version is cosmetic in the MCP handshake,
so a missing or unreadable package.json falls back to "0.0.0" rather than
failing server startup.
Codex ⇄ Claude debate — ✅ consensus2 rounds · reviewer effort xhigh · codex (reviewer) ⇄ Claude (author). Per-round findings + dispositions below; each round is committed on the branch. Round 1codex — approved: The branch is directionally coherent, but I found several correctness and robustness issues in the MCP tool surface. The most serious one can report a non-settled run as passed if the coordinator/socket disappears before a terminal frame is observed. I could not run the typecheck because Findings:
claude — Addressed all 7 of CODEX's findings on the odu MCP tool surface (src/mcp). All agreed-with — none disputed; one was already partly mitigated (timeout path) but the false-green stream-end path was a genuine bug. Typecheck (tsc --noEmit) clean and the full suite (61 tests, 7 files) green, including 4 new regression tests I added (F1 non-green-when-not-settled, F3 path-traversal rejection, F5 caller-cancel, F6 malformed-uri). Note for CODEX: pnpm isn't on PATH here either, but the repo's local binaries work —
commit: Round 2codex — approved: All seven previously raised findings have been addressed in the current tree. I verified the relevant MCP changes directly, confirmed Findings:
codex on the rebuttal: CLAUDE did not dispute any finding. I verified each claimed fix in the current files and accept all seven resolutions. claude — (no author turn this round) (no author turn — the debate ended this round) |
⚖️ Lowy ⇄ Hickey lens debate — ✅ consensus1 round · lowy (volatility-based decomposition) + hickey (structural simplicity), reviewed independently then debated to consensus. Reviewed a pinned snapshot read-only ( Agreed fixes
Applied by /be-reviewRe-validated against the post-codex tree, implemented, committed individually:
|
👮 Code-policeThree read-only passes over the diff — rule checklist · fact-check · elegance. The diff is high quality. Every load-bearing comment/claim fact-checked true against the code:
The security/robustness class (path traversal in the durable-log fallback, the |
Wire kolu to odu's new MCP server (juspay/odu#3) so a coding agent drives CI with structured calls: - npins: pin odu → 2997b5b (the reviewed agent-face-mcp head; the recorded revision is immutable and survives the merge). Re-pin to master with `npins update odu` after juspay/odu#3 merges. - apm: depend on juspay/odu#agent-face-mcp so odu's apm.yml MCP declaration deploys the odu-mcp `serve` launcher and registers the `odu` MCP server in .mcp.json / .codex/config.toml / opencode.json. (After the odu PR merges, drop the #ref back to `juspay/odu` and `just ai::apm-update juspay/odu`.) - nix run .#odu -- mcp validated through kolu's re-export: serves the 5 tools + the odu://nodes resource. - atlas: the MCP agent face is shipped (juspay/odu#3) and consumed here — note status/roadmap/ledger/D2 updated as-built. - changelog: Added entry.
The odu-mcp launcher defaulted to `nix run github:juspay/odu -- mcp` — an unpinned fetch of master that (a) bypasses the npins pin the consuming repo deliberately maintains and (b) fails outright until a release lands on master. A consuming repo pins odu via npins and re-exports it as `.#odu`; the launcher now runs that exact pinned output (override with ODU_FLAKE). Docs updated.
The agent face from the roadmap:
odu mcpserves odu's surface as an MCP stdio server, so a coding agent (Claude Code, Codex, opencode, Gemini CLI) drives CI with structured calls instead of scraping the terminal. This is the one place justci told us the batch shape was the obstacle — its MCP mode auto-ran every recipe because process-compose has no serve-only mode (juspay/justci#22); a runner that owns the DAG as idle state has that separation by construction.In-band like
status/logs/monitor: it dials.ci/odu.sockin the cwd and predetermines no host — which boxes run the lanes stays the coordinator's job (pool lease /hosts.json).Tools (stateless, dial-per-call)
runodu run(its own coordinator) and return once the socket is live.get_nodestail_logrerun_nodewait_for_settleThe design fact that makes request/response enough: an agent wants a snapshot + a blocking "done", not a byte stream. The loop is
run→wait_for_settle→ read the red node'stail_log→ fix →rerun_node.Resources (the live-push ceiling)
odu://nodes+odu://log/{node}as subscribable resources —resources/subscribe+notifications/resources/updatedmaps the surface's snapshot-then-delta one-to-one.wait_for_settleis the blocking-pull floor for hosts that don't wake the model on a notification.Design notes
Serverfor full subscribe/unsubscribe control + JSON-Schema tool inputs validated by odu's own zod (no SDK schema coupling).src/cli/introspect.ts); the only state — one live socket attachment driving the resource pushes — is isolated inresources.ts.detachdisposes the link rather than aborting (no oRPC cancel-send / socket-close race), and a generation token stops a torn-down stream from rescheduling. (@kolu/surface-mcpwas considered and rejected — Lowy+Hickey: it hides no hard volatility and thewait_for_settle"settled" predicate is odu-domain, so this stays consumer-local until a second surface with a different vocabulary proves the concept.)apm.ymldeclares the MCP server (the nix-chrome-devtools-mcp pattern); theodu-mcpskill ships theservelauncher (nix run,ODU_FLAKE-overridable).Verification
wait_for_settle, resource push).nix build .#odu .#odu-runnergreen; MCPinitialize/tools/list/resources/listsmoke-tested over stdio in the packaged closure.🤖 Generated with Claude Code