feat(sdk): agent runtime behind backend/harness ports by mmabrouk · Pull Request #4771 · Agenta-AI/agenta

mmabrouk · 2026-06-19T16:28:56Z

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is main.

feat(sdk): agent runtime behind backend/harness ports #4771: SDK agent runtime: ports, adapters, tools, messages protocol <- you are here
- feat(agent): agent workflow service and tool-resolution API #4772: Agent service + tool-resolution API
feat(agent): runner wire contract and tool execution #4773: Runner wire contract + tool execution
- feat(agent): runner engines, server, and tracing #4774: Runner engines, server, tracing
feat(frontend): agent config playground controls #4775: Playground agent config UI
chore(hosting): wire the agent runner sidecar into compose #4776: Hosting compose wiring
docs(agent): agent-workflows design and ground truth #4777: Docs: design + ground truth

Context

The agent runtime turns a stored agent definition into a live coding-agent turn: it picks an engine, shapes the config that engine wants, runs one prompt, and streams the reply back to a browser. This PR puts that whole runtime in the SDK under sdks/python/agenta/sdk/agents/, structured as ports and adapters. It targets main and is independent. It is a functional slice that shows the final code, so the service PR that composes these adapters stacks on top of it.

What this changes

The runtime now reads as three layers behind interfaces. Backend is the engine: it declares which harnesses it can drive and owns sandbox plus session lifecycle. Environment sits above a backend and owns the sandbox-per-session policy. Harness sits above an environment and maps a neutral config into one engine's shape.

Before, the per-harness knowledge lived in the TypeScript runner, and a caller spoke directly to a transport. Now a caller builds a SessionConfig, hands it to a Harness, and the harness produces the engine-shaped config that a Backend plumbs to the runner without business logic. PiHarness, ClaudeHarness, and AgentaHarness each do different work because the harnesses differ: Pi takes built-in tool names plus native specs and never gates tool use; Claude has no built-ins, delivers tools over MCP, and gates tool use behind a permission policy; Agenta is Pi plus forced skills and a preamble.

Two backend adapters drive real engines. InProcessPiBackend runs Pi in-process through the runner and supports pi and agenta. RivetBackend drives a harness over ACP and supports pi and claude on local or Daytona. LocalBackend is a stub that raises NotImplementedError.

The browser edge is the Vercel /messages adapter. It folds inbound UIMessage input into neutral messages, emits Vercel UI Message Stream parts, stamps x-ag-messages-format and x-ag-messages-version headers, resolves the session id, and routes /load-session through a SessionStore port whose only adapter today is NoopSessionStore. The normalizer threads session_id as a request-envelope field, so it survives the round trip as a correlation value.

Key architectural decision to review

The first decision is the ownership split. The SDK owns the runtime ports and the adapters, and the service only composes them (sdks/python/agenta/sdk/agents/interfaces.py). The tradeoff: a standalone SDK user can drive Pi with no Agenta service, but the service must inject its server-side concerns (gateway tool resolution, the secret vault) through the injected adapter seams rather than reaching into the runtime. Check that Backend.supported_harnesses stays the single source of truth and that Harness.__init__ rejects an unsupported pairing before any run starts.

The second decision is that session_id is a correlation primitive, not state (adapters/vercel/routing.py, middlewares/running/normalizer.py). The cold runtime still receives the full message history on every turn. resolve_session_id mints, echoes, or rejects the id against a bounded charset, and the id is stamped onto the stream and the envelope, but nothing reads it back as conversation state yet. SessionStore is a port-only seam: NoopSessionStore returns empty history and discards writes, so /load-session answers with nothing until a real adapter lands. Confirm this is a deliberate seam and not a dropped write path.

How to review this PR

Read interfaces.py first and fix the three-layer vocabulary in your head: Backend, Environment, Harness, plus the Sandbox, Session, and SessionStore ports. Then read dtos.py for the shapes that cross those ports, especially SessionConfig (the run bundle), AgentConfig.harness_options (the per-harness escape hatch), and the PiAgentConfig / ClaudeAgentConfig / AgentaAgentConfig split where wire_tools differs per engine. Then read adapters/harnesses.py, adapters/in_process.py, and adapters/rivet.py to see the mapping and the two real backends. Read adapters/vercel/routing.py last for the browser edge.

You can skip the mcp/ subpackage and the parsing helpers at the bottom of dtos.py on a first pass; they are mechanical. The regression most likely to break is the golden wire contract: a tool-free run's /run payload must stay byte-identical, so watch any change to wire_tools, wire_mcp, or request_to_wire against golden/run_request.pi.json.

Tests / notes

The suite covers the DTO shapes, the harness adapters and their backend-support validation, the /messages and /load-session routing, the tool resolver, and a transport round trip. The wire-contract test pins the runner payload against golden JSON. The NoopSessionStore path is verified to return empty and discard, which documents the not-yet-persisted behavior rather than hiding it.

…es protocol

vercel · 2026-06-19T16:29:02Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Jun 19, 2026 8:29pm

coderabbitai · 2026-06-19T16:29:19Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added agents runtime with support for multiple harness types (Pi, Claude, Agenta)
- Introduced backend adapters (Rivet, Local, InProcess) for agent execution
- Added Model Context Protocol (MCP) server configuration and resolution
- Implemented tool configuration, parsing, and resolution framework
- Added Vercel UI message streaming integration with session management
- Introduced session persistence and message history loading

Walkthrough

Introduces a Python agents runtime subsystem with DTOs, MCP/tools resolution, runtime ports, streaming, runner transport, backend and harness adapters, Vercel message/SSE routing, agent-specific SDK wiring, and tests plus golden fixtures for the new request/result paths.

Changes

Agents Subsystem

Layer / File(s)	Summary
Core DTOs and workflow schema `sdks/python/agenta/sdk/agents/dtos.py`, `sdks/python/agenta/sdk/models/workflows.py`, `sdks/python/agenta/sdk/utils/types.py`	Defines agent/runtime DTOs, harness selection, session config, workflow request/response fields, and the agent config catalog schema.
MCP models, parsing, resolver, and wire `sdks/python/agenta/sdk/agents/mcp/*`, `sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py`	Defines MCP server models, parsing helpers, secret provider protocol, resolver, wire serialization, and MCP error types.
Tools models, parsing, resolver, and compat `sdks/python/agenta/sdk/agents/tools/`, `sdks/python/oss/tests/pytest/unit/agents/tools/`	Defines canonical and resolved tool models, tool callbacks, resolver interfaces, legacy compatibility coercion, wire helpers, tool errors, and resolver tests.
Runtime ports and AgentRun streaming `sdks/python/agenta/sdk/agents/interfaces.py`, `sdks/python/agenta/sdk/agents/streaming.py`, `sdks/python/agenta/sdk/agents/errors.py`, `sdks/python/agenta/tests/agents/test_streaming.py`, `sdks/python/oss/tests/pytest/unit/agents/conftest.py`, `sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py`	Defines backend, sandbox, session, environment, and harness ports plus AgentRun streaming and runtime errors, with lifecycle and streaming tests.
Runner transport and /run wire contract `sdks/python/agenta/sdk/agents/utils/ts_runner.py`, `sdks/python/agenta/sdk/agents/utils/wire.py`, `sdks/python/agenta/sdk/agents/utils/__init__.py`, `sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py`, `sdks/python/oss/tests/pytest/unit/agents/golden/*`	Implements HTTP and subprocess delivery, request/result wire conversion, and regression tests with golden payloads.
Backend adapters and harness adapters `sdks/python/agenta/sdk/agents/adapters/*`, `sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py`, `sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py`, `sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py`	Implements Rivet, InProcess, and Local backends plus Pi, Claude, and Agenta harness adapters, with forced Agenta defaults and adapter tests.
Vercel UI messages, SSE, stream, and routes `sdks/python/agenta/sdk/agents/adapters/vercel/*`, `sdks/python/agenta/sdk/agents/ui_messages.py`, `sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py`	Converts Vercel UI messages to neutral messages and back, frames SSE output, maps AgentRun into Vercel stream parts, and registers FastAPI message routes.
SDK routing, normalizer, and agent builtin registration `sdks/python/agenta/sdk/decorators/routing.py`, `sdks/python/agenta/sdk/middlewares/running/normalizer.py`, `sdks/python/agenta/sdk/engines/running/*`, `sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py`, `sdks/python/oss/tests/pytest/utils/test_routing.py`, `sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py`	Wires agent-only routes, Vercel streaming, session_id normalization, and agent builtin workflow registration into routing and engine registries.
Public package exports `sdks/python/agenta/__init__.py`, `sdks/python/agenta/sdk/agents/__init__.py`, `sdks/python/agenta/sdk/agents/adapters/__init__.py`, `sdks/python/agenta/sdk/agents/utils/__init__.py`	Re-exports the agents runtime API from package entrypoints and public package `__all__` lists.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Vercel AI SDK
  participant MessagesEndpoint as POST /messages
  participant Harness as PiHarness/ClaudeHarness/AgentaHarness
  participant Backend as InProcessPiBackend/RivetBackend
  participant TSRunner as TypeScript Runner
  participant VercelStream as agent_run_to_vercel_parts

  Client->>MessagesEndpoint: {messages, session_id, stream:true}
  MessagesEndpoint->>MessagesEndpoint: resolve/mint session_id
  MessagesEndpoint->>MessagesEndpoint: vercel_ui_messages_to_messages
  MessagesEndpoint->>Harness: stream(session_config, messages)
  Harness->>Harness: _to_harness_config(session_config)
  Harness->>Backend: create_session(sandbox, harness_config)
  Backend->>TSRunner: deliver_subprocess_stream(payload)
  TSRunner-->>Backend: NDJSON records
  Backend-->>Harness: AgentRun
  Harness-->>MessagesEndpoint: WorkflowStreamingResponse(AgentRun)
  MessagesEndpoint->>VercelStream: agent_run_to_vercel_parts(AgentRun)
  VercelStream-->>Client: SSE data: {start} ... {text-delta} ... {finish}\ndata: [DONE]

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Agenta-AI/agenta#4443: Shares routing/middleware work around request context propagation into tracing and is adjacent to the new agent request envelope handling.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 32.18% which is insufficient. The required threshold is 60.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(sdk): agent runtime behind backend/harness ports' clearly and concisely summarizes the main change: introducing an agent runtime architecture structured around backend and harness ports.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the three-layer architecture (Backend, Environment, Harness), key design decisions, and implementation details across multiple modules.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-sdk-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mmabrouk · 2026-06-19T16:34:13Z

Reviewer guide: interesting code

A few pointers to the load-bearing decisions, so review time goes to the parts that matter.

sdk/agents/interfaces.py:140 and interfaces.py:248 — the backend/harness validation matrix: each backend declares supported_harnesses, and the Harness constructor rejects an environment whose backend cannot drive it, so a bad pairing fails at construction rather than mid-run.
sdk/agents/adapters/harnesses.py:83 — ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor; this is the clearest spot where the adapters do genuinely divergent work.
sdk/agents/interfaces.py:89 — SessionStore is a port-only seam with a NoopSessionStore default; the cold runtime still gets full history every turn, so nothing persists yet and the platform store attaches here later.
sdk/agents/adapters/vercel/routing.py:26 — session_id is validated against a bounded charset and minted when absent, then carried as an envelope field (not a header) and stamped onto the first Vercel start part's messageMetadata.
sdk/agents/tools/resolver.py — ToolResolver turns canonical ToolConfig into runner-ready ToolSpec through an injected secret provider and gateway resolver; the gateway resolver is None here and lands server-side in feat(agent): agent workflow service and tool-resolution API #4772, so only the offline executors resolve in the SDK.
sdk/agents/dtos.py:546 — SessionConfig exposes the resolved tools under two names (builtin_names/builtin_tools, tool_specs/custom_tools) via alias choices; the same coercion lives in ResolvedToolSet, and the back-compat names must keep working.
sdk/agents/adapters/local.py — LocalBackend raises on every method by design; it is the next backend's skeleton, present so the adapter layout and port shape are visible.

mmabrouk · 2026-06-19T16:34:24Z

+    harness_type: ClassVar[HarnessType]
+
+    def __init__(self, environment: Environment) -> None:
+        if not environment.backend.supports(self.harness_type):


This is the validation matrix gate: a Harness can only wrap an Environment whose backend lists its harness_type in supported_harnesses. ClaudeHarness over InProcessPiBackend, or AgentaHarness over RivetBackend, raises here at construction.

mmabrouk · 2026-06-19T16:34:25Z

+        # Claude has no Pi built-in tools; drop them rather than ship a name Claude cannot
+        # honor. Tools go over MCP, and Claude gates tool use, so the permission policy is
+        # carried through.
+        if config.builtin_names:


Claude has no Pi built-in tools, so they are dropped with a warning rather than shipped as a name Claude cannot honor. This is the cleanest example of an adapter sending only what its harness understands.

mmabrouk · 2026-06-19T16:34:26Z

+from .messages import message_to_vercel_ui_message, vercel_ui_messages_to_messages
+
+# An opaque, project-scoped session id (RFC §4.1): bounded length, restricted charset.
+_SESSION_ID_RE = re.compile(r"^[A-Za-z0-9._:-]{1,128}$")


session_id is a project-scoped opaque token validated against a bounded charset/length and minted when absent, carried as an envelope field rather than a header. Worth confirming the charset is wide enough for the platform's id format.

mmabrouk · 2026-06-19T16:34:27Z

                )
                consumed.add(name)

+            elif name == "session_id":


This maps the request envelope's session_id into a handler parameter of the same name, which is how the /messages session threads into the agent handler without living in request.data.inputs.

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (2)

sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py (1)

86-96: ⚡ Quick win

Add regression coverage for agent shape with missing tools

Please add a test where {"agent": {"instructions": "I"}} + defaults verifies tools still inherit from defaults. This would have caught the current fallback bug.

sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py (1)

36-46: ⚡ Quick win

Add a regression test for string needs_approval values.

Given legacy payloads may carry "false"/"true" as strings, add a case asserting "false" does not become True after coercion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3746d41f-c884-49c4-8834-df9bd68dfb03

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and b9e62f9.

📒 Files selected for processing (68)

sdks/python/agenta/__init__.py
sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/__init__.py
sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
sdks/python/agenta/sdk/agents/adapters/harnesses.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/local.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py
sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
sdks/python/agenta/sdk/agents/adapters/vercel/routing.py
sdks/python/agenta/sdk/agents/adapters/vercel/sse.py
sdks/python/agenta/sdk/agents/adapters/vercel/stream.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/agenta/sdk/agents/interfaces.py
sdks/python/agenta/sdk/agents/mcp/__init__.py
sdks/python/agenta/sdk/agents/mcp/errors.py
sdks/python/agenta/sdk/agents/mcp/interfaces.py
sdks/python/agenta/sdk/agents/mcp/models.py
sdks/python/agenta/sdk/agents/mcp/parsing.py
sdks/python/agenta/sdk/agents/mcp/resolver.py
sdks/python/agenta/sdk/agents/mcp/wire.py
sdks/python/agenta/sdk/agents/streaming.py
sdks/python/agenta/sdk/agents/tools/__init__.py
sdks/python/agenta/sdk/agents/tools/compat.py
sdks/python/agenta/sdk/agents/tools/errors.py
sdks/python/agenta/sdk/agents/tools/interfaces.py
sdks/python/agenta/sdk/agents/tools/models.py
sdks/python/agenta/sdk/agents/tools/parsing.py
sdks/python/agenta/sdk/agents/tools/resolver.py
sdks/python/agenta/sdk/agents/tools/wire.py
sdks/python/agenta/sdk/agents/ui_messages.py
sdks/python/agenta/sdk/agents/utils/__init__.py
sdks/python/agenta/sdk/agents/utils/ts_runner.py
sdks/python/agenta/sdk/agents/utils/wire.py
sdks/python/agenta/sdk/decorators/routing.py
sdks/python/agenta/sdk/engines/running/interfaces.py
sdks/python/agenta/sdk/engines/running/utils.py
sdks/python/agenta/sdk/middlewares/running/normalizer.py
sdks/python/agenta/sdk/models/workflows.py
sdks/python/agenta/sdk/utils/types.py
sdks/python/agenta/tests/agents/test_streaming.py
sdks/python/oss/tests/pytest/integration/agents/__init__.py
sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py
sdks/python/oss/tests/pytest/unit/agents/__init__.py
sdks/python/oss/tests/pytest/unit/agents/conftest.py
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
sdks/python/oss/tests/pytest/unit/agents/mcp/__init__.py
sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
sdks/python/oss/tests/pytest/unit/agents/tools/__init__.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_models.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py
sdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.py
sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py
sdks/python/oss/tests/pytest/utils/test_routing.py

coderabbitai · 2026-06-19T16:45:30Z

+    AgentaHarness,
+    ClaudeHarness,
+    InProcessPiBackend,
+    LocalBackend,


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exporting LocalBackend as stable public API until it is implemented.

LocalBackend is currently a guaranteed runtime failure path (create_sandbox/create_session raise NotImplementedError in sdks/python/agenta/sdk/agents/adapters/local.py). Re-exporting it here makes that incomplete adapter look production-ready.

Prefer removing it from public exports for now, or clearly gating it as experimental/internal.

Also applies to: 178-178

coderabbitai · 2026-06-19T16:45:30Z

+    def __init__(
+        self,
+        backend: "InProcessPiBackend",
+        config: HarnessAgentConfig,
+        *,
+        secrets: Optional[Mapping[str, str]],
+        trace: Optional[TraceContext],
+        session_id: Optional[str],
+    ) -> None:
+        self._backend = backend
+        self._config = config
+        self._secrets = dict(secrets or {})
+        self._trace = trace
+        self._session_id = session_id
+
+    @property
+    def id(self) -> Optional[str]:
+        return self._session_id
+
+    def _wire_payload(self, messages: Sequence[Message]) -> Dict[str, Any]:
+        """The ``/run`` request JSON for this turn (shared by ``prompt`` and ``stream``)."""
+        return request_to_wire(
+            engine=InProcessPiBackend._ENGINE,
+            harness=HarnessType.PI,
+            sandbox="local",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the requested harness type in the wire payload.

create_session accepts harness, but the session drops it and _wire_payload always sends HarnessType.PI (Line 76). For Agenta runs, this serializes the wrong harness across the backend boundary.

Suggested fix

class InProcessPiSession(Session): @@ def __init__( self, backend: "InProcessPiBackend", config: HarnessAgentConfig, *, + harness: HarnessType, secrets: Optional[Mapping[str, str]], trace: Optional[TraceContext], session_id: Optional[str], ) -> None: self._backend = backend self._config = config + self._harness = harness self._secrets = dict(secrets or {}) self._trace = trace self._session_id = session_id @@ return request_to_wire( engine=InProcessPiBackend._ENGINE, - harness=HarnessType.PI, + harness=self._harness, sandbox="local", config=self._config, messages=messages, secrets=self._secrets, trace=self._trace, session_id=self._session_id, ) @@ async def create_session( @@ ) -> InProcessPiSession: return InProcessPiSession( self, config, + harness=harness, secrets=secrets, trace=trace, session_id=session_id, )

Also applies to: 137-153

coderabbitai · 2026-06-19T16:45:30Z

+    supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+
+    async def create_sandbox(self) -> Sandbox:
+        raise NotImplementedError(
+            "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+            "Phase 4: Claude via claude-agent-sdk)."
+        )
+
+    async def create_session(
+        self,
+        sandbox: Sandbox,
+        config: HarnessAgentConfig,
+        *,
+        harness: HarnessType,
+        secrets: Optional[Mapping[str, str]] = None,
+        trace: Optional[TraceContext] = None,
+        session_id: Optional[str] = None,
+    ) -> Session:
+        raise NotImplementedError(
+            "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
+            "Phase 4: Claude via claude-agent-sdk)."
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid advertising harness support before implementation exists.

LocalBackend declares PI/CLAUDE in supported_harnesses (Line 27), but both creation methods always raise NotImplementedError (Lines 30-48). This defers failure to runtime instead of failing fast on compatibility checks.

Suggested fail-fast adjustment

- supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE}) + supported_harnesses = frozenset()

coderabbitai · 2026-06-19T16:45:30Z

+    async def load_session_endpoint(req: Request, request: LoadSessionRequest):
+        messages = await store.load(request.session_id)
+        response = LoadSessionResponse(
+            session_id=request.session_id,
+            messages=[
+                message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
+                for idx, message in enumerate(messages, start=1)
+            ],
+        )
+        return set_vercel_message_protocol_headers(
+            JSONResponse(content=response.model_dump(mode="json"))
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate session_id in /load-session before hitting SessionStore.

Line 159 forwards raw request.session_id to store.load(...) without the same charset/length gate used by /messages (Lines 84-93). This creates an inconsistent trust boundary and can expose storage adapters to unsafe identifiers.

Suggested patch

async def load_session_endpoint(req: Request, request: LoadSessionRequest): - messages = await store.load(request.session_id) + session_id = resolve_session_id(request.session_id) + if session_id is None: + return set_vercel_message_protocol_headers( + JSONResponse( + status_code=400, + content={ + "detail": "session_id violates the allowed charset/length" + }, + ) + ) + messages = await store.load(session_id) response = LoadSessionResponse( - session_id=request.session_id, + session_id=session_id, messages=[ message_to_vercel_ui_message(message, message_id=f"msg-{idx}") for idx, message in enumerate(messages, start=1) ], )

coderabbitai · 2026-06-19T16:45:30Z

+
+# Permission policy for harness tool use in a headless run. ``auto`` approves (tools are
+# backend-resolved and trusted, no human to prompt); ``deny`` rejects.
+PermissionPolicy = str  # "auto" | "deny"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate permission_policy instead of accepting arbitrary strings

PermissionPolicy is documented as "auto" | "deny" but currently typed as str, so invalid values flow through until downstream failure. Enforce this at DTO boundaries.

Proposed fix

-from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union +from typing import Any, Callable, ClassVar, Dict, List, Literal, Optional, Tuple, Union @@ -PermissionPolicy = str # "auto" | "deny" +PermissionPolicy = Literal["auto", "deny"]

Also applies to: 363-379, 502-503, 559-559

coderabbitai · 2026-06-19T16:45:30Z

+    if "needs_approval" in source:
+        result["needs_approval"] = bool(source["needs_approval"])
+    if isinstance(source.get("render"), dict):


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

needs_approval coercion is semantically wrong for string inputs.

Line 54 uses bool(source["needs_approval"]), so values like "false" become True. That flips approval gating behavior for legacy payloads.

Proposed fix

def _copy_tool_metadata( source: dict[str, Any], target: dict[str, Any] ) -> dict[str, Any]: result = dict(target) if "needs_approval" in source: - result["needs_approval"] = bool(source["needs_approval"]) + result["needs_approval"] = source["needs_approval"] if isinstance(source.get("render"), dict): result["render"] = dict(source["render"]) return result

coderabbitai · 2026-06-19T16:45:31Z

+        if on_error == "raise":
+            raise error
+        diagnostics.append(ToolConfigDiagnostic(index=index, message=str(error)))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate on_error at runtime to prevent silent fallback behavior.

If callers pass an invalid value (e.g., typo), current logic silently behaves like "collect". Fail fast to avoid hidden parse-policy changes.

Proposed fix

def coerce_tool_configs( values: Optional[Sequence[Any]], *, on_error: Literal["raise", "collect"] = "raise", ) -> ToolConfigParseResult: """Convert legacy values, either raising or returning structured diagnostics.""" + if on_error not in {"raise", "collect"}: + raise ValueError("on_error must be 'raise' or 'collect'") + tool_configs: list[ToolConfig] = [] diagnostics: list[ToolConfigDiagnostic] = []

coderabbitai · 2026-06-19T16:45:31Z

+    if response.status_code >= 500:
+        raise RuntimeError(
+            f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
+        )
+    return response.json()


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle all non-2xx HTTP statuses as transport failures.

Only 5xx is handled today; 4xx responses fall through and may surface as opaque JSON parse errors instead of clear runner failures.

Proposed fix

- if response.status_code >= 500: + if response.status_code >= 400: raise RuntimeError( f"Agent runner HTTP {response.status_code}: {response.text[:1000]}" ) @@ - if response.status_code >= 500: + if response.status_code >= 400: body = await response.aread() raise RuntimeError( f"Agent runner HTTP {response.status_code}: {body[:1000]!r}" )

Also applies to: 108-113

coderabbitai · 2026-06-19T16:45:31Z

+            async for line in response.aiter_lines():
+                line = line.strip()
+                if line:
+                    yield json.loads(line)
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce a terminal stream result (or raise a transport error).

Both streaming transports can end cleanly when the runner disconnects/exits early, which leaves downstream AgentRun without a terminal result and can hide backend failures.

Proposed fix

async def deliver_http_stream( @@ - async with httpx.AsyncClient(timeout=timeout) as client: + saw_result = False + async with httpx.AsyncClient(timeout=timeout) as client: async with client.stream( "POST", url, json=payload, headers=headers ) as response: @@ async for line in response.aiter_lines(): line = line.strip() if line: - yield json.loads(line) + record = json.loads(line) + if record.get("kind") == "result": + saw_result = True + yield record + if not saw_result: + raise RuntimeError( + "Agent runner stream ended without a terminal result record" + ) @@ async def deliver_subprocess_stream( @@ - try: + saw_result = False + try: while True: @@ line = raw.decode("utf-8", "replace").strip() if line: - yield json.loads(line) + record = json.loads(line) + if record.get("kind") == "result": + saw_result = True + yield record await proc.wait() + err = (await proc.stderr.read()).decode("utf-8", "replace") + if proc.returncode not in (0, None): + raise RuntimeError( + f"Agent runner stream failed. exit={proc.returncode} stderr={err[-2000:]}" + ) + if not saw_result: + raise RuntimeError( + f"Agent runner stream ended without terminal result. stderr={err[-2000:]}" + ) finally: if proc.returncode is None: proc.kill() await proc.wait()

Also applies to: 147-160

coderabbitai · 2026-06-19T16:45:31Z

+    text = res.text
+    assert '"sessionId": "sess_abc"' in text  # stamped onto the start part
+    assert '"type": "text-delta"' in text
+    assert "data: [DONE]" in text
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the SSE session-id check structure-aware instead of whitespace-dependent.

Line 196 matches a literal JSON substring ('"sessionId": "sess_abc"'), which can fail on harmless serializer formatting changes.

Suggested test hardening

text = res.text - assert '"sessionId": "sess_abc"' in text # stamped onto the start part + payloads = [ + json.loads(line.removeprefix("data: ")) + for line in text.splitlines() + if line.startswith("data: ") and line != "data: [DONE]" + ] + start = next(p for p in payloads if p.get("type") == "start") + assert start["messageMetadata"]["sessionId"] == "sess_abc" assert '"type": "text-delta"' in text assert "data: [DONE]" in text

mmabrouk · 2026-06-19T17:51:27Z

Reviewer guide: interesting code

A few spots worth landing on first:

sdks/python/agenta/sdk/agents/interfaces.py:140 — Backend.supported_harnesses is the single source of truth for what an engine can drive; Harness.__init__ validates against it before any run.
sdks/python/agenta/sdk/agents/interfaces.py:111 — NoopSessionStore returns empty history and discards writes, which is the port-only seam behind /load-session until a real store lands.
sdks/python/agenta/sdk/agents/dtos.py:524 — AgentaAgentConfig extends PiAgentConfig and only adds forced skills, which is the cleanest read on "Agenta is Pi with an opinion".
sdks/python/agenta/sdk/agents/adapters/in_process.py:118 — InProcessPiBackend is the reference backend; note it is deliberately not a subclass of RivetBackend even though they share wire helpers.
sdks/python/agenta/sdk/agents/adapters/harnesses.py:85 — ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor.
sdks/python/agenta/sdk/agents/adapters/vercel/routing.py:43 — resolve_session_id mints, echoes, or rejects the session id against a bounded charset; this is where session_id enters the run as a correlation value.

mmabrouk · 2026-06-19T17:51:44Z

+    """
+
+    #: The single source of truth for what this engine can run.
+    supported_harnesses: ClassVar[FrozenSet[HarnessType]] = frozenset()


This class var is the one place an engine declares its supported harnesses. The split below keeps backends as pure plumbing: they never branch on a harness name, they only check membership here.

mmabrouk · 2026-06-19T17:51:45Z

+
+    def __init__(self, environment: Environment) -> None:
+        if not environment.backend.supports(self.harness_type):
+            raise UnsupportedHarnessError(self.harness_type, environment.backend)


Validation happens at harness construction, before any sandbox or session exists, so an unsupported backend/harness pairing fails fast rather than mid-run.

mmabrouk · 2026-06-19T17:51:46Z

+        # carried through.
+        if config.builtin_names:
+            log.warning(
+                "ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept",


Worth confirming a warning is the right level here. A config that names Pi built-ins but runs on Claude silently loses those tools; a stored agent could behave differently across harnesses without an obvious signal.

mmabrouk · 2026-06-19T17:51:47Z

+    return response
+
+
+def resolve_session_id(session_id: Optional[str]) -> Optional[str]:


This is the only gate on the session id. Returning None on an invalid id drives the 400 in the endpoint; a minted id uses sess_ + uuid4 hex, which stays inside the allowed charset.

mmabrouk

Codex subagent review for #4771

Findings:

Blocking: sdks/python/agenta/sdk/agents/adapters/rivet.py:36 and sdks/python/agenta/sdk/agents/adapters/in_process.py:36 make the default runner-backed path point at pnpm exec tsx src/cli.ts, but this PR does not add services/agent/src/cli.ts or the runner package. The public SDK example also uses RivetBackend() with no url, command, or cwd (sdks/python/agenta/sdk/agents/__init__.py:19), while the integration test only proves transport behavior by injecting a fake Python runner (sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py:81). Merged alone, and for #4772 stacked on it, the advertised default SDK runtime fails before any harness starts unless later runner assets from #4773/#4778 are present and the process cwd happens to be right. Please either require an explicit url/command until the runner lands, stack/retarget this runtime on the runner PR, or include the runnable runner assets plus an end-to-end test that exercises the default path.
sdks/python/agenta/sdk/agents/dtos.py:680 drops default tools whenever a dedicated agent dict is present but omits tools. The from_params docstring says unset fields fall back to defaults, and the MCP/harness-option paths do that, but this branch returns None; the constructor then passes tools=_as_list(None) and silently clears defaults.tools. A partial override such as { "agent": { "model": "..." } } will run tool-free. Please fall back to defaults.tools when the key is absent and add a partial-agent test.

Stack note: #4771 does contain the Python utils/wire.py serializer and golden fixtures. #4773 still advertises independence from main, but its protocol docs point at those SDK files and one advertised test imports src/engines/pi.ts, which only lands in the later runner-engine PR. Please align the stack-nav/review map so reviewers know which PR supplies the wire fixtures and runner assets.

I did not run tests locally; this review used the GitHub patch/head files.

coderabbitai

Actionable comments posted: 1

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5388d34f-2c5e-4260-b4e6-13176aece5f9

📥 Commits

Reviewing files that changed from the base of the PR and between b9e62f9 and 741fc73.

📒 Files selected for processing (9)

sdks/python/agenta/sdk/agents/__init__.py
sdks/python/agenta/sdk/agents/adapters/_runner_config.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/dtos.py
sdks/python/agenta/sdk/agents/errors.py
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py

🚧 Files skipped from review as they are similar to previous changes (6)

sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
sdks/python/agenta/sdk/agents/init.py
sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
sdks/python/agenta/sdk/agents/adapters/in_process.py
sdks/python/agenta/sdk/agents/adapters/rivet.py
sdks/python/agenta/sdk/agents/dtos.py

coderabbitai · 2026-06-19T20:33:41Z

+    if url:
+        return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
+    if command is not None:
+        return list(command)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty command at config time.

At Line 22 and Line 24, command=[] is accepted and propagated as _command, which creates an unusable subprocess transport and fails later at runtime. Validate non-empty command in resolve_runner_command so misconfiguration fails fast with AgentRunnerConfigurationError.

Suggested fix

def resolve_runner_command( @@ ) -> List[str]: + def _validated_command(raw: Sequence[str]) -> List[str]: + cmd = list(raw) + if not cmd: + raise AgentRunnerConfigurationError( + f"{backend_name} received an empty command. " + "Pass a non-empty command, pass url for an HTTP runner, " + f"or set cwd to a runner wrapper containing {RUNNER_CLI_PATH.as_posix()}." + ) + return cmd + if url: - return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND) + return _validated_command(command) if command is not None else list(DEFAULT_RUNNER_COMMAND) if command is not None: - return list(command) + return _validated_command(command)

feat(sdk): agent runtime ports, adapters, tool resolution, and messag…

b9e62f9

…es protocol

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 19, 2026

dosubot Bot added Backend feature python Pull requests that update Python code SDK labels Jun 19, 2026

vercel Bot deployed to Preview June 19, 2026 16:29 View deployment

mmabrouk mentioned this pull request Jun 19, 2026

feat(agent): agent workflow service and tool-resolution API #4772

Open

mmabrouk commented Jun 19, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

mmabrouk commented Jun 19, 2026

View reviewed changes

This was referenced Jun 19, 2026

docs(agent): agent-workflows design and ground truth #4777

Open

feat(agent): runner engines, server, and tracing #4774

Open

mmabrouk commented Jun 19, 2026

View reviewed changes

mmabrouk mentioned this pull request Jun 19, 2026

feat(frontend): agent chat streaming slice + RAG example demo #4780

Open

fix(sdk): validate agent runner configuration

741fc73

vercel Bot deployed to Preview June 19, 2026 20:29 View deployment

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

		return response


		def resolve_session_id(session_id: Optional[str]) -> Optional[str]:

Conversation

mmabrouk commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent-workflows: functional PR set

Context

What this changes

Key architectural decision to review

How to review this PR

Tests / notes

Uh oh!

vercel Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: interesting code

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Jun 19, 2026

Reviewer guide: interesting code

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

mmabrouk commented Jun 19, 2026 •

edited

Loading

vercel Bot commented Jun 19, 2026 •

edited

Loading

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading