Skip to content

feat(sdk): agent runtime behind backend/harness ports#4771

Open
mmabrouk wants to merge 2 commits into
mainfrom
feat/agent-sdk-runtime
Open

feat(sdk): agent runtime behind backend/harness ports#4771
mmabrouk wants to merge 2 commits into
mainfrom
feat/agent-sdk-runtime

Conversation

@mmabrouk

@mmabrouk mmabrouk commented Jun 19, 2026

Copy link
Copy Markdown
Member

Agent-workflows: functional PR set

Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off main; two pairs are stacked. This PR's base is main.

Context

The agent runtime turns a stored agent definition into a live coding-agent turn: it picks an engine, shapes the config that engine wants, runs one prompt, and streams the reply back to a browser. This PR puts that whole runtime in the SDK under sdks/python/agenta/sdk/agents/, structured as ports and adapters. It targets main and is independent. It is a functional slice that shows the final code, so the service PR that composes these adapters stacks on top of it.

What this changes

The runtime now reads as three layers behind interfaces. Backend is the engine: it declares which harnesses it can drive and owns sandbox plus session lifecycle. Environment sits above a backend and owns the sandbox-per-session policy. Harness sits above an environment and maps a neutral config into one engine's shape.

Before, the per-harness knowledge lived in the TypeScript runner, and a caller spoke directly to a transport. Now a caller builds a SessionConfig, hands it to a Harness, and the harness produces the engine-shaped config that a Backend plumbs to the runner without business logic. PiHarness, ClaudeHarness, and AgentaHarness each do different work because the harnesses differ: Pi takes built-in tool names plus native specs and never gates tool use; Claude has no built-ins, delivers tools over MCP, and gates tool use behind a permission policy; Agenta is Pi plus forced skills and a preamble.

Two backend adapters drive real engines. InProcessPiBackend runs Pi in-process through the runner and supports pi and agenta. RivetBackend drives a harness over ACP and supports pi and claude on local or Daytona. LocalBackend is a stub that raises NotImplementedError.

The browser edge is the Vercel /messages adapter. It folds inbound UIMessage input into neutral messages, emits Vercel UI Message Stream parts, stamps x-ag-messages-format and x-ag-messages-version headers, resolves the session id, and routes /load-session through a SessionStore port whose only adapter today is NoopSessionStore. The normalizer threads session_id as a request-envelope field, so it survives the round trip as a correlation value.

Key architectural decision to review

The first decision is the ownership split. The SDK owns the runtime ports and the adapters, and the service only composes them (sdks/python/agenta/sdk/agents/interfaces.py). The tradeoff: a standalone SDK user can drive Pi with no Agenta service, but the service must inject its server-side concerns (gateway tool resolution, the secret vault) through the injected adapter seams rather than reaching into the runtime. Check that Backend.supported_harnesses stays the single source of truth and that Harness.__init__ rejects an unsupported pairing before any run starts.

The second decision is that session_id is a correlation primitive, not state (adapters/vercel/routing.py, middlewares/running/normalizer.py). The cold runtime still receives the full message history on every turn. resolve_session_id mints, echoes, or rejects the id against a bounded charset, and the id is stamped onto the stream and the envelope, but nothing reads it back as conversation state yet. SessionStore is a port-only seam: NoopSessionStore returns empty history and discards writes, so /load-session answers with nothing until a real adapter lands. Confirm this is a deliberate seam and not a dropped write path.

How to review this PR

Read interfaces.py first and fix the three-layer vocabulary in your head: Backend, Environment, Harness, plus the Sandbox, Session, and SessionStore ports. Then read dtos.py for the shapes that cross those ports, especially SessionConfig (the run bundle), AgentConfig.harness_options (the per-harness escape hatch), and the PiAgentConfig / ClaudeAgentConfig / AgentaAgentConfig split where wire_tools differs per engine. Then read adapters/harnesses.py, adapters/in_process.py, and adapters/rivet.py to see the mapping and the two real backends. Read adapters/vercel/routing.py last for the browser edge.

You can skip the mcp/ subpackage and the parsing helpers at the bottom of dtos.py on a first pass; they are mechanical. The regression most likely to break is the golden wire contract: a tool-free run's /run payload must stay byte-identical, so watch any change to wire_tools, wire_mcp, or request_to_wire against golden/run_request.pi.json.

Tests / notes

The suite covers the DTO shapes, the harness adapters and their backend-support validation, the /messages and /load-session routing, the tool resolver, and a transport round trip. The wire-contract test pins the runner payload against golden JSON. The NoopSessionStore path is verified to return empty and discard, which documents the not-yet-persisted behavior rather than hiding it.

@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 19, 2026
@vercel

vercel Bot commented Jun 19, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 19, 2026 8:29pm

Request Review

@dosubot dosubot Bot added Backend feature python Pull requests that update Python code SDK labels Jun 19, 2026
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Added agents runtime with support for multiple harness types (Pi, Claude, Agenta)
    • Introduced backend adapters (Rivet, Local, InProcess) for agent execution
    • Added Model Context Protocol (MCP) server configuration and resolution
    • Implemented tool configuration, parsing, and resolution framework
    • Added Vercel UI message streaming integration with session management
    • Introduced session persistence and message history loading

Walkthrough

Introduces a Python agents runtime subsystem with DTOs, MCP/tools resolution, runtime ports, streaming, runner transport, backend and harness adapters, Vercel message/SSE routing, agent-specific SDK wiring, and tests plus golden fixtures for the new request/result paths.

Changes

Agents Subsystem

Layer / File(s) Summary
Core DTOs and workflow schema
sdks/python/agenta/sdk/agents/dtos.py, sdks/python/agenta/sdk/models/workflows.py, sdks/python/agenta/sdk/utils/types.py
Defines agent/runtime DTOs, harness selection, session config, workflow request/response fields, and the agent config catalog schema.
MCP models, parsing, resolver, and wire
sdks/python/agenta/sdk/agents/mcp/*, sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
Defines MCP server models, parsing helpers, secret provider protocol, resolver, wire serialization, and MCP error types.
Tools models, parsing, resolver, and compat
sdks/python/agenta/sdk/agents/tools/*, sdks/python/oss/tests/pytest/unit/agents/tools/*
Defines canonical and resolved tool models, tool callbacks, resolver interfaces, legacy compatibility coercion, wire helpers, tool errors, and resolver tests.
Runtime ports and AgentRun streaming
sdks/python/agenta/sdk/agents/interfaces.py, sdks/python/agenta/sdk/agents/streaming.py, sdks/python/agenta/sdk/agents/errors.py, sdks/python/agenta/tests/agents/test_streaming.py, sdks/python/oss/tests/pytest/unit/agents/conftest.py, sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
Defines backend, sandbox, session, environment, and harness ports plus AgentRun streaming and runtime errors, with lifecycle and streaming tests.
Runner transport and /run wire contract
sdks/python/agenta/sdk/agents/utils/ts_runner.py, sdks/python/agenta/sdk/agents/utils/wire.py, sdks/python/agenta/sdk/agents/utils/__init__.py, sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py, sdks/python/oss/tests/pytest/unit/agents/golden/*
Implements HTTP and subprocess delivery, request/result wire conversion, and regression tests with golden payloads.
Backend adapters and harness adapters
sdks/python/agenta/sdk/agents/adapters/*, sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py, sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py, sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
Implements Rivet, InProcess, and Local backends plus Pi, Claude, and Agenta harness adapters, with forced Agenta defaults and adapter tests.
Vercel UI messages, SSE, stream, and routes
sdks/python/agenta/sdk/agents/adapters/vercel/*, sdks/python/agenta/sdk/agents/ui_messages.py, sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
Converts Vercel UI messages to neutral messages and back, frames SSE output, maps AgentRun into Vercel stream parts, and registers FastAPI message routes.
SDK routing, normalizer, and agent builtin registration
sdks/python/agenta/sdk/decorators/routing.py, sdks/python/agenta/sdk/middlewares/running/normalizer.py, sdks/python/agenta/sdk/engines/running/*, sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py, sdks/python/oss/tests/pytest/utils/test_routing.py, sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
Wires agent-only routes, Vercel streaming, session_id normalization, and agent builtin workflow registration into routing and engine registries.
Public package exports
sdks/python/agenta/__init__.py, sdks/python/agenta/sdk/agents/__init__.py, sdks/python/agenta/sdk/agents/adapters/__init__.py, sdks/python/agenta/sdk/agents/utils/__init__.py
Re-exports the agents runtime API from package entrypoints and public package __all__ lists.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Vercel AI SDK
  participant MessagesEndpoint as POST /messages
  participant Harness as PiHarness/ClaudeHarness/AgentaHarness
  participant Backend as InProcessPiBackend/RivetBackend
  participant TSRunner as TypeScript Runner
  participant VercelStream as agent_run_to_vercel_parts

  Client->>MessagesEndpoint: {messages, session_id, stream:true}
  MessagesEndpoint->>MessagesEndpoint: resolve/mint session_id
  MessagesEndpoint->>MessagesEndpoint: vercel_ui_messages_to_messages
  MessagesEndpoint->>Harness: stream(session_config, messages)
  Harness->>Harness: _to_harness_config(session_config)
  Harness->>Backend: create_session(sandbox, harness_config)
  Backend->>TSRunner: deliver_subprocess_stream(payload)
  TSRunner-->>Backend: NDJSON records
  Backend-->>Harness: AgentRun
  Harness-->>MessagesEndpoint: WorkflowStreamingResponse(AgentRun)
  MessagesEndpoint->>VercelStream: agent_run_to_vercel_parts(AgentRun)
  VercelStream-->>Client: SSE data: {start} ... {text-delta} ... {finish}\ndata: [DONE]
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • Agenta-AI/agenta#4443: Shares routing/middleware work around request context propagation into tracing and is adjacent to the new agent request envelope handling.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.18% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(sdk): agent runtime behind backend/harness ports' clearly and concisely summarizes the main change: introducing an agent runtime architecture structured around backend and harness ports.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the three-layer architecture (Backend, Environment, Harness), key design decisions, and implementation details across multiple modules.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-sdk-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

A few pointers to the load-bearing decisions, so review time goes to the parts that matter.

  • sdk/agents/interfaces.py:140 and interfaces.py:248 — the backend/harness validation matrix: each backend declares supported_harnesses, and the Harness constructor rejects an environment whose backend cannot drive it, so a bad pairing fails at construction rather than mid-run.
  • sdk/agents/adapters/harnesses.py:83ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor; this is the clearest spot where the adapters do genuinely divergent work.
  • sdk/agents/interfaces.py:89SessionStore is a port-only seam with a NoopSessionStore default; the cold runtime still gets full history every turn, so nothing persists yet and the platform store attaches here later.
  • sdk/agents/adapters/vercel/routing.py:26session_id is validated against a bounded charset and minted when absent, then carried as an envelope field (not a header) and stamped onto the first Vercel start part's messageMetadata.
  • sdk/agents/tools/resolver.pyToolResolver turns canonical ToolConfig into runner-ready ToolSpec through an injected secret provider and gateway resolver; the gateway resolver is None here and lands server-side in feat(agent): agent workflow service and tool-resolution API #4772, so only the offline executors resolve in the SDK.
  • sdk/agents/dtos.py:546SessionConfig exposes the resolved tools under two names (builtin_names/builtin_tools, tool_specs/custom_tools) via alias choices; the same coercion lives in ResolvedToolSet, and the back-compat names must keep working.
  • sdk/agents/adapters/local.pyLocalBackend raises on every method by design; it is the next backend's skeleton, present so the adapter layout and port shape are visible.

harness_type: ClassVar[HarnessType]

def __init__(self, environment: Environment) -> None:
if not environment.backend.supports(self.harness_type):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the validation matrix gate: a Harness can only wrap an Environment whose backend lists its harness_type in supported_harnesses. ClaudeHarness over InProcessPiBackend, or AgentaHarness over RivetBackend, raises here at construction.

# Claude has no Pi built-in tools; drop them rather than ship a name Claude cannot
# honor. Tools go over MCP, and Claude gates tool use, so the permission policy is
# carried through.
if config.builtin_names:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude has no Pi built-in tools, so they are dropped with a warning rather than shipped as a name Claude cannot honor. This is the cleanest example of an adapter sending only what its harness understands.

from .messages import message_to_vercel_ui_message, vercel_ui_messages_to_messages

# An opaque, project-scoped session id (RFC §4.1): bounded length, restricted charset.
_SESSION_ID_RE = re.compile(r"^[A-Za-z0-9._:-]{1,128}$")

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session_id is a project-scoped opaque token validated against a bounded charset/length and minted when absent, carried as an envelope field rather than a header. Worth confirming the charset is wide enough for the platform's id format.

)
consumed.add(name)

elif name == "session_id":

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps the request envelope's session_id into a handler parameter of the same name, which is how the /messages session threads into the agent handler without living in request.data.inputs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🧹 Nitpick comments (2)
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py (1)

86-96: ⚡ Quick win

Add regression coverage for agent shape with missing tools

Please add a test where {"agent": {"instructions": "I"}} + defaults verifies tools still inherit from defaults. This would have caught the current fallback bug.

sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py (1)

36-46: ⚡ Quick win

Add a regression test for string needs_approval values.

Given legacy payloads may carry "false"/"true" as strings, add a case asserting "false" does not become True after coercion.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3746d41f-c884-49c4-8834-df9bd68dfb03

📥 Commits

Reviewing files that changed from the base of the PR and between a97e608 and b9e62f9.

📒 Files selected for processing (68)
  • sdks/python/agenta/__init__.py
  • sdks/python/agenta/sdk/agents/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/agenta_builtins.py
  • sdks/python/agenta/sdk/agents/adapters/harnesses.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/local.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/messages.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/routing.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/sse.py
  • sdks/python/agenta/sdk/agents/adapters/vercel/stream.py
  • sdks/python/agenta/sdk/agents/dtos.py
  • sdks/python/agenta/sdk/agents/errors.py
  • sdks/python/agenta/sdk/agents/interfaces.py
  • sdks/python/agenta/sdk/agents/mcp/__init__.py
  • sdks/python/agenta/sdk/agents/mcp/errors.py
  • sdks/python/agenta/sdk/agents/mcp/interfaces.py
  • sdks/python/agenta/sdk/agents/mcp/models.py
  • sdks/python/agenta/sdk/agents/mcp/parsing.py
  • sdks/python/agenta/sdk/agents/mcp/resolver.py
  • sdks/python/agenta/sdk/agents/mcp/wire.py
  • sdks/python/agenta/sdk/agents/streaming.py
  • sdks/python/agenta/sdk/agents/tools/__init__.py
  • sdks/python/agenta/sdk/agents/tools/compat.py
  • sdks/python/agenta/sdk/agents/tools/errors.py
  • sdks/python/agenta/sdk/agents/tools/interfaces.py
  • sdks/python/agenta/sdk/agents/tools/models.py
  • sdks/python/agenta/sdk/agents/tools/parsing.py
  • sdks/python/agenta/sdk/agents/tools/resolver.py
  • sdks/python/agenta/sdk/agents/tools/wire.py
  • sdks/python/agenta/sdk/agents/ui_messages.py
  • sdks/python/agenta/sdk/agents/utils/__init__.py
  • sdks/python/agenta/sdk/agents/utils/ts_runner.py
  • sdks/python/agenta/sdk/agents/utils/wire.py
  • sdks/python/agenta/sdk/decorators/routing.py
  • sdks/python/agenta/sdk/engines/running/interfaces.py
  • sdks/python/agenta/sdk/engines/running/utils.py
  • sdks/python/agenta/sdk/middlewares/running/normalizer.py
  • sdks/python/agenta/sdk/models/workflows.py
  • sdks/python/agenta/sdk/utils/types.py
  • sdks/python/agenta/tests/agents/test_streaming.py
  • sdks/python/oss/tests/pytest/integration/agents/__init__.py
  • sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py
  • sdks/python/oss/tests/pytest/unit/agents/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/conftest.py
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.json
  • sdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.json
  • sdks/python/oss/tests/pytest/unit/agents/mcp/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.py
  • sdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/oss/tests/pytest/unit/agents/test_ui_messages.py
  • sdks/python/oss/tests/pytest/unit/agents/test_wire_contract.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/__init__.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/test_models.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py
  • sdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.py
  • sdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.py
  • sdks/python/oss/tests/pytest/utils/test_messages_endpoint.py
  • sdks/python/oss/tests/pytest/utils/test_routing.py

AgentaHarness,
ClaudeHarness,
InProcessPiBackend,
LocalBackend,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid exporting LocalBackend as stable public API until it is implemented.

LocalBackend is currently a guaranteed runtime failure path (create_sandbox/create_session raise NotImplementedError in sdks/python/agenta/sdk/agents/adapters/local.py). Re-exporting it here makes that incomplete adapter look production-ready.

Prefer removing it from public exports for now, or clearly gating it as experimental/internal.

Also applies to: 178-178

Comment on lines +53 to +77
def __init__(
self,
backend: "InProcessPiBackend",
config: HarnessAgentConfig,
*,
secrets: Optional[Mapping[str, str]],
trace: Optional[TraceContext],
session_id: Optional[str],
) -> None:
self._backend = backend
self._config = config
self._secrets = dict(secrets or {})
self._trace = trace
self._session_id = session_id

@property
def id(self) -> Optional[str]:
return self._session_id

def _wire_payload(self, messages: Sequence[Message]) -> Dict[str, Any]:
"""The ``/run`` request JSON for this turn (shared by ``prompt`` and ``stream``)."""
return request_to_wire(
engine=InProcessPiBackend._ENGINE,
harness=HarnessType.PI,
sandbox="local",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the requested harness type in the wire payload.

create_session accepts harness, but the session drops it and _wire_payload always sends HarnessType.PI (Line 76). For Agenta runs, this serializes the wrong harness across the backend boundary.

Suggested fix
 class InProcessPiSession(Session):
@@
     def __init__(
         self,
         backend: "InProcessPiBackend",
         config: HarnessAgentConfig,
         *,
+        harness: HarnessType,
         secrets: Optional[Mapping[str, str]],
         trace: Optional[TraceContext],
         session_id: Optional[str],
     ) -> None:
         self._backend = backend
         self._config = config
+        self._harness = harness
         self._secrets = dict(secrets or {})
         self._trace = trace
         self._session_id = session_id
@@
         return request_to_wire(
             engine=InProcessPiBackend._ENGINE,
-            harness=HarnessType.PI,
+            harness=self._harness,
             sandbox="local",
             config=self._config,
             messages=messages,
             secrets=self._secrets,
             trace=self._trace,
             session_id=self._session_id,
         )
@@
     async def create_session(
@@
     ) -> InProcessPiSession:
         return InProcessPiSession(
             self,
             config,
+            harness=harness,
             secrets=secrets,
             trace=trace,
             session_id=session_id,
         )

Also applies to: 137-153

Comment on lines +27 to +48
supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})

async def create_sandbox(self) -> Sandbox:
raise NotImplementedError(
"LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
"Phase 4: Claude via claude-agent-sdk)."
)

async def create_session(
self,
sandbox: Sandbox,
config: HarnessAgentConfig,
*,
harness: HarnessType,
secrets: Optional[Mapping[str, str]] = None,
trace: Optional[TraceContext] = None,
session_id: Optional[str] = None,
) -> Session:
raise NotImplementedError(
"LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, "
"Phase 4: Claude via claude-agent-sdk)."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid advertising harness support before implementation exists.

LocalBackend declares PI/CLAUDE in supported_harnesses (Line 27), but both creation methods always raise NotImplementedError (Lines 30-48). This defers failure to runtime instead of failing fast on compatibility checks.

Suggested fail-fast adjustment
-    supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+    supported_harnesses = frozenset()

Comment on lines +158 to +169
async def load_session_endpoint(req: Request, request: LoadSessionRequest):
messages = await store.load(request.session_id)
response = LoadSessionResponse(
session_id=request.session_id,
messages=[
message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
for idx, message in enumerate(messages, start=1)
],
)
return set_vercel_message_protocol_headers(
JSONResponse(content=response.model_dump(mode="json"))
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate session_id in /load-session before hitting SessionStore.

Line 159 forwards raw request.session_id to store.load(...) without the same charset/length gate used by /messages (Lines 84-93). This creates an inconsistent trust boundary and can expose storage adapters to unsafe identifiers.

Suggested patch
 async def load_session_endpoint(req: Request, request: LoadSessionRequest):
-        messages = await store.load(request.session_id)
+        session_id = resolve_session_id(request.session_id)
+        if session_id is None:
+            return set_vercel_message_protocol_headers(
+                JSONResponse(
+                    status_code=400,
+                    content={
+                        "detail": "session_id violates the allowed charset/length"
+                    },
+                )
+            )
+        messages = await store.load(session_id)
         response = LoadSessionResponse(
-            session_id=request.session_id,
+            session_id=session_id,
             messages=[
                 message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
                 for idx, message in enumerate(messages, start=1)
             ],
         )


# Permission policy for harness tool use in a headless run. ``auto`` approves (tools are
# backend-resolved and trusted, no human to prompt); ``deny`` rejects.
PermissionPolicy = str # "auto" | "deny"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate permission_policy instead of accepting arbitrary strings

PermissionPolicy is documented as "auto" | "deny" but currently typed as str, so invalid values flow through until downstream failure. Enforce this at DTO boundaries.

Proposed fix
-from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union
+from typing import Any, Callable, ClassVar, Dict, List, Literal, Optional, Tuple, Union
@@
-PermissionPolicy = str  # "auto" | "deny"
+PermissionPolicy = Literal["auto", "deny"]

Also applies to: 363-379, 502-503, 559-559

Comment on lines +53 to +55
if "needs_approval" in source:
result["needs_approval"] = bool(source["needs_approval"])
if isinstance(source.get("render"), dict):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

needs_approval coercion is semantically wrong for string inputs.

Line 54 uses bool(source["needs_approval"]), so values like "false" become True. That flips approval gating behavior for legacy payloads.

Proposed fix
 def _copy_tool_metadata(
     source: dict[str, Any], target: dict[str, Any]
 ) -> dict[str, Any]:
     result = dict(target)
     if "needs_approval" in source:
-        result["needs_approval"] = bool(source["needs_approval"])
+        result["needs_approval"] = source["needs_approval"]
     if isinstance(source.get("render"), dict):
         result["render"] = dict(source["render"])
     return result

Comment on lines +125 to +127
if on_error == "raise":
raise error
diagnostics.append(ToolConfigDiagnostic(index=index, message=str(error)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate on_error at runtime to prevent silent fallback behavior.

If callers pass an invalid value (e.g., typo), current logic silently behaves like "collect". Fail fast to avoid hidden parse-policy changes.

Proposed fix
 def coerce_tool_configs(
     values: Optional[Sequence[Any]],
     *,
     on_error: Literal["raise", "collect"] = "raise",
 ) -> ToolConfigParseResult:
     """Convert legacy values, either raising or returning structured diagnostics."""
+    if on_error not in {"raise", "collect"}:
+        raise ValueError("on_error must be 'raise' or 'collect'")
+
     tool_configs: list[ToolConfig] = []
     diagnostics: list[ToolConfigDiagnostic] = []

Comment on lines +29 to +33
if response.status_code >= 500:
raise RuntimeError(
f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
)
return response.json()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle all non-2xx HTTP statuses as transport failures.

Only 5xx is handled today; 4xx responses fall through and may surface as opaque JSON parse errors instead of clear runner failures.

Proposed fix
-    if response.status_code >= 500:
+    if response.status_code >= 400:
         raise RuntimeError(
             f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
         )
@@
-            if response.status_code >= 500:
+            if response.status_code >= 400:
                 body = await response.aread()
                 raise RuntimeError(
                     f"Agent runner HTTP {response.status_code}: {body[:1000]!r}"
                 )

Also applies to: 108-113

Comment on lines +113 to +117
async for line in response.aiter_lines():
line = line.strip()
if line:
yield json.loads(line)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce a terminal stream result (or raise a transport error).

Both streaming transports can end cleanly when the runner disconnects/exits early, which leaves downstream AgentRun without a terminal result and can hide backend failures.

Proposed fix
 async def deliver_http_stream(
@@
-    async with httpx.AsyncClient(timeout=timeout) as client:
+    saw_result = False
+    async with httpx.AsyncClient(timeout=timeout) as client:
         async with client.stream(
             "POST", url, json=payload, headers=headers
         ) as response:
@@
             async for line in response.aiter_lines():
                 line = line.strip()
                 if line:
-                    yield json.loads(line)
+                    record = json.loads(line)
+                    if record.get("kind") == "result":
+                        saw_result = True
+                    yield record
+            if not saw_result:
+                raise RuntimeError(
+                    "Agent runner stream ended without a terminal result record"
+                )
@@
 async def deliver_subprocess_stream(
@@
-    try:
+    saw_result = False
+    try:
         while True:
@@
             line = raw.decode("utf-8", "replace").strip()
             if line:
-                yield json.loads(line)
+                record = json.loads(line)
+                if record.get("kind") == "result":
+                    saw_result = True
+                yield record
         await proc.wait()
+        err = (await proc.stderr.read()).decode("utf-8", "replace")
+        if proc.returncode not in (0, None):
+            raise RuntimeError(
+                f"Agent runner stream failed. exit={proc.returncode} stderr={err[-2000:]}"
+            )
+        if not saw_result:
+            raise RuntimeError(
+                f"Agent runner stream ended without terminal result. stderr={err[-2000:]}"
+            )
     finally:
         if proc.returncode is None:
             proc.kill()
             await proc.wait()

Also applies to: 147-160

Comment on lines +195 to +199
text = res.text
assert '"sessionId": "sess_abc"' in text # stamped onto the start part
assert '"type": "text-delta"' in text
assert "data: [DONE]" in text

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the SSE session-id check structure-aware instead of whitespace-dependent.

Line 196 matches a literal JSON substring ('"sessionId": "sess_abc"'), which can fail on harmless serializer formatting changes.

Suggested test hardening
     text = res.text
-    assert '"sessionId": "sess_abc"' in text  # stamped onto the start part
+    payloads = [
+        json.loads(line.removeprefix("data: "))
+        for line in text.splitlines()
+        if line.startswith("data: ") and line != "data: [DONE]"
+    ]
+    start = next(p for p in payloads if p.get("type") == "start")
+    assert start["messageMetadata"]["sessionId"] == "sess_abc"
     assert '"type": "text-delta"' in text
     assert "data: [DONE]" in text

@mmabrouk

Copy link
Copy Markdown
Member Author

Reviewer guide: interesting code

A few spots worth landing on first:

  • sdks/python/agenta/sdk/agents/interfaces.py:140Backend.supported_harnesses is the single source of truth for what an engine can drive; Harness.__init__ validates against it before any run.
  • sdks/python/agenta/sdk/agents/interfaces.py:111NoopSessionStore returns empty history and discards writes, which is the port-only seam behind /load-session until a real store lands.
  • sdks/python/agenta/sdk/agents/dtos.py:524AgentaAgentConfig extends PiAgentConfig and only adds forced skills, which is the cleanest read on "Agenta is Pi with an opinion".
  • sdks/python/agenta/sdk/agents/adapters/in_process.py:118InProcessPiBackend is the reference backend; note it is deliberately not a subclass of RivetBackend even though they share wire helpers.
  • sdks/python/agenta/sdk/agents/adapters/harnesses.py:85ClaudeHarness drops Pi built-in tool names with a warning, because built-ins are a Pi concept Claude cannot honor.
  • sdks/python/agenta/sdk/agents/adapters/vercel/routing.py:43resolve_session_id mints, echoes, or rejects the session id against a bounded charset; this is where session_id enters the run as a correlation value.

"""

#: The single source of truth for what this engine can run.
supported_harnesses: ClassVar[FrozenSet[HarnessType]] = frozenset()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class var is the one place an engine declares its supported harnesses. The split below keeps backends as pure plumbing: they never branch on a harness name, they only check membership here.


def __init__(self, environment: Environment) -> None:
if not environment.backend.supports(self.harness_type):
raise UnsupportedHarnessError(self.harness_type, environment.backend)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation happens at harness construction, before any sandbox or session exists, so an unsupported backend/harness pairing fails fast rather than mid-run.

# carried through.
if config.builtin_names:
log.warning(
"ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept",

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth confirming a warning is the right level here. A config that names Pi built-ins but runs on Claude silently loses those tools; a stored agent could behave differently across harnesses without an obvious signal.

return response


def resolve_session_id(session_id: Optional[str]) -> Optional[str]:

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only gate on the session id. Returning None on an invalid id drives the 400 in the endpoint; a minted id uses sess_ + uuid4 hex, which stays inside the allowed charset.

@mmabrouk mmabrouk left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex subagent review for #4771

Findings:

  • Blocking: sdks/python/agenta/sdk/agents/adapters/rivet.py:36 and sdks/python/agenta/sdk/agents/adapters/in_process.py:36 make the default runner-backed path point at pnpm exec tsx src/cli.ts, but this PR does not add services/agent/src/cli.ts or the runner package. The public SDK example also uses RivetBackend() with no url, command, or cwd (sdks/python/agenta/sdk/agents/__init__.py:19), while the integration test only proves transport behavior by injecting a fake Python runner (sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py:81). Merged alone, and for #4772 stacked on it, the advertised default SDK runtime fails before any harness starts unless later runner assets from #4773/#4778 are present and the process cwd happens to be right. Please either require an explicit url/command until the runner lands, stack/retarget this runtime on the runner PR, or include the runnable runner assets plus an end-to-end test that exercises the default path.

  • sdks/python/agenta/sdk/agents/dtos.py:680 drops default tools whenever a dedicated agent dict is present but omits tools. The from_params docstring says unset fields fall back to defaults, and the MCP/harness-option paths do that, but this branch returns None; the constructor then passes tools=_as_list(None) and silently clears defaults.tools. A partial override such as { "agent": { "model": "..." } } will run tool-free. Please fall back to defaults.tools when the key is absent and add a partial-agent test.

Stack note: #4771 does contain the Python utils/wire.py serializer and golden fixtures. #4773 still advertises independence from main, but its protocol docs point at those SDK files and one advertised test imports src/engines/pi.ts, which only lands in the later runner-engine PR. Please align the stack-nav/review map so reviewers know which PR supplies the wire fixtures and runner assets.

I did not run tests locally; this review used the GitHub patch/head files.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5388d34f-2c5e-4260-b4e6-13176aece5f9

📥 Commits

Reviewing files that changed from the base of the PR and between b9e62f9 and 741fc73.

📒 Files selected for processing (9)
  • sdks/python/agenta/sdk/agents/__init__.py
  • sdks/python/agenta/sdk/agents/adapters/_runner_config.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/dtos.py
  • sdks/python/agenta/sdk/agents/errors.py
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
  • sdks/python/agenta/sdk/agents/init.py
  • sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
  • sdks/python/agenta/sdk/agents/adapters/in_process.py
  • sdks/python/agenta/sdk/agents/adapters/rivet.py
  • sdks/python/agenta/sdk/agents/dtos.py

Comment on lines +21 to +24
if url:
return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
if command is not None:
return list(command)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject empty command at config time.

At Line 22 and Line 24, command=[] is accepted and propagated as _command, which creates an unusable subprocess transport and fails later at runtime. Validate non-empty command in resolve_runner_command so misconfiguration fails fast with AgentRunnerConfigurationError.

Suggested fix
 def resolve_runner_command(
@@
 ) -> List[str]:
+    def _validated_command(raw: Sequence[str]) -> List[str]:
+        cmd = list(raw)
+        if not cmd:
+            raise AgentRunnerConfigurationError(
+                f"{backend_name} received an empty command. "
+                "Pass a non-empty command, pass url for an HTTP runner, "
+                f"or set cwd to a runner wrapper containing {RUNNER_CLI_PATH.as_posix()}."
+            )
+        return cmd
+
     if url:
-        return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
+        return _validated_command(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
     if command is not None:
-        return list(command)
+        return _validated_command(command)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend feature python Pull requests that update Python code SDK size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant