feat(sdk): agent runtime behind backend/harness ports#4771
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughIntroduces a Python agents runtime subsystem with DTOs, MCP/tools resolution, runtime ports, streaming, runner transport, backend and harness adapters, Vercel message/SSE routing, agent-specific SDK wiring, and tests plus golden fixtures for the new request/result paths. ChangesAgents Subsystem
Sequence Diagram(s)sequenceDiagram
participant Client as Vercel AI SDK
participant MessagesEndpoint as POST /messages
participant Harness as PiHarness/ClaudeHarness/AgentaHarness
participant Backend as InProcessPiBackend/RivetBackend
participant TSRunner as TypeScript Runner
participant VercelStream as agent_run_to_vercel_parts
Client->>MessagesEndpoint: {messages, session_id, stream:true}
MessagesEndpoint->>MessagesEndpoint: resolve/mint session_id
MessagesEndpoint->>MessagesEndpoint: vercel_ui_messages_to_messages
MessagesEndpoint->>Harness: stream(session_config, messages)
Harness->>Harness: _to_harness_config(session_config)
Harness->>Backend: create_session(sandbox, harness_config)
Backend->>TSRunner: deliver_subprocess_stream(payload)
TSRunner-->>Backend: NDJSON records
Backend-->>Harness: AgentRun
Harness-->>MessagesEndpoint: WorkflowStreamingResponse(AgentRun)
MessagesEndpoint->>VercelStream: agent_run_to_vercel_parts(AgentRun)
VercelStream-->>Client: SSE data: {start} ... {text-delta} ... {finish}\ndata: [DONE]
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting codeA few pointers to the load-bearing decisions, so review time goes to the parts that matter.
|
| harness_type: ClassVar[HarnessType] | ||
|
|
||
| def __init__(self, environment: Environment) -> None: | ||
| if not environment.backend.supports(self.harness_type): |
There was a problem hiding this comment.
This is the validation matrix gate: a Harness can only wrap an Environment whose backend lists its harness_type in supported_harnesses. ClaudeHarness over InProcessPiBackend, or AgentaHarness over RivetBackend, raises here at construction.
| # Claude has no Pi built-in tools; drop them rather than ship a name Claude cannot | ||
| # honor. Tools go over MCP, and Claude gates tool use, so the permission policy is | ||
| # carried through. | ||
| if config.builtin_names: |
There was a problem hiding this comment.
Claude has no Pi built-in tools, so they are dropped with a warning rather than shipped as a name Claude cannot honor. This is the cleanest example of an adapter sending only what its harness understands.
| from .messages import message_to_vercel_ui_message, vercel_ui_messages_to_messages | ||
|
|
||
| # An opaque, project-scoped session id (RFC §4.1): bounded length, restricted charset. | ||
| _SESSION_ID_RE = re.compile(r"^[A-Za-z0-9._:-]{1,128}$") |
There was a problem hiding this comment.
session_id is a project-scoped opaque token validated against a bounded charset/length and minted when absent, carried as an envelope field rather than a header. Worth confirming the charset is wide enough for the platform's id format.
| ) | ||
| consumed.add(name) | ||
|
|
||
| elif name == "session_id": |
There was a problem hiding this comment.
This maps the request envelope's session_id into a handler parameter of the same name, which is how the /messages session threads into the agent handler without living in request.data.inputs.
There was a problem hiding this comment.
Actionable comments posted: 14
🧹 Nitpick comments (2)
sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py (1)
86-96: ⚡ Quick winAdd regression coverage for
agentshape with missingtoolsPlease add a test where
{"agent": {"instructions": "I"}}+ defaults verifiestoolsstill inherit from defaults. This would have caught the current fallback bug.sdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.py (1)
36-46: ⚡ Quick winAdd a regression test for string
needs_approvalvalues.Given legacy payloads may carry
"false"/"true"as strings, add a case asserting"false"does not becomeTrueafter coercion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 3746d41f-c884-49c4-8834-df9bd68dfb03
📒 Files selected for processing (68)
sdks/python/agenta/__init__.pysdks/python/agenta/sdk/agents/__init__.pysdks/python/agenta/sdk/agents/adapters/__init__.pysdks/python/agenta/sdk/agents/adapters/agenta_builtins.pysdks/python/agenta/sdk/agents/adapters/harnesses.pysdks/python/agenta/sdk/agents/adapters/in_process.pysdks/python/agenta/sdk/agents/adapters/local.pysdks/python/agenta/sdk/agents/adapters/rivet.pysdks/python/agenta/sdk/agents/adapters/vercel/__init__.pysdks/python/agenta/sdk/agents/adapters/vercel/messages.pysdks/python/agenta/sdk/agents/adapters/vercel/routing.pysdks/python/agenta/sdk/agents/adapters/vercel/sse.pysdks/python/agenta/sdk/agents/adapters/vercel/stream.pysdks/python/agenta/sdk/agents/dtos.pysdks/python/agenta/sdk/agents/errors.pysdks/python/agenta/sdk/agents/interfaces.pysdks/python/agenta/sdk/agents/mcp/__init__.pysdks/python/agenta/sdk/agents/mcp/errors.pysdks/python/agenta/sdk/agents/mcp/interfaces.pysdks/python/agenta/sdk/agents/mcp/models.pysdks/python/agenta/sdk/agents/mcp/parsing.pysdks/python/agenta/sdk/agents/mcp/resolver.pysdks/python/agenta/sdk/agents/mcp/wire.pysdks/python/agenta/sdk/agents/streaming.pysdks/python/agenta/sdk/agents/tools/__init__.pysdks/python/agenta/sdk/agents/tools/compat.pysdks/python/agenta/sdk/agents/tools/errors.pysdks/python/agenta/sdk/agents/tools/interfaces.pysdks/python/agenta/sdk/agents/tools/models.pysdks/python/agenta/sdk/agents/tools/parsing.pysdks/python/agenta/sdk/agents/tools/resolver.pysdks/python/agenta/sdk/agents/tools/wire.pysdks/python/agenta/sdk/agents/ui_messages.pysdks/python/agenta/sdk/agents/utils/__init__.pysdks/python/agenta/sdk/agents/utils/ts_runner.pysdks/python/agenta/sdk/agents/utils/wire.pysdks/python/agenta/sdk/decorators/routing.pysdks/python/agenta/sdk/engines/running/interfaces.pysdks/python/agenta/sdk/engines/running/utils.pysdks/python/agenta/sdk/middlewares/running/normalizer.pysdks/python/agenta/sdk/models/workflows.pysdks/python/agenta/sdk/utils/types.pysdks/python/agenta/tests/agents/test_streaming.pysdks/python/oss/tests/pytest/integration/agents/__init__.pysdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.pysdks/python/oss/tests/pytest/unit/agents/__init__.pysdks/python/oss/tests/pytest/unit/agents/conftest.pysdks/python/oss/tests/pytest/unit/agents/golden/run_request.claude.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_request.pi.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.error.jsonsdks/python/oss/tests/pytest/unit/agents/golden/run_result.ok.jsonsdks/python/oss/tests/pytest/unit/agents/mcp/__init__.pysdks/python/oss/tests/pytest/unit/agents/mcp/test_resolver.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_capabilities_events.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_content_blocks.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_harness_configs.pysdks/python/oss/tests/pytest/unit/agents/test_environment_lifecycle.pysdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.pysdks/python/oss/tests/pytest/unit/agents/test_ui_messages.pysdks/python/oss/tests/pytest/unit/agents/test_wire_contract.pysdks/python/oss/tests/pytest/unit/agents/tools/__init__.pysdks/python/oss/tests/pytest/unit/agents/tools/test_models.pysdks/python/oss/tests/pytest/unit/agents/tools/test_parsing.pysdks/python/oss/tests/pytest/unit/agents/tools/test_resolver.pysdks/python/oss/tests/pytest/unit/test_normalizer_passthrough.pysdks/python/oss/tests/pytest/utils/test_messages_endpoint.pysdks/python/oss/tests/pytest/utils/test_routing.py
| AgentaHarness, | ||
| ClaudeHarness, | ||
| InProcessPiBackend, | ||
| LocalBackend, |
There was a problem hiding this comment.
Avoid exporting LocalBackend as stable public API until it is implemented.
LocalBackend is currently a guaranteed runtime failure path (create_sandbox/create_session raise NotImplementedError in sdks/python/agenta/sdk/agents/adapters/local.py). Re-exporting it here makes that incomplete adapter look production-ready.
Prefer removing it from public exports for now, or clearly gating it as experimental/internal.
Also applies to: 178-178
| def __init__( | ||
| self, | ||
| backend: "InProcessPiBackend", | ||
| config: HarnessAgentConfig, | ||
| *, | ||
| secrets: Optional[Mapping[str, str]], | ||
| trace: Optional[TraceContext], | ||
| session_id: Optional[str], | ||
| ) -> None: | ||
| self._backend = backend | ||
| self._config = config | ||
| self._secrets = dict(secrets or {}) | ||
| self._trace = trace | ||
| self._session_id = session_id | ||
|
|
||
| @property | ||
| def id(self) -> Optional[str]: | ||
| return self._session_id | ||
|
|
||
| def _wire_payload(self, messages: Sequence[Message]) -> Dict[str, Any]: | ||
| """The ``/run`` request JSON for this turn (shared by ``prompt`` and ``stream``).""" | ||
| return request_to_wire( | ||
| engine=InProcessPiBackend._ENGINE, | ||
| harness=HarnessType.PI, | ||
| sandbox="local", |
There was a problem hiding this comment.
Preserve the requested harness type in the wire payload.
create_session accepts harness, but the session drops it and _wire_payload always sends HarnessType.PI (Line 76). For Agenta runs, this serializes the wrong harness across the backend boundary.
Suggested fix
class InProcessPiSession(Session):
@@
def __init__(
self,
backend: "InProcessPiBackend",
config: HarnessAgentConfig,
*,
+ harness: HarnessType,
secrets: Optional[Mapping[str, str]],
trace: Optional[TraceContext],
session_id: Optional[str],
) -> None:
self._backend = backend
self._config = config
+ self._harness = harness
self._secrets = dict(secrets or {})
self._trace = trace
self._session_id = session_id
@@
return request_to_wire(
engine=InProcessPiBackend._ENGINE,
- harness=HarnessType.PI,
+ harness=self._harness,
sandbox="local",
config=self._config,
messages=messages,
secrets=self._secrets,
trace=self._trace,
session_id=self._session_id,
)
@@
async def create_session(
@@
) -> InProcessPiSession:
return InProcessPiSession(
self,
config,
+ harness=harness,
secrets=secrets,
trace=trace,
session_id=session_id,
)Also applies to: 137-153
| supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE}) | ||
|
|
||
| async def create_sandbox(self) -> Sandbox: | ||
| raise NotImplementedError( | ||
| "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, " | ||
| "Phase 4: Claude via claude-agent-sdk)." | ||
| ) | ||
|
|
||
| async def create_session( | ||
| self, | ||
| sandbox: Sandbox, | ||
| config: HarnessAgentConfig, | ||
| *, | ||
| harness: HarnessType, | ||
| secrets: Optional[Mapping[str, str]] = None, | ||
| trace: Optional[TraceContext] = None, | ||
| session_id: Optional[str] = None, | ||
| ) -> Session: | ||
| raise NotImplementedError( | ||
| "LocalBackend is not implemented yet (Phase 3: Pi via bundled JS, " | ||
| "Phase 4: Claude via claude-agent-sdk)." | ||
| ) |
There was a problem hiding this comment.
Avoid advertising harness support before implementation exists.
LocalBackend declares PI/CLAUDE in supported_harnesses (Line 27), but both creation methods always raise NotImplementedError (Lines 30-48). This defers failure to runtime instead of failing fast on compatibility checks.
Suggested fail-fast adjustment
- supported_harnesses = frozenset({HarnessType.PI, HarnessType.CLAUDE})
+ supported_harnesses = frozenset()| async def load_session_endpoint(req: Request, request: LoadSessionRequest): | ||
| messages = await store.load(request.session_id) | ||
| response = LoadSessionResponse( | ||
| session_id=request.session_id, | ||
| messages=[ | ||
| message_to_vercel_ui_message(message, message_id=f"msg-{idx}") | ||
| for idx, message in enumerate(messages, start=1) | ||
| ], | ||
| ) | ||
| return set_vercel_message_protocol_headers( | ||
| JSONResponse(content=response.model_dump(mode="json")) | ||
| ) |
There was a problem hiding this comment.
Validate session_id in /load-session before hitting SessionStore.
Line 159 forwards raw request.session_id to store.load(...) without the same charset/length gate used by /messages (Lines 84-93). This creates an inconsistent trust boundary and can expose storage adapters to unsafe identifiers.
Suggested patch
async def load_session_endpoint(req: Request, request: LoadSessionRequest):
- messages = await store.load(request.session_id)
+ session_id = resolve_session_id(request.session_id)
+ if session_id is None:
+ return set_vercel_message_protocol_headers(
+ JSONResponse(
+ status_code=400,
+ content={
+ "detail": "session_id violates the allowed charset/length"
+ },
+ )
+ )
+ messages = await store.load(session_id)
response = LoadSessionResponse(
- session_id=request.session_id,
+ session_id=session_id,
messages=[
message_to_vercel_ui_message(message, message_id=f"msg-{idx}")
for idx, message in enumerate(messages, start=1)
],
)|
|
||
| # Permission policy for harness tool use in a headless run. ``auto`` approves (tools are | ||
| # backend-resolved and trusted, no human to prompt); ``deny`` rejects. | ||
| PermissionPolicy = str # "auto" | "deny" |
There was a problem hiding this comment.
Validate permission_policy instead of accepting arbitrary strings
PermissionPolicy is documented as "auto" | "deny" but currently typed as str, so invalid values flow through until downstream failure. Enforce this at DTO boundaries.
Proposed fix
-from typing import Any, Callable, ClassVar, Dict, List, Optional, Tuple, Union
+from typing import Any, Callable, ClassVar, Dict, List, Literal, Optional, Tuple, Union
@@
-PermissionPolicy = str # "auto" | "deny"
+PermissionPolicy = Literal["auto", "deny"]Also applies to: 363-379, 502-503, 559-559
| if "needs_approval" in source: | ||
| result["needs_approval"] = bool(source["needs_approval"]) | ||
| if isinstance(source.get("render"), dict): |
There was a problem hiding this comment.
needs_approval coercion is semantically wrong for string inputs.
Line 54 uses bool(source["needs_approval"]), so values like "false" become True. That flips approval gating behavior for legacy payloads.
Proposed fix
def _copy_tool_metadata(
source: dict[str, Any], target: dict[str, Any]
) -> dict[str, Any]:
result = dict(target)
if "needs_approval" in source:
- result["needs_approval"] = bool(source["needs_approval"])
+ result["needs_approval"] = source["needs_approval"]
if isinstance(source.get("render"), dict):
result["render"] = dict(source["render"])
return result| if on_error == "raise": | ||
| raise error | ||
| diagnostics.append(ToolConfigDiagnostic(index=index, message=str(error))) |
There was a problem hiding this comment.
Validate on_error at runtime to prevent silent fallback behavior.
If callers pass an invalid value (e.g., typo), current logic silently behaves like "collect". Fail fast to avoid hidden parse-policy changes.
Proposed fix
def coerce_tool_configs(
values: Optional[Sequence[Any]],
*,
on_error: Literal["raise", "collect"] = "raise",
) -> ToolConfigParseResult:
"""Convert legacy values, either raising or returning structured diagnostics."""
+ if on_error not in {"raise", "collect"}:
+ raise ValueError("on_error must be 'raise' or 'collect'")
+
tool_configs: list[ToolConfig] = []
diagnostics: list[ToolConfigDiagnostic] = []| if response.status_code >= 500: | ||
| raise RuntimeError( | ||
| f"Agent runner HTTP {response.status_code}: {response.text[:1000]}" | ||
| ) | ||
| return response.json() |
There was a problem hiding this comment.
Handle all non-2xx HTTP statuses as transport failures.
Only 5xx is handled today; 4xx responses fall through and may surface as opaque JSON parse errors instead of clear runner failures.
Proposed fix
- if response.status_code >= 500:
+ if response.status_code >= 400:
raise RuntimeError(
f"Agent runner HTTP {response.status_code}: {response.text[:1000]}"
)
@@
- if response.status_code >= 500:
+ if response.status_code >= 400:
body = await response.aread()
raise RuntimeError(
f"Agent runner HTTP {response.status_code}: {body[:1000]!r}"
)Also applies to: 108-113
| async for line in response.aiter_lines(): | ||
| line = line.strip() | ||
| if line: | ||
| yield json.loads(line) | ||
|
|
There was a problem hiding this comment.
Enforce a terminal stream result (or raise a transport error).
Both streaming transports can end cleanly when the runner disconnects/exits early, which leaves downstream AgentRun without a terminal result and can hide backend failures.
Proposed fix
async def deliver_http_stream(
@@
- async with httpx.AsyncClient(timeout=timeout) as client:
+ saw_result = False
+ async with httpx.AsyncClient(timeout=timeout) as client:
async with client.stream(
"POST", url, json=payload, headers=headers
) as response:
@@
async for line in response.aiter_lines():
line = line.strip()
if line:
- yield json.loads(line)
+ record = json.loads(line)
+ if record.get("kind") == "result":
+ saw_result = True
+ yield record
+ if not saw_result:
+ raise RuntimeError(
+ "Agent runner stream ended without a terminal result record"
+ )
@@
async def deliver_subprocess_stream(
@@
- try:
+ saw_result = False
+ try:
while True:
@@
line = raw.decode("utf-8", "replace").strip()
if line:
- yield json.loads(line)
+ record = json.loads(line)
+ if record.get("kind") == "result":
+ saw_result = True
+ yield record
await proc.wait()
+ err = (await proc.stderr.read()).decode("utf-8", "replace")
+ if proc.returncode not in (0, None):
+ raise RuntimeError(
+ f"Agent runner stream failed. exit={proc.returncode} stderr={err[-2000:]}"
+ )
+ if not saw_result:
+ raise RuntimeError(
+ f"Agent runner stream ended without terminal result. stderr={err[-2000:]}"
+ )
finally:
if proc.returncode is None:
proc.kill()
await proc.wait()Also applies to: 147-160
| text = res.text | ||
| assert '"sessionId": "sess_abc"' in text # stamped onto the start part | ||
| assert '"type": "text-delta"' in text | ||
| assert "data: [DONE]" in text | ||
|
|
There was a problem hiding this comment.
Make the SSE session-id check structure-aware instead of whitespace-dependent.
Line 196 matches a literal JSON substring ('"sessionId": "sess_abc"'), which can fail on harmless serializer formatting changes.
Suggested test hardening
text = res.text
- assert '"sessionId": "sess_abc"' in text # stamped onto the start part
+ payloads = [
+ json.loads(line.removeprefix("data: "))
+ for line in text.splitlines()
+ if line.startswith("data: ") and line != "data: [DONE]"
+ ]
+ start = next(p for p in payloads if p.get("type") == "start")
+ assert start["messageMetadata"]["sessionId"] == "sess_abc"
assert '"type": "text-delta"' in text
assert "data: [DONE]" in text
Reviewer guide: interesting codeA few spots worth landing on first:
|
| """ | ||
|
|
||
| #: The single source of truth for what this engine can run. | ||
| supported_harnesses: ClassVar[FrozenSet[HarnessType]] = frozenset() |
There was a problem hiding this comment.
This class var is the one place an engine declares its supported harnesses. The split below keeps backends as pure plumbing: they never branch on a harness name, they only check membership here.
|
|
||
| def __init__(self, environment: Environment) -> None: | ||
| if not environment.backend.supports(self.harness_type): | ||
| raise UnsupportedHarnessError(self.harness_type, environment.backend) |
There was a problem hiding this comment.
Validation happens at harness construction, before any sandbox or session exists, so an unsupported backend/harness pairing fails fast rather than mid-run.
| # carried through. | ||
| if config.builtin_names: | ||
| log.warning( | ||
| "ClaudeHarness ignores %d built-in tool(s); built-ins are a Pi concept", |
There was a problem hiding this comment.
Worth confirming a warning is the right level here. A config that names Pi built-ins but runs on Claude silently loses those tools; a stored agent could behave differently across harnesses without an obvious signal.
| return response | ||
|
|
||
|
|
||
| def resolve_session_id(session_id: Optional[str]) -> Optional[str]: |
There was a problem hiding this comment.
This is the only gate on the session id. Returning None on an invalid id drives the 400 in the endpoint; a minted id uses sess_ + uuid4 hex, which stays inside the allowed charset.
mmabrouk
left a comment
There was a problem hiding this comment.
Codex subagent review for #4771
Findings:
-
Blocking:
sdks/python/agenta/sdk/agents/adapters/rivet.py:36andsdks/python/agenta/sdk/agents/adapters/in_process.py:36make the default runner-backed path point atpnpm exec tsx src/cli.ts, but this PR does not addservices/agent/src/cli.tsor the runner package. The public SDK example also usesRivetBackend()with nourl,command, orcwd(sdks/python/agenta/sdk/agents/__init__.py:19), while the integration test only proves transport behavior by injecting a fake Python runner (sdks/python/oss/tests/pytest/integration/agents/test_transport_roundtrip.py:81). Merged alone, and for #4772 stacked on it, the advertised default SDK runtime fails before any harness starts unless later runner assets from #4773/#4778 are present and the process cwd happens to be right. Please either require an expliciturl/commanduntil the runner lands, stack/retarget this runtime on the runner PR, or include the runnable runner assets plus an end-to-end test that exercises the default path. -
sdks/python/agenta/sdk/agents/dtos.py:680drops default tools whenever a dedicatedagentdict is present but omitstools. Thefrom_paramsdocstring says unset fields fall back todefaults, and the MCP/harness-option paths do that, but this branch returnsNone; the constructor then passestools=_as_list(None)and silently clearsdefaults.tools. A partial override such as{ "agent": { "model": "..." } }will run tool-free. Please fall back todefaults.toolswhen the key is absent and add a partial-agent test.
Stack note: #4771 does contain the Python utils/wire.py serializer and golden fixtures. #4773 still advertises independence from main, but its protocol docs point at those SDK files and one advertised test imports src/engines/pi.ts, which only lands in the later runner-engine PR. Please align the stack-nav/review map so reviewers know which PR supplies the wire fixtures and runner assets.
I did not run tests locally; this review used the GitHub patch/head files.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 5388d34f-2c5e-4260-b4e6-13176aece5f9
📒 Files selected for processing (9)
sdks/python/agenta/sdk/agents/__init__.pysdks/python/agenta/sdk/agents/adapters/_runner_config.pysdks/python/agenta/sdk/agents/adapters/in_process.pysdks/python/agenta/sdk/agents/adapters/rivet.pysdks/python/agenta/sdk/agents/dtos.pysdks/python/agenta/sdk/agents/errors.pysdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.pysdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.pysdks/python/oss/tests/pytest/unit/agents/test_runner_adapter_config.py
🚧 Files skipped from review as they are similar to previous changes (6)
- sdks/python/oss/tests/pytest/unit/agents/test_dtos_agent_config.py
- sdks/python/agenta/sdk/agents/init.py
- sdks/python/oss/tests/pytest/unit/agents/test_harness_adapters.py
- sdks/python/agenta/sdk/agents/adapters/in_process.py
- sdks/python/agenta/sdk/agents/adapters/rivet.py
- sdks/python/agenta/sdk/agents/dtos.py
| if url: | ||
| return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND) | ||
| if command is not None: | ||
| return list(command) |
There was a problem hiding this comment.
Reject empty command at config time.
At Line 22 and Line 24, command=[] is accepted and propagated as _command, which creates an unusable subprocess transport and fails later at runtime. Validate non-empty command in resolve_runner_command so misconfiguration fails fast with AgentRunnerConfigurationError.
Suggested fix
def resolve_runner_command(
@@
) -> List[str]:
+ def _validated_command(raw: Sequence[str]) -> List[str]:
+ cmd = list(raw)
+ if not cmd:
+ raise AgentRunnerConfigurationError(
+ f"{backend_name} received an empty command. "
+ "Pass a non-empty command, pass url for an HTTP runner, "
+ f"or set cwd to a runner wrapper containing {RUNNER_CLI_PATH.as_posix()}."
+ )
+ return cmd
+
if url:
- return list(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
+ return _validated_command(command) if command is not None else list(DEFAULT_RUNNER_COMMAND)
if command is not None:
- return list(command)
+ return _validated_command(command)
Agent-workflows: functional PR set
Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off
main; two pairs are stacked. This PR's base ismain.Context
The agent runtime turns a stored agent definition into a live coding-agent turn: it picks an engine, shapes the config that engine wants, runs one prompt, and streams the reply back to a browser. This PR puts that whole runtime in the SDK under
sdks/python/agenta/sdk/agents/, structured as ports and adapters. It targetsmainand is independent. It is a functional slice that shows the final code, so the service PR that composes these adapters stacks on top of it.What this changes
The runtime now reads as three layers behind interfaces.
Backendis the engine: it declares which harnesses it can drive and owns sandbox plus session lifecycle.Environmentsits above a backend and owns the sandbox-per-session policy.Harnesssits above an environment and maps a neutral config into one engine's shape.Before, the per-harness knowledge lived in the TypeScript runner, and a caller spoke directly to a transport. Now a caller builds a
SessionConfig, hands it to aHarness, and the harness produces the engine-shaped config that aBackendplumbs to the runner without business logic.PiHarness,ClaudeHarness, andAgentaHarnesseach do different work because the harnesses differ: Pi takes built-in tool names plus native specs and never gates tool use; Claude has no built-ins, delivers tools over MCP, and gates tool use behind a permission policy; Agenta is Pi plus forced skills and a preamble.Two backend adapters drive real engines.
InProcessPiBackendruns Pi in-process through the runner and supportspiandagenta.RivetBackenddrives a harness over ACP and supportspiandclaudeon local or Daytona.LocalBackendis a stub that raisesNotImplementedError.The browser edge is the Vercel
/messagesadapter. It folds inboundUIMessageinput into neutral messages, emits Vercel UI Message Stream parts, stampsx-ag-messages-formatandx-ag-messages-versionheaders, resolves the session id, and routes/load-sessionthrough aSessionStoreport whose only adapter today isNoopSessionStore. The normalizer threadssession_idas a request-envelope field, so it survives the round trip as a correlation value.Key architectural decision to review
The first decision is the ownership split. The SDK owns the runtime ports and the adapters, and the service only composes them (
sdks/python/agenta/sdk/agents/interfaces.py). The tradeoff: a standalone SDK user can drive Pi with no Agenta service, but the service must inject its server-side concerns (gateway tool resolution, the secret vault) through the injected adapter seams rather than reaching into the runtime. Check thatBackend.supported_harnessesstays the single source of truth and thatHarness.__init__rejects an unsupported pairing before any run starts.The second decision is that
session_idis a correlation primitive, not state (adapters/vercel/routing.py,middlewares/running/normalizer.py). The cold runtime still receives the full message history on every turn.resolve_session_idmints, echoes, or rejects the id against a bounded charset, and the id is stamped onto the stream and the envelope, but nothing reads it back as conversation state yet.SessionStoreis a port-only seam:NoopSessionStorereturns empty history and discards writes, so/load-sessionanswers with nothing until a real adapter lands. Confirm this is a deliberate seam and not a dropped write path.How to review this PR
Read
interfaces.pyfirst and fix the three-layer vocabulary in your head:Backend,Environment,Harness, plus theSandbox,Session, andSessionStoreports. Then readdtos.pyfor the shapes that cross those ports, especiallySessionConfig(the run bundle),AgentConfig.harness_options(the per-harness escape hatch), and thePiAgentConfig/ClaudeAgentConfig/AgentaAgentConfigsplit wherewire_toolsdiffers per engine. Then readadapters/harnesses.py,adapters/in_process.py, andadapters/rivet.pyto see the mapping and the two real backends. Readadapters/vercel/routing.pylast for the browser edge.You can skip the
mcp/subpackage and the parsing helpers at the bottom ofdtos.pyon a first pass; they are mechanical. The regression most likely to break is the golden wire contract: a tool-free run's/runpayload must stay byte-identical, so watch any change towire_tools,wire_mcp, orrequest_to_wireagainstgolden/run_request.pi.json.Tests / notes
The suite covers the DTO shapes, the harness adapters and their backend-support validation, the
/messagesand/load-sessionrouting, the tool resolver, and a transport round trip. The wire-contract test pins the runner payload against golden JSON. TheNoopSessionStorepath is verified to return empty and discard, which documents the not-yet-persisted behavior rather than hiding it.