feat(agent): agent workflow service and tool-resolution API#4772
feat(agent): agent workflow service and tool-resolution API#4772mmabrouk wants to merge 1 commit into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Reviewer guide: interesting codeStart here, in this order:
|
| runtime == "rivet" | ||
| or selection.harness not in ("pi", "agenta") | ||
| or selection.sandbox != "local" | ||
| ) |
There was a problem hiding this comment.
This OR decides rivet vs in-process. Worth a careful read: any harness other than pi/agenta, any non-local sandbox, or the runtime override sends the run to rivet. The Agenta-plus-non-local case has no backend and raises downstream rather than silently substituting one. Confirm a local Pi run never trips any of these clauses.
|
|
||
| class VaultToolSecretProvider: | ||
| async def get_many(self, names: Sequence[str]) -> Mapping[str, str]: | ||
| return await resolve_named_secrets(names) |
There was a problem hiding this comment.
This provider is deliberately best-effort: a vault outage logs and returns {} rather than raising. It pairs with the SDK resolver running under MissingSecretPolicy.ERROR, so a code tool whose declared secret is now absent raises MissingToolSecretError. Net effect: an outage downgrades to 'secret missing' and then hard-fails for any tool that needed it, but a project with no secret-bearing tools still runs. Review this layering as one unit.
| ) | ||
| log.warning("agent: %s", error) | ||
| raise error | ||
| reference = _normalize_reference(str(call_ref)) |
There was a problem hiding this comment.
Specs are joined to references by call_ref here, not by list position. With the count check at line 123 and the duplicate-reference guards, a reordered or partial platform response is rejected instead of silently binding a tool to the wrong schema. This is the spot to verify the reference normalization (__ <-> .) matches the slug the service emits.
| provider_key = ToolProviderKind.COMPOSIO.value | ||
|
|
||
| for segment in (ref.integration, ref.action, ref.connection): | ||
| if not _SLUG_SEGMENT_RE.match(segment): |
There was a problem hiding this comment.
The slug-segment regex forbids __ inside a segment because /tools/call round-trips __ <-> . when parsing function names, so a __ in a segment would corrupt the split. Combined with the up-front resolve_connection_by_slug call, a missing/inactive/invalid connection fails the invoke before the agent loop starts. Confirm the regex is the only validation gate on these three segments.
Reviewer guide: interesting codeA map to the load-bearing lines. This is a functional slice on top of
|
| url = os.getenv("AGENTA_AGENT_PI_URL") | ||
| cwd = str(wrapper_dir()) | ||
| use_rivet = ( | ||
| runtime == "rivet" |
There was a problem hiding this comment.
This boolean is the whole routing policy: agenta on a non-local sandbox routes to rivet here even though the Agenta harness can't run there yet. The rejection is deferred to make_harness below, so this function stays a pure mapping that the unit test can lock down. Worth a comment cross-link so a future reader doesn't assume routing here implies support.
| raw_specs = payload.get("custom") if isinstance(payload, dict) else None | ||
| if not isinstance(raw_specs, list): | ||
| raw_specs = [] | ||
| if len(raw_specs) != len(tools): |
There was a problem hiding this comment.
Strict cardinality check: one spec per ref. Combined with the duplicate-ref guards above and the per-ref specs_by_reference.get below, the resolver refuses to run an agent with a tool catalog that doesn't exactly match what was requested. This is the fail-fast half of the secret-resolver asymmetry described in the PR body.
| action.schemas.inputs if action.schemas and action.schemas.inputs else None | ||
| ) | ||
| name = ref.name or f"{ref.integration}__{ref.action}" | ||
| call_ref = ( |
There was a problem hiding this comment.
The call_ref slug is the trust boundary: the platform returns this, never the provider key, and /tools/call parses it back. Note it uses raw __ joins for name two lines up but . for the call_ref; _SLUG_SEGMENT_RE forbids __ inside a segment precisely so the __<->. round-trip in /tools/call can't corrupt the split.
|
|
||
|
|
||
| def record_usage(usage: Optional[Dict[str, Any]]) -> None: | ||
| """Stamp the agent's token/cost totals onto the active ``/invoke`` workflow span. |
There was a problem hiding this comment.
record_usage writes gen_ai.usage.* directly on the active span instead of relying on the cumulative roll-up, because the harness exports its span tree in a separate OTLP batch. Without this the /invoke trace would show zero tokens/cost for the agent run. The early return on a missing total keeps it best-effort.
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
api/oss/tests/pytest/unit/tools/test_agent_resolution.py (1)
21-48: ⚡ Quick winAdd regression coverage for provider and payload-shape edge cases
Please add tests for: (1) a gateway tool with
provider != "composio"being rejected, and (2) non-arraytoolspayloads (e.g.{}) failing validation. These two cases directly guard the new resolution boundary behavior.Also applies to: 50-79
services/oss/src/agent/config.py (1)
65-72: ⚡ Quick win
toolsis typed too narrowly inload_config().
agent.jsontools can be structured objects, but the local variable is annotated asList[str]. Aligning it withAgentConfig.toolsavoids incorrect assumptions in future edits.Suggested fix
- tools: List[str] = [] + tools: List[Any] = []
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 9c1f9ecf-9a11-41a4-9004-cc4fc8a74aaa
📒 Files selected for processing (35)
api/oss/src/apis/fastapi/tools/models.pyapi/oss/src/apis/fastapi/tools/router.pyapi/oss/src/core/tools/dtos.pyapi/oss/src/core/tools/exceptions.pyapi/oss/src/core/tools/service.pyapi/oss/tests/pytest/unit/tools/__init__.pyapi/oss/tests/pytest/unit/tools/test_agent_resolution.pyservices/entrypoints/main.pyservices/oss/src/agent/__init__.pyservices/oss/src/agent/app.pyservices/oss/src/agent/client.pyservices/oss/src/agent/config.pyservices/oss/src/agent/schemas.pyservices/oss/src/agent/secrets.pyservices/oss/src/agent/tools/__init__.pyservices/oss/src/agent/tools/gateway.pyservices/oss/src/agent/tools/resolver.pyservices/oss/src/agent/tools/secrets.pyservices/oss/src/agent/tracing.pyservices/oss/tests/pytest/integration/__init__.pyservices/oss/tests/pytest/integration/agent/__init__.pyservices/oss/tests/pytest/integration/agent/conftest.pyservices/oss/tests/pytest/integration/agent/test_resolve_secrets_http.pyservices/oss/tests/pytest/integration/agent/tools/__init__.pyservices/oss/tests/pytest/integration/agent/tools/test_gateway_http.pyservices/oss/tests/pytest/integration/agent/tools/test_secrets_http.pyservices/oss/tests/pytest/unit/__init__.pyservices/oss/tests/pytest/unit/agent/__init__.pyservices/oss/tests/pytest/unit/agent/conftest.pyservices/oss/tests/pytest/unit/agent/test_invoke_handler.pyservices/oss/tests/pytest/unit/agent/test_secrets_mapping.pyservices/oss/tests/pytest/unit/agent/test_select_backend.pyservices/oss/tests/pytest/unit/agent/tools/__init__.pyservices/oss/tests/pytest/unit/agent/tools/test_gateway_mapping.pyservices/oss/tests/pytest/unit/agent/tools/test_resolution.py
| def _coerce_tools(cls, value: Any) -> List[AgentToolReference]: | ||
| try: | ||
| configs = coerce_tool_configs(value or []).tool_configs | ||
| except ToolConfigurationError as exc: |
There was a problem hiding this comment.
Avoid falsy-coercing invalid payloads to an empty tools list
Using value or [] makes invalid falsy payloads (e.g. {}, 0, False) pass as [], so malformed requests can be silently accepted.
Suggested fix
def _coerce_tools(cls, value: Any) -> List[AgentToolReference]:
+ if value is None:
+ raw_values = []
+ elif isinstance(value, list):
+ raw_values = value
+ else:
+ raise ValueError("tools must be an array")
try:
- configs = coerce_tool_configs(value or []).tool_configs
+ configs = coerce_tool_configs(raw_values).tool_configs
except ToolConfigurationError as exc:
raise ValueError(str(exc)) from exc| async def _resolve_composio_tool( | ||
| self, | ||
| *, | ||
| project_id: UUID, | ||
| ref: AgentComposioTool, | ||
| ) -> ResolvedAgentTool: | ||
| provider_key = ToolProviderKind.COMPOSIO.value | ||
|
|
||
| for segment in (ref.integration, ref.action, ref.connection): | ||
| if not _SLUG_SEGMENT_RE.match(segment): | ||
| raise ToolSlugInvalidError( | ||
| slug=f"{provider_key}.{ref.integration}.{ref.action}.{ref.connection}", | ||
| detail=f"Invalid slug segment: {segment!r}", | ||
| ) | ||
|
|
||
| # Fail fast if the connection is missing/inactive/invalid for this project. | ||
| await self.resolve_connection_by_slug( | ||
| project_id=project_id, | ||
| provider_key=provider_key, | ||
| integration_key=ref.integration, | ||
| connection_slug=ref.connection, | ||
| ) | ||
|
|
||
| action = await self.get_action( | ||
| provider_key=provider_key, | ||
| integration_key=ref.integration, | ||
| action_key=ref.action, | ||
| ) | ||
| if not action: | ||
| raise ActionNotFoundError( | ||
| provider_key=provider_key, | ||
| integration_key=ref.integration, | ||
| action_key=ref.action, | ||
| ) | ||
|
|
||
| input_schema = ( | ||
| action.schemas.inputs if action.schemas and action.schemas.inputs else None | ||
| ) | ||
| name = ref.name or f"{ref.integration}__{ref.action}" | ||
| call_ref = ( | ||
| f"tools.{provider_key}.{ref.integration}.{ref.action}.{ref.connection}" | ||
| ) |
There was a problem hiding this comment.
Validate gateway provider before composing a composio call-ref
_resolve_composio_tool always forces provider_key="composio" but never checks ref.provider. A gateway reference with a different provider is currently accepted and silently remapped into the composio namespace, which can resolve/execute against the wrong integration space.
Suggested fix
async def _resolve_composio_tool(
self,
*,
project_id: UUID,
ref: AgentComposioTool,
) -> ResolvedAgentTool:
provider_key = ToolProviderKind.COMPOSIO.value
+ if ref.provider != provider_key:
+ raise ToolSlugInvalidError(
+ slug=f"{ref.provider}.{ref.integration}.{ref.action}.{ref.connection}",
+ detail=f"Unsupported gateway provider for /tools/resolve: {ref.provider!r}",
+ )| from agenta.sdk.engines.tracing.propagation import inject | ||
|
|
||
| # Budget for a backend round-trip (the tool catalog/connection check, the vault fetch). | ||
| TOOLS_TIMEOUT = float(os.getenv("AGENTA_AGENT_TOOLS_TIMEOUT", "30")) |
There was a problem hiding this comment.
Guard TOOLS_TIMEOUT parsing to avoid import-time crashes.
A non-numeric AGENTA_AGENT_TOOLS_TIMEOUT will raise ValueError at import time and fail the agent app startup path.
Suggested fix
-TOOLS_TIMEOUT = float(os.getenv("AGENTA_AGENT_TOOLS_TIMEOUT", "30"))
+try:
+ TOOLS_TIMEOUT = float(os.getenv("AGENTA_AGENT_TOOLS_TIMEOUT", "30"))
+except ValueError:
+ TOOLS_TIMEOUT = 30.0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| TOOLS_TIMEOUT = float(os.getenv("AGENTA_AGENT_TOOLS_TIMEOUT", "30")) | |
| try: | |
| TOOLS_TIMEOUT = float(os.getenv("AGENTA_AGENT_TOOLS_TIMEOUT", "30")) | |
| except ValueError: | |
| TOOLS_TIMEOUT = 30.0 |
| log.warning( | ||
| "agent: named-secret resolve HTTP %s for %s", | ||
| response.status_code, | ||
| names, | ||
| ) | ||
| return {} | ||
| data = response.json() or {} | ||
| except Exception: # pylint: disable=broad-except | ||
| log.warning("agent: named-secret resolve failed for %s", names, exc_info=True) | ||
| return {} | ||
|
|
||
| resolved = data.get("secrets") if isinstance(data, dict) else None | ||
| resolved = resolved if isinstance(resolved, dict) else {} | ||
| missing = [name for name in names if name not in resolved] | ||
| if missing: | ||
| log.warning("agent: unresolved named secret(s): %s", missing) |
There was a problem hiding this comment.
Avoid logging raw secret names in warning paths.
Line 43, Line 50, and Line 57 log secret identifiers directly. Even without values, names can expose internal integrations and tenant metadata in shared logs.
Proposed patch
- if response.status_code >= 400:
- log.warning(
- "agent: named-secret resolve HTTP %s for %s",
- response.status_code,
- names,
- )
+ if response.status_code >= 400:
+ log.warning(
+ "agent: named-secret resolve HTTP %s for %d requested secret(s)",
+ response.status_code,
+ len(names),
+ )
return {}
@@
- except Exception: # pylint: disable=broad-except
- log.warning("agent: named-secret resolve failed for %s", names, exc_info=True)
+ except Exception: # pylint: disable=broad-except
+ log.warning(
+ "agent: named-secret resolve failed for %d requested secret(s)",
+ len(names),
+ exc_info=True,
+ )
return {}
@@
- if missing:
- log.warning("agent: unresolved named secret(s): %s", missing)
+ if missing:
+ log.warning("agent: unresolved named secret count=%d", len(missing))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log.warning( | |
| "agent: named-secret resolve HTTP %s for %s", | |
| response.status_code, | |
| names, | |
| ) | |
| return {} | |
| data = response.json() or {} | |
| except Exception: # pylint: disable=broad-except | |
| log.warning("agent: named-secret resolve failed for %s", names, exc_info=True) | |
| return {} | |
| resolved = data.get("secrets") if isinstance(data, dict) else None | |
| resolved = resolved if isinstance(resolved, dict) else {} | |
| missing = [name for name in names if name not in resolved] | |
| if missing: | |
| log.warning("agent: unresolved named secret(s): %s", missing) | |
| log.warning( | |
| "agent: named-secret resolve HTTP %s for %d requested secret(s)", | |
| response.status_code, | |
| len(names), | |
| ) | |
| return {} | |
| data = response.json() or {} | |
| except Exception: # pylint: disable=broad-except | |
| log.warning( | |
| "agent: named-secret resolve failed for %d requested secret(s)", | |
| len(names), | |
| exc_info=True, | |
| ) | |
| return {} | |
| resolved = data.get("secrets") if isinstance(data, dict) else None | |
| resolved = resolved if isinstance(resolved, dict) else {} | |
| missing = [name for name in names if name not in resolved] | |
| if missing: | |
| log.warning("agent: unresolved named secret count=%d", len(missing)) |
| resolved = data.get("secrets") if isinstance(data, dict) else None | ||
| resolved = resolved if isinstance(resolved, dict) else {} | ||
| missing = [name for name in names if name not in resolved] | ||
| if missing: | ||
| log.warning("agent: unresolved named secret(s): %s", missing) | ||
| return { | ||
| str(key): str(value) for key, value in resolved.items() if value is not None | ||
| } |
There was a problem hiding this comment.
Restrict returned secrets to the requested name set.
Line 59 currently returns every key in resolved. If upstream returns extras, this path propagates unrequested secrets into runtime memory.
Proposed patch
- return {
- str(key): str(value) for key, value in resolved.items() if value is not None
- }
+ requested = {str(name) for name in names}
+ return {
+ str(key): str(value)
+ for key, value in resolved.items()
+ if value is not None and str(key) in requested
+ }| if not usage or not usage.get("total"): | ||
| return | ||
| try: | ||
| span = otel_trace.get_current_span() | ||
| input_tokens = int(usage.get("input") or 0) | ||
| output_tokens = int(usage.get("output") or 0) | ||
| span.set_attribute("gen_ai.usage.input_tokens", input_tokens) | ||
| span.set_attribute("gen_ai.usage.output_tokens", output_tokens) | ||
| span.set_attribute("gen_ai.usage.prompt_tokens", input_tokens) | ||
| span.set_attribute("gen_ai.usage.completion_tokens", output_tokens) | ||
| span.set_attribute("gen_ai.usage.total_tokens", int(usage.get("total") or 0)) | ||
| cost = usage.get("cost") | ||
| if cost: | ||
| span.set_attribute("gen_ai.usage.cost", float(cost)) |
There was a problem hiding this comment.
Zero-valued usage metrics are currently dropped.
Using truthy checks skips valid 0 totals/costs, so traces can’t distinguish “zero usage” from “missing usage”.
Suggested fix
- if not usage or not usage.get("total"):
+ if not usage or usage.get("total") is None:
return
@@
- cost = usage.get("cost")
- if cost:
- span.set_attribute("gen_ai.usage.cost", float(cost))
+ cost = usage.get("cost")
+ if cost is not None:
+ span.set_attribute("gen_ai.usage.cost", float(cost))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not usage or not usage.get("total"): | |
| return | |
| try: | |
| span = otel_trace.get_current_span() | |
| input_tokens = int(usage.get("input") or 0) | |
| output_tokens = int(usage.get("output") or 0) | |
| span.set_attribute("gen_ai.usage.input_tokens", input_tokens) | |
| span.set_attribute("gen_ai.usage.output_tokens", output_tokens) | |
| span.set_attribute("gen_ai.usage.prompt_tokens", input_tokens) | |
| span.set_attribute("gen_ai.usage.completion_tokens", output_tokens) | |
| span.set_attribute("gen_ai.usage.total_tokens", int(usage.get("total") or 0)) | |
| cost = usage.get("cost") | |
| if cost: | |
| span.set_attribute("gen_ai.usage.cost", float(cost)) | |
| if not usage or usage.get("total") is None: | |
| return | |
| try: | |
| span = otel_trace.get_current_span() | |
| input_tokens = int(usage.get("input") or 0) | |
| output_tokens = int(usage.get("output") or 0) | |
| span.set_attribute("gen_ai.usage.input_tokens", input_tokens) | |
| span.set_attribute("gen_ai.usage.output_tokens", output_tokens) | |
| span.set_attribute("gen_ai.usage.prompt_tokens", input_tokens) | |
| span.set_attribute("gen_ai.usage.completion_tokens", output_tokens) | |
| span.set_attribute("gen_ai.usage.total_tokens", int(usage.get("total") or 0)) | |
| cost = usage.get("cost") | |
| if cost is not None: | |
| span.set_attribute("gen_ai.usage.cost", float(cost)) |
mmabrouk
left a comment
There was a problem hiding this comment.
Codex subagent review for #4772
Blocking finding:
services/oss/src/agent/app.py:63/services/oss/src/agent/config.py:40: the default local path selectsInProcessPiBackend(url=None, cwd=wrapper_dir()), andwrapper_dir()defaults toservices/agent. At this PR's head (feat/agent-service),services/agent/src/cli.ts,services/agent/package.json, andservices/agent/src/server.tsare not present; those files land in the runner PRs (#4773/#4778). After merging only #4771 + #4772, the newly mounted/agent/v0/invokepath will try to spawnpnpm exec tsx src/cli.tsfrom a missing package wheneverAGENTA_AGENT_PI_URLis unset, so the service is not independently runnable against its declared base. The HTTP sidecar path also depends on the runner/hosting stack being present. Please either stack/retarget this PR on the runner PR that provides the runtime, include the runtime dependency here, or gate/agent/v0/backend selection until a runner package or sidecar URL is available, returning a clear 503/validation error with coverage.
Related cross-PR issue:
services/oss/src/agent/app.py:149/services/oss/src/agent/schemas.py:36: related to the #4775 frontend path, this PR mounts a directly registered workflow and the comment explicitly says there is no first-class builtin URI yet (agenta:builtin:agent:v0)./inspectcan describe the mounted handler, but I do not see a catalog template/registration that would make an unconditional frontend "create Agent" option work. Please either add that backend surface here, gate the frontend create option until it lands, or update the PR/stack docs so #4775 does not depend on a template this PR does not provide.
I also checked the tool resolver/call-path refactor, vault/gateway adapters, and streaming setup/cleanup path from the patch and did not find another blocker. I did not run tests; this review is based on the GitHub PR metadata, patches, and branch files. The PR body's stack map still points to #4774/#4777 while the newer runner/docs PRs are #4778/#4779, which is contributing to the dependency confusion above.
Agent-workflows: functional PR set
Sliced by functional area, final code only (no intermediate churn). Most PRs are independent off
main; two pairs are stacked. This PR's base is #4771 (review that first).Context
The agent workflow service is the Agenta app that runs an agent turn. It registers the agent as a normal Agenta workflow (
ag.create_app+ag.workflow+ag.route), so it exposes/invokeand/inspectlike the chat and completion services, plus the agent-only/messagesand/load-sessionroutes. The handler reads the agent config fromparameters, resolves tools and provider secrets server-side, threads the trace context, picks a backend, and runs one turn.This PR also adds the platform-side tool-resolution API the service calls back into.
POST /tools/resolveturns an agent's tool references into model-ready specs without ever handing provider keys to the running agent.This is a functional slice that shows the final code, not the path we took to get there. It stacks on the SDK runtime PR
feat/agent-sdk-runtime, which owns the engine-agnostic ports (backend, environment, harness) and their adapters. This PR is the thin Agenta integration that feeds those ports resolved tools, vault secrets, and a trace context.What this changes
The service composes the SDK's offline tool resolver with two server-only adapters. A gateway resolver calls
POST /tools/resolvefor Composio tools. A vault secret provider callsPOST /secrets/resolvefor named secrets. Both fail fast: a missing API base or a non-200 from the gateway raises rather than running an agent with half its tools.select_backendmakes the engine-routing decision explicit. Before, the harness and sandbox choices could be silently dropped. Nowpiandagentastay on the in-process Pi backend on a local sandbox. Aclaudeharness, any non-local sandbox, orAGENTA_AGENT_RUNTIME=rivetroutes to the rivet backend. The transport to the TypeScript runner followsAGENTA_AGENT_PI_URL: set means HTTP to the sidecar, unset means spawn the runner CLI.On the platform side,
call_tooland the new agent resolver previously duplicated connection lookup and validation. That logic now lives in oneresolve_connection_by_slugmethod, so the call path and the resolve path raise the same precise errors: missing, inactive, invalid, or no provider handshake.The agent advertises an
agent_configcatalog type through/inspect. The playground renders one composite control for the whole config and pre-fills it from the schema default.Key architectural decision to review
Backend gating is split across two files, and that split is deliberate.
select_backendinservices/oss/src/agent/app.pyonly routes. It never raises. It sendsagentaon a non-local sandbox to the rivet backend even though the Agenta harness cannot run there yet. The actual rejection happens one line later, whenmake_harnessbuilds the harness and the harness validates its backend and raisesUnsupportedHarnessError. Scrutinize whether routing and validation belong in separate places. The tradeoff:select_backendstays a pure mapping that is trivial to unit test, and the harness owns the truth about which backends it supports. The risk: a reader skimmingselect_backendseesagentarouted to rivet and may assume it runs there.The second decision is the trust boundary in tool resolution. The platform never returns provider keys.
resolve_agent_toolsinapi/oss/src/core/tools/service.pyreturns only acall_refslug (tools.{provider}.{integration}.{action}.{connection}), and the running agent calls back throughPOST /tools/callto execute. The_SLUG_SEGMENT_REvalidation matters here.__is forbidden in a segment because/tools/callround-trips__to.when parsing function names, so a__inside a segment would corrupt the split. Check that the regex and the call-ref construction agree.How to review this PR
Read
services/oss/src/agent/app.pyfirst. The_agenthandler is the whole control flow: parse config, resolve tools and secrets, build a session config, pick a backend, run one turn batch or streaming. Confirmselect_backendmatches the routing table in its test, and confirm the streaming path ownssetupandcleanupcorrectly.Then read the tool adapters.
tools/gateway.pyis the largest file and the one to read closely. It validates the gateway response shape hard: one spec per ref, no duplicates, every ref returned.tools/resolver.pyshows the composition.tools/secrets.pyandsecrets.pyare short HTTP clients.Then
schemas.pyandtracing.py. In tracing, check thatrecord_usagestampsgen_ai.usage.*on the active span, because the harness exports its own OTLP batch and the per-batch roll-up cannot bridge totals onto the workflow span.Then
api/oss/src/core/tools/service.py: the new resolver and the extractedresolve_connection_by_slug.Skip the test files on a first pass. They read as a spec for the behavior above.
Likely regression: the
call_toolrefactor. It now delegates connection lookup toresolve_connection_by_slug. Confirm the error-to-HTTP-status mapping in the router did not change for the call path.Tests / notes
Unit tests cover
select_backendrouting, the invoke handler, gateway and secret mapping, and tool resolution. Integration tests exercise the gateway and secret HTTP adapters against a mocked backend. Platform tests cover the resolver, the legacy-shape coercion, and the stable call-ref.Both secret resolvers are best-effort by design. An empty vault returns no env vars, and the harness falls back to its own login or OAuth. The gateway resolver is the opposite. It fails fast, because an agent missing a configured tool should not run silently. Watch that this asymmetry is intended when reviewing failure handling.