Skip to content

RFC: Agent protocol support for apx apps#142

Open
stuart-gano wants to merge 43 commits intodatabricks-solutions:mainfrom
stuart-gano:rfc/agent-protocol
Open

RFC: Agent protocol support for apx apps#142
stuart-gano wants to merge 43 commits intodatabricks-solutions:mainfrom
stuart-gano:rfc/agent-protocol

Conversation

@stuart-gano
Copy link
Copy Markdown

@stuart-gano stuart-gano commented Mar 24, 2026

Summary

This RFC adds the agent protocol addon to APX — everything needed to build, run, and deploy Databricks-native AI agents with one command.

What's in this PR

Agent runtime (core/agent.py)

Agent type hierarchy

Type Description
LlmAgent / Agent Tool-calling loop over FMAPI. Alias: Agent = LlmAgent.
SequentialAgent Runs sub-agents one after another, piping output as input.
ParallelAgent Runs sub-agents concurrently, merges outputs.
LoopAgent Repeats a sub-agent until it calls finish_loop() or max_iterations is hit.
RouterAgent One upfront FMAPI call selects a sub-agent; synthetic transfer tools never enter the dispatch pipeline.
HandoffAgent Agents pass control to each other via real ASGI transfer_to_* routes.

LlmAgent features

  • before_tool / after_tool hooks — sync or async callables, called around every tool dispatch
  • input_guardrails / output_guardrails — lists of sync/async callables; return None (pass) or str (short-circuit with that message)
  • context_window_tokens — budget cap; when exceeded, the middle of the history is summarized with one extra LLM call
  • custom_outputsset_custom_output(request, key, value) helper; surfaced in InvocationResponse.custom_outputs and as event: custom_outputs in SSE

Composition patterns match Google ADK and OpenAI Agents SDK feature-for-feature, plus Databricks-native additions:

  • On-behalf-of auth wired end-to-end — every tool receives the caller's identity token
  • Zero-config MCP SSE server at /mcp/sse
  • A2A discovery via /.well-known/agent.json (live url + mcpEndpoint)
  • WorkspaceClient / UserWorkspaceClient injectable as typed FastAPI deps in tool functions
  • apx deploy — one command to production
  • app_predict_fn MLflow eval bridge
  • Built-in dev UI at /_apx/agent and /_apx/tools

Dev UI namespace: /_apx/*

/_apx/ is the APX platform tooling namespace (underscore prefix signals "platform layer, not app layer").

/_apx/agent — interactive chat UI

  • Send messages, stream responses
  • Tool call trace panel showing args, results, and timing
  • Inspect registered skills
  • Copy MCP SSE URL for Claude Desktop / Cursor
  • Nav link to /_apx/tools

/_apx/tools — tool inspector and live invocation form

  • Left sidebar: all tools grouped Local / Remote with type badges
  • Schema tab: inputSchema as the LLM sees it (dep-injected params stripped, FMAPI JSON, syntax highlighted) — different from /docs which shows all FastAPI params
  • Invoke tab: form auto-generated from inputSchema, POSTs to /api/tools/<name> or sub-agent /invocations, shows result with timing
  • Nav link back to /_apx/agent

Protocol endpoints

GET  /.well-known/agent.json   A2A discovery (name, skills, mcpEndpoint, url)
POST /invocations               FMAPI tool-calling loop; stream=true for SSE
GET  /health                    Liveness
POST /api/tools/<fn_name>       One route per registered tool
GET  /mcp/sse                   MCP SSE transport
POST /mcp/messages/             MCP SSE return channel
GET  /_apx/agent                Dev chat UI
GET  /_apx/tools                Tool inspector + invocation

Deployment (crates/cli/src/deploy.rs)

  • apx deploy command — builds wheel, packages app, deploys to Databricks Apps, polls until running
  • No DABs YAML, no separate CI step

README

Added "Why APX for agents?" section calling out the seven Databricks-native advantages vs Google ADK and OpenAI Agents SDK.

End-to-end validation: Energy billing agent

Validated the full agent flow against a common customer pattern — an energy billing Q&A agent with Lakebase-backed tools:

  • 5 tools: get_customer_profile, query_ami_readings, get_billing_summary, get_rate_schedule, compare_months
  • All tools use Dependencies.UserClient (OBO auth) → Lakebase Provisioned via generate_database_credential
  • LLM model: databricks-claude-sonnet-4-6 via FMAPI
  • Streaming SSE via /_apx/agent chat UI — confirmed working end-to-end

Bugs found and fixed during validation:

  1. OBO token not forwarded in tool dispatch_dispatch_tool_call only passed Authorization to internal ASGI calls, dropping X-Forwarded-Access-Token. Tools using Dependencies.UserClient failed with ValueError: OBO token is not provided. Fixed by forwarding OBO headers.
  2. JS syntax error in /_apx/agent — Python \n in an f-string produced a literal newline inside a JS string, breaking the chat UI entirely. Fixed by escaping to \\n.
  3. /_apx/* namespace conflict (prior commit) — /_apx routes collided with app routes; fixed by separating the router merge.
  4. /invocations not proxied (prior commit) — dev proxy only injected OBO tokens for /api/*, not root-level agent routes; fixed by adding api_utils_router.

Test plan

  • Agent(tools=[...]) registers routes, /_apx/agent loads
  • /_apx/agent chat UI: send message → streaming SSE response renders correctly
  • /_apx/agent chat UI: tool calls display in trace panel with args/result/timing
  • /_apx/tools shows correct FMAPI schemas (dep-stripped), Invoke tab POSTs and returns result
  • OBO token forwarded through internal ASGI tool dispatch — Dependencies.UserClient works in tools called via /invocations
  • Dev proxy injects X-Forwarded-Access-Token for both /api/* and root-level routes (/invocations, /.well-known/agent.json)
  • End-to-end: user question → LLM → tool call → Lakebase query → LLM → streamed answer
  • apx init --addon agent scaffolds correctly (existing integration test)
  • LoopAgent loops until finish_loop() is called
  • RouterAgent routes to correct sub-agent without registering transfer tools in MCP
  • HandoffAgent transfers between agents via transfer_to_*
  • Guardrails short-circuit correctly; hooks fire before/after tool dispatch
  • custom_outputs appear in InvocationResponse and SSE stream
  • context_window_tokens triggers summarization at budget
  • apx deploy deploys and polls to RUNNING

This pull request was AI-assisted.

Proposes generating agent protocol endpoints (invocations, A2A
discovery, MCP tools, eval bridge) from existing apx routes via
pyproject.toml configuration. Routes are tools — no new abstractions.

Implemented as a LifespanDependency addon following the same pattern
as SQL and Lakebase addons.

Co-authored-by: Isaac
Adds addons/agent/ following the same pattern as sql and lakebase:
- addon.toml with Dependencies.Agent type alias
- LifespanDependency that reads [tool.apx.agent] from pyproject.toml
- Builds tool registry from app's OpenAPI spec (routes are tools)
- Generates /.well-known/agent.json (A2A discovery card)
- Generates /invocations (agent protocol dispatch to routes)
- Generates /health (liveness probe)

Zero application code changes — configure via pyproject.toml, existing
routes with operation_id automatically become agent tools.

Co-authored-by: Isaac
49 checks covering _inspect_tool_fn, _make_input_model, Agent.build_local_tools,
_build_fmapi_tool_schemas, build_router signature patching, structured output,
protocol models, and A2A card generation.

Runs directly with python3 — no APX wheel build required.
- Remove __signature__ from _ToolFn Protocol; regular Python functions
  satisfy __name__ + __doc__ but not __signature__ as a direct attr
- Change _patch_handler_signature handler param to Any (does dynamic
  attr assignment, not Protocol reads)
- Change _make_route_handler return type to Any (returns async coroutines)
- Fix type: ignore comment from mypy syntax [call-overload] to bare
  type: ignore for ty compatibility on create_model call
- Add rust-embed include-exclude feature + exclude pyc/__pycache__ from
  embedded templates to prevent template-not-found errors at scaffold time
- Add httpx>=0.27.0 as Python dependency in agent addon.toml
- Add get_root_routers() to LifespanDependency base class + factory
  for protocol routes that must live at / not /api/
- Agent get_root_routers(): /.well-known/agent.json, /invocations, /health
- Agent get_routers(): /api/tools/* (api-prefixed tool routes)
- Move addon pyproject.toml.jinja2 to addon root (was in src/base/,
  which mapped to src/{app_slug}/ instead of project root)
- Result: [tool.apx.agent] config correctly written at scaffold time,
  enabling AgentContext lifespan initialization
Builds the project and deploys to Databricks Apps via the Databricks
CLI, then polls until the app reaches RUNNING state.

- apx deploy [APP_PATH] [--skip-build] [--profile P] [--build-path P]
- Reads DATABRICKS_CONFIG_PROFILE from .env if --profile not given
- Polls databricks apps get every 3s (up to 3 min) for RUNNING state
- Reports ERROR/CRASHED states with hint to check logs
- Extracts run_build() from build.rs so deploy can reuse it
Adds a self-contained HTML/JS chat interface at /_agent for
interactively testing agents during local development, inspired
by Google ADK's `adk web` experience:

- Fetches agent name/description/skills from /.well-known/agent.json
  context at render time (no round-trip needed)
- Streams responses via SSE: sends InvocationRequest with stream=true
  and reads output_text.delta events token by token
- Maintains full conversation history client-side and sends it each
  request (stateless agent, stateful client)
- Shows registered skills in a collapsible panel
- Auto-resizing textarea, Enter to send, Shift+Enter for newlines
- Dark theme matching APX style

Also fixes build.rs to skip UI build when the project has no frontend
(pure-API agent projects), guarded by meta.has_ui().
…g capability

- Remove unused JSONResponse import
- Fix skills_json construction: !r produces Python repr (single-quoted
  strings), which is invalid JSON. Switch to json.dumps() so the browser
  can actually parse the skills array without error
- Set A2ACapabilities.streaming = True — /invocations supports stream: true
  via SSE, so the discovery card should advertise it correctly
… test

1. Auto-discover agent_router.py — _AgentDependency.get_routers() now
   auto-imports {backend_pkg}.agent_router via importlib if _agent_instance
   is None. This removes the need for the addon to overwrite app.py with a
   side-effect import, so existing app.py customisations are preserved.
   The addon's app.py template is deleted.

2. /_agent setup banner — when AgentContext is None (missing pyproject
   config or no Agent() call), the dev UI now shows a clear amber banner
   with setup instructions instead of silently sending to a 404.

3. Rename Dependencies alias Agent → AgentContext — avoids a confusing
   collision between the Agent builder class (used in agent_router.py)
   and the route-parameter dependency type (used in FastAPI handlers).
   Updated doc: `ctx: Dependencies.AgentContext`.

4. Integration test for agent addon — test_init_with_agent_addon verifies
   that `apx init --addons agent` scaffolds agent_router.py, core/agent.py,
   [tool.apx.agent] in pyproject.toml, httpx dep, and no ui/ directory.
Exposes all registered Agent tools as an MCP server over SSE transport,
mounted at /mcp/sse (GET) + /mcp/messages/ (POST).

- _build_mcp_components(ctx, app): builds mcp.server.Server + SseServerTransport
  from the AgentContext tool registry. Tool calls dispatch via ASGI to the
  existing /api/tools/<name> routes so FastAPI dep injection (auth, workspace
  client) applies identically to MCP and REST callers.
- Lifespan wires the MCP server onto app.state; gracefully skips with a warning
  if the mcp package isn't installed.
- /_agent dev UI gains an MCP info bar: shows the SSE URL computed from
  window.location.origin with a one-click copy button.
- addon.toml: adds mcp>=1.0.0 to Python dependencies.

Claude Desktop config:
  {"mcpServers": {"my-agent": {"transport": "sse", "url": "http://localhost:8000/mcp/sse"}}}
…t.json

- Add mcpEndpoint field to AgentCard — populated at request time with
  "{base_url}/mcp/sse" when the MCP server is active, null otherwise.
- Populate card.url from request.base_url — was always "" before, which
  breaks A2A clients that use the card to self-locate the agent.
- Both fields are filled via model_copy() in the route handler so the
  stored ctx.card template stays clean (no request dependency at lifespan).
_dispatch_tool_call posted to /tools/<fn> but the actual routes live at
{api_prefix}/tools/<fn> (e.g. /api/tools/<fn>) because build_router()
returns a router that gets included under api_router which carries the
prefix. The LLM tool-calling loop would 404 on every tool call.

Fix: import api_prefix from ..._metadata (same pattern as _factory.py)
and use f"{api_prefix}/tools/{fn_name}" — matching what the MCP
dispatch already did correctly.
…gent hierarchy

Adds ADK-style agent composition types alongside the existing LlmAgent:

  SequentialAgent([planner, writer])   — chains agents, each sees prior output
  ParallelAgent([legal, finance])      — runs all concurrently, merges results

Key design changes:

- BaseAgent abstract base: run(), stream(), get_tool_routers(), collect_tools(),
  fetch_remote_tools() — any custom orchestration pattern can subclass this.

- LlmAgent replaces Agent (Agent = LlmAgent alias kept for backwards compat).
  __init__ no longer sets _agent_instance — sub-agents in a composite no longer
  accidentally override the root registration.

- _auto_import_agent_router now looks for a module-level `agent` variable of
  type BaseAgent in agent_router.py rather than relying on __init__ side-effects.
  Explicit assignment makes intent clear and supports all agent types:
      agent = SequentialAgent([LlmAgent(tools=[a]), LlmAgent(tools=[b])])

- AgentContext carries the root agent instance (ctx.agent). _handle_invocation
  delegates to ctx.agent.run() / ctx.agent.stream() — no agent-type-specific
  code in the protocol layer.

- _run_llm_loop now takes list[Message] instead of InvocationRequest, making
  it callable from LlmAgent.run() without constructing a fake request body.

- Lifespan uses collect_tools() + fetch_remote_tools() instead of the
  LlmAgent-specific build_local_tools() / fetch_sub_agent_tools().

Usage in agent_router.py:
    planner = LlmAgent(tools=[search, outline])
    writer  = LlmAgent(tools=[draft])
    agent   = SequentialAgent([planner, writer])   # ← registered as root
_run_llm_loop now accepts an optional `tools` parameter. LlmAgent.run()
and LlmAgent.stream() pass self.collect_tools() so each LlmAgent in a
SequentialAgent or ParallelAgent hierarchy only exposes its own tools
to FMAPI, preventing cross-agent tool leakage.
- Async tool support: _make_route_handler now checks iscoroutinefunction
  and awaits the tool fn when it's a coroutine; sync tools unchanged

- MCP tool dispatch path: _build_mcp_components imported api_prefix from
  _metadata so /mcp tool calls hit the correct {api_prefix}/tools/{name}
  route instead of hardcoded /api/tools/{name}

- Instructions / system prompt: AgentConfig gains an optional `instructions`
  field (maps to pyproject.toml [tool.apx.agent]); LlmAgent.__init__ accepts
  an `instructions` kwarg that overrides the config value per-agent.
  _run_llm_loop prepends a system message when instructions are non-empty.
  Useful for per-agent persona in SequentialAgent/ParallelAgent compositions.
- InvocationRequest.input now accepts list[Message] | str; a plain string
  is coerced via .messages() so MLflow eval harness and curl one-liners
  work without wrapping in a list

- app_predict_fn gains an optional token param that adds Authorization:
  Bearer <token> to every request — required for OBO-protected Databricks
  Apps during mlflow.genai.evaluate()

- MLflow tracing: _handle_invocation opens a root CHAIN span per request;
  each FMAPI call opens a child LLM span; each tool dispatch opens a child
  TOOL span. All attributes (model, messages, result) are set on the spans.
  Tracing is no-op when mlflow is not importable so the addon remains
  usable in plain FastAPI dev without a tracking server.
- MLflow span leak on exception: all LLM and TOOL spans now use
  try/finally to guarantee span.end() on error paths; root CHAIN span
  in _handle_invocation also wrapped in try/finally

- AgentConfig gains temperature, max_tokens (both optional, None = model
  default), and max_iterations (default 10). All three are documented as
  comments in the scaffolded pyproject.toml.

- LlmAgent.__init__ accepts the same three params to override per-agent
  within a SequentialAgent/ParallelAgent composition.

- _run_llm_loop resolves precedence: constructor arg > AgentConfig >
  model default. Builds fmapi_extra dict only with non-None values so
  missing fields are not sent to FMAPI at all.
…tom_inputs

- Root span set_attribute-after-end: moved set_attribute inside the
  try/finally block so it runs before end() in all paths including errors

- result undefined if _dispatch_tool_call raises: initialise result = ""
  before the try block so messages.append never hits a NameError

- SequentialAgent/ParallelAgent now accept an optional instructions param.
  When set, a system message is prepended to the conversation before any
  sub-agent runs — framing the whole pipeline without overriding each
  LlmAgent's own system prompt.

- custom_inputs wired up: _handle_invocation stashes custom_inputs on
  request.state; _run_llm_loop reads custom_inputs["instructions"] as the
  highest-priority system prompt override (> constructor > AgentConfig).
  custom_inputs also recorded as a span attribute on the root CHAIN span.
  InvocationRequest.instructions_override() helper added for callers.
- _load_agent_config: prefer __file__-relative pyproject.toml search over
  cwd-relative; cwd may be unrelated in deployed Databricks Apps. Falls
  back to cwd walk for interactive/test use.

- FMAPI tools payload: omit `tools` key entirely when tool list is empty
  instead of sending `"tools": []`; some FMAPI backends reject the empty array.

- SSE error events: wrap ctx.agent.stream() in try/except inside the
  generator; on exception yield `event: error\ndata: {...}` and log,
  then close the span in finally. Previously the stream silently stopped
  and the client UI hung indefinitely.

- MCP auth forwarding: mcp_sse handler captures the incoming Authorization
  header onto app.state.mcp_auth_header; _call_tool reads it and forwards
  it when making ASGI requests to tool routes, giving MCP clients the same
  OBO token context as REST callers.

- app_predict_fn docstring: fix import path from `apx.agent` (wrong) to
  `{{app_slug}}.backend.core.agent` (correct rendered module path).

- LlmAgent.stream(): add comment clarifying that streaming is simulated
  (full response then chunked) because FMAPI lacks per-token streaming.
…xt window

- LoopAgent: runs a LlmAgent in a loop until finish_loop() tool called or
  max_iterations reached; finish_loop registered as a real ASGI route so it
  shares the same dispatch path as all other tools

- Tool hooks: before_tool(name, args) and after_tool(name, args, result) on
  LlmAgent; sync and async callables both accepted; fire around every tool
  dispatch in _run_llm_loop

- Guardrails: input_guardrails and output_guardrails on LlmAgent; each is a
  list of callables returning None (pass) or str (short-circuit with that text);
  applied in LlmAgent.run() and stream() before/after _run_llm_loop

- custom_outputs: set_custom_output(request, key, value) helper lets tool
  functions surface structured data alongside text; _handle_invocation
  initialises request.state.custom_outputs and includes it in
  InvocationResponse.custom_outputs; SSE path emits a custom_outputs event

- Context window management: context_window_tokens on LlmAgent; _maybe_trim_context
  estimates token usage (4 chars/token), keeps system messages + last 2 messages
  intact, and summarises the middle with a single LLM call when budget exceeded

- Type aliases: BeforeToolHook, AfterToolHook, InputGuardrailFn, OutputGuardrailFn
- RouterAgent: routes to one of several named sub-agents via a single upfront
  FMAPI call with synthetic transfer_to_<name> tools; no routes registered —
  the Python layer intercepts tool_calls directly; falls back to first agent
  when LLM does not call a transfer tool

- HandoffAgent: agents dict + start key; each active agent receives
  transfer_to_<name> tools for every other agent injected into its own tool
  list; transfer routes registered as real ASGI endpoints (same signal
  pattern as LoopAgent.finish_loop) so FastAPI dep injection is preserved;
  handoff_to is set on request.state and checked after each _run_llm_loop
  call; supports up to max_handoffs transfers before stopping

- _TransferBody Pydantic model shared by HandoffAgent transfer handlers

Both types honour LlmAgent hooks (before_tool/after_tool) and
context_window_tokens of the currently active sub-agent.
Adds a second page to the APX dev tooling namespace: /_apx/tools.

Shows every registered tool exactly as the LLM sees it — dep-injected
parameters stripped, FMAPI inputSchema rendered with syntax highlighting.
An Invoke tab auto-generates a form from the schema and POSTs to the tool's
/api/tools/<name> endpoint (or sub-agent /invocations), displaying the
result with timing.

Both /_apx/agent and /_apx/tools now have a nav bar linking between them.
Also wires the route in get_root_routers() and updates the module docstring.
GET /_apx/probe?url=https://api.example.com makes a server-side GET and
returns HTTP status, latency, content-type, server header, redirect count,
and structured error details (ConnectError, Timeout, SSLError).

Because the request runs from the server process, results reflect the
deployed app's actual network path — useful for diagnosing egress
restrictions on Databricks Apps before writing tool code.

Also adds Probe to the nav bar on /_apx/agent and /_apx/tools.
- Add _build_apx_openapi_spec() — generates an OpenAPI 3.1 spec containing
  only tool endpoints with dep-stripped schemas (what the LLM sees, not what
  FastAPI sees with WorkspaceClient etc.)
- Serve the filtered spec at /_apx/openapi.json
- Replace 350-line hand-rolled tools UI with Scalar CDN embed (kepler theme)
  pointing at /_apx/openapi.json — sidebar, schema display, and try-it panel
  are now best-in-class without maintaining bespoke JS/CSS
- Fixed APX nav bar overlays Scalar using position:fixed + ResizeObserver
The Rust dev server nested its own control router at /_apx (for
/health, /logs, /stop), which consumed all /_apx/* traffic and
returned 404 for the Python-side APX dev UI routes.

Add /_apx/agent, /_apx/tools, /_apx/probe, /_apx/openapi.json,
and /invocations as direct routes in api_utils_router. Axum gives
specific .route() registrations priority over .nest() prefix
matches, so these paths now reach the Python backend correctly.
Add /health, /.well-known/agent.json, /mcp/sse, /mcp/messages/
to api_utils_router alongside /invocations and the /_apx/* dev
UI routes added in the previous commit.

All routes registered by the APX agent protocol at app root are
now forwarded to the Python backend through the dev proxy.
Each assistant turn now shows a collapsible 'tool calls' row below
the response. For each tool: function name, ok/error badge, latency,
input args, and result (truncated at 800 chars).

Backend: _run_llm_loop writes a tool_trace entry (name, args, result,
ms) to request.state after each dispatch. _sse_generator emits a
'tool.trace' SSE event after the text stream completes.

Frontend: buildTraceEl() renders the trace as a <details> block.
Error results are highlighted red. Multiple errors show a count in
the summary line.
UV_NATIVE_TLS=1 in the project .env was not being seen by the uv
subprocess during preflight because .env is not loaded into the
process environment at that point.

Add resolve_native_tls(app_dir) which checks the shell env first,
then reads the project .env as fallback. Pass the result as the
native_tls flag to Uv::sync() and Uv::tool_run(), which append
--native-tls when true.

This fixes apx dev start on corporate networks with SSL inspection
where uv-dynamic-versioning fetches were returning 503.
_dispatch_tool_call only forwarded Authorization when dispatching tool
calls via ASGI to /api/tools/<fn>. This meant X-Forwarded-Access-Token
(injected by the dev proxy) was lost, causing OBO auth to fail on any
tool that needs Dependencies.UserClient (e.g. Lakebase queries).

Also fixes a JS syntax error in the /_apx/agent chat UI where a Python
\n in an f-string produced a literal newline inside a JS string literal.

Co-authored-by: Isaac
Replace verbose prose with an ASCII architecture diagram and a
feature table. Keeps the code example showing tool definition.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant