Skip to content

Proposal: ship a Phantom.Test framework for asserting tools, prompts, and resources #15

@dbernheisel

Description

@dbernheisel

Motivation

Today, users testing their MCP routers either reach for Phantom.Plug + Plug.Test (which exercises the full HTTP/SSE transport and is heavy for unit-level assertions) or roll their own dispatcher per project. The library already has a strong internal pattern for this in test/support/dispatcher.ex (request_tool/3, assert_response/2, assert_notify/1, etc.) — it just isn't exposed.

A first-class test framework would make it as easy to test an MCP router as it is to test a Phoenix LiveView, without forcing users to know whether a given handler returns {:reply, _, _} synchronously or {:noreply, _} and calls Session.respond/2 from a Task.

Design goals

  1. No use macro. Plain modules to import, so users compose with whatever ExUnit.Case template they already have (MyApp.DataCase, MyApp.ConnCase, etc.). No inherited setup, no opinions about async: true.
  2. Final-response-only assertions. Modeled on Phoenix.LiveViewTest.render_async/2 and Phoenix.ChannelTest.assert_reply/4: dispatch helpers block until the system has settled and return the final result. Tests assert on the result, not on which path produced it.
  3. Phoenix.PubSub stays optional. The library declares phoenix_pubsub as optional: true (mix.exs:45). The test framework must work without it; PubSub-dependent features (Phantom.Tracker, log/progress/list-changed notifications, resources/subscribe) are opt-in via Phantom.Test.start(pubsub: MyApp.PubSub). Without :pubsub, start/1 skips Phantom.Tracker startup and the subscribe-style assertions raise a clear "requires phoenix_pubsub" error.
  4. End-to-end escape hatch. Keep a thin Phantom.Test.Conn module (essentially a generalized TestDispatcher) for users who want to verify CORS, origin validation, or SSE framing through Phantom.Plug.

Proposed surface

Phantom.Test

  • start/1 — registers Phantom.Cache.register/1 for the given router, sets up a notification listener that routes Session.respond/2, Session.notify_progress/3, Session.elicit/2, and ClientLogger.log/3 back to the calling test process. Optionally starts Phantom.Tracker if :pubsub is provided.
  • build_session/2 — constructs a Phantom.Session without going through connect/2; accepts assigns:, allowed_tools:, allowed_resource_templates:, allowed_prompts:.
  • build_request/2 — constructs a Phantom.Request for callers that need it directly.
  • Blocking dispatchers (each returns the final response, accepts timeout:):
    • call_tool/3..4
    • read_resource/3..4
    • get_prompt/3..4
    • complete_prompt/4..5
    • complete_resource/4..5
    • list_resources/2..3
  • expect_elicit/1, expect_elicit_url/1 — register a responder for handlers that block on Session.elicit/2 so the call can settle.
  • flush_notifications/0 — drain any pending progress/log messages from the test mailbox.

Phantom.Test.Assertions

Final-response matchers (operate on the value returned by a dispatcher):

  • assert_tool_text/2, assert_tool_error/2, assert_tool_image/2, assert_tool_audio/2
  • assert_tool_resource_link/2, assert_tool_embedded_resource/2
  • assert_resource_text/2, assert_resource_blob/2
  • assert_prompt_message/2
  • assert_jsonrpc_error/2
  • assert_elicitation_required/2

Side-channel matchers (drain mailbox, don't gate the dispatch):

  • assert_progress_seen/1, refute_progress_seen/1
  • assert_client_log_seen/1, refute_client_log_seen/1

Phantom.Test.Conn (optional, transport-level)

Generalized form of the existing test/support/dispatcher.ex: request_tool/3, request_resource_read/2, assert_response/2, assert_sse_connected/0, etc. For users who care about Plug behavior (CORS, origin validation, SSE framing).

Examples

defmodule MyApp.MCP.RouterTest do
  use ExUnit.Case, async: true
  import Phantom.Test
  import Phantom.Test.Assertions

  setup do
    # PubSub is optional. Omit it for tests that don't need Tracker.
    Phantom.Test.start(router: MyApp.MCP.Router)
    {:ok, session: build_session(MyApp.MCP.Router, assigns: %{user: %User{id: 1}})}
  end

  test "sync tool", %{session: session} do
    result = call_tool(session, :echo_tool, %{message: "hi"})
    assert_tool_text(result, "hi")
  end

  test "async tool", %{session: session} do
    result = call_tool(session, :async_echo_tool, %{message: "hi"}, timeout: 500)
    assert_tool_text(result, "hi")
  end

  test "tool error", %{session: session} do
    result = call_tool(session, :with_error_tool, %{})
    assert_tool_error(result, "an error")
  end

  test "input-schema validation", %{session: session} do
    result = call_tool(session, :create_question, %{study_id: "nope"})
    assert_jsonrpc_error(result, code: -32602)
  end

  test "tool that requires elicitation", %{session: session} do
    result = call_tool(session, :elicitation_required_tool, %{})
    assert_elicitation_required(result, message: ~r/authenticate/)
  end

  test "tool that blocks on Session.elicit/2", %{session: session} do
    expect_elicit(fn _req ->
      {:ok, %{"action" => "accept", "content" => %{"name" => "Joe"}}}
    end)

    result = call_tool(session, :elicit_tool, %{})
    assert_tool_text(result, ~r/Joe/)
  end

  test "resource read", %{session: session} do
    result = read_resource(session, :study, id: 42)
    assert_resource_text(result, ~r/^# Study/)
  end

  test "prompt completion", %{session: session} do
    result = complete_prompt(session, :suggest_questions, "study_id", "1")
    assert {:ok, %{values: ids}} = result
    assert "1" in ids
  end
end

When PubSub is wired up, side-channel assertions become available without changing the dispatch shape:

defmodule MyApp.MCP.LongRunningTest do
  use ExUnit.Case, async: false
  import Phantom.Test
  import Phantom.Test.Assertions

  setup do
    start_supervised!({Phoenix.PubSub, name: MyApp.PubSub})
    Phantom.Test.start(router: MyApp.MCP.Router, pubsub: MyApp.PubSub)
    {:ok, session: build_session(MyApp.MCP.Router)}
  end

  test "progress notifications fire mid-call", %{session: session} do
    result = call_tool(session, :really_long_async_tool, %{message: "hi"}, timeout: 1_000)
    assert_tool_text(result, "hi")
    assert_progress_seen(steps: 4)
  end

  test "client log emitted", %{session: session} do
    call_tool(session, :client_log_tool, %{message: "hello"})
    assert_client_log_seen(level: :info, data: %{message: "hello"})
  end
end

Tradeoffs

  • Blocking inside call_tool/3 means a missing/slow response surfaces as a Phantom.Test.TimeoutError raised at the call site (better error locality than a separate assert_response/1 two lines later), at the cost of a default per-call timeout (~100ms) that handlers exceeding it must override explicitly.
  • Final-response model means side-channel messages (progress, log) need explicit drain helpers because they're no longer the natural return value. This matches how Phoenix.LiveViewTest treats Phoenix.PubSub broadcasts triggered during a render.
  • No use macro means a tiny bit more boilerplate per test module (two imports and a setup) in exchange for not boxing users into a case template hierarchy.
  • PubSub-optional means Phantom.Test ships in two tiers: a core that works without phoenix_pubsub (covers tools/call, resources/read, prompts/get, completion/complete, validation errors, elicitation), and PubSub-gated helpers (assert_progress_seen/1, assert_client_log_seen/1, anything that exercises Phantom.Tracker) that raise a clear error if invoked without pubsub: having been passed to start/1.

Implementation notes

  • Most of Phantom.Test.Conn already exists in test/support/dispatcher.ex — generalizing it (router-agnostic, public docs, configurable PubSub) is mechanical.
  • The blocking dispatchers can be implemented as: invoke the handler in a linked task that posts the resolved payload to a known message tag, then assert_receive in the calling process. This handles {:reply, _, _}, {:noreply, _} + Session.respond/2, and {:elicitation_required, _} uniformly.
  • expect_elicit/1 registers a responder pid that intercepts the message Session.elicit/2 would normally send to the connection process.
  • Content matchers should accept either the unwrapped struct or be composable with the dispatch return value, so both result = call_tool(...); assert_tool_text(result, "x") and call_tool(...) |> assert_tool_text("x") read naturally.

Happy to prototype Phantom.Test.start/1 + call_tool/3 + expect_elicit/1 + assert_tool_text/2 + assert_tool_error/2 against the existing Test.MCP.Router as a proof of concept if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions