Motivation
Today, users testing their MCP routers either reach for Phantom.Plug + Plug.Test (which exercises the full HTTP/SSE transport and is heavy for unit-level assertions) or roll their own dispatcher per project. The library already has a strong internal pattern for this in test/support/dispatcher.ex (request_tool/3, assert_response/2, assert_notify/1, etc.) — it just isn't exposed.
A first-class test framework would make it as easy to test an MCP router as it is to test a Phoenix LiveView, without forcing users to know whether a given handler returns {:reply, _, _} synchronously or {:noreply, _} and calls Session.respond/2 from a Task.
Design goals
- No
use macro. Plain modules to import, so users compose with whatever ExUnit.Case template they already have (MyApp.DataCase, MyApp.ConnCase, etc.). No inherited setup, no opinions about async: true.
- Final-response-only assertions. Modeled on
Phoenix.LiveViewTest.render_async/2 and Phoenix.ChannelTest.assert_reply/4: dispatch helpers block until the system has settled and return the final result. Tests assert on the result, not on which path produced it.
Phoenix.PubSub stays optional. The library declares phoenix_pubsub as optional: true (mix.exs:45). The test framework must work without it; PubSub-dependent features (Phantom.Tracker, log/progress/list-changed notifications, resources/subscribe) are opt-in via Phantom.Test.start(pubsub: MyApp.PubSub). Without :pubsub, start/1 skips Phantom.Tracker startup and the subscribe-style assertions raise a clear "requires phoenix_pubsub" error.
- End-to-end escape hatch. Keep a thin
Phantom.Test.Conn module (essentially a generalized TestDispatcher) for users who want to verify CORS, origin validation, or SSE framing through Phantom.Plug.
Proposed surface
Phantom.Test
start/1 — registers Phantom.Cache.register/1 for the given router, sets up a notification listener that routes Session.respond/2, Session.notify_progress/3, Session.elicit/2, and ClientLogger.log/3 back to the calling test process. Optionally starts Phantom.Tracker if :pubsub is provided.
build_session/2 — constructs a Phantom.Session without going through connect/2; accepts assigns:, allowed_tools:, allowed_resource_templates:, allowed_prompts:.
build_request/2 — constructs a Phantom.Request for callers that need it directly.
- Blocking dispatchers (each returns the final response, accepts
timeout:):
call_tool/3..4
read_resource/3..4
get_prompt/3..4
complete_prompt/4..5
complete_resource/4..5
list_resources/2..3
expect_elicit/1, expect_elicit_url/1 — register a responder for handlers that block on Session.elicit/2 so the call can settle.
flush_notifications/0 — drain any pending progress/log messages from the test mailbox.
Phantom.Test.Assertions
Final-response matchers (operate on the value returned by a dispatcher):
assert_tool_text/2, assert_tool_error/2, assert_tool_image/2, assert_tool_audio/2
assert_tool_resource_link/2, assert_tool_embedded_resource/2
assert_resource_text/2, assert_resource_blob/2
assert_prompt_message/2
assert_jsonrpc_error/2
assert_elicitation_required/2
Side-channel matchers (drain mailbox, don't gate the dispatch):
assert_progress_seen/1, refute_progress_seen/1
assert_client_log_seen/1, refute_client_log_seen/1
Phantom.Test.Conn (optional, transport-level)
Generalized form of the existing test/support/dispatcher.ex: request_tool/3, request_resource_read/2, assert_response/2, assert_sse_connected/0, etc. For users who care about Plug behavior (CORS, origin validation, SSE framing).
Examples
defmodule MyApp.MCP.RouterTest do
use ExUnit.Case, async: true
import Phantom.Test
import Phantom.Test.Assertions
setup do
# PubSub is optional. Omit it for tests that don't need Tracker.
Phantom.Test.start(router: MyApp.MCP.Router)
{:ok, session: build_session(MyApp.MCP.Router, assigns: %{user: %User{id: 1}})}
end
test "sync tool", %{session: session} do
result = call_tool(session, :echo_tool, %{message: "hi"})
assert_tool_text(result, "hi")
end
test "async tool", %{session: session} do
result = call_tool(session, :async_echo_tool, %{message: "hi"}, timeout: 500)
assert_tool_text(result, "hi")
end
test "tool error", %{session: session} do
result = call_tool(session, :with_error_tool, %{})
assert_tool_error(result, "an error")
end
test "input-schema validation", %{session: session} do
result = call_tool(session, :create_question, %{study_id: "nope"})
assert_jsonrpc_error(result, code: -32602)
end
test "tool that requires elicitation", %{session: session} do
result = call_tool(session, :elicitation_required_tool, %{})
assert_elicitation_required(result, message: ~r/authenticate/)
end
test "tool that blocks on Session.elicit/2", %{session: session} do
expect_elicit(fn _req ->
{:ok, %{"action" => "accept", "content" => %{"name" => "Joe"}}}
end)
result = call_tool(session, :elicit_tool, %{})
assert_tool_text(result, ~r/Joe/)
end
test "resource read", %{session: session} do
result = read_resource(session, :study, id: 42)
assert_resource_text(result, ~r/^# Study/)
end
test "prompt completion", %{session: session} do
result = complete_prompt(session, :suggest_questions, "study_id", "1")
assert {:ok, %{values: ids}} = result
assert "1" in ids
end
end
When PubSub is wired up, side-channel assertions become available without changing the dispatch shape:
defmodule MyApp.MCP.LongRunningTest do
use ExUnit.Case, async: false
import Phantom.Test
import Phantom.Test.Assertions
setup do
start_supervised!({Phoenix.PubSub, name: MyApp.PubSub})
Phantom.Test.start(router: MyApp.MCP.Router, pubsub: MyApp.PubSub)
{:ok, session: build_session(MyApp.MCP.Router)}
end
test "progress notifications fire mid-call", %{session: session} do
result = call_tool(session, :really_long_async_tool, %{message: "hi"}, timeout: 1_000)
assert_tool_text(result, "hi")
assert_progress_seen(steps: 4)
end
test "client log emitted", %{session: session} do
call_tool(session, :client_log_tool, %{message: "hello"})
assert_client_log_seen(level: :info, data: %{message: "hello"})
end
end
Tradeoffs
- Blocking inside
call_tool/3 means a missing/slow response surfaces as a Phantom.Test.TimeoutError raised at the call site (better error locality than a separate assert_response/1 two lines later), at the cost of a default per-call timeout (~100ms) that handlers exceeding it must override explicitly.
- Final-response model means side-channel messages (progress, log) need explicit drain helpers because they're no longer the natural return value. This matches how
Phoenix.LiveViewTest treats Phoenix.PubSub broadcasts triggered during a render.
- No
use macro means a tiny bit more boilerplate per test module (two imports and a setup) in exchange for not boxing users into a case template hierarchy.
- PubSub-optional means
Phantom.Test ships in two tiers: a core that works without phoenix_pubsub (covers tools/call, resources/read, prompts/get, completion/complete, validation errors, elicitation), and PubSub-gated helpers (assert_progress_seen/1, assert_client_log_seen/1, anything that exercises Phantom.Tracker) that raise a clear error if invoked without pubsub: having been passed to start/1.
Implementation notes
- Most of
Phantom.Test.Conn already exists in test/support/dispatcher.ex — generalizing it (router-agnostic, public docs, configurable PubSub) is mechanical.
- The blocking dispatchers can be implemented as: invoke the handler in a linked task that posts the resolved payload to a known message tag, then
assert_receive in the calling process. This handles {:reply, _, _}, {:noreply, _} + Session.respond/2, and {:elicitation_required, _} uniformly.
expect_elicit/1 registers a responder pid that intercepts the message Session.elicit/2 would normally send to the connection process.
- Content matchers should accept either the unwrapped struct or be composable with the dispatch return value, so both
result = call_tool(...); assert_tool_text(result, "x") and call_tool(...) |> assert_tool_text("x") read naturally.
Happy to prototype Phantom.Test.start/1 + call_tool/3 + expect_elicit/1 + assert_tool_text/2 + assert_tool_error/2 against the existing Test.MCP.Router as a proof of concept if there's interest.
Motivation
Today, users testing their MCP routers either reach for
Phantom.Plug+Plug.Test(which exercises the full HTTP/SSE transport and is heavy for unit-level assertions) or roll their own dispatcher per project. The library already has a strong internal pattern for this intest/support/dispatcher.ex(request_tool/3,assert_response/2,assert_notify/1, etc.) — it just isn't exposed.A first-class test framework would make it as easy to test an MCP router as it is to test a Phoenix LiveView, without forcing users to know whether a given handler returns
{:reply, _, _}synchronously or{:noreply, _}and callsSession.respond/2from aTask.Design goals
usemacro. Plain modules toimport, so users compose with whateverExUnit.Casetemplate they already have (MyApp.DataCase,MyApp.ConnCase, etc.). No inheritedsetup, no opinions aboutasync: true.Phoenix.LiveViewTest.render_async/2andPhoenix.ChannelTest.assert_reply/4: dispatch helpers block until the system has settled and return the final result. Tests assert on the result, not on which path produced it.Phoenix.PubSubstays optional. The library declaresphoenix_pubsubasoptional: true(mix.exs:45). The test framework must work without it; PubSub-dependent features (Phantom.Tracker, log/progress/list-changed notifications,resources/subscribe) are opt-in viaPhantom.Test.start(pubsub: MyApp.PubSub). Without:pubsub,start/1skipsPhantom.Trackerstartup and the subscribe-style assertions raise a clear "requires phoenix_pubsub" error.Phantom.Test.Connmodule (essentially a generalizedTestDispatcher) for users who want to verify CORS, origin validation, or SSE framing throughPhantom.Plug.Proposed surface
Phantom.Teststart/1— registersPhantom.Cache.register/1for the given router, sets up a notification listener that routesSession.respond/2,Session.notify_progress/3,Session.elicit/2, andClientLogger.log/3back to the calling test process. Optionally startsPhantom.Trackerif:pubsubis provided.build_session/2— constructs aPhantom.Sessionwithout going throughconnect/2; acceptsassigns:,allowed_tools:,allowed_resource_templates:,allowed_prompts:.build_request/2— constructs aPhantom.Requestfor callers that need it directly.timeout:):call_tool/3..4read_resource/3..4get_prompt/3..4complete_prompt/4..5complete_resource/4..5list_resources/2..3expect_elicit/1,expect_elicit_url/1— register a responder for handlers that block onSession.elicit/2so the call can settle.flush_notifications/0— drain any pending progress/log messages from the test mailbox.Phantom.Test.AssertionsFinal-response matchers (operate on the value returned by a dispatcher):
assert_tool_text/2,assert_tool_error/2,assert_tool_image/2,assert_tool_audio/2assert_tool_resource_link/2,assert_tool_embedded_resource/2assert_resource_text/2,assert_resource_blob/2assert_prompt_message/2assert_jsonrpc_error/2assert_elicitation_required/2Side-channel matchers (drain mailbox, don't gate the dispatch):
assert_progress_seen/1,refute_progress_seen/1assert_client_log_seen/1,refute_client_log_seen/1Phantom.Test.Conn(optional, transport-level)Generalized form of the existing
test/support/dispatcher.ex:request_tool/3,request_resource_read/2,assert_response/2,assert_sse_connected/0, etc. For users who care about Plug behavior (CORS, origin validation, SSE framing).Examples
When PubSub is wired up, side-channel assertions become available without changing the dispatch shape:
Tradeoffs
call_tool/3means a missing/slow response surfaces as aPhantom.Test.TimeoutErrorraised at the call site (better error locality than a separateassert_response/1two lines later), at the cost of a default per-call timeout (~100ms) that handlers exceeding it must override explicitly.Phoenix.LiveViewTesttreatsPhoenix.PubSubbroadcasts triggered during a render.usemacro means a tiny bit more boilerplate per test module (twoimports and asetup) in exchange for not boxing users into a case template hierarchy.Phantom.Testships in two tiers: a core that works withoutphoenix_pubsub(coverstools/call,resources/read,prompts/get,completion/complete, validation errors, elicitation), and PubSub-gated helpers (assert_progress_seen/1,assert_client_log_seen/1, anything that exercisesPhantom.Tracker) that raise a clear error if invoked withoutpubsub:having been passed tostart/1.Implementation notes
Phantom.Test.Connalready exists intest/support/dispatcher.ex— generalizing it (router-agnostic, public docs, configurable PubSub) is mechanical.assert_receivein the calling process. This handles{:reply, _, _},{:noreply, _}+Session.respond/2, and{:elicitation_required, _}uniformly.expect_elicit/1registers a responder pid that intercepts the messageSession.elicit/2would normally send to the connection process.result = call_tool(...); assert_tool_text(result, "x")andcall_tool(...) |> assert_tool_text("x")read naturally.Happy to prototype
Phantom.Test.start/1+call_tool/3+expect_elicit/1+assert_tool_text/2+assert_tool_error/2against the existingTest.MCP.Routeras a proof of concept if there's interest.