Proposal: ship a `Phantom.Test` framework for asserting tools, prompts, and resources

## Motivation

Today, users testing their MCP routers either reach for `Phantom.Plug` + `Plug.Test` (which exercises the full HTTP/SSE transport and is heavy for unit-level assertions) or roll their own dispatcher per project. The library already has a strong internal pattern for this in `test/support/dispatcher.ex` (`request_tool/3`, `assert_response/2`, `assert_notify/1`, etc.) — it just isn't exposed.

A first-class test framework would make it as easy to test an MCP router as it is to test a Phoenix LiveView, without forcing users to know whether a given handler returns `{:reply, _, _}` synchronously or `{:noreply, _}` and calls `Session.respond/2` from a `Task`.

## Design goals

1. **No `use` macro.** Plain modules to `import`, so users compose with whatever `ExUnit.Case` template they already have (`MyApp.DataCase`, `MyApp.ConnCase`, etc.). No inherited `setup`, no opinions about `async: true`.
2. **Final-response-only assertions.** Modeled on `Phoenix.LiveViewTest.render_async/2` and `Phoenix.ChannelTest.assert_reply/4`: dispatch helpers block until the system has settled and return the final result. Tests assert on the result, not on which path produced it.
3. **`Phoenix.PubSub` stays optional.** The library declares `phoenix_pubsub` as `optional: true` (mix.exs:45). The test framework must work without it; PubSub-dependent features (`Phantom.Tracker`, log/progress/list-changed notifications, `resources/subscribe`) are opt-in via `Phantom.Test.start(pubsub: MyApp.PubSub)`. Without `:pubsub`, `start/1` skips `Phantom.Tracker` startup and the subscribe-style assertions raise a clear "requires phoenix_pubsub" error.
4. **End-to-end escape hatch.** Keep a thin `Phantom.Test.Conn` module (essentially a generalized `TestDispatcher`) for users who want to verify CORS, origin validation, or SSE framing through `Phantom.Plug`.

## Proposed surface

### `Phantom.Test`

- `start/1` — registers `Phantom.Cache.register/1` for the given router, sets up a notification listener that routes `Session.respond/2`, `Session.notify_progress/3`, `Session.elicit/2`, and `ClientLogger.log/3` back to the calling test process. Optionally starts `Phantom.Tracker` if `:pubsub` is provided.
- `build_session/2` — constructs a `Phantom.Session` without going through `connect/2`; accepts `assigns:`, `allowed_tools:`, `allowed_resource_templates:`, `allowed_prompts:`.
- `build_request/2` — constructs a `Phantom.Request` for callers that need it directly.
- **Blocking dispatchers** (each returns the final response, accepts `timeout:`):
  - `call_tool/3..4`
  - `read_resource/3..4`
  - `get_prompt/3..4`
  - `complete_prompt/4..5`
  - `complete_resource/4..5`
  - `list_resources/2..3`
- `expect_elicit/1`, `expect_elicit_url/1` — register a responder for handlers that block on `Session.elicit/2` so the call can settle.
- `flush_notifications/0` — drain any pending progress/log messages from the test mailbox.

### `Phantom.Test.Assertions`

Final-response matchers (operate on the value returned by a dispatcher):

- `assert_tool_text/2`, `assert_tool_error/2`, `assert_tool_image/2`, `assert_tool_audio/2`
- `assert_tool_resource_link/2`, `assert_tool_embedded_resource/2`
- `assert_resource_text/2`, `assert_resource_blob/2`
- `assert_prompt_message/2`
- `assert_jsonrpc_error/2`
- `assert_elicitation_required/2`

Side-channel matchers (drain mailbox, don't gate the dispatch):

- `assert_progress_seen/1`, `refute_progress_seen/1`
- `assert_client_log_seen/1`, `refute_client_log_seen/1`

### `Phantom.Test.Conn` (optional, transport-level)

Generalized form of the existing `test/support/dispatcher.ex`: `request_tool/3`, `request_resource_read/2`, `assert_response/2`, `assert_sse_connected/0`, etc. For users who care about Plug behavior (CORS, origin validation, SSE framing).

## Examples

```elixir
defmodule MyApp.MCP.RouterTest do
  use ExUnit.Case, async: true
  import Phantom.Test
  import Phantom.Test.Assertions

  setup do
    # PubSub is optional. Omit it for tests that don't need Tracker.
    Phantom.Test.start(router: MyApp.MCP.Router)
    {:ok, session: build_session(MyApp.MCP.Router, assigns: %{user: %User{id: 1}})}
  end

  test "sync tool", %{session: session} do
    result = call_tool(session, :echo_tool, %{message: "hi"})
    assert_tool_text(result, "hi")
  end

  test "async tool", %{session: session} do
    result = call_tool(session, :async_echo_tool, %{message: "hi"}, timeout: 500)
    assert_tool_text(result, "hi")
  end

  test "tool error", %{session: session} do
    result = call_tool(session, :with_error_tool, %{})
    assert_tool_error(result, "an error")
  end

  test "input-schema validation", %{session: session} do
    result = call_tool(session, :create_question, %{study_id: "nope"})
    assert_jsonrpc_error(result, code: -32602)
  end

  test "tool that requires elicitation", %{session: session} do
    result = call_tool(session, :elicitation_required_tool, %{})
    assert_elicitation_required(result, message: ~r/authenticate/)
  end

  test "tool that blocks on Session.elicit/2", %{session: session} do
    expect_elicit(fn _req ->
      {:ok, %{"action" => "accept", "content" => %{"name" => "Joe"}}}
    end)

    result = call_tool(session, :elicit_tool, %{})
    assert_tool_text(result, ~r/Joe/)
  end

  test "resource read", %{session: session} do
    result = read_resource(session, :study, id: 42)
    assert_resource_text(result, ~r/^# Study/)
  end

  test "prompt completion", %{session: session} do
    result = complete_prompt(session, :suggest_questions, "study_id", "1")
    assert {:ok, %{values: ids}} = result
    assert "1" in ids
  end
end
```

When PubSub is wired up, side-channel assertions become available without changing the dispatch shape:

```elixir
defmodule MyApp.MCP.LongRunningTest do
  use ExUnit.Case, async: false
  import Phantom.Test
  import Phantom.Test.Assertions

  setup do
    start_supervised!({Phoenix.PubSub, name: MyApp.PubSub})
    Phantom.Test.start(router: MyApp.MCP.Router, pubsub: MyApp.PubSub)
    {:ok, session: build_session(MyApp.MCP.Router)}
  end

  test "progress notifications fire mid-call", %{session: session} do
    result = call_tool(session, :really_long_async_tool, %{message: "hi"}, timeout: 1_000)
    assert_tool_text(result, "hi")
    assert_progress_seen(steps: 4)
  end

  test "client log emitted", %{session: session} do
    call_tool(session, :client_log_tool, %{message: "hello"})
    assert_client_log_seen(level: :info, data: %{message: "hello"})
  end
end
```

## Tradeoffs

- **Blocking inside `call_tool/3`** means a missing/slow response surfaces as a `Phantom.Test.TimeoutError` raised at the call site (better error locality than a separate `assert_response/1` two lines later), at the cost of a default per-call timeout (~100ms) that handlers exceeding it must override explicitly.
- **Final-response model** means side-channel messages (progress, log) need explicit drain helpers because they're no longer the natural return value. This matches how `Phoenix.LiveViewTest` treats `Phoenix.PubSub` broadcasts triggered during a render.
- **No `use` macro** means a tiny bit more boilerplate per test module (two `import`s and a `setup`) in exchange for not boxing users into a case template hierarchy.
- **PubSub-optional** means `Phantom.Test` ships in two tiers: a core that works without `phoenix_pubsub` (covers `tools/call`, `resources/read`, `prompts/get`, `completion/complete`, validation errors, elicitation), and PubSub-gated helpers (`assert_progress_seen/1`, `assert_client_log_seen/1`, anything that exercises `Phantom.Tracker`) that raise a clear error if invoked without `pubsub:` having been passed to `start/1`.

## Implementation notes

- Most of `Phantom.Test.Conn` already exists in `test/support/dispatcher.ex` — generalizing it (router-agnostic, public docs, configurable PubSub) is mechanical.
- The blocking dispatchers can be implemented as: invoke the handler in a linked task that posts the resolved payload to a known message tag, then `assert_receive` in the calling process. This handles `{:reply, _, _}`, `{:noreply, _}` + `Session.respond/2`, and `{:elicitation_required, _}` uniformly.
- `expect_elicit/1` registers a responder pid that intercepts the message `Session.elicit/2` would normally send to the connection process.
- Content matchers should accept either the unwrapped struct or be composable with the dispatch return value, so both `result = call_tool(...); assert_tool_text(result, "x")` and `call_tool(...) |> assert_tool_text("x")` read naturally.

Happy to prototype `Phantom.Test.start/1` + `call_tool/3` + `expect_elicit/1` + `assert_tool_text/2` + `assert_tool_error/2` against the existing `Test.MCP.Router` as a proof of concept if there's interest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: ship a `Phantom.Test` framework for asserting tools, prompts, and resources #15

Motivation

Design goals

Proposed surface

`Phantom.Test`

`Phantom.Test.Assertions`

`Phantom.Test.Conn` (optional, transport-level)

Examples

Tradeoffs

Implementation notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Proposal: ship a Phantom.Test framework for asserting tools, prompts, and resources #15

Description

Motivation

Design goals

Proposed surface

Phantom.Test

Phantom.Test.Assertions

Phantom.Test.Conn (optional, transport-level)

Examples

Tradeoffs

Implementation notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Proposal: ship a `Phantom.Test` framework for asserting tools, prompts, and resources #15

`Phantom.Test`

`Phantom.Test.Assertions`

`Phantom.Test.Conn` (optional, transport-level)