Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 146 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1235,7 +1235,7 @@ Note that `uv run mcp run` or `uv run mcp dev` only supports server using FastMC

### Streamable HTTP Transport

> **Note**: Streamable HTTP transport is the recommended transport for production deployments. Use `stateless_http=True` and `json_response=True` for optimal scalability.
> **Note**: Streamable HTTP transport is the recommended transport for production deployments. For serverless and load-balanced environments, consider using `stateless_http=True` and `json_response=True`. See [Understanding Stateless Mode](#understanding-stateless-mode) for guidance on choosing between stateful and stateless operation.

<!-- snippet-source examples/snippets/servers/streamable_config.py -->
```python
Expand Down Expand Up @@ -1347,6 +1347,151 @@ The streamable HTTP transport supports:
- JSON or SSE response formats
- Better scalability for multi-node deployments

#### Understanding Stateless Mode

The Streamable HTTP transport can operate in two modes: **stateful** (default) and **stateless**. Understanding the difference is important for choosing the right deployment model.

##### What "Stateless" Means

In **stateless mode** (`stateless_http=True`), each HTTP request creates a completely independent MCP session that exists only for the duration of that single request:

- **No session tracking**: No `Mcp-Session-Id` header is used or required
- **Per-request lifecycle**: Each request initializes a fresh server instance, processes the request, and terminates
- **No state persistence**: No information is retained between requests
- **No event store**: Resumability features are disabled

This is fundamentally different from **stateful mode** (default), where:

- A session persists across multiple requests
- The `Mcp-Session-Id` header links requests to an existing session
- Server state (e.g., subscriptions, context) is maintained between calls
- Event stores can provide resumability if the connection drops

##### MCP Features Impacted by Stateless Mode

When running in stateless mode, certain MCP features are unavailable or behave differently:

| Feature | Stateful Mode | Stateless Mode |
|---------|---------------|----------------|
| **Server Notifications** | ✅ Supported | ❌ Not available<sup>1</sup> |
| **Resource Subscriptions** | ✅ Supported | ❌ Not available<sup>1</sup> |
| **Multi-turn Context** | ✅ Maintained | ❌ Lost between requests<sup>2</sup> |
| **Long-running Tools** | ✅ Can use notifications for progress | ⚠️ Must complete within request timeout |
| **Event Resumability** | ✅ With event store | ❌ Not applicable |
| **Tools/Resources/Prompts** | ✅ Fully supported | ✅ Fully supported |
| **Concurrent Requests** | ⚠️ One per session | ✅ Unlimited<sup>3</sup> |

<sup>1</sup> Server-initiated notifications require a persistent connection to deliver updates
<sup>2</sup> Each request starts fresh; client must provide all necessary context
<sup>3</sup> Each request is independent, enabling horizontal scaling

##### When to Use Stateless Mode

**Stateless mode is ideal for:**

- **Serverless Deployments**: AWS Lambda, Cloud Functions, or similar FaaS platforms where instances are ephemeral
- **Load-Balanced Multi-Node**: Deploying across multiple servers without sticky sessions
- **Stateless APIs**: Services where each request is self-contained (e.g., data lookups, calculations)
- **High Concurrency**: Scenarios requiring many simultaneous independent operations
- **Simplified Operations**: Avoiding session management complexity

**Use stateful mode when:**

- Server needs to push notifications to clients (e.g., progress updates, real-time events)
- Resources require subscriptions with change notifications
- Tools maintain conversation state across multiple turns
- Long-running operations need to report progress asynchronously
- Connection resumability is required

##### Example: Stateless Configuration

```python
from mcp.server.fastmcp import FastMCP

# Stateless server - each request is independent
mcp = FastMCP(
"StatelessAPI",
stateless_http=True, # Enable stateless mode
json_response=True, # Recommended for stateless
)

@mcp.tool()
def calculate(a: int, b: int, operation: str) -> int:
"""Stateless calculation tool."""
operations = {"add": a + b, "multiply": a * b}
return operations[operation]

# Each request will:
# 1. Initialize a new server instance
# 2. Process the calculate tool call
# 3. Return the result
# 4. Terminate the instance
```

##### Deployment Patterns

###### Pattern 1: Pure Stateless (Recommended)

```python
# Best for: Serverless, auto-scaling environments
mcp = FastMCP("MyServer", stateless_http=True, json_response=True)

# Clients can connect to any instance
# Load balancer doesn't need session affinity
```

###### Pattern 2: Stateful with Sticky Sessions

```python
# Best for: When you need notifications but have load balancing
mcp = FastMCP("MyServer", stateless_http=False) # Default

# Load balancer must use sticky sessions based on Mcp-Session-Id header
# ALB/NGINX can route by header value to maintain session affinity
```

###### Pattern 3: Hybrid Approach

```python
# Deploy both modes side-by-side
stateless_mcp = FastMCP("StatelessAPI", stateless_http=True)
stateful_mcp = FastMCP("StatefulAPI", stateless_http=False)

app = Starlette(routes=[
Mount("/api/stateless", app=stateless_mcp.streamable_http_app()),
Mount("/api/stateful", app=stateful_mcp.streamable_http_app()),
])
```

##### Technical Details

**Session Lifecycle in Stateless Mode:**

1. Client sends HTTP POST request to `/mcp` endpoint
2. Server creates ephemeral `StreamableHTTPServerTransport` (no session ID)
3. Server initializes fresh `Server` instance with `stateless=True` flag
4. Request is processed using the ephemeral transport
5. Response is sent back to client
6. Transport and server instance are immediately terminated

**Performance Characteristics:**

- **Initialization overhead**: Each request pays the cost of server initialization
- **Memory efficiency**: No long-lived sessions consuming memory
- **Scalability**: Excellent horizontal scaling with no state synchronization
- **Latency**: Slightly higher per-request latency due to initialization

**Stateless Mode Checklist:**

When designing for stateless mode, ensure:

- ✅ Tools are self-contained and don't rely on previous calls
- ✅ All required context is passed in each request
- ✅ Tools complete synchronously within request timeout
- ✅ No server notifications or subscriptions are needed
- ✅ Client handles any necessary state management
- ✅ Operations are idempotent where possible

#### CORS Configuration for Browser-Based Clients

If you'd like your server to be accessible by browser-based MCP clients, you'll need to configure CORS headers. The `Mcp-Session-Id` header must be exposed for browser clients to access it:
Expand Down