diff --git a/README.md b/README.md index 166fc52fa..aeac86be8 100644 --- a/README.md +++ b/README.md @@ -1235,7 +1235,7 @@ Note that `uv run mcp run` or `uv run mcp dev` only supports server using FastMC ### Streamable HTTP Transport -> **Note**: Streamable HTTP transport is the recommended transport for production deployments. Use `stateless_http=True` and `json_response=True` for optimal scalability. +> **Note**: Streamable HTTP transport is the recommended transport for production deployments. For serverless and load-balanced environments, consider using `stateless_http=True` and `json_response=True`. See [Understanding Stateless Mode](#understanding-stateless-mode) for guidance on choosing between stateful and stateless operation. ```python @@ -1347,6 +1347,151 @@ The streamable HTTP transport supports: - JSON or SSE response formats - Better scalability for multi-node deployments +#### Understanding Stateless Mode + +The Streamable HTTP transport can operate in two modes: **stateful** (default) and **stateless**. Understanding the difference is important for choosing the right deployment model. + +##### What "Stateless" Means + +In **stateless mode** (`stateless_http=True`), each HTTP request creates a completely independent MCP session that exists only for the duration of that single request: + +- **No session tracking**: No `Mcp-Session-Id` header is used or required +- **Per-request lifecycle**: Each request initializes a fresh server instance, processes the request, and terminates +- **No state persistence**: No information is retained between requests +- **No event store**: Resumability features are disabled + +This is fundamentally different from **stateful mode** (default), where: + +- A session persists across multiple requests +- The `Mcp-Session-Id` header links requests to an existing session +- Server state (e.g., subscriptions, context) is maintained between calls +- Event stores can provide resumability if the connection drops + +##### MCP Features Impacted by Stateless Mode + +When running in stateless mode, certain MCP features are unavailable or behave differently: + +| Feature | Stateful Mode | Stateless Mode | +|---------|---------------|----------------| +| **Server Notifications** | ✅ Supported | ❌ Not available1 | +| **Resource Subscriptions** | ✅ Supported | ❌ Not available1 | +| **Multi-turn Context** | ✅ Maintained | ❌ Lost between requests2 | +| **Long-running Tools** | ✅ Can use notifications for progress | ⚠️ Must complete within request timeout | +| **Event Resumability** | ✅ With event store | ❌ Not applicable | +| **Tools/Resources/Prompts** | ✅ Fully supported | ✅ Fully supported | +| **Concurrent Requests** | ⚠️ One per session | ✅ Unlimited3 | + +1 Server-initiated notifications require a persistent connection to deliver updates +2 Each request starts fresh; client must provide all necessary context +3 Each request is independent, enabling horizontal scaling + +##### When to Use Stateless Mode + +**Stateless mode is ideal for:** + +- **Serverless Deployments**: AWS Lambda, Cloud Functions, or similar FaaS platforms where instances are ephemeral +- **Load-Balanced Multi-Node**: Deploying across multiple servers without sticky sessions +- **Stateless APIs**: Services where each request is self-contained (e.g., data lookups, calculations) +- **High Concurrency**: Scenarios requiring many simultaneous independent operations +- **Simplified Operations**: Avoiding session management complexity + +**Use stateful mode when:** + +- Server needs to push notifications to clients (e.g., progress updates, real-time events) +- Resources require subscriptions with change notifications +- Tools maintain conversation state across multiple turns +- Long-running operations need to report progress asynchronously +- Connection resumability is required + +##### Example: Stateless Configuration + +```python +from mcp.server.fastmcp import FastMCP + +# Stateless server - each request is independent +mcp = FastMCP( + "StatelessAPI", + stateless_http=True, # Enable stateless mode + json_response=True, # Recommended for stateless +) + +@mcp.tool() +def calculate(a: int, b: int, operation: str) -> int: + """Stateless calculation tool.""" + operations = {"add": a + b, "multiply": a * b} + return operations[operation] + +# Each request will: +# 1. Initialize a new server instance +# 2. Process the calculate tool call +# 3. Return the result +# 4. Terminate the instance +``` + +##### Deployment Patterns + +###### Pattern 1: Pure Stateless (Recommended) + +```python +# Best for: Serverless, auto-scaling environments +mcp = FastMCP("MyServer", stateless_http=True, json_response=True) + +# Clients can connect to any instance +# Load balancer doesn't need session affinity +``` + +###### Pattern 2: Stateful with Sticky Sessions + +```python +# Best for: When you need notifications but have load balancing +mcp = FastMCP("MyServer", stateless_http=False) # Default + +# Load balancer must use sticky sessions based on Mcp-Session-Id header +# ALB/NGINX can route by header value to maintain session affinity +``` + +###### Pattern 3: Hybrid Approach + +```python +# Deploy both modes side-by-side +stateless_mcp = FastMCP("StatelessAPI", stateless_http=True) +stateful_mcp = FastMCP("StatefulAPI", stateless_http=False) + +app = Starlette(routes=[ + Mount("/api/stateless", app=stateless_mcp.streamable_http_app()), + Mount("/api/stateful", app=stateful_mcp.streamable_http_app()), +]) +``` + +##### Technical Details + +**Session Lifecycle in Stateless Mode:** + +1. Client sends HTTP POST request to `/mcp` endpoint +2. Server creates ephemeral `StreamableHTTPServerTransport` (no session ID) +3. Server initializes fresh `Server` instance with `stateless=True` flag +4. Request is processed using the ephemeral transport +5. Response is sent back to client +6. Transport and server instance are immediately terminated + +**Performance Characteristics:** + +- **Initialization overhead**: Each request pays the cost of server initialization +- **Memory efficiency**: No long-lived sessions consuming memory +- **Scalability**: Excellent horizontal scaling with no state synchronization +- **Latency**: Slightly higher per-request latency due to initialization + +**Stateless Mode Checklist:** + +When designing for stateless mode, ensure: + +- ✅ Tools are self-contained and don't rely on previous calls +- ✅ All required context is passed in each request +- ✅ Tools complete synchronously within request timeout +- ✅ No server notifications or subscriptions are needed +- ✅ Client handles any necessary state management +- ✅ Operations are idempotent where possible + #### CORS Configuration for Browser-Based Clients If you'd like your server to be accessible by browser-based MCP clients, you'll need to configure CORS headers. The `Mcp-Session-Id` header must be exposed for browser clients to access it: