diff --git a/README.md b/README.md
index 07bc768..c267fa7 100644
--- a/README.md
+++ b/README.md
@@ -92,6 +92,12 @@ Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacem
 
 ## Quick start
 
+> **Runnable quickstart:** `examples/quickstart/` — `docker compose up -d` and run your first enforce/reject test in under a minute. See [`examples/quickstart/README.md`](examples/quickstart/README.md).
+>
+> **Recipes:** `examples/recipes/` — deployable team budgets, runaway agent guard, and provider failover examples.
+>
+> **Sample artifacts:** `fixtures/` — canonical request/response fixtures for enforce, reject (TPM, TPD, prompt-too-large), and provider-native error bodies (OpenAI, Anthropic, Gemini).
+
 ### 1. Create a policy
 
 ```bash
@@ -304,7 +310,7 @@ Policies are versioned JSON — commit them to Git, review changes in PRs, roll
 
 **No external datastore.** All enforcement state lives in in-process shared memory (`ngx.shared.dict`). No Redis, no Postgres, no network round-trips in the decision path.
 
-> Reproduce: `git clone https://github.com/fairvisor/benchmark && cd benchmark && ./run-all.sh`
+> Reproduce: see [fairvisor/benchmark](https://github.com/fairvisor/benchmark) — the canonical benchmark source of truth for Fairvisor Edge performance numbers.
 
 ## Deployment
 
@@ -348,14 +354,16 @@ If the SaaS is unreachable, the edge keeps enforcing with the last-known policy
 ## Project layout
 
 ```
-src/fairvisor/    runtime modules (OpenResty/LuaJIT)
-cli/              command-line tooling
-spec/             unit and integration tests (busted)
-tests/e2e/        Docker-based E2E tests (pytest)
-examples/         sample policy bundles
-helm/             Helm chart
-docker/           Docker artifacts
-docs/             reference documentation
+src/fairvisor/           runtime modules (OpenResty/LuaJIT)
+cli/                     command-line tooling
+spec/                    unit and integration tests (busted)
+tests/e2e/               Docker-based E2E tests (pytest)
+examples/quickstart/     runnable quickstart (docker compose up -d)
+examples/recipes/        deployable policy recipes (team budgets, agent guard, failover)
+fixtures/                canonical request/response sample artifacts
+helm/                    Helm chart
+docker/                  Docker artifacts
+docs/                    reference documentation
 ```
 
 ## Contributing
@@ -376,3 +384,4 @@ pytest tests/e2e -v  # E2E (requires Docker)
 ---
 
 **Docs:** [docs.fairvisor.com](https://docs.fairvisor.com/docs/) · **Website:** [fairvisor.com](https://fairvisor.com) · **Quickstart:** [5 minutes to enforcement](https://docs.fairvisor.com/docs/quickstart/)
+
diff --git a/examples/quickstart/README.md b/examples/quickstart/README.md
new file mode 100644
index 0000000..a23fc78
--- /dev/null
+++ b/examples/quickstart/README.md
@@ -0,0 +1,97 @@
+# Fairvisor Edge — Quickstart
+
+Go from `git clone` to working policy enforcement in one step.
+
+## Prerequisites
+
+- Docker with Compose V2 (`docker compose version`)
+- Port 8080 free on localhost
+
+## Start
+
+```bash
+docker compose up -d
+```
+
+Wait for the edge service to report healthy:
+
+```bash
+docker compose ps
+# edge should show "healthy"
+```
+
+## Verify enforcement
+
+**Allowed request** — should return `200`:
+
+```bash
+curl -s -X POST http://localhost:8080/openai/v1/chat/completions \
+  -H "Authorization: Bearer demo-client-jwt.demo-payload.demo-sig:sk-fake-key" \
+  -H "Content-Type: application/json" \
+  -d @../../fixtures/normal_request.json
+```
+
+Expected response matches `../../fixtures/allow_response.json`.
+
+**Over-limit request** — should return `429`:
+
+```bash
+curl -s -X POST http://localhost:8080/openai/v1/chat/completions \
+  -H "Authorization: Bearer demo-client-jwt.demo-payload.demo-sig:sk-fake-key" \
+  -H "Content-Type: application/json" \
+  -d @../../fixtures/over_limit_request.json
+```
+
+Expected response body matches `../../fixtures/reject_tpm_exceeded.json`.
+The response will also include:
+- `X-Fairvisor-Reason: tpm_exceeded`
+- `Retry-After: 60`
+- `RateLimit-Limit: 100`
+- `RateLimit-Remaining: 0`
+
+## Wrapper mode and auth
+
+This quickstart runs in `FAIRVISOR_MODE=wrapper`. The composite Bearer token format is:
+
+```
+Authorization: Bearer CLIENT_JWT:UPSTREAM_KEY
+```
+
+- `CLIENT_JWT` — a signed JWT identifying the calling client/tenant (used for policy enforcement)
+- `UPSTREAM_KEY` — the real upstream API key forwarded to the provider (e.g. `sk-...` for OpenAI)
+
+Fairvisor strips the composite header and injects the correct provider auth before forwarding. The upstream key is **never returned to the caller** — see `../../fixtures/allow_response.json` for proof (no `Authorization`, `x-api-key`, or `x-goog-api-key` headers in the response).
+
+## Provider-prefixed paths
+
+Wrapper mode routes by path prefix:
+
+| Path prefix | Upstream | Auth header |
+|---|---|---|
+| `/openai/v1/...` | `https://api.openai.com/v1/...` | `Authorization: Bearer UPSTREAM_KEY` |
+| `/anthropic/v1/...` | `https://api.anthropic.com/v1/...` | `x-api-key: UPSTREAM_KEY` |
+| `/gemini/v1beta/...` | `https://generativelanguage.googleapis.com/v1beta/...` | `x-goog-api-key: UPSTREAM_KEY` |
+
+## Anthropic example
+
+```bash
+curl -s -X POST http://localhost:8080/anthropic/v1/messages \
+  -H "Authorization: Bearer demo-client-jwt.demo-payload.demo-sig:sk-ant-fake-key" \
+  -H "Content-Type: application/json" \
+  -d @../../fixtures/anthropic_normal_request.json
+```
+
+A rejected Anthropic request returns an Anthropic-native error body — see `../../fixtures/reject_anthropic.json`.
+
+## Teardown
+
+```bash
+docker compose down
+```
+
+## Next steps
+
+- See `../recipes/` for team budgets, runaway agent guard, and provider failover scenarios
+- See `../../fixtures/` for all sample request/response artifacts
+- See [fairvisor/benchmark](https://github.com/fairvisor/benchmark) for performance benchmarks
+- See [docs/install/](../../docs/install/) for Kubernetes, VM, and SaaS deployment options
diff --git a/examples/quickstart/docker-compose.yml b/examples/quickstart/docker-compose.yml
new file mode 100644
index 0000000..0b5affa
--- /dev/null
+++ b/examples/quickstart/docker-compose.yml
@@ -0,0 +1,48 @@
+# Fairvisor Edge — Quickstart stack (standalone mode)
+#
+# Usage:
+#   docker compose up -d
+#   curl -s http://localhost:8080/readyz          # health check
+#   curl -s -X POST http://localhost:8080/openai/v1/chat/completions \
+#     -H "Authorization: Bearer demo-client-jwt.demo-payload.demo-sig:sk-fake-upstream-key" \
+#     -H "Content-Type: application/json" \
+#     -d @fixtures/normal_request.json            # expect 200
+#   curl -s -X POST http://localhost:8080/openai/v1/chat/completions \
+#     -H "Authorization: Bearer demo-client-jwt.demo-payload.demo-sig:sk-fake-upstream-key" \
+#     -H "Content-Type: application/json" \
+#     -d @fixtures/over_limit_request.json        # expect 429
+#
+# This file is also the base for the e2e-smoke CI check.
+# CI extends it via tests/e2e/docker-compose.test.yml; do not diverge the
+# service name, port, or volume contract without updating CI as well.
+
+services:
+  edge:
+    image: ghcr.io/fairvisor/fairvisor-edge:latest
+    ports:
+      - "8080:8080"
+    environment:
+      FAIRVISOR_CONFIG_FILE: /etc/fairvisor/policy.json
+      FAIRVISOR_MODE: wrapper
+      FAIRVISOR_SHARED_DICT_SIZE: 32m
+      FAIRVISOR_LOG_LEVEL: info
+      FAIRVISOR_WORKER_PROCESSES: "1"
+    volumes:
+      - ./policy.json:/etc/fairvisor/policy.json:ro
+    healthcheck:
+      test: ["CMD", "curl", "-sf", "http://127.0.0.1:8080/readyz"]
+      interval: 2s
+      timeout: 2s
+      retries: 15
+      start_period: 5s
+
+  mock_llm:
+    image: nginx:1.27-alpine
+    volumes:
+      - ./mock-llm.conf:/etc/nginx/nginx.conf:ro
+    healthcheck:
+      test: ["CMD", "wget", "-q", "-O", "-", "http://127.0.0.1:80/"]
+      interval: 2s
+      timeout: 2s
+      retries: 10
+      start_period: 5s
diff --git a/examples/quickstart/mock-llm.conf b/examples/quickstart/mock-llm.conf
new file mode 100644
index 0000000..26603ab
--- /dev/null
+++ b/examples/quickstart/mock-llm.conf
@@ -0,0 +1,10 @@
+events {}
+http {
+  server {
+    listen 80;
+    location / {
+      default_type application/json;
+      return 200 '{"id":"chatcmpl-qs","object":"chat.completion","choices":[{"index":0,"message":{"role":"assistant","content":"Hello from the mock backend!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":8,"total_tokens":18}}';
+    }
+  }
+}
diff --git a/examples/quickstart/policy.json b/examples/quickstart/policy.json
new file mode 100644
index 0000000..f5520aa
--- /dev/null
+++ b/examples/quickstart/policy.json
@@ -0,0 +1,31 @@
+{
+  "bundle_version": 1,
+  "issued_at": "2026-01-01T00:00:00Z",
+  "expires_at": "2030-01-01T00:00:00Z",
+  "policies": [
+    {
+      "id": "quickstart-tpm-policy",
+      "spec": {
+        "selector": {
+          "pathPrefix": "/openai/",
+          "methods": ["POST"]
+        },
+        "mode": "enforce",
+        "rules": [
+          {
+            "name": "tpm-limit",
+            "limit_keys": ["jwt:sub"],
+            "algorithm": "token_bucket_llm",
+            "algorithm_config": {
+              "tokens_per_minute": 100,
+              "tokens_per_day": 1000,
+              "burst_tokens": 100,
+              "default_max_completion_tokens": 50
+            }
+          }
+        ]
+      }
+    }
+  ],
+  "kill_switches": []
+}
diff --git a/examples/recipes/provider-failover/README.md b/examples/recipes/provider-failover/README.md
new file mode 100644
index 0000000..520226c
--- /dev/null
+++ b/examples/recipes/provider-failover/README.md
@@ -0,0 +1,52 @@
+# Recipe: Provider Failover / Edge Control
+
+Run two provider paths under independent policy budgets. When the primary
+provider (OpenAI) trips a circuit breaker, your client-side router can
+switch to the fallback (Anthropic) — both paths enforced by the same edge.
+
+## How it works
+
+- `/openai/v1/...` — enforced by an OpenAI TPM limit + a spend-based circuit breaker
+- `/anthropic/v1/...` — enforced by an Anthropic TPM limit
+
+The circuit breaker on the OpenAI path auto-trips when cumulative spend
+exceeds the threshold in a 5-minute window, then auto-resets after 10 minutes.
+Your application can detect the 429 with `X-Fairvisor-Reason: circuit_breaker_open`
+and switch to the Anthropic path without any Fairvisor configuration change.
+
+## Deploy
+
+```bash
+cp policy.json /etc/fairvisor/policy.json
+```
+
+## Client-side failover pattern
+
+```python
+import httpx
+
+EDGE = "http://localhost:8080"
+AUTH = "Bearer my-client-jwt.payload.sig:sk-my-upstream-key"
+
+def chat(messages, provider="openai"):
+    resp = httpx.post(
+        f"{EDGE}/{provider}/v1/chat/completions",
+        headers={"Authorization": AUTH, "Content-Type": "application/json"},
+        json={"model": "gpt-4o", "messages": messages},
+    )
+    if resp.status_code == 429:
+        reason = resp.headers.get("X-Fairvisor-Reason", "")
+        if reason == "circuit_breaker_open" and provider == "openai":
+            return chat(messages, provider="anthropic")
+    resp.raise_for_status()
+    return resp.json()
+```
+
+## Auth note
+
+The composite `CLIENT_JWT:UPSTREAM_KEY` format is the same for all providers.
+Fairvisor injects the correct provider-native auth header:
+- OpenAI: `Authorization: Bearer UPSTREAM_KEY`
+- Anthropic: `x-api-key: UPSTREAM_KEY`
+
+The upstream key is stripped from responses — it never reaches your client.
diff --git a/examples/recipes/provider-failover/policy.json b/examples/recipes/provider-failover/policy.json
new file mode 100644
index 0000000..5974c77
--- /dev/null
+++ b/examples/recipes/provider-failover/policy.json
@@ -0,0 +1,62 @@
+{
+  "bundle_version": 1,
+  "issued_at": "2026-01-01T00:00:00Z",
+  "expires_at": "2030-01-01T00:00:00Z",
+  "policies": [
+    {
+      "id": "provider-failover-primary",
+      "spec": {
+        "selector": {
+          "pathPrefix": "/openai/",
+          "methods": ["POST"]
+        },
+        "mode": "enforce",
+        "rules": [
+          {
+            "name": "openai-tpm",
+            "limit_keys": ["jwt:org_id"],
+            "algorithm": "token_bucket_llm",
+            "algorithm_config": {
+              "tokens_per_minute": 200000,
+              "burst_tokens": 200000,
+              "default_max_completion_tokens": 2048
+            }
+          },
+          {
+            "name": "openai-circuit-breaker",
+            "limit_keys": ["jwt:org_id"],
+            "algorithm": "circuit_breaker",
+            "algorithm_config": {
+              "spend_window_seconds": 300,
+              "spend_threshold": 100000,
+              "cooldown_seconds": 600
+            }
+          }
+        ]
+      }
+    },
+    {
+      "id": "provider-failover-fallback",
+      "spec": {
+        "selector": {
+          "pathPrefix": "/anthropic/",
+          "methods": ["POST"]
+        },
+        "mode": "enforce",
+        "rules": [
+          {
+            "name": "anthropic-tpm",
+            "limit_keys": ["jwt:org_id"],
+            "algorithm": "token_bucket_llm",
+            "algorithm_config": {
+              "tokens_per_minute": 100000,
+              "burst_tokens": 100000,
+              "default_max_completion_tokens": 2048
+            }
+          }
+        ]
+      }
+    }
+  ],
+  "kill_switches": []
+}
diff --git a/examples/recipes/runaway-agent-guard/README.md b/examples/recipes/runaway-agent-guard/README.md
new file mode 100644
index 0000000..7b34491
--- /dev/null
+++ b/examples/recipes/runaway-agent-guard/README.md
@@ -0,0 +1,50 @@
+# Recipe: Runaway Agent Guard
+
+Stop runaway agentic workflows before they exhaust your token budget or
+billing limit.
+
+## Problem
+
+Autonomous agents (LangChain, AutoGPT, custom loops) can enter retry storms
+or infinite planning loops. Without enforcement, a single runaway agent
+can consume thousands of dollars of API budget in minutes.
+
+## How it works
+
+Two rules cooperate:
+
+1. **Loop detector** — counts requests per `agent_id` in a sliding window.
+   If the agent fires more than 30 requests in 60 seconds, it trips a
+   120-second cooldown. This catches tight retry loops.
+
+2. **TPM guard** — caps tokens per minute per agent. A burst-heavy agent
+   that passes the loop check still cannot drain the token pool.
+
+## Deploy
+
+```bash
+cp policy.json /etc/fairvisor/policy.json
+```
+
+## JWT shape expected
+
+```json
+{
+  "sub": "user-456",
+  "agent_id": "autoagent-prod-7",
+  "exp": 9999999999
+}
+```
+
+## Kill switch for incidents
+
+If an agent causes an incident, flip a kill switch without restarting edge:
+
+```bash
+# Via CLI
+fairvisor kill-switch enable agent-id=autoagent-prod-7
+
+# Or update the policy bundle with a kill_switch entry and hot-reload
+```
+
+See `docs/cookbook/kill-switch-incident-response.md` for the full incident playbook.
diff --git a/examples/recipes/runaway-agent-guard/policy.json b/examples/recipes/runaway-agent-guard/policy.json
new file mode 100644
index 0000000..2715b60
--- /dev/null
+++ b/examples/recipes/runaway-agent-guard/policy.json
@@ -0,0 +1,40 @@
+{
+  "bundle_version": 1,
+  "issued_at": "2026-01-01T00:00:00Z",
+  "expires_at": "2030-01-01T00:00:00Z",
+  "policies": [
+    {
+      "id": "runaway-agent-guard",
+      "spec": {
+        "selector": {
+          "pathPrefix": "/",
+          "methods": ["POST"]
+        },
+        "mode": "enforce",
+        "rules": [
+          {
+            "name": "loop-detection",
+            "limit_keys": ["jwt:agent_id"],
+            "algorithm": "loop_detector",
+            "algorithm_config": {
+              "window_seconds": 60,
+              "max_requests": 30,
+              "cooldown_seconds": 120
+            }
+          },
+          {
+            "name": "agent-tpm-guard",
+            "limit_keys": ["jwt:agent_id"],
+            "algorithm": "token_bucket_llm",
+            "algorithm_config": {
+              "tokens_per_minute": 50000,
+              "burst_tokens": 50000,
+              "default_max_completion_tokens": 512
+            }
+          }
+        ]
+      }
+    }
+  ],
+  "kill_switches": []
+}
diff --git a/examples/recipes/team-budgets/README.md b/examples/recipes/team-budgets/README.md
new file mode 100644
index 0000000..54c1551
--- /dev/null
+++ b/examples/recipes/team-budgets/README.md
@@ -0,0 +1,45 @@
+# Recipe: Team Budgets
+
+Enforce per-team token and cost limits using JWT claims.
+
+## How it works
+
+Each request carries a JWT with a `team_id` claim. Fairvisor uses this as
+the bucket key for two independent rules:
+
+1. **TPM/TPD limit** — token-rate enforcement per minute and per day
+2. **Monthly cost budget** — cumulative cost cap with staged warn/throttle/reject
+
+## Deploy
+
+```bash
+# Copy policy to your edge config path
+cp policy.json /etc/fairvisor/policy.json
+
+# Or use with docker compose (standalone mode):
+FAIRVISOR_CONFIG_FILE=./policy.json FAIRVISOR_MODE=wrapper docker compose up -d
+```
+
+## JWT shape expected
+
+```json
+{
+  "sub": "user-123",
+  "team_id": "engineering",
+  "plan": "pro",
+  "exp": 9999999999
+}
+```
+
+## Staged actions at cost budget thresholds
+
+| Threshold | Action |
+|---|---|
+| 80% | Warn (allow, log, emit business event) |
+| 95% | Throttle (allow with 500 ms delay) |
+| 100% | Reject (429, `budget_exceeded`) |
+
+## Related fixtures
+
+- `../../../fixtures/reject_tpd_exceeded.json` — TPD reject body
+- `../../../fixtures/reject_tpm_exceeded.json` — TPM reject body
diff --git a/examples/recipes/team-budgets/policy.json b/examples/recipes/team-budgets/policy.json
new file mode 100644
index 0000000..d361a30
--- /dev/null
+++ b/examples/recipes/team-budgets/policy.json
@@ -0,0 +1,47 @@
+{
+  "bundle_version": 1,
+  "issued_at": "2026-01-01T00:00:00Z",
+  "expires_at": "2030-01-01T00:00:00Z",
+  "policies": [
+    {
+      "id": "team-token-budget",
+      "spec": {
+        "selector": {
+          "pathPrefix": "/openai/",
+          "methods": ["POST"]
+        },
+        "mode": "enforce",
+        "rules": [
+          {
+            "name": "per-team-tpm",
+            "limit_keys": ["jwt:team_id"],
+            "algorithm": "token_bucket_llm",
+            "algorithm_config": {
+              "tokens_per_minute": 120000,
+              "tokens_per_day": 2000000,
+              "burst_tokens": 120000,
+              "default_max_completion_tokens": 1024
+            }
+          },
+          {
+            "name": "per-team-cost-budget",
+            "limit_keys": ["jwt:team_id"],
+            "algorithm": "cost_based",
+            "algorithm_config": {
+              "budget": 50000,
+              "period": "30d",
+              "cost_key": "fixed",
+              "fixed_cost": 1,
+              "staged_actions": [
+                { "threshold_percent": 80, "action": "warn" },
+                { "threshold_percent": 95, "action": "throttle", "delay_ms": 500 },
+                { "threshold_percent": 100, "action": "reject" }
+              ]
+            }
+          }
+        ]
+      }
+    }
+  ],
+  "kill_switches": []
+}
diff --git a/fixtures/allow_response.json b/fixtures/allow_response.json
new file mode 100644
index 0000000..7cc0312
--- /dev/null
+++ b/fixtures/allow_response.json
@@ -0,0 +1,28 @@
+{
+  "_comment": "Sample 200 response for an allowed request in wrapper mode. Note: no Authorization, x-api-key, or x-goog-api-key headers — upstream auth is stripped on the response side.",
+  "_status": 200,
+  "_headers": {
+    "Content-Type": "application/json",
+    "X-Fairvisor-Reason": null,
+    "Authorization": null,
+    "x-api-key": null,
+    "x-goog-api-key": null
+  },
+  "id": "chatcmpl-example",
+  "object": "chat.completion",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "Hello! How can I help you today?"
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 9,
+    "total_tokens": 19
+  }
+}
diff --git a/fixtures/anthropic_normal_request.json b/fixtures/anthropic_normal_request.json
new file mode 100644
index 0000000..bcffdbf
--- /dev/null
+++ b/fixtures/anthropic_normal_request.json
@@ -0,0 +1,10 @@
+{
+  "model": "claude-3-5-haiku-20241022",
+  "max_tokens": 20,
+  "messages": [
+    {
+      "role": "user",
+      "content": "Say hello in one sentence."
+    }
+  ]
+}
diff --git a/fixtures/normal_request.json b/fixtures/normal_request.json
new file mode 100644
index 0000000..049a4e4
--- /dev/null
+++ b/fixtures/normal_request.json
@@ -0,0 +1,10 @@
+{
+  "model": "gpt-4o-mini",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Say hello in one sentence."
+    }
+  ],
+  "max_tokens": 20
+}
diff --git a/fixtures/over_limit_request.json b/fixtures/over_limit_request.json
new file mode 100644
index 0000000..b3b554f
--- /dev/null
+++ b/fixtures/over_limit_request.json
@@ -0,0 +1,10 @@
+{
+  "model": "gpt-4o",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Say hello in one sentence."
+    }
+  ],
+  "max_tokens": 200000
+}
diff --git a/fixtures/reject_anthropic.json b/fixtures/reject_anthropic.json
new file mode 100644
index 0000000..bdf468f
--- /dev/null
+++ b/fixtures/reject_anthropic.json
@@ -0,0 +1,13 @@
+{
+  "_comment": "Anthropic-native 429 reject body. Used for /anthropic/* paths.",
+  "_headers": {
+    "X-Fairvisor-Reason": "tpm_exceeded",
+    "Retry-After": "60",
+    "Content-Type": "application/json"
+  },
+  "type": "error",
+  "error": {
+    "type": "rate_limit_error",
+    "message": "Token budget exceeded for this tenant."
+  }
+}
diff --git a/fixtures/reject_gemini.json b/fixtures/reject_gemini.json
new file mode 100644
index 0000000..f0df901
--- /dev/null
+++ b/fixtures/reject_gemini.json
@@ -0,0 +1,13 @@
+{
+  "_comment": "Gemini-native 429 reject body. Used for /gemini/* paths.",
+  "_headers": {
+    "X-Fairvisor-Reason": "tpm_exceeded",
+    "Retry-After": "60",
+    "Content-Type": "application/json"
+  },
+  "error": {
+    "code": 429,
+    "message": "Token budget exceeded for this tenant.",
+    "status": "RESOURCE_EXHAUSTED"
+  }
+}
diff --git a/fixtures/reject_openai.json b/fixtures/reject_openai.json
new file mode 100644
index 0000000..eabd023
--- /dev/null
+++ b/fixtures/reject_openai.json
@@ -0,0 +1,14 @@
+{
+  "_comment": "OpenAI-native 429 reject body. Used for /openai/* paths and OpenAI-compatible providers.",
+  "_headers": {
+    "X-Fairvisor-Reason": "tpm_exceeded",
+    "Retry-After": "60",
+    "Content-Type": "application/json"
+  },
+  "error": {
+    "type": "rate_limit_error",
+    "code": "tpm_exceeded",
+    "message": "Token budget exceeded for this tenant.",
+    "param": null
+  }
+}
diff --git a/fixtures/reject_prompt_too_large.json b/fixtures/reject_prompt_too_large.json
new file mode 100644
index 0000000..9c4cf8c
--- /dev/null
+++ b/fixtures/reject_prompt_too_large.json
@@ -0,0 +1,13 @@
+{
+  "_comment": "429 body returned when the request exceeds max_prompt_tokens.",
+  "_headers": {
+    "X-Fairvisor-Reason": "prompt_too_large",
+    "Content-Type": "application/json"
+  },
+  "error": {
+    "type": "rate_limit_error",
+    "code": "prompt_too_large",
+    "message": "Request prompt exceeds the maximum allowed token count for this policy.",
+    "param": null
+  }
+}
diff --git a/fixtures/reject_tpd_exceeded.json b/fixtures/reject_tpd_exceeded.json
new file mode 100644
index 0000000..8d2bcdb
--- /dev/null
+++ b/fixtures/reject_tpd_exceeded.json
@@ -0,0 +1,16 @@
+{
+  "_comment": "429 body returned when the per-day token budget is exhausted.",
+  "_headers": {
+    "X-Fairvisor-Reason": "tpd_exceeded",
+    "Retry-After": "86400",
+    "RateLimit-Limit": "2000000",
+    "RateLimit-Remaining": "0",
+    "Content-Type": "application/json"
+  },
+  "error": {
+    "type": "rate_limit_error",
+    "code": "tpd_exceeded",
+    "message": "Token budget exceeded for this tenant.",
+    "param": null
+  }
+}
diff --git a/fixtures/reject_tpm_exceeded.json b/fixtures/reject_tpm_exceeded.json
new file mode 100644
index 0000000..26f45d0
--- /dev/null
+++ b/fixtures/reject_tpm_exceeded.json
@@ -0,0 +1,17 @@
+{
+  "_comment": "429 body returned when the per-minute token budget is exhausted.",
+  "_headers": {
+    "X-Fairvisor-Reason": "tpm_exceeded",
+    "Retry-After": "60",
+    "RateLimit-Limit": "120000",
+    "RateLimit-Remaining": "0",
+    "RateLimit-Reset": "<unix timestamp of next window>",
+    "Content-Type": "application/json"
+  },
+  "error": {
+    "type": "rate_limit_error",
+    "code": "tpm_exceeded",
+    "message": "Token budget exceeded for this tenant.",
+    "param": null
+  }
+}