quilrai · bedapudi67 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/docs/llm-gateway/_category_.json b/docs/llm-gateway/_category_.json
@@ -3,6 +3,6 @@
   "position": 3,
   "link": {
     "type": "generated-index",
-    "description": "Unified API gateway for LLM providers — route, secure, monitor, and optimize all your LLM traffic through a single endpoint."
+    "description": "Unified API gateway for LLM providers - route, secure, monitor, and optimize all your LLM traffic through a single endpoint."
   }
 }
diff --git a/docs/llm-gateway/architecture.md b/docs/llm-gateway/architecture.md
@@ -0,0 +1,81 @@
+---
+sidebar_position: 2
+---
+
+# Architecture
+
+How the QuilrAI LLM Gateway processes every request - from your application to the LLM provider and back.
+
+<ArchitectureDiagram
+  source={{
+    label: "Your Application",
+    code: `client = OpenAI(
+    base_url='https://guardrails.quilr.ai/openai_compatible/',
+    api_key='sk-quilr-xxx'
+)
+client.chat.completions.create(
+    model='gpt-4o',
+    messages=[{'role': 'user', 'content': 'Hello!'}]
+)`,
+  }}
+  gateway={{
+    label: "QuilrAI LLM Gateway",
+    phases: [
+      {
+        label: "Validate",
+        stages: [
+          { label: "Identity & Auth", items: ["JWT / header validation", "Domain allowlist", "Per-user tracking"] },
+          { label: "Rate Limits", items: ["Req/min, hr, day limits", "Token budgets", "Key expiration"] },
+        ],
+      },
+      {
+        label: "Scan",
+        stages: [
+          { label: "PII / PHI / PCI", items: ["Contextual detection", "Exact data matching", "Block / redact / anonymize"] },
+          { label: "Adversarial Detection", items: ["Prompt injection", "Jailbreak detection", "Social engineering"] },
+          { label: "Custom Intents", items: ["User-defined categories", "Example-trained classifier"] },
+        ],
+      },
+      {
+        label: "Transform",
+        stages: [
+          { label: "Prompt Store", items: ["Centralized prompts", "Template variables", "Enforce prompt-only mode"] },
+          { label: "Token Saving", items: ["JSON compression", "HTML/MD stripping", "Input-only, same accuracy"] },
+        ],
+      },
+      {
+        label: "Route",
+        stages: [
+          { label: "Request Routing", items: ["Weighted load balancing", "Automatic failover", "Multi-provider groups"] },
+        ],
+      },
+    ],
+    footer: "Logging  ·  Cost Tracking  ·  Analytics  ·  Red Team Testing",
+  }}
+  destination={{
+    label: "LLM Providers",
+    items: ["OpenAI", "Anthropic", "Azure OpenAI", "AWS Bedrock", "Vertex AI", "Custom Endpoints"],
+  }}
+/>
+
+## Pipeline Stages
+
+Every API request flows through these stages in order. Each stage is independently configurable per API key from the dashboard.
+
+| Stage | Description | Details |
+|-------|-------------|---------|
+| **Identity & Auth** | Validates request identity via JWT, JWKS, or header. Enforces domain restrictions. | [Identity Aware →](./features/identity-aware) |
+| **Rate Limits** | Enforces request rates, token budgets, and key expiration before reaching the provider. | [Rate Limits →](./features/rate-limits) |
+| **Security Guardrails** | Detects PII, PHI, PCI, and financial data. Catches prompt injection, jailbreak, and social engineering. | [Security Guardrails →](./features/security-guardrails) |
+| **Custom Intents** | User-defined detection categories trained with positive and negative examples. | [Custom Intents →](./features/custom-intents) |
+| **Prompt Store** | Resolves centralized system prompts by ID with template variable substitution. | [Prompt Store →](./features/prompt-store) |
+| **Token Saving** | Compresses input tokens - JSON to TOON, HTML/Markdown to plain text. Responses unchanged. | [Token Saving →](./features/token-saving) |
+| **Request Routing** | Routes to the optimal provider using weighted load balancing with automatic failover. | [Request Routing →](./features/request-routing) |
+
+## Response Path
+
+Responses from the LLM provider pass back through the **security guardrails** for output scanning before being returned to your application. The same detection categories and configurable actions (block, redact, anonymize, monitor) apply to both requests and responses.
+
+## Observability
+
+Every request is logged with cost, latency, token counts, and guardrail actions. Use the **Logs** tab to review request history and the **Red Team Testing** tool to [validate your guardrail configuration](./features/red-team-testing) against adversarial prompts.
diff --git a/docs/llm-gateway/features/_category_.json b/docs/llm-gateway/features/_category_.json
@@ -3,6 +3,6 @@
   "position": 3,
   "link": {
     "type": "generated-index",
-    "description": "Detailed documentation for each LLM Gateway capability — Routing, Token Saving, Guardrails, Prompt Store, Identity Aware, and more."
+    "description": "Detailed documentation for each LLM Gateway capability - Routing, Token Saving, Guardrails, Prompt Store, Identity Aware, and more."
   }
 }
diff --git a/docs/llm-gateway/features/custom-intents.md b/docs/llm-gateway/features/custom-intents.md
@@ -20,8 +20,8 @@ Custom intents extend the guardrails system with your own detection logic. Provi
 
 1. **Name** your intent (e.g., `competitor-mentions`)
 2. **Describe** what the intent should detect
-3. **Add positive examples** — prompts that should trigger the intent
-4. **Add negative examples** — prompts that should not trigger the intent
-5. **Assign an action** — block, monitor, or redact
+3. **Add positive examples** - prompts that should trigger the intent
+4. **Add negative examples** - prompts that should not trigger the intent
+5. **Assign an action** - block, monitor, or redact
 
 The classifier learns from your examples and applies the configured action when a match is detected.
diff --git a/docs/llm-gateway/features/identity-aware.md b/docs/llm-gateway/features/identity-aware.md
@@ -8,21 +8,45 @@ Authenticate and track users behind each API key.
 
 ## How It Works
 
-1. **Request Arrives** — App sends an API call with identity info
-2. **Gateway Identifies User** — Extracts identity via header or JWT token
-3. **Per-User Tracking** — Usage tracked per user with rate limits and analytics
+<StepFlow steps={[
+  {
+    label: "Request Arrives",
+    items: [
+      "Authorization: Bearer sk-quilr-•••",
+      "X-User-Email: alice@acme.com",
+    ],
+  },
+  {
+    label: "QuilrAI Identifies",
+    items: [
+      "User: alice@acme.com",
+      "Domain: acme.com ✓",
+    ],
+  },
+  {
+    label: "Per-User Tracking",
+    items: [
+      "Requests today: 142",
+      "Rate limit: 80% used",
+    ],
+  },
+]} />
+
+1. **Request Arrives** - App sends an API call with identity info
+2. **Gateway Identifies User** - Extracts identity via header or JWT token
+3. **Per-User Tracking** - Usage tracked per user with rate limits and analytics
 
 ## Authentication Modes
 
-### Header Based — Recommended for trusted clients
+### Header Based - Recommended for trusted clients
 
-Uses the `X-User-Email` header to identify users. If your app handles user login and makes LLM calls from your own backend, this is the easiest and recommended approach — just pass the logged-in user's email as a header.
+Uses the `X-User-Email` header to identify users. If your app handles user login and makes LLM calls from your own backend, this is the easiest and recommended approach - just pass the logged-in user's email as a header.
 
 ```
 X-User-Email: user@company.com
 ```
 
-### JWKS Endpoint — For untrusted clients
+### JWKS Endpoint - For untrusted clients
 
 Validates JWT tokens using a JWKS URL for dynamic key rotation. Ideal for production OAuth/OIDC flows with providers like Auth0, Okta, or Google.
 
@@ -51,7 +75,7 @@ MIIBIjANBgkqh...
 
 ### Enforce Identity
 
-When enabled, requests without valid identity (header or JWT) are **rejected at the gateway** — bare API key access is blocked.
+When enabled, requests without valid identity (header or JWT) are **rejected at the gateway** - bare API key access is blocked.
 
 ### Allowed User Domains
 

diff --git a/docs/llm-gateway/features/prompt-store.md b/docs/llm-gateway/features/prompt-store.md
@@ -8,9 +8,33 @@ Manage and version system prompts centrally.
 
 ## How It Works
 
-1. **Create** — Store a prompt with a unique ID (e.g., `code-reviewer`)
-2. **Reference** — Use it as the system message content: `quilrai-prompt-store-code-reviewer`
-3. **Gateway Resolves** — The gateway resolves the prompt and sends the full text to the LLM
+<StepFlow steps={[
+  {
+    label: "Prompt Stored",
+    items: [
+      "ID: code-reviewer",
+      '"You are a {{tone}} reviewer"',
+    ],
+  },
+  {
+    label: "API References It",
+    items: [
+      "system: quilrai-prompt-store-code-reviewer",
+      'vars: {tone: "formal"}',
+    ],
+  },
+  {
+    label: "QuilrAI Resolves",
+    items: [
+      '"You are a formal reviewer"',
+      "Sent to LLM ✓",
+    ],
+  },
+]} />
+
+1. **Create** - Store a prompt with a unique ID (e.g., `code-reviewer`)
+2. **Reference** - Use it as the system message content: `quilrai-prompt-store-code-reviewer`
+3. **Gateway Resolves** - The gateway resolves the prompt and sends the full text to the LLM
 
 ## Template Variables
 

diff --git a/docs/llm-gateway/features/rate-limits.md b/docs/llm-gateway/features/rate-limits.md
@@ -12,10 +12,10 @@ Rate limits protect your LLM spend and availability. All limits are enforced at
 
 ## Key Features
 
-- **Per-key rate limits** — Requests per minute, hour, or day
-- **Token limits** — Input and output token budgets per request or over time
-- **API key expiration** — Configurable epoch time for automatic key expiry
-- **Response timeout** — Maximum wait time to prevent hung requests from consuming resources
+- **Per-key rate limits** - Requests per minute, hour, or day
+- **Token limits** - Input and output token budgets per request or over time
+- **API key expiration** - Configurable epoch time for automatic key expiry
+- **Response timeout** - Maximum wait time to prevent hung requests from consuming resources
 
 ## Configuration
 

diff --git a/docs/llm-gateway/features/request-routing.md b/docs/llm-gateway/features/request-routing.md
@@ -8,9 +8,34 @@ Multi-provider load balancing and failover behind a single API key.
 
 ## How It Works
 
-1. **Create Group** — Define a named routing group (e.g., `Group1`)
-2. **Add Models** — Add providers with traffic weights (e.g., `gpt-4o 60%`, `claude 40%`)
-3. **Use as Model** — Pass the group name as the `model` parameter in your API call
+<StepFlow steps={[
+  {
+    label: "API Request",
+    items: [
+      'model: "Group1"',
+      'content: "Hello!"',
+    ],
+  },
+  {
+    label: "QuilrAI Routes",
+    items: [
+      "Group1 found ✓",
+      "gpt-4o → 60% weight",
+      "claude-sonnet → 40% weight",
+    ],
+  },
+  {
+    label: "Provider Selected",
+    items: [
+      "→ gpt-4o (weighted)",
+      "Response returned ✓",
+    ],
+  },
+]} />
+
+1. **Create Group** - Define a named routing group (e.g., `Group1`)
+2. **Add Models** - Add providers with traffic weights (e.g., `gpt-4o 60%`, `claude 40%`)
+3. **Use as Model** - Pass the group name as the `model` parameter in your API call
 
 ## Weight-Based Routing
 
@@ -64,7 +89,7 @@ Group names can match actual model names. Your application keeps sending request
 |-----------|-----------|
 | `gpt-4.1` | `gpt-4.1-nano` (70%), `gpt-4.1-mini` (30%) |
 
-Your code still sends `model="gpt-4.1"` — zero code changes, but requests get routed to cheaper or faster models behind the scenes.
+Your code still sends `model="gpt-4.1"` - zero code changes, but requests get routed to cheaper or faster models behind the scenes.
 
 ## Code Examples
 

diff --git a/docs/llm-gateway/features/security-guardrails.md b/docs/llm-gateway/features/security-guardrails.md
@@ -16,10 +16,10 @@ Contextual detection identifies sensitive data categories and applies the config
 
 ### Supported Categories
 
-- **PII** — Personally Identifiable Information
-- **PHI** — Protected Health Information
-- **PCI** — Payment Card Industry data
-- **Financial data** — Financial records and account information
+- **PII** - Personally Identifiable Information
+- **PHI** - Protected Health Information
+- **PCI** - Payment Card Industry data
+- **Financial data** - Financial records and account information
 
 ### Exact Data Matching (EDM)
 
@@ -29,9 +29,9 @@ Pattern matching with custom EDM rules for specific data formats.
 
 Catches adversarial attack patterns in requests:
 
-- **Prompt injection** — Attempts to override system instructions
-- **Jailbreak** — Attempts to bypass safety controls
-- **Social engineering** — Manipulation attempts targeting the AI model
+- **Prompt injection** - Attempts to override system instructions
+- **Jailbreak** - Attempts to bypass safety controls
+- **Social engineering** - Manipulation attempts targeting the AI model
 
 ## Configurable Actions
 

diff --git a/docs/llm-gateway/features/tags.md b/docs/llm-gateway/features/tags.md
diff --git a/docs/llm-gateway/features/token-saving.md b/docs/llm-gateway/features/token-saving.md
@@ -8,23 +8,47 @@ Reduce token usage by compressing input content automatically.
 
 ## How It Works
 
-1. **Request Arrives** — Your app sends a normal API call
-2. **Gateway Compresses** — Content is transformed to use fewer tokens
-3. **Forwarded to LLM** — Optimized content sent — same accuracy, lower cost
+<StepFlow steps={[
+  {
+    label: "Request Arrives",
+    items: [
+      '{"name": "John", "age": 30}',
+      "14 input tokens",
+    ],
+  },
+  {
+    label: "QuilrAI Compresses",
+    items: [
+      "name:John|age:30",
+      "8 input tokens",
+    ],
+  },
+  {
+    label: "Sent to LLM",
+    items: [
+      "43% tokens saved",
+      "Same response quality ✓",
+    ],
+  },
+]} />
+
+1. **Request Arrives** - Your app sends a normal API call
+2. **Gateway Compresses** - Content is transformed to use fewer tokens
+3. **Forwarded to LLM** - Optimized content sent - same accuracy, lower cost
 
 ## Compression Methods
 
-### Smart JSON Compression — Up to 20% savings
+### Smart JSON Compression - Up to 20% savings
 
-Converts JSON objects in LLM inputs to TOON format — ideal for tool call responses and structured data.
+Converts JSON objects in LLM inputs to TOON format - ideal for tool call responses and structured data.
 
 | Before | After |
 |--------|-------|
 | `{"name": "John", "age": 30}` | `name:John\|age:30` |
 
 ### HTML to Text
 
-Strips HTML tags and extracts clean text — removes markup overhead from scraped pages or rich content.
+Strips HTML tags and extracts clean text - removes markup overhead from scraped pages or rich content.
 
 | Before | After |
 |--------|-------|
@@ -40,4 +64,4 @@ Removes Markdown syntax characters that consume tokens without adding meaning fo
 
 ## Seamless and Input-Only
 
-Compression is applied **only to input tokens** before they reach the LLM. Responses are returned untouched. Your application code stays exactly the same — no SDK changes, no prompt rewrites, just lower costs.
+Compression is applied **only to input tokens** before they reach the LLM. Responses are returned untouched. Your application code stays exactly the same - no SDK changes, no prompt rewrites, just lower costs.