|
| 1 | +--- |
| 2 | +sidebar_position: 2 |
| 3 | +--- |
| 4 | + |
| 5 | +# Architecture |
| 6 | + |
| 7 | +How the QuilrAI LLM Gateway processes every request - from your application to the LLM provider and back. |
| 8 | + |
| 9 | +<ArchitectureDiagram |
| 10 | + source={{ |
| 11 | + label: "Your Application", |
| 12 | + code: `client = OpenAI( |
| 13 | + base_url='https://guardrails.quilr.ai/openai_compatible/', |
| 14 | + api_key='sk-quilr-xxx' |
| 15 | +) |
| 16 | +client.chat.completions.create( |
| 17 | + model='gpt-4o', |
| 18 | + messages=[{'role': 'user', 'content': 'Hello!'}] |
| 19 | +)`, |
| 20 | + }} |
| 21 | + gateway={{ |
| 22 | + label: "QuilrAI LLM Gateway", |
| 23 | + phases: [ |
| 24 | + { |
| 25 | + label: "Validate", |
| 26 | + stages: [ |
| 27 | + { label: "Identity & Auth", items: ["JWT / header validation", "Domain allowlist", "Per-user tracking"] }, |
| 28 | + { label: "Rate Limits", items: ["Req/min, hr, day limits", "Token budgets", "Key expiration"] }, |
| 29 | + ], |
| 30 | + }, |
| 31 | + { |
| 32 | + label: "Scan", |
| 33 | + stages: [ |
| 34 | + { label: "PII / PHI / PCI", items: ["Contextual detection", "Exact data matching", "Block / redact / anonymize"] }, |
| 35 | + { label: "Adversarial Detection", items: ["Prompt injection", "Jailbreak detection", "Social engineering"] }, |
| 36 | + { label: "Custom Intents", items: ["User-defined categories", "Example-trained classifier"] }, |
| 37 | + ], |
| 38 | + }, |
| 39 | + { |
| 40 | + label: "Transform", |
| 41 | + stages: [ |
| 42 | + { label: "Prompt Store", items: ["Centralized prompts", "Template variables", "Enforce prompt-only mode"] }, |
| 43 | + { label: "Token Saving", items: ["JSON compression", "HTML/MD stripping", "Input-only, same accuracy"] }, |
| 44 | + ], |
| 45 | + }, |
| 46 | + { |
| 47 | + label: "Route", |
| 48 | + stages: [ |
| 49 | + { label: "Request Routing", items: ["Weighted load balancing", "Automatic failover", "Multi-provider groups"] }, |
| 50 | + ], |
| 51 | + }, |
| 52 | + ], |
| 53 | + footer: "Logging · Cost Tracking · Analytics · Red Team Testing", |
| 54 | + }} |
| 55 | + destination={{ |
| 56 | + label: "LLM Providers", |
| 57 | + items: ["OpenAI", "Anthropic", "Azure OpenAI", "AWS Bedrock", "Vertex AI", "Custom Endpoints"], |
| 58 | + }} |
| 59 | +/> |
| 60 | + |
| 61 | +## Pipeline Stages |
| 62 | + |
| 63 | +Every API request flows through these stages in order. Each stage is independently configurable per API key from the dashboard. |
| 64 | + |
| 65 | +| Stage | Description | Details | |
| 66 | +|-------|-------------|---------| |
| 67 | +| **Identity & Auth** | Validates request identity via JWT, JWKS, or header. Enforces domain restrictions. | [Identity Aware →](./features/identity-aware) | |
| 68 | +| **Rate Limits** | Enforces request rates, token budgets, and key expiration before reaching the provider. | [Rate Limits →](./features/rate-limits) | |
| 69 | +| **Security Guardrails** | Detects PII, PHI, PCI, and financial data. Catches prompt injection, jailbreak, and social engineering. | [Security Guardrails →](./features/security-guardrails) | |
| 70 | +| **Custom Intents** | User-defined detection categories trained with positive and negative examples. | [Custom Intents →](./features/custom-intents) | |
| 71 | +| **Prompt Store** | Resolves centralized system prompts by ID with template variable substitution. | [Prompt Store →](./features/prompt-store) | |
| 72 | +| **Token Saving** | Compresses input tokens - JSON to TOON, HTML/Markdown to plain text. Responses unchanged. | [Token Saving →](./features/token-saving) | |
| 73 | +| **Request Routing** | Routes to the optimal provider using weighted load balancing with automatic failover. | [Request Routing →](./features/request-routing) | |
| 74 | + |
| 75 | +## Response Path |
| 76 | + |
| 77 | +Responses from the LLM provider pass back through the **security guardrails** for output scanning before being returned to your application. The same detection categories and configurable actions (block, redact, anonymize, monitor) apply to both requests and responses. |
| 78 | + |
| 79 | +## Observability |
| 80 | + |
| 81 | +Every request is logged with cost, latency, token counts, and guardrail actions. Use the **Logs** tab to review request history and the **Red Team Testing** tool to [validate your guardrail configuration](./features/red-team-testing) against adversarial prompts. |
0 commit comments