Skip to content

Commit 88a4ee8

Browse files
authored
Merge pull request #1 from quilrai/add_diagrams
Add diagrams
2 parents 723391a + 46f87f2 commit 88a4ee8

38 files changed

Lines changed: 2994 additions & 398 deletions

docs/llm-gateway/_category_.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33
"position": 3,
44
"link": {
55
"type": "generated-index",
6-
"description": "Unified API gateway for LLM providers route, secure, monitor, and optimize all your LLM traffic through a single endpoint."
6+
"description": "Unified API gateway for LLM providers - route, secure, monitor, and optimize all your LLM traffic through a single endpoint."
77
}
88
}

docs/llm-gateway/architecture.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
5+
# Architecture
6+
7+
How the QuilrAI LLM Gateway processes every request - from your application to the LLM provider and back.
8+
9+
<ArchitectureDiagram
10+
source={{
11+
label: "Your Application",
12+
code: `client = OpenAI(
13+
base_url='https://guardrails.quilr.ai/openai_compatible/',
14+
api_key='sk-quilr-xxx'
15+
)
16+
client.chat.completions.create(
17+
model='gpt-4o',
18+
messages=[{'role': 'user', 'content': 'Hello!'}]
19+
)`,
20+
}}
21+
gateway={{
22+
label: "QuilrAI LLM Gateway",
23+
phases: [
24+
{
25+
label: "Validate",
26+
stages: [
27+
{ label: "Identity & Auth", items: ["JWT / header validation", "Domain allowlist", "Per-user tracking"] },
28+
{ label: "Rate Limits", items: ["Req/min, hr, day limits", "Token budgets", "Key expiration"] },
29+
],
30+
},
31+
{
32+
label: "Scan",
33+
stages: [
34+
{ label: "PII / PHI / PCI", items: ["Contextual detection", "Exact data matching", "Block / redact / anonymize"] },
35+
{ label: "Adversarial Detection", items: ["Prompt injection", "Jailbreak detection", "Social engineering"] },
36+
{ label: "Custom Intents", items: ["User-defined categories", "Example-trained classifier"] },
37+
],
38+
},
39+
{
40+
label: "Transform",
41+
stages: [
42+
{ label: "Prompt Store", items: ["Centralized prompts", "Template variables", "Enforce prompt-only mode"] },
43+
{ label: "Token Saving", items: ["JSON compression", "HTML/MD stripping", "Input-only, same accuracy"] },
44+
],
45+
},
46+
{
47+
label: "Route",
48+
stages: [
49+
{ label: "Request Routing", items: ["Weighted load balancing", "Automatic failover", "Multi-provider groups"] },
50+
],
51+
},
52+
],
53+
footer: "Logging · Cost Tracking · Analytics · Red Team Testing",
54+
}}
55+
destination={{
56+
label: "LLM Providers",
57+
items: ["OpenAI", "Anthropic", "Azure OpenAI", "AWS Bedrock", "Vertex AI", "Custom Endpoints"],
58+
}}
59+
/>
60+
61+
## Pipeline Stages
62+
63+
Every API request flows through these stages in order. Each stage is independently configurable per API key from the dashboard.
64+
65+
| Stage | Description | Details |
66+
|-------|-------------|---------|
67+
| **Identity & Auth** | Validates request identity via JWT, JWKS, or header. Enforces domain restrictions. | [Identity Aware →](./features/identity-aware) |
68+
| **Rate Limits** | Enforces request rates, token budgets, and key expiration before reaching the provider. | [Rate Limits →](./features/rate-limits) |
69+
| **Security Guardrails** | Detects PII, PHI, PCI, and financial data. Catches prompt injection, jailbreak, and social engineering. | [Security Guardrails →](./features/security-guardrails) |
70+
| **Custom Intents** | User-defined detection categories trained with positive and negative examples. | [Custom Intents →](./features/custom-intents) |
71+
| **Prompt Store** | Resolves centralized system prompts by ID with template variable substitution. | [Prompt Store →](./features/prompt-store) |
72+
| **Token Saving** | Compresses input tokens - JSON to TOON, HTML/Markdown to plain text. Responses unchanged. | [Token Saving →](./features/token-saving) |
73+
| **Request Routing** | Routes to the optimal provider using weighted load balancing with automatic failover. | [Request Routing →](./features/request-routing) |
74+
75+
## Response Path
76+
77+
Responses from the LLM provider pass back through the **security guardrails** for output scanning before being returned to your application. The same detection categories and configurable actions (block, redact, anonymize, monitor) apply to both requests and responses.
78+
79+
## Observability
80+
81+
Every request is logged with cost, latency, token counts, and guardrail actions. Use the **Logs** tab to review request history and the **Red Team Testing** tool to [validate your guardrail configuration](./features/red-team-testing) against adversarial prompts.

docs/llm-gateway/features/_category_.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,6 @@
33
"position": 3,
44
"link": {
55
"type": "generated-index",
6-
"description": "Detailed documentation for each LLM Gateway capability Routing, Token Saving, Guardrails, Prompt Store, Identity Aware, and more."
6+
"description": "Detailed documentation for each LLM Gateway capability - Routing, Token Saving, Guardrails, Prompt Store, Identity Aware, and more."
77
}
88
}

docs/llm-gateway/features/custom-intents.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ Custom intents extend the guardrails system with your own detection logic. Provi
2020

2121
1. **Name** your intent (e.g., `competitor-mentions`)
2222
2. **Describe** what the intent should detect
23-
3. **Add positive examples** prompts that should trigger the intent
24-
4. **Add negative examples** prompts that should not trigger the intent
25-
5. **Assign an action** block, monitor, or redact
23+
3. **Add positive examples** - prompts that should trigger the intent
24+
4. **Add negative examples** - prompts that should not trigger the intent
25+
5. **Assign an action** - block, monitor, or redact
2626

2727
The classifier learns from your examples and applies the configured action when a match is detected.

docs/llm-gateway/features/identity-aware.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,21 +8,45 @@ Authenticate and track users behind each API key.
88

99
## How It Works
1010

11-
1. **Request Arrives** — App sends an API call with identity info
12-
2. **Gateway Identifies User** — Extracts identity via header or JWT token
13-
3. **Per-User Tracking** — Usage tracked per user with rate limits and analytics
11+
<StepFlow steps={[
12+
{
13+
label: "Request Arrives",
14+
items: [
15+
"Authorization: Bearer sk-quilr-•••",
16+
"X-User-Email: alice@acme.com",
17+
],
18+
},
19+
{
20+
label: "QuilrAI Identifies",
21+
items: [
22+
"User: alice@acme.com",
23+
"Domain: acme.com ✓",
24+
],
25+
},
26+
{
27+
label: "Per-User Tracking",
28+
items: [
29+
"Requests today: 142",
30+
"Rate limit: 80% used",
31+
],
32+
},
33+
]} />
34+
35+
1. **Request Arrives** - App sends an API call with identity info
36+
2. **Gateway Identifies User** - Extracts identity via header or JWT token
37+
3. **Per-User Tracking** - Usage tracked per user with rate limits and analytics
1438

1539
## Authentication Modes
1640

17-
### Header Based Recommended for trusted clients
41+
### Header Based - Recommended for trusted clients
1842

19-
Uses the `X-User-Email` header to identify users. If your app handles user login and makes LLM calls from your own backend, this is the easiest and recommended approach just pass the logged-in user's email as a header.
43+
Uses the `X-User-Email` header to identify users. If your app handles user login and makes LLM calls from your own backend, this is the easiest and recommended approach - just pass the logged-in user's email as a header.
2044

2145
```
2246
X-User-Email: user@company.com
2347
```
2448

25-
### JWKS Endpoint For untrusted clients
49+
### JWKS Endpoint - For untrusted clients
2650

2751
Validates JWT tokens using a JWKS URL for dynamic key rotation. Ideal for production OAuth/OIDC flows with providers like Auth0, Okta, or Google.
2852

@@ -51,7 +75,7 @@ MIIBIjANBgkqh...
5175

5276
### Enforce Identity
5377

54-
When enabled, requests without valid identity (header or JWT) are **rejected at the gateway** bare API key access is blocked.
78+
When enabled, requests without valid identity (header or JWT) are **rejected at the gateway** - bare API key access is blocked.
5579

5680
### Allowed User Domains
5781

docs/llm-gateway/features/prompt-store.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,33 @@ Manage and version system prompts centrally.
88

99
## How It Works
1010

11-
1. **Create** — Store a prompt with a unique ID (e.g., `code-reviewer`)
12-
2. **Reference** — Use it as the system message content: `quilrai-prompt-store-code-reviewer`
13-
3. **Gateway Resolves** — The gateway resolves the prompt and sends the full text to the LLM
11+
<StepFlow steps={[
12+
{
13+
label: "Prompt Stored",
14+
items: [
15+
"ID: code-reviewer",
16+
'"You are a {{tone}} reviewer"',
17+
],
18+
},
19+
{
20+
label: "API References It",
21+
items: [
22+
"system: quilrai-prompt-store-code-reviewer",
23+
'vars: {tone: "formal"}',
24+
],
25+
},
26+
{
27+
label: "QuilrAI Resolves",
28+
items: [
29+
'"You are a formal reviewer"',
30+
"Sent to LLM ✓",
31+
],
32+
},
33+
]} />
34+
35+
1. **Create** - Store a prompt with a unique ID (e.g., `code-reviewer`)
36+
2. **Reference** - Use it as the system message content: `quilrai-prompt-store-code-reviewer`
37+
3. **Gateway Resolves** - The gateway resolves the prompt and sends the full text to the LLM
1438

1539
## Template Variables
1640

docs/llm-gateway/features/rate-limits.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ Rate limits protect your LLM spend and availability. All limits are enforced at
1212

1313
## Key Features
1414

15-
- **Per-key rate limits** Requests per minute, hour, or day
16-
- **Token limits** Input and output token budgets per request or over time
17-
- **API key expiration** Configurable epoch time for automatic key expiry
18-
- **Response timeout** Maximum wait time to prevent hung requests from consuming resources
15+
- **Per-key rate limits** - Requests per minute, hour, or day
16+
- **Token limits** - Input and output token budgets per request or over time
17+
- **API key expiration** - Configurable epoch time for automatic key expiry
18+
- **Response timeout** - Maximum wait time to prevent hung requests from consuming resources
1919

2020
## Configuration
2121

docs/llm-gateway/features/request-routing.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,34 @@ Multi-provider load balancing and failover behind a single API key.
88

99
## How It Works
1010

11-
1. **Create Group** — Define a named routing group (e.g., `Group1`)
12-
2. **Add Models** — Add providers with traffic weights (e.g., `gpt-4o 60%`, `claude 40%`)
13-
3. **Use as Model** — Pass the group name as the `model` parameter in your API call
11+
<StepFlow steps={[
12+
{
13+
label: "API Request",
14+
items: [
15+
'model: "Group1"',
16+
'content: "Hello!"',
17+
],
18+
},
19+
{
20+
label: "QuilrAI Routes",
21+
items: [
22+
"Group1 found ✓",
23+
"gpt-4o → 60% weight",
24+
"claude-sonnet → 40% weight",
25+
],
26+
},
27+
{
28+
label: "Provider Selected",
29+
items: [
30+
"→ gpt-4o (weighted)",
31+
"Response returned ✓",
32+
],
33+
},
34+
]} />
35+
36+
1. **Create Group** - Define a named routing group (e.g., `Group1`)
37+
2. **Add Models** - Add providers with traffic weights (e.g., `gpt-4o 60%`, `claude 40%`)
38+
3. **Use as Model** - Pass the group name as the `model` parameter in your API call
1439

1540
## Weight-Based Routing
1641

@@ -64,7 +89,7 @@ Group names can match actual model names. Your application keeps sending request
6489
|-----------|-----------|
6590
| `gpt-4.1` | `gpt-4.1-nano` (70%), `gpt-4.1-mini` (30%) |
6691

67-
Your code still sends `model="gpt-4.1"` zero code changes, but requests get routed to cheaper or faster models behind the scenes.
92+
Your code still sends `model="gpt-4.1"` - zero code changes, but requests get routed to cheaper or faster models behind the scenes.
6893

6994
## Code Examples
7095

docs/llm-gateway/features/security-guardrails.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ Contextual detection identifies sensitive data categories and applies the config
1616

1717
### Supported Categories
1818

19-
- **PII** Personally Identifiable Information
20-
- **PHI** Protected Health Information
21-
- **PCI** Payment Card Industry data
22-
- **Financial data** Financial records and account information
19+
- **PII** - Personally Identifiable Information
20+
- **PHI** - Protected Health Information
21+
- **PCI** - Payment Card Industry data
22+
- **Financial data** - Financial records and account information
2323

2424
### Exact Data Matching (EDM)
2525

@@ -29,9 +29,9 @@ Pattern matching with custom EDM rules for specific data formats.
2929

3030
Catches adversarial attack patterns in requests:
3131

32-
- **Prompt injection** Attempts to override system instructions
33-
- **Jailbreak** Attempts to bypass safety controls
34-
- **Social engineering** Manipulation attempts targeting the AI model
32+
- **Prompt injection** - Attempts to override system instructions
33+
- **Jailbreak** - Attempts to bypass safety controls
34+
- **Social engineering** - Manipulation attempts targeting the AI model
3535

3636
## Configurable Actions
3737

docs/llm-gateway/features/tags.md

Lines changed: 0 additions & 18 deletions
This file was deleted.

0 commit comments

Comments
 (0)