A multi-layer orchestration platform that intelligently routes customer queries through intent classification, safety validation, bilingual support, and automated tool execution — with a full-stack real-time dashboard.
Chat with a live agent — see intent classification, chosen action, confidence score, and response debug in real time
Create and manage multiple agent configurations — each with its own safety mode, confidence threshold, and system prompt
Every request is logged with intent, action, safety status, and latency — filterable by intent, action, and status
AgentOS sits between a user's message and your backend tools. Every message goes through a 9-stage orchestration pipeline:
Message → Validate → Sanitize → Load Agent Config (DB)
→ Load FAQs + Tools (DB) → Classify Intent → Safety Check
→ Execute Action (answer / tool / escalate) → Hallucination Guard → Log
It supports 10 intent types, 2 tools (order lookup + ticket creation), 3 safety levels, and bilingual English + Hinglish input — all configurable from the dashboard without touching code.
flowchart TD
A[User Message] --> B[Validate & Sanitize]
B --> C[Load Agent Config from DB]
C --> D[Load FAQs + ToolConfig from DB]
D --> E[Intent Classification\n10 intents · English + Hinglish]
E --> F[Safety Check\nStrict / Balanced / Permissive]
F --> G{Action Decision}
G -- answer --> H[FAQ Match + Template Response]
G -- tool_call --> I[Tool Execution with Retry\norder_lookup · create_ticket]
G -- escalate --> J[Escalation Response]
H --> K[Hallucination Guard]
I --> K
J --> K
K --> L[Async DB Log]
L --> M[JSON Response]
| Intent | Example Queries | Action |
|---|---|---|
greeting |
"Hello", "Namaste" | answer |
order_status |
"Where is my order ORD-123?", "Mera order kaha hai?" | tool_call |
refund_request |
"I want a refund", "Paisa wapas chahiye" | tool_call |
complaint |
"Product arrived broken", "Saman kharab aaya" | escalate |
ticket_creation |
"Open a support ticket", "Naya ticket khol do" | tool_call |
faq_query |
"What is your return policy?", "Customer service number kya hai?" | answer |
shipping_query |
"When will my package arrive?", "Kab tak aa jayega?" | answer |
product_query |
"Do you have iPhone 15?", "Kya iPhone available hai?" | escalate |
payment_query |
"Payment failed but money deducted" | escalate |
abusive_language |
"Ye bakwas service hai" | escalate |
Tested on 32 queries — 16 English + 16 Hinglish — across 3 confidence thresholds:
| Threshold | Intent Accuracy | Escalation Rate | Tool Success Rate | Avg Latency |
|---|---|---|---|---|
| 0.60 | 84.4% | 50.0% | 41.7% | 35ms |
| 0.70 | 84.4% | 81.3% | 40.0% | 21ms |
| 0.80 | 84.4% | 81.3% | 66.7% | 20ms |
Key insight: Higher threshold → low-confidence tool calls escalate instead of attempting execution → tool success rate improves. Intent accuracy stays consistent because classification confidence is threshold-independent.
Hallucination Guard: The guard fires in strict mode (isHallucination: true, confidence 0.9). In balanced mode (default), template responses are intentionally permitted — this is by design, not a gap.
Three configurable safety levels per agent:
| Level | Blocks | Allows |
|---|---|---|
strict |
High + Medium risk · Any reasoning-source response | Only FAQ-backed or tool-backed responses |
balanced |
High risk only | Template responses + FAQ + tool |
permissive |
Risk score > 0.8 | Everything else |
Detection: PII (SSN, email, credit card), SQL injection, command injection, template injection, spam, abusive language (English + Hinglish).
- Node.js 18+
- pnpm (or npm)
# 1. Install dependencies
pnpm install
# 2. Generate Prisma client + create database
npx prisma generate
npx prisma db push
# 3. Seed default agent, FAQs, and tool config
node prisma/seed.js
# 4. Copy environment file
cp .env.example .env.local
# 5. Start dev server
pnpm run dev# .env.local
DATABASE_URL="file:./dev.db"# Start dev server first, then in a second terminal:
pnpm run evalThe eval script:
- ✅ Preflight checks the server is responsive (3 retries)
- 🤖 Auto-loads the active agent ID from
/api/agents - 📊 Runs 32 queries × 3 thresholds (ablation)
- 🛡️ Sends a strict-mode hallucination guard probe
- 💾 Saves
eval/results.json+eval/results.csv
Main orchestration endpoint — runs a message through the full 9-stage pipeline.
Request:
{
"message": "Where is my order ORD-12345?",
"agentId": "your-agent-id",
"sessionId": "optional-session-id",
"confidenceThreshold": 0.7
}Response:
{
"success": true,
"agentId": "...",
"intentClassification": {
"intent": "order_status",
"confidence": 0.91,
"language": "english"
},
"actionDecision": "tool_call",
"toolInvocation": {
"toolId": "order_lookup",
"result": { "success": true, "data": { "status": "In Transit" } }
},
"finalResponse": "Your order ORD-12345 is currently In Transit...",
"hallucinationCheck": { "isHallucination": false },
"latencyBreakdown": {
"intentClassification": 1,
"safetyCheck": 0,
"toolExecution": 102,
"total": 105
}
}Override Headers:
X-Session-ID— session trackingX-Safety-Mode: strict | balanced | permissive— override agent safety level per-request
AgentOS/
├── app/
│ ├── api/
│ │ ├── run/route.ts ← Main 9-stage orchestrator
│ │ ├── agents/ ← Agent CRUD
│ │ ├── agent-config/ ← Agent settings upsert
│ │ ├── tools/ ← ToolConfig API
│ │ └── knowledge/ ← FAQ management
│ └── (dashboard)/
│ ├── test-console/ ← Live agent chat UI
│ ├── agent-builder/ ← Agent config UI
│ ├── tools-knowledge/ ← FAQ + tool management
│ └── logs/ ← Execution log viewer
├── lib/
│ ├── intent-router.ts ← Intent classification (10 types)
│ ├── safety-guard.ts ← Safety + hallucination detection
│ ├── tools.ts ← Tool implementations
│ └── db-logger.ts ← Queue-based async logger
├── prisma/
│ ├── schema.prisma ← Data models
│ └── seed.js ← Default data seeder
├── scripts/
│ └── eval.js ← Ablation evaluation pipeline
├── eval/
│ ├── dataset.json ← 32-query bilingual test set
│ ├── results.json ← Latest eval output
│ └── results.csv ← CSV export
└── docs/images/ ← Screenshots
| Layer | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| Language | TypeScript 5 |
| Database | SQLite via Prisma ORM |
| Validation | Zod |
| UI Components | shadcn/ui + Radix UI + Tailwind CSS |
| Logging | Custom queue-based async DB logger |
| Evaluation | Node.js ablation script |
For full technical documentation including problem statement, data models, and design decisions — see PROJECT.md


