Production-grade, open-source AI gateway β unifying Groq, Gemini, OpenRouter, and Ollama behind a single OpenAI-compatible + Anthropic-compatible endpoint. Smart failover, multi-key rotation, response caching, 4 routing strategies, and a powerful CLI β all in one.
π Quick Start Β· βοΈ Configuration Β· π API Reference Β· π» Usage Β· π₯οΈ CLI Β· π€ Contributing
Building production AI apps is painful:
- πΈ Rate limits kill your app at peak traffic
- π One API key = single point of failure
- π Different SDKs per provider = messy codebase
- π No fallback when Groq or Gemini goes down
- π° Redundant API costs for repeated prompts
Universal AI Router eliminates all of this. It's a self-hosted AI gateway that sits between your app and every major LLM provider. One endpoint, one format, infinite resilience β built for developers who run real workloads.
| Feature | Description |
|---|---|
| π OpenAI-Compatible API | Drop-in replacement at /v1/chat/completions β zero SDK changes |
| π€ Anthropic-Compatible API | Full /v1/messages endpoint β works with Claude Code, Anthropic SDKs |
| β‘ Smart Failover | Automatic provider switching on failure with exponential backoff |
| π Multi-Key Rotation | Add unlimited keys per provider β health-scored rotation bypasses rate limits |
| π§ Response Caching | In-memory TTL cache β same prompt costs zero tokens the second time |
| π― 4 Routing Strategies | model-based, priority, latency-aware, round-robin β pick your strategy |
| π Background Daemon | Runs as a persistent background process β close terminal, router stays alive |
| π 4 Provider Support | Groq Β· Gemini Β· OpenRouter Β· Ollama β all unified |
| π Live Metrics & Usage | /metrics, /usage, /health endpoints with per-key telemetry |
| π‘οΈ Auth & Rate Limiting | Token-based auth + sliding-window IP rate limiter built-in |
| π§ Admin API | Reset cooldowns & clear cache via authenticated admin endpoints |
| π Streaming SSE | Full streaming support β responses pipe directly to your client |
| π οΈ Tool Call Support | OpenAI function calling / tool use β handled natively |
| π₯οΈ Powerful Global CLI | init, start, stop, restart, status, remove β full lifecycle management |
| βοΈ Multi-Router Support | Run multiple named routers on different ports simultaneously |
| π n8n / Make / Zapier Ready | Works with any OpenAI-compatible no-code platform |
Requirement: Node.js LTS (v20+) β Download here
# 1. Clone the repository
git clone https://github.com/technicalboy2023/ai-router.git
cd ai-router
# 2. Install dependencies
npm install
# 3. Register the global CLI command
npm linkβ
ai-routercommand is now available globally in your terminal.
Perfect for 24/7 hosting on Linode, DigitalOcean, Vultr, Hetzner, Contabo, etc.
sudo apt update && sudo apt upgrade -y
sudo apt install -y git curl# Add NodeSource LTS repo
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
# Install Node.js
sudo apt install -y nodejs
# Verify
node -v # v22.x.x or latest LTS
npm -vgit clone https://github.com/technicalboy2023/ai-router.git
cd ai-router
npm install
sudo npm linksudo ufw allow 8000/tcp
sudo ufw reload
sudo ufw status# Install PM2 globally
sudo npm install -g pm2
# Start the router
pm2 start npm --name "ai-router" -- run dev
# Save process list
pm2 save
# Enable auto-start on reboot (run the command PM2 outputs!)
pm2 startup systemd
# Verify
pm2 status
pm2 logs ai-routerβ Router running at
http://YOUR_VPS_IP:8000β survives reboots automatically!
pm2 status # Check all running processes
pm2 logs ai-router # Stream live logs
pm2 restart ai-router # Restart after config changes
pm2 stop ai-router # Stop the router
pm2 delete ai-router # Remove from PM2# ββ Groq (add multiple keys for rotation) βββββββββββββββββ
GROQ_KEY_1=gsk_your_first_groq_key
GROQ_KEY_2=gsk_your_second_groq_key
# ββ Google Gemini ββββββββββββββββββββββββββββββββββββββββββ
GEMINI_KEY_1=AIzaSy_your_gemini_key
# ββ OpenRouter ββββββββββββββββββββββββββββββββββββββββββββ
OPENROUTER_KEY_1=sk-or-v1-your_key
OPENROUTER_KEY_2=sk-or-v1-your_second_key
# ββ Security ββββββββββββββββββββββββββββββββββββββββββββββ
AUTH_TOKEN=my_super_secret_token
ADMIN_TOKEN=my_admin_secret_token
β οΈ Never commit.envto Git. Add it to.gitignore.
{
"name": "default",
"port": 8000,
"host": "0.0.0.0",
"routing": {
"strategy": "model-based",
"providerOrder": ["groq", "openrouter", "gemini", "ollama"],
"modelMapping": {
"llama*": "groq",
"mixtral*": "groq",
"gemma*": "groq",
"gemini*": "gemini",
"gpt*": "openrouter"
}
},
"fallback": {
"providers": ["groq", "openrouter", "gemini", "ollama"],
"maxRetries": 4,
"backoff": { "initial": 500, "factor": 2, "max": 16000 }
},
"cache": { "enabled": true, "ttl": 30, "maxSize": 512 },
"auth": { "enabled": true, "tokens": ["my_super_secret_token"], "adminTokens": ["my_admin_secret_token"] },
"rateLimit": { "enabled": true, "windowMs": 60000, "maxRequests": 100 },
"logging": { "level": "info", "file": "logs/gateway.log", "console": true }
}| Key | Description |
|---|---|
routing.strategy |
model-based Β· priority Β· latency-aware Β· round-robin |
routing.modelMapping |
Glob patterns β provider ("llama*": "groq") |
fallback.maxRetries |
Provider switches before giving up (default: 4) |
fallback.backoff |
Exponential backoff in ms (initial β max) |
cache.ttl |
Cache TTL in minutes |
auth.enabled |
Toggle Bearer token authentication |
rateLimit.windowMs |
Sliding window duration in ms |
# Development β foreground with live logs
npm run dev
# β
Router live at β http://localhost:8000# Production β named instance in background
ai-router start myRouter -c config/default.jsonYou can run multiple routers simultaneously on the same server β each on a different port, with its own config, auth token, and provider priority.
Copy config/default.json and give it a new name:
cp config/default.json config/myrouter.jsonOpen config/myrouter.json and change the following values:
| Field | Where | What to Change |
|---|---|---|
"name" |
Top level | Change to a unique name e.g. "myrouter" |
"port" |
Top level | Change to a different port e.g. 8001, 8080 |
"logging.file" |
logging block |
Change to a new log file e.g. "logs/myrouter.log" |
β οΈ Two routers cannot share the same port. If they do, the second one will crash with "Port already in use".
| Field | Where | Why You'd Change It |
|---|---|---|
"auth.tokens" |
auth block |
Give this router a separate API password |
"routing.providerOrder" |
routing block |
Prioritise a different provider first (e.g. ["gemini", "openrouter", "groq", "ollama"]) |
"fallback.providers" |
fallback block |
Control which providers act as fallbacks |
"rateLimit.maxRequests" |
rateLimit block |
Set a higher/lower request cap for this router |
{
"name": "myrouter",
"port": 8001,
"host": "0.0.0.0",
"routing": {
"strategy": "model-based",
"providerOrder": ["openrouter", "gemini", "groq", "ollama"]
},
"fallback": {
"providers": ["openrouter", "gemini", "groq", "ollama"],
"maxRetries": 4,
"backoff": { "initial": 500, "factor": 2, "max": 16000 }
},
"cache": { "enabled": true, "ttl": 30, "maxSize": 512 },
"auth": { "enabled": true, "tokens": ["my_router2_token"], "adminTokens": ["my_router2_admin"] },
"rateLimit": { "enabled": true, "windowMs": 60000, "maxRequests": 100 },
"logging": { "level": "info", "file": "logs/myrouter.log", "console": true }
}# Open firewall for the new port first (Linux VPS only)
sudo ufw allow 8001/tcp
# Start the new router
ai-router start myrouter -c config/myrouter.jsonβ Now both routers are running:
:8000(default) and:8001(myrouter) β completely independent.
| Method | Endpoint | Auth | Description |
|---|---|---|---|
POST |
/v1/chat/completions |
User | Main LLM endpoint β OpenAI-compatible |
POST |
/v1/messages |
User | Anthropic Messages API β Claude Code compatible |
POST |
/v1/messages/count_tokens |
User | Token estimation β Claude Code compatible |
POST |
/v1/embeddings |
User | OpenAI-compatible text embeddings endpoint |
GET |
/v1/models |
User | List all models across all providers |
GET |
/health |
None | Liveness probe β provider & key summary |
GET |
/metrics |
None | Per-key telemetry β requests, errors, tokens, latency |
GET |
/usage |
None | Anonymized per-key usage counters |
GET |
/router/status |
None | Routing engine status |
POST |
/admin/reset-cooldowns |
Admin | Reset all rate-limited/cooled-down keys |
POST |
/admin/cache/clear |
Admin | Flush the response cache |
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="my_super_secret_token"
)
response = client.chat.completions.create(
model="llama3-8b-8192",
messages=[{"role": "user", "content": "Explain neural networks simply."}]
)
print(response.choices[0].message.content)stream = client.chat.completions.create(
model="gemini-1.5-flash",
messages=[{"role": "user", "content": "Write a poem about space."}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my_super_secret_token" \
-d '{"model": "openrouter/auto", "messages": [{"role": "user", "content": "Hello!"}]}'curl http://localhost:8000/healthThe router fully supports Claude Code via the /v1/messages endpoint. Configure it:
# Set your router as the Anthropic API base URL
export ANTHROPIC_BASE_URL="http://YOUR_VPS_IP:8000"
export ANTHROPIC_API_KEY="your-router-auth-token"Or add to your shell config (~/.bashrc, ~/.zshrc) for persistence:
echo 'export ANTHROPIC_BASE_URL="http://YOUR_VPS_IP:8000"' >> ~/.bashrc
echo 'export ANTHROPIC_API_KEY="your-router-auth-token"' >> ~/.bashrc
source ~/.bashrcNow launch Claude Code normally β it will route through your AI Router with full fallback support.
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8000",
api_key="my_super_secret_token"
)
message = client.messages.create(
model="openrouter/auto",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain quantum computing simply."}]
)
print(message.content[0].text)curl -X POST http://localhost:8000/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: my_super_secret_token" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "openrouter/auto",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'- URL:
http://YOUR_VPS_IP:8000/v1/chat/completions - Method:
POST - Header:
Authorization: Bearer my_super_secret_token - Body: Standard OpenAI JSON payload
Works natively with n8n's OpenAI node β just change the base URL.
# Initialize a new named router config
ai-router init myRouter --port 8000
# Start a named router (background)
ai-router start myRouter -c config/myRouter.json
# Start ALL routers defined in config/
ai-router start-all
# Check status of all running routers
ai-router status
# Stream live logs
ai-router logs myRouter
# Restart a router (pick up config changes)
ai-router restart myRouter
# Stop a specific router
ai-router stop myRouter
# Stop ALL running routers
ai-router stop-all
# Remove a router config
ai-router remove myRouter| Strategy | How It Works | Best For |
|---|---|---|
model-based |
Routes by model name glob patterns | Predictable provider assignment |
priority |
Tries providers in providerOrder sequence |
Simple primary + fallback setup |
latency-aware |
Prefers provider with lowest avg response time | Latency-sensitive apps |
round-robin |
Distributes evenly across all providers | Load balancing |
ai-router/
β
βββ bin/
β βββ ai-router.js # Global CLI entrypoint
β
βββ config/
β βββ default.json # Full router configuration
β
βββ src/
β βββ index.js # Main export
β βββ worker.js # Dev server entry (npm run dev)
β β
β βββ cli/
β β βββ orchestrator.js # PM2 process manager wrapper
β β βββ commands/ # init, start, startAll, stop, stopAll,
β β # restart, status, logs, remove
β βββ config/
β β βββ loader.js # Config parser & merger
β β βββ schema.js # Zod validation schema
β β
β βββ providers/
β β βββ BaseProvider.js # Abstract provider class
β β βββ ProviderRegistry.js # Provider registry & lookup
β β βββ GroqProvider.js # Groq
β β βββ GeminiProvider.js # Google Gemini
β β βββ OpenRouterProvider.js # OpenRouter
β β βββ OllamaProvider.js # Ollama (local)
β β
β βββ router_core/
β β βββ KeyRegistry.js # Per-provider key pool
β β βββ KeyHealth.js # Health scoring per key
β β βββ ResponseCache.js # In-memory TTL cache
β β βββ UsageStore.js # Usage counter persistence
β β
β βββ server/
β β βββ app.js # Express app bootstrap
β β βββ middleware/ # auth, cors, rateLimiter,
β β β # errorHandler, requestId
β β βββ routes/ # chatCompletions, messages, models,
β β # health, metrics, usage, routerStatus, admin
β βββ services/
β βββ RoutingEngine.js # 4-strategy routing logic
β βββ FallbackEngine.js # Retry + failover
β βββ KeyManager.js # Key selection & rotation
β βββ ToolCallHandler.js # OpenAI tool/function calls
β βββ AnthropicTranslator.js # Anthropic β OpenAI format conversion
β βββ ResponseNormalizer.js # Unified response format
β βββ ErrorNormalizer.js # Unified error format
β
βββ .env # β οΈ Your keys (never commit!)
βββ .env.example # Template for .env
βββ package.json
βββ README.md
After making changes locally (or when a new version is available on GitHub), follow these steps to update the router on your VPS or cloud platform.
# 1. SSH into your VPS
ssh user@YOUR_VPS_IP
# 2. Navigate to the project directory
cd ~/ai-router
# 3. Pull the latest changes from GitHub
git pull origin main
# 4. Install any new/updated dependencies
npm install
# 5. Restart all running routers to pick up changes
pm2 restart all
# 6. Verify everything is running
pm2 status
pm2 logs ai-router --lines 20π‘ Tip: Your
.envandconfig/*.jsonfiles won't be overwritten bygit pullβ they're either gitignored or only yours.
If your router is deployed on Render, Railway, or similar:
- Push your changes to GitHub:
git add . git commit -m "fix: update router logic" git push origin main
- Auto-deploy: Most cloud platforms auto-detect the push and redeploy automatically.
- Manual deploy: If auto-deploy is off, go to your platform dashboard β click "Manual Deploy" β select the latest commit.
β No SSH needed β cloud platforms handle the restart for you.
If you just edited config/default.json or .env on the VPS directly:
# Just restart β no git pull needed
pm2 restart ai-router
# Or restart a specific named router
pm2 restart myrouterTo fully remove the AI Router from your system β including all processes, configs, logs, and the CLI command.
# Stop all running routers
pm2 stop all
# Delete all router processes from PM2
pm2 delete all
# Remove PM2 startup script (optional β if you don't use PM2 for anything else)
pm2 unstartup systemd
pm2 save --force# Navigate to the project directory
cd ~/ai-router
# Remove the global 'ai-router' command
sudo npm unlink# Go back to home directory
cd ~
# Delete the entire project folder
rm -rf ai-router# Remove PM2 logs related to ai-router
pm2 flush
# Close the firewall port (if you opened one)
sudo ufw delete allow 8000/tcp
sudo ufw delete allow 8001/tcp # if you had a second router
sudo ufw reload# Should return "command not found"
ai-router status
# Should show no processes
pm2 status
# Should show the folder no longer exists
ls ~/ai-routerβ That's it β your system is 100% clean. No leftover configs, daemons, or orphan processes.
All contributions welcome!
# Fork + clone
git clone https://github.com/technicalboy2023/ai-router.git
cd ai-router
# Create feature branch
git checkout -b feature/add-mistral-provider
# Run tests
npm test
# Commit + push + open PR
git commit -m "feat: add Mistral AI provider"
git push origin feature/add-mistral-providerGood first contributions:
- π New provider adapter (Mistral, Cohere, Together AI, Anthropic)
- π Web dashboard UI for metrics
- π³ Docker / docker-compose setup
- π§ͺ Test coverage improvements
- π Docs & usage examples
MIT License β free for personal and commercial use. See LICENSE for full details.
Built with β€οΈ by AMAN
Self-hosted infrastructure enthusiast. Building open-source tools for AI developers who refuse vendor lock-in.
If this saved you time, money, or debugging pain β a star means everything.
β Star Β· π΄ Fork Β· π’ Share
Every star helps more developers discover this project. Thank you!
ai gateway Β· openai proxy Β· llm router Β· groq api Β· google gemini Β· openrouter Β· ollama Β· ai failover Β· api key rotation Β· self-hosted ai Β· open source llm Β· n8n ai Β· ai rate limit bypass Β· openai compatible Β· local ai server Β· llm proxy Β· multi-provider ai Β· ai load balancer