BurnCloud

The Next-Gen High-Performance AI Gateway & Aggregator

Feature Requests · Roadmap · Getting Started · Documentation

💡 What is BurnCloud?

BurnCloud is a Rust-native LLM Aggregation Gateway and Management Platform. It aims to benchmark against and surpass One API (New API), providing individual developers, teams, and enterprises with a high-performance, resource-efficient, secure, and controllable unified LLM access layer.

We are not just reinventing the wheel; we are upgrading the engine. If you are tired of the high memory consumption, GC pauses, or complex deployment dependencies of existing gateways, BurnCloud is your best choice.

✨ Why BurnCloud? (Core Values)

🚀 1. Performance First

Powered by Rust: Built on Axum and Tokio, offering astonishing concurrency handling capabilities and extremely low memory footprint (MB level vs GB level).
Zero-Overhead Passthrough: Featuring a unique "Don't Touch the Body" routing mode. In scenarios without protocol conversion, it achieves byte-level zero-copy forwarding with near-zero latency.
Single Binary: No Runtime dependencies (No Python, No Node.js, No Java). One file is a complete platform.

🔌 2. Universal Aggregation

All to OpenAI: Unifies protocols from Anthropic (Claude), Google (Gemini), Azure, Alibaba Qwen, and other mainstream models into standard OpenAI format.
Write Once, Run Anywhere: Your LangChain, AutoGPT, or any existing application can seamlessly switch underlying models just by changing the Base URL.

⚖️ 3. Enterprise Governance

Smart Load Balancing: Supports Multi-Channel Round-Robin, Weighted Distribution, and Automatic Failover. If one gpt-4 goes down, thousands of gpt-4 stand up.
Precise Billing: Supports precise token-based billing, custom Model Ratios, and User Group Ratios.
Multi-Tenant Management: Comprehensive redemption codes, quota management, and invitation mechanisms.

🛡️ 4. Rock-Solid Reliability

Real-World E2E Testing: We have abandoned fake Mock data. BurnCloud's CI/CD pipeline validates end-to-end against real OpenAI/Gemini APIs, ensuring core forwarding logic remains robust in real network environments.
Browser-Driven Verification: Built-in automated UI tests based on Headless Chrome ensure the rendering link from Backend API to Frontend Dioxus LiveView is unobstructed.
Zero-Regression Promise: Strict "API-Path Matching" testing strategy ensures every Commit passes rigorous automated auditing.

🎨 5. Fluent Experience

More Than API: Built-in local management client developed with Dioxus, featuring Windows 11 Fluent Design.
Visual Monitoring: View real-time TPS, RPM, and token consumption trends, saying goodbye to boring log files.

🏗️ Architecture

BurnCloud adopts a strict four-layer architecture to ensure high cohesion and low coupling:

Gateway Layer (crates/router): Data plane. Handles high-concurrency traffic, authentication, rate limiting, and protocol conversion.
Control Layer (crates/server): Control plane. Provides RESTful APIs for UI calls, managing configuration and state.
Service Layer (crates/service): Business logic. Encapsulates core logic like billing, monitoring, and channel speed testing.
Data Layer (crates/database): Data persistence. Based on SQLx + SQLite/PostgreSQL, with future Redis cache support.

See: Architecture Evolution

Key Principle: "Don't Touch the Body"

The router is a smart pipe, not a processor. It handles authentication and routing but streams request/response bodies with zero latency.

🛠️ Getting Started

Requirements

Rust 1.75+
Windows 10/11, Linux, or macOS

Quick Start

Full guide: docs/getting-started.md — 环境要求、安全配置、验证步骤、常见问题

# 1. Clone repository
git clone https://github.com/burncloud/burncloud.git
cd burncloud

# 2. Configure (Optional)
cp .env.example .env
# Edit .env and fill in TEST_OPENAI_KEY to enable full E2E tests

# 3. Build
cargo build --release

# 4. Run (Auto-compiles Server and Client)
cargo run                  # GUI on Windows, server with LiveView on Linux
cargo run -- router        # Server mode only
cargo run -- client        # GUI client only

Configuration

Key configuration options:

Variable	Description	Default
`PORT`	Server port	3000
`HOST`	Server host	0.0.0.0
`DATABASE_URL`	Database connection	sqlite:burncloud.db
`RUST_LOG`	Log level	info

Basic Usage

Start the router:

cargo run -- router

Make a request:

curl http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Run Tests (Quality Assurance)

Experience the industrial-grade testing process:

# Run all tests
cargo test --all-features

# Run all API integration tests
cargo test -p burncloud-tests --test api_tests

# Run UI automation tests (Requires Chrome)
cargo test -p burncloud-tests --test ui_tests

# Format check
cargo fmt --all -- --check

# Lint
cargo clippy --all-targets --all-features

💰 Billing & Quota

Pricing Configuration

BurnCloud tracks token usage and calculates costs based on configurable pricing per model.

CLI Commands

List all model prices:

burncloud price list

Set price for a model (per 1 million tokens):

burncloud price set gpt-4 --input 30.0 --output 60.0

Get price for a specific model:

burncloud price get gpt-4

Delete a price:

burncloud price delete gpt-4

Pricing Format

Prices are defined per 1 million tokens:

input_price: Cost per 1M prompt tokens
output_price: Cost per 1M completion tokens

Example calculation for GPT-4 with input=$30/1M, output=$60/1M:

For 100 prompt tokens + 200 completion tokens:
cost = (100/1,000,000 * 30) + (200/1,000,000 * 60)
     = 0.003 + 0.012
     = $0.015

Model Aliases

Models can be aliased to share pricing:

burncloud price set gpt-4-turbo --alias gpt-4

This makes gpt-4-turbo use the same pricing as gpt-4.

Quota Management

Each token can have a quota limit. When a request is made:

System checks if token has sufficient remaining quota
Request is processed
Cost is calculated from token usage
Quota is deducted atomically

Quota Limits

quota_limit = -1: Unlimited quota (default)
quota_limit >= 0: Maximum usage allowed

Insufficient Quota Response

When quota is exhausted:

{
  "error": {
    "message": "Insufficient quota",
    "type": "insufficient_quota_error",
    "code": "insufficient_quota"
  }
}

HTTP Status: 402 Payment Required

Token Expiry

Tokens can have an expiration time:

expired_time = -1: Never expires (default)
expired_time > 0: Unix timestamp of expiration

When a token expires:

{
  "error": {
    "message": "Token has expired",
    "type": "invalid_request_error",
    "code": "token_expired"
  }
}

HTTP Status: 401 Unauthorized

📊 Streaming Token Statistics

BurnCloud parses token usage from streaming responses for accurate billing.

OpenAI

Enable usage stats in streaming:

{
  "model": "gpt-4",
  "messages": [...],
  "stream": true,
  "stream_options": { "include_usage": true }
}

Anthropic

Token counts are in message_start and message_delta events.

Gemini

Token counts are in usageMetadata field.

⚖️ Load Balancing

Weighted Random Selection

Channels can be assigned weights for traffic distribution:

Weight 80/20: ~80% traffic to channel A, ~20% to channel B
Weight 0: Falls back to round-robin

Weights are configured in the abilities table.

⚠️ Error Codes

BurnCloud returns errors in OpenAI-compatible format:

{
  "error": {
    "message": "Error description",
    "type": "error_type",
    "code": "error_code"
  }
}

HTTP Status	Code	Type	Description
401	`invalid_token`	`invalid_request_error`	Invalid or missing token
401	`token_expired`	`invalid_request_error`	Token has expired
402	`insufficient_quota`	`insufficient_quota_error`	Quota exceeded
403	`permission_denied`	`permission_error`	Permission denied
404	`not_found`	`not_found_error`	Resource not found
429	`rate_limit_exceeded`	`rate_limit_error`	Rate limited
500	`server_error`	`server_error`	Internal error
503	`service_unavailable`	`server_error`	Service unavailable

🗺️ Roadmap

v0.1: Basic routing & AWS SigV4 signing support (Completed)
v0.2: Database integration, Basic Auth & New API Core Replication (Completed)
- Ability Smart Routing
- Channel Management API
- Async Billing & Logging
v0.3: Unified Protocol Adaptors (OpenAI/Gemini/Claude) & E2E Test Suite (Completed)
v0.4: Smart Load Balancing & Failover (In Progress)
v0.5: Web Console Frontend Polish
v1.0: Official Release, Redis Cache Integration

🤝 Contributing

Contributions of any kind are welcome! Please read our Development Constitution before submitting code.

Name		Name	Last commit message	Last commit date
Latest commit History 494 Commits
.cargo		.cargo
.github/workflows		.github/workflows
crates		crates
deploy		deploy
docs		docs
src		src
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
clippy.toml		clippy.toml
deny.toml		deny.toml

Folders and files

Latest commit

History

Repository files navigation

BurnCloud

💡 What is BurnCloud?

✨ Why BurnCloud? (Core Values)

🚀 1. Performance First

🔌 2. Universal Aggregation

⚖️ 3. Enterprise Governance

🛡️ 4. Rock-Solid Reliability

🎨 5. Fluent Experience

🏗️ Architecture

Key Principle: "Don't Touch the Body"

🛠️ Getting Started

Requirements

Quick Start

Configuration

Basic Usage

Run Tests (Quality Assurance)

💰 Billing & Quota

Pricing Configuration

CLI Commands

Pricing Format

Model Aliases

Quota Management

Quota Limits

Insufficient Quota Response

Token Expiry

📊 Streaming Token Statistics

OpenAI

Anthropic

Gemini

⚖️ Load Balancing

Weighted Random Selection

⚠️ Error Codes

🗺️ Roadmap

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages