Kimi Proxy

[!WARNING] ⚠️ Experimental: This project is still in development. Use with caution in production.

Makes kimi-k2-thinking usable across multiple LLM providers by normalizing API formats, fixing tool call and thinking format issues, and optionally ensuring the model always uses a tool call for agentic workflows.

The proxy and transformation pipelines are built generically and can be easily extended to support any model and any provider.

Features

Multi-provider proxy for kimi-k2-thinking and other models

Seamlessly route requests to OpenAI-compatible APIs, OpenRouter, or Vertex AI using a unified client model name.

Format fixes for tool calls and thinking blocks

For some providers, kimi-k2-thinking returns tool calls and thinking content in non-standard formats. The proxy normalizes these to the standard Anthropic format that clients expect.

Example: Tool call normalization from content

What the kimi-k2 provider returns (tool calls embedded in content with <|tool_call_begin|> markers):

{
  "content": "Let me search for that.     <|tool_call_begin|>    functions.lookup:42  <|tool_call_argument_begin|>   {\"term\":\"express\"}   <|tool_call_begin|>  "
}

What clients receive (normalized):

{
  "content": "Let me search for that.",
  "tool_calls": [
    {
      "id": "42",
      "type": "function",
      "function": {
        "name": "lookup",
        "arguments": "{\"term\":\"express\"}"
      }
    }
  ],
  "finish_reason": "tool_calls"
}

Example: Thinking tags extraction and cleanup

What kimi-k2 returns:

(no content)(no content)  Let me break down... </think>   The answer is 42.

What clients receive:

{
  "content": "The answer is 42.",
  "thinking": "Let me break down..."
}

Tool call enforcement (optional) for reliable agentic workflows

Enable with ensure_tool_call: true in model config. The proxy detects missing tool calls and re-prompts the model with a reminder.

When enabled, the proxy also injects a termination tool named done and a system instruction telling the model to call it when finished (optionally with {"final_answer":"..."}), then strips that termination tool call from the final response.

Control the maximum number of re-prompt attempts with ENSURE_TOOL_CALL_MAX_ATTEMPTS (default: 3, max: 5).

Example enforcement flow:

System: You are a helpful assistant with access to tools.
        Always reply with at least one tool call so the client can continue.

User: What's the weather in SF?

Assistant: Let me check that for you.

System: Reminder: The client will not continue unless you reply with a tool call.

Assistant: {
  "tool_calls": [{
    "id": "get_weather:0",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\": \"SF\"}"
    }
  }]
}

Request/response logging with web dashboard

All requests and responses are logged to SQLite and viewable through a built-in web dashboard at the root path.

Load balancing with multiple strategies

Distribute traffic across providers using round-robin, weighted random, random, or first strategies.

Extensible architecture for adding new models and providers
Provider support: OpenAI-compatible APIs, OpenRouter, Vertex AI
Hybrid logging pipeline: SQLite metadata with filesystem blobs, LiveStore-backed dashboard with virtualized/lazy blob loading

Quick Start (Bun)

bun install
cp .env.example .env
cp model-config.example.yaml model-config.yaml
# Edit .env and model-config.yaml with your provider keys and models
bun run dev            # backend
bun --cwd frontend dev # dashboard (Vite)

The API runs on http://127.0.0.1:8000 and serves the dashboard (built assets) at /. The dev dashboard uses VITE_API_URL to point at the backend (defaults to same origin).

Configuration

Dashboard & LiveStore

Control LiveStore sync behavior via environment variables:

Variable	Default	Description
`LIVESTORE_BATCH`	50	Batch size for dashboard sync (range: 1-500)
`LIVESTORE_MAX_RECORDS`	500	Memory sliding window - max records to keep in LiveStore. Set to 0 to disable (not recommended)

Providers

Set environment variables in .env:

Generic OpenAI: OPENAI_BASE_URL, OPENAI_API_KEY
Anthropic: ANTHROPIC_API_KEY, optional ANTHROPIC_BASE_URL
OpenRouter: OPENROUTER_API_KEY, OPENROUTER_PROVIDERS (optional), OPENROUTER_ORDER (optional)
Vertex AI: VERTEX_PROJECT_ID, VERTEX_LOCATION, GOOGLE_APPLICATION_CREDENTIALS
- GOOGLE_APPLICATION_CREDENTIALS can be a path to the JSON key file or the JSON payload itself. Use VERTEX_CHAT_ENDPOINT to point at a private MaaS endpoint if needed.

Models

Edit model-config.yaml to map client model names to upstream providers:

default_strategy: round_robin
models:
  - name: kimi-k2-thinking
    provider: vertex
    model: moonshotai/kimi-k2-thinking-maas
    # Optional: enforce tool call consistency for reliable agentic workflows
    ensure_tool_call: true
  - name: kimi-k2-thinking
    provider: openrouter
    model: moonshot-ai/kimi-k2-thinking
    weight: 2

Dashboard

The web dashboard shows request/response logs and metrics. Access it at the root path when running the proxy. LiveStore metadata sync pulls from /api/livestore/pull in batches (size controlled by LIVESTORE_BATCH) and lazily fetches blobs on expansion. Build the dashboard with bun run build:all to serve static assets from the backend.

Performance Features

Reverse-chronological loading: Data loads from newest to oldest, providing immediate access to recent logs
Memory-efficient virtualization: Uses TanStack Virtual to render only visible rows
Configurable sliding window: Limit browser memory usage by setting LIVESTORE_MAX_RECORDS (see .env.example)
Automatic garbage collection: Old records beyond the window limit are automatically purged

The dashboard uses reactive queries with TanStack Table and TanStack Virtual for fast, efficient rendering of large datasets.

Development

bun run dev         # Run backend with hot reload
bun --cwd frontend dev  # Run dashboard
bun test            # Run tests
bun run build:all   # Build server + dashboard

Docker

docker compose up --build -d  # Production stack with web dashboard
docker compose -f docker-compose.dev.yml watch  # Development with hot reload

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.vscode		.vscode
frontend		frontend
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
header_fix_summary.txt		header_fix_summary.txt
model-config.example.yaml		model-config.example.yaml
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kimi Proxy

Features

Quick Start (Bun)

Configuration

Dashboard & LiveStore

Providers

Models

Dashboard

Performance Features

Development

Docker

About

Uh oh!

Languages

License

abcpro1/kimi-proxy

Folders and files

Latest commit

History

Repository files navigation

Kimi Proxy

Features

Quick Start (Bun)

Configuration

Dashboard & LiveStore

Providers

Models

Dashboard

Performance Features

Development

Docker

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages