Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ CSRF_SECRET=
# - Session 追踪:5 分钟上下文缓存优化(避免频繁切换供应商)
# - Fail Open 策略:Redis 不可用时自动降级,不影响服务可用性
ENABLE_RATE_LIMIT=true # 是否启用限流功能(默认:true)
ENABLE_MODEL_RATE_LIMIT=false # 是否启用按模型维度限额(默认:false;依赖 ENABLE_RATE_LIMIT=true)
MODEL_RATE_LIMIT_FAIL_OPEN=true # 按模型限额在 Redis 故障时是否 fail-open(默认:true,与主线一致)
REDIS_URL=redis://localhost:6379 # Redis 连接地址(Docker 部署使用 redis://redis:6379,支持 rediss:// TLS)
REDIS_TLS_REJECT_UNAUTHORIZED=true # 是否验证 Redis TLS 证书(默认:true)
# 设置为 false 可跳过证书验证,用于自签证书或共享证书场景
Expand Down
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,18 @@

---

## 未发布 (Unreleased)

### 新增

- 用户组 × 模型组限额(Group Rate Limit):将「按模型限额」重构为两维度模型——**模型组**(一组模型,全局互斥归属)× **限额主体**(用户 / 用户组 / 密钥),可为每个 (主体 × 模型组) 设置 5 小时/每日/每周/每月/总额成本上限。多来源(个人行 + 用户组上限)按**取最大值**合并,用户组限额为**人均上限**。支持**临时提额**授予(按用户 × 模型组 × 窗口,带有效期,到点即时生效/失效,叠加在有效上限之上)。
- **完全切分**:命中某轴(用户或密钥)模型组限额后,该轴消费既**跳过**主线全局成本闸门、也**不计入**该轴主线全局额(通过 `usage_ledger` 按轴打标 `counted_in_user_global` / `counted_in_key_global` 实现,DB 聚合、Redis 回填、展示分栏三处同源);RPM 与并发护栏始终生效。Redis 故障时按 `MODEL_RATE_LIMIT_FAIL_OPEN` fail-open,且 fail-open **不**置旁路标记以防双重放行。
- 新增模块:schema 五表 + 两枚举 + `usage_ledger`/`message_request` 打标两列、解析快照缓存(SWR + pub/sub 失效)、桶 lease 计量、guard 接入、模型组/用户组/限额/提额 Admin REST API、Dashboard 管理界面(模型组、用户组、按模型限额含提额内嵌),5 语言 i18n。
- 通过 `ENABLE_MODEL_RATE_LIMIT` 开关控制,默认关闭,关闭时与主线逐字节一致。提额到点生效为内存精确判定;增删授予最长一个缓存 TTL 后对线上请求生效。
- 已知后续项:OPT-B 模型维度 lease 百分比(`quotaModelLeasePercent*` / `quotaModelLeaseMinSliceUsd`)当前未配置时回退主线百分比;真实 PG+Redis 的集成/E2E 测试待在具备数据库的环境中补充。

---

## v0.8.5 (2026-06-08)

### 新增
Expand Down
16 changes: 14 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

- **Source**: https://github.com/ding113/claude-code-hub
- **PR Target Branch**: `dev` (all pull requests must target the dev branch)
- **Branching & commit conventions**: see @CONTRIBUTING.md (Conventional Commits, `feature/*` / `fix/*` branches, squash-merge to `dev`)

## Critical Rules

1. **No Emoji in Code** - Never use emoji characters in any code, comments, or string literals
1. **No Emoji in Code** - Never use emoji characters in any code, comments, or string literals (verify: `bun run i18n:audit-messages-no-emoji`)
2. **Test Coverage** - All new features must have unit test coverage of at least 80%
3. **i18n Required** - All user-facing strings must use i18n (5 languages supported). Never hardcode display text
3. **i18n Required** - All user-facing strings must use i18n (5 languages supported). Message files live at `messages/<locale>/<section>.json`. Verify placeholders: `bun run i18n:audit-placeholders`
4. **Pre-commit Checklist** - Before committing, always run:
```bash
bun run build # Production build
Expand Down Expand Up @@ -44,6 +45,9 @@ bun run test:ui # Interactive test UI
bun run test:coverage # Coverage report
bunx vitest run <file> # Run single test file
bunx vitest run -t "test name" # Run specific test
bun run test:integration # Run integration tests (separate config)
bun run test:e2e # Run e2e tests (separate config)
bun run test:v1 # API v1 critical-path coverage check

# Dev environment (via dev/Makefile)
cd dev && make dev # Start all services (PG + Redis + app)
Expand All @@ -65,6 +69,7 @@ bun run db:generate # Generate Drizzle migrations from schema changes
bun run db:migrate # Apply migrations
bun run db:push # Push schema changes (dev only)
bun run db:studio # Open Drizzle Studio
bun run validate:migrations # Verify generated migration files are consistent
```

## Architecture Overview
Expand Down Expand Up @@ -123,6 +128,13 @@ Key components:
- **Legacy Management API**: `/api/actions/{module}/{action}` - Deprecated Server Action adapter, retained behind `ENABLE_LEGACY_ACTIONS_API`
- **Docs**: `/api/v1/scalar` (Scalar UI), `/api/v1/docs` (Swagger), `/api/v1/openapi.json`
- **OpenAPI checks**: `bun run test:v1`, `bun run openapi:check`, `bun run openapi:lint`
- **OpenAPI codegen**: `bun run openapi:generate` regenerates TypeScript types from the OpenAPI schema

### MCP Servers
Configured in `.mcp.json` — prefer these over reinventing:
- `db` (Bytebase DBHub): introspect Postgres schema/data directly
- `shadcn`: search/install shadcn/ui components into the project
- `chrome-devtools`: browser automation for E2E debugging

## Code Conventions

Expand Down
7 changes: 7 additions & 0 deletions docker-compose.local.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
services:
app:
image: claude-code-hub:local
environment:
ENABLE_RATE_LIMIT: "true"
ENABLE_MODEL_RATE_LIMIT: "true"
AUTO_MIGRATE: "true"
4 changes: 4 additions & 0 deletions docs/api/v1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ traffic can converge without reimplementing business rules.

Every response includes `X-API-Version: 1.0.0`.

### Resource guides

- [Per-Model Limits](./model-limits.md): admin endpoints for per-model cost limits.

## Authentication

The API accepts three credential transports:
Expand Down
114 changes: 114 additions & 0 deletions docs/api/v1/model-limits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Per-Model Limits API

Admin endpoints for managing per-model cost limits scoped to a user or an API
key. These complement the mainline user/key quotas by letting you cap spend on a
single model (or all models via a `*` wildcard) without affecting the shared
account-level budget.

See the OpenAPI surface for the authoritative schema:

- OpenAPI JSON: `/api/v1/openapi.json`
- Scalar UI: `/api/v1/scalar` (tag: `Model Limits`)

## Feature flag

Per-model limiting is opt-in and is enforced only when both flags are set:

- `ENABLE_MODEL_RATE_LIMIT=true` (default `false`)
- `ENABLE_RATE_LIMIT=true` (default `true`)

The management endpoints below are always available to admins regardless of the
flag, so limits can be configured ahead of enabling enforcement. When the flag
is off, configured limits are stored but never evaluated, and the request path
is unchanged.

## Authentication

All endpoints require `admin` access (session cookie, opaque session bearer
token, or `ADMIN_TOKEN`; user API keys are rejected unless
`ENABLE_API_KEY_ADMIN_ACCESS=true` for an admin-owned key). Cookie-authenticated
mutations must include the CSRF token from `GET /api/v1/auth/csrf`.

Errors use the standard `application/problem+json` envelope. Notable codes:

- `model_limit.not_found` (404): the targeted limit row does not exist.
- `model_limit.action_failed` (400): the underlying action rejected the input.
- `auth.forbidden` (403): caller lacks admin access.

## Endpoints

| Method | Path | Description |
| --- | --- | --- |
| `GET` | `/api/v1/model-limits/users/{userId}` | List a user's per-model limits |
| `POST` | `/api/v1/model-limits/users/{userId}` | Create or update a user limit (`model` in body) |
| `DELETE` | `/api/v1/model-limits/users/{userId}/{model}` | Delete a user limit |
| `GET` | `/api/v1/model-limits/keys/{keyId}` | List a key's per-model limits |
| `POST` | `/api/v1/model-limits/keys/{keyId}` | Create or update a key limit (`model` in body) |
| `DELETE` | `/api/v1/model-limits/keys/{keyId}/{model}` | Delete a key limit |

For `DELETE`, URL-encode the `model` path segment. The wildcard `*` is
`%2A` (e.g. `/api/v1/model-limits/keys/42/%2A`).

### List response

```json
{
"items": [
{
"scopeType": "user",
"scopeId": 7,
"model": "claude-opus-4",
"rpmLimit": null,
"limit5hUsd": 2.5,
"limit5hResetMode": "fixed",
"dailyLimitUsd": 10,
"limitWeeklyUsd": null,
"limitMonthlyUsd": 100,
"limitTotalUsd": null,
"limit5hCostResetAt": null
}
]
}
```

### Upsert body

```json
{
"model": "claude-opus-4",
"limit5hUsd": 2.5,
"limit5hResetMode": "fixed",
"dailyLimitUsd": 10,
"limitWeeklyUsd": null,
"limitMonthlyUsd": 100,
"limitTotalUsd": null
}
```

- `model` is required (1-128 chars). Use `*` for an all-models fallback.
- Each USD field is optional. Omit a field to leave it unchanged on update;
send `null` to clear it (unlimited for that window).
- `limit5hResetMode` is `fixed` or `rolling` and applies to the 5-hour window.
- `rpmLimit` is reserved for a future release and is not enforced.

The endpoint upserts on `(scope, model)` and returns the resulting row (HTTP
200). `DELETE` returns HTTP 204 with no body.

## Resolution semantics

When a request is evaluated, the most specific matching limit is chosen via a
4-level lookup (first match wins; no stacking):

1. key + exact model
2. key + `*`
3. user + exact model
4. user + `*`

If none match, no per-model limit applies and the request continues under the
mainline user/key quotas only.

Usage is metered on the resolved (post-redirect) model name, consistent with the
`model` column stored in `usage_ledger`. Limits reuse the mainline lease
mechanism (PostgreSQL as the authoritative source, Redis lease slices, atomic
decrement). On Redis failure the limiter fails open by default
(`MODEL_RATE_LIMIT_FAIL_OPEN=true`).
Loading