ding113 · GOSICK-Angel · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026
diff --git a/.env.example b/.env.example
@@ -74,6 +74,8 @@ CSRF_SECRET=
 # - Session 追踪：5 分钟上下文缓存优化（避免频繁切换供应商）
 # - Fail Open 策略：Redis 不可用时自动降级，不影响服务可用性
 ENABLE_RATE_LIMIT=true                  # 是否启用限流功能（默认：true）
+ENABLE_MODEL_RATE_LIMIT=false           # 是否启用按模型维度限额（默认：false；依赖 ENABLE_RATE_LIMIT=true）
+MODEL_RATE_LIMIT_FAIL_OPEN=true         # 按模型限额在 Redis 故障时是否 fail-open（默认：true，与主线一致）
 REDIS_URL=redis://localhost:6379        # Redis 连接地址（Docker 部署使用 redis://redis:6379，支持 rediss:// TLS）
 REDIS_TLS_REJECT_UNAUTHORIZED=true      # 是否验证 Redis TLS 证书（默认：true）
                                         # 设置为 false 可跳过证书验证，用于自签证书或共享证书场景

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,18 @@
 
 ---
 
+## 未发布 (Unreleased)
+
+### 新增
+
+- 用户组 × 模型组限额（Group Rate Limit）：将「按模型限额」重构为两维度模型——**模型组**（一组模型，全局互斥归属）× **限额主体**（用户 / 用户组 / 密钥），可为每个 (主体 × 模型组) 设置 5 小时/每日/每周/每月/总额成本上限。多来源（个人行 + 用户组上限）按**取最大值**合并，用户组限额为**人均上限**。支持**临时提额**授予（按用户 × 模型组 × 窗口，带有效期，到点即时生效/失效，叠加在有效上限之上）。
+  - **完全切分**：命中某轴（用户或密钥）模型组限额后，该轴消费既**跳过**主线全局成本闸门、也**不计入**该轴主线全局额（通过 `usage_ledger` 按轴打标 `counted_in_user_global` / `counted_in_key_global` 实现，DB 聚合、Redis 回填、展示分栏三处同源）；RPM 与并发护栏始终生效。Redis 故障时按 `MODEL_RATE_LIMIT_FAIL_OPEN` fail-open，且 fail-open **不**置旁路标记以防双重放行。
+  - 新增模块：schema 五表 + 两枚举 + `usage_ledger`/`message_request` 打标两列、解析快照缓存（SWR + pub/sub 失效）、桶 lease 计量、guard 接入、模型组/用户组/限额/提额 Admin REST API、Dashboard 管理界面（模型组、用户组、按模型限额含提额内嵌），5 语言 i18n。
+  - 通过 `ENABLE_MODEL_RATE_LIMIT` 开关控制，默认关闭，关闭时与主线逐字节一致。提额到点生效为内存精确判定；增删授予最长一个缓存 TTL 后对线上请求生效。
+  - 已知后续项：OPT-B 模型维度 lease 百分比（`quotaModelLeasePercent*` / `quotaModelLeaseMinSliceUsd`）当前未配置时回退主线百分比；真实 PG+Redis 的集成/E2E 测试待在具备数据库的环境中补充。
+
+---
+
 ## v0.8.5 (2026-06-08)
 
 ### 新增

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -6,12 +6,13 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 - **Source**: https://github.com/ding113/claude-code-hub
 - **PR Target Branch**: `dev` (all pull requests must target the dev branch)
+- **Branching & commit conventions**: see @CONTRIBUTING.md (Conventional Commits, `feature/*` / `fix/*` branches, squash-merge to `dev`)
 
 ## Critical Rules
 
-1. **No Emoji in Code** - Never use emoji characters in any code, comments, or string literals
+1. **No Emoji in Code** - Never use emoji characters in any code, comments, or string literals (verify: `bun run i18n:audit-messages-no-emoji`)
 2. **Test Coverage** - All new features must have unit test coverage of at least 80%
-3. **i18n Required** - All user-facing strings must use i18n (5 languages supported). Never hardcode display text
+3. **i18n Required** - All user-facing strings must use i18n (5 languages supported). Message files live at `messages/<locale>/<section>.json`. Verify placeholders: `bun run i18n:audit-placeholders`
 4. **Pre-commit Checklist** - Before committing, always run:
    ```bash
    bun run build      # Production build
@@ -44,6 +45,9 @@ bun run test:ui           # Interactive test UI
 bun run test:coverage     # Coverage report
 bunx vitest run <file>    # Run single test file
 bunx vitest run -t "test name"  # Run specific test
+bun run test:integration  # Run integration tests (separate config)
+bun run test:e2e          # Run e2e tests (separate config)
+bun run test:v1           # API v1 critical-path coverage check
 
 # Dev environment (via dev/Makefile)
 cd dev && make dev        # Start all services (PG + Redis + app)
@@ -65,6 +69,7 @@ bun run db:generate       # Generate Drizzle migrations from schema changes
 bun run db:migrate        # Apply migrations
 bun run db:push           # Push schema changes (dev only)
 bun run db:studio         # Open Drizzle Studio
+bun run validate:migrations  # Verify generated migration files are consistent
 ```
 
 ## Architecture Overview
@@ -123,6 +128,13 @@ Key components:
 - **Legacy Management API**: `/api/actions/{module}/{action}` - Deprecated Server Action adapter, retained behind `ENABLE_LEGACY_ACTIONS_API`
 - **Docs**: `/api/v1/scalar` (Scalar UI), `/api/v1/docs` (Swagger), `/api/v1/openapi.json`
 - **OpenAPI checks**: `bun run test:v1`, `bun run openapi:check`, `bun run openapi:lint`
+- **OpenAPI codegen**: `bun run openapi:generate` regenerates TypeScript types from the OpenAPI schema
+
+### MCP Servers
+Configured in `.mcp.json` — prefer these over reinventing:
+- `db` (Bytebase DBHub): introspect Postgres schema/data directly
+- `shadcn`: search/install shadcn/ui components into the project
+- `chrome-devtools`: browser automation for E2E debugging
 
 ## Code Conventions
 

diff --git a/docker-compose.local.yaml b/docker-compose.local.yaml
@@ -0,0 +1,7 @@
+services:
+  app:
+    image: claude-code-hub:local
+    environment:
+      ENABLE_RATE_LIMIT: "true"
+      ENABLE_MODEL_RATE_LIMIT: "true"
+      AUTO_MIGRATE: "true"
diff --git a/docs/api/v1/README.md b/docs/api/v1/README.md
@@ -18,6 +18,10 @@ traffic can converge without reimplementing business rules.
 
 Every response includes `X-API-Version: 1.0.0`.
 
+### Resource guides
+
+- [Per-Model Limits](./model-limits.md): admin endpoints for per-model cost limits.
+
 ## Authentication
 
 The API accepts three credential transports:

diff --git a/docs/api/v1/model-limits.md b/docs/api/v1/model-limits.md
@@ -0,0 +1,114 @@
+# Per-Model Limits API
+
+Admin endpoints for managing per-model cost limits scoped to a user or an API
+key. These complement the mainline user/key quotas by letting you cap spend on a
+single model (or all models via a `*` wildcard) without affecting the shared
+account-level budget.
+
+See the OpenAPI surface for the authoritative schema:
+
+- OpenAPI JSON: `/api/v1/openapi.json`
+- Scalar UI: `/api/v1/scalar` (tag: `Model Limits`)
+
+## Feature flag
+
+Per-model limiting is opt-in and is enforced only when both flags are set:
+
+- `ENABLE_MODEL_RATE_LIMIT=true` (default `false`)
+- `ENABLE_RATE_LIMIT=true` (default `true`)
+
+The management endpoints below are always available to admins regardless of the
+flag, so limits can be configured ahead of enabling enforcement. When the flag
+is off, configured limits are stored but never evaluated, and the request path
+is unchanged.
+
+## Authentication
+
+All endpoints require `admin` access (session cookie, opaque session bearer
+token, or `ADMIN_TOKEN`; user API keys are rejected unless
+`ENABLE_API_KEY_ADMIN_ACCESS=true` for an admin-owned key). Cookie-authenticated
+mutations must include the CSRF token from `GET /api/v1/auth/csrf`.
+
+Errors use the standard `application/problem+json` envelope. Notable codes:
+
+- `model_limit.not_found` (404): the targeted limit row does not exist.
+- `model_limit.action_failed` (400): the underlying action rejected the input.
+- `auth.forbidden` (403): caller lacks admin access.
+
+## Endpoints
+
+| Method | Path | Description |
+| --- | --- | --- |
+| `GET` | `/api/v1/model-limits/users/{userId}` | List a user's per-model limits |
+| `POST` | `/api/v1/model-limits/users/{userId}` | Create or update a user limit (`model` in body) |
+| `DELETE` | `/api/v1/model-limits/users/{userId}/{model}` | Delete a user limit |
+| `GET` | `/api/v1/model-limits/keys/{keyId}` | List a key's per-model limits |
+| `POST` | `/api/v1/model-limits/keys/{keyId}` | Create or update a key limit (`model` in body) |
+| `DELETE` | `/api/v1/model-limits/keys/{keyId}/{model}` | Delete a key limit |
+
+For `DELETE`, URL-encode the `model` path segment. The wildcard `*` is
+`%2A` (e.g. `/api/v1/model-limits/keys/42/%2A`).
+
+### List response
+
+```json
+{
+  "items": [
+    {
+      "scopeType": "user",
+      "scopeId": 7,
+      "model": "claude-opus-4",
+      "rpmLimit": null,
+      "limit5hUsd": 2.5,
+      "limit5hResetMode": "fixed",
+      "dailyLimitUsd": 10,
+      "limitWeeklyUsd": null,
+      "limitMonthlyUsd": 100,
+      "limitTotalUsd": null,
+      "limit5hCostResetAt": null
+    }
+  ]
+}
+```
+
+### Upsert body
+
+```json
+{
+  "model": "claude-opus-4",
+  "limit5hUsd": 2.5,
+  "limit5hResetMode": "fixed",
+  "dailyLimitUsd": 10,
+  "limitWeeklyUsd": null,
+  "limitMonthlyUsd": 100,
+  "limitTotalUsd": null
+}
+```
+
+- `model` is required (1-128 chars). Use `*` for an all-models fallback.
+- Each USD field is optional. Omit a field to leave it unchanged on update;
+  send `null` to clear it (unlimited for that window).
+- `limit5hResetMode` is `fixed` or `rolling` and applies to the 5-hour window.
+- `rpmLimit` is reserved for a future release and is not enforced.
+
+The endpoint upserts on `(scope, model)` and returns the resulting row (HTTP
+200). `DELETE` returns HTTP 204 with no body.
+
+## Resolution semantics
+
+When a request is evaluated, the most specific matching limit is chosen via a
+4-level lookup (first match wins; no stacking):
+
+1. key + exact model
+2. key + `*`
+3. user + exact model
+4. user + `*`
+
+If none match, no per-model limit applies and the request continues under the
+mainline user/key quotas only.
+
+Usage is metered on the resolved (post-redirect) model name, consistent with the
+`model` column stored in `usage_ledger`. Limits reuse the mainline lease
+mechanism (PostgreSQL as the authoritative source, Redis lease slices, atomic
+decrement). On Redis failure the limiter fails open by default
+(`MODEL_RATE_LIMIT_FAIL_OPEN=true`).