feat(rate-limit): 用户组 × 模型组限额（Group Rate Limit） by GOSICK-Angel · Pull Request #1260 · ding113/claude-code-hub

GOSICK-Angel · 2026-06-09T05:43:31Z

Summary (English)

Refactors per-model rate limiting into a User Group x Model Group (Group Rate Limit) two-dimensional model:

Model Groups: Mutually exclusive groups of models (one model belongs to exactly one group).
Limit Subjects: User / User Group / Key.

Each (Subject x Model Group) pair supports cost caps across 5h / daily / weekly / monthly / total windows. Multiple sources (personal + user group) merge by taking the maximum value. User group limits are per-capita. Temporary quota boosts are also supported (per User x Model Group x Window, with expiry).

Controlled via ENABLE_MODEL_RATE_LIMIT feature flag (default off). When disabled, behavior is byte-for-byte identical to mainline (new DB columns have defaults, guard bypasses).

Key design decisions:

Billing isolation: When a model-group rate limit axis (user or key) is hit, consumption on that axis skips and is excluded from mainline global cost gates, tracked via counted_in_user_global / counted_in_key_global markers.
Resilience: Redis failures trigger fail-open via MODEL_RATE_LIMIT_FAIL_OPEN without setting bypass markers (preventing double-release).
Data layer: 5 new tables + 2 enums, 2 new columns on usage_ledger / message_request, single idempotent migration 0105_broad_crystal.sql.
Management: Admin REST API (OpenAPI-synced) + Dashboard UI for model groups, user groups, limits, and quota boosts. Full 5-language i18n.

Scope: 206 files, +32,603 / -3,669 lines. Covers schema, proxy guard pipeline, bucket lease service, admin API, dashboard UI, i18n, and documentation.

概述

将「按模型限额」重构为**用户组 × 模型组限额（Group Rate Limit）**两维度模型：

模型组：一组模型，全局互斥归属（一个模型只能属于一个组）。
限额主体：用户 / 用户组 / 密钥。

可为每个 (主体 × 模型组) 设置 5h / 每日 / 每周 / 每月 / 总额的成本上限。多来源（个人行 + 用户组上限）按取最大值合并，用户组限额为人均上限；并支持临时提额授予（按 用户 × 模型组 × 窗口，带有效期，到点即时生效/失效，叠加在有效上限之上）。

通过 ENABLE_MODEL_RATE_LIMIT 开关控制，默认关闭；关闭时与主线逐字节一致（新增列均带默认值，guard 直接 bypass）。

主要改动

计费完全切分：命中某轴（用户或密钥）模型组限额后，该轴消费既跳过主线全局成本闸门、也不计入该轴主线全局额——通过 usage_ledger / message_request 按轴打标 counted_in_user_global / counted_in_key_global 实现，DB 聚合、Redis 回填、展示分栏三处同源。RPM 与并发护栏始终生效。
韧性：Redis 故障时按 MODEL_RATE_LIMIT_FAIL_OPEN fail-open，且 fail-open 不置旁路标记，防止双重放行。
数据层：schema 新增 5 表 + 2 枚举（model_groups / model_group_members / model_group_limits / user_groups / quota_boost_grants，boost_window / limit_subject），usage_ledger / message_request 各加 2 列打标，system_settings 加 OPT-B 模型维度 lease 列。单个净增量迁移 0105_broad_crystal.sql（幂等 DDL）。
解析/计量：解析快照缓存（SWR + pub/sub 失效）、桶 lease 检查/扣减服务、guard pipeline 接入。
管理面：模型组 / 用户组 / 限额 / 提额 Admin REST API（OpenAPI 同步）+ Dashboard 管理界面，覆盖 5 语言 i18n。

Changes

New Files (by area)

Schema & Migration:

drizzle/0105_broad_crystal.sql - Idempotent DDL migration (5 tables, 2 enums, column additions)

Proxy Pipeline:

src/app/v1/_lib/proxy/model-rate-limit-guard.ts - New guard for model group rate limiting
src/app/v1/_lib/proxy/provider-session-cleanup.ts - Provider session cleanup logic

Admin REST API (OpenAPI-synced):

src/app/api/v1/resources/model-groups/ - Model group CRUD
src/app/api/v1/resources/model-limits/ - Model limit CRUD
src/app/api/v1/resources/quota-boosts/ - Quota boost CRUD
src/app/api/v1/resources/user-groups/ - User group CRUD
src/app/api/v1/resources/me/ - Self-service quota extensions

Server Actions:

src/actions/model-group.ts, src/actions/model-limit.ts, src/actions/quota-boost.ts, src/actions/user-group.ts

Dashboard UI:

src/app/[locale]/dashboard/quotas/model-limits/ - Model limits management pages
src/app/[locale]/dashboard/quotas/model-limits/model-groups/ - Model group management
src/app/[locale]/dashboard/quotas/model-limits/user-groups/ - User group management
src/app/[locale]/my-usage/_components/model-group-quota-cards.tsx - User-facing quota display

Documentation:

docs/limit/group-rate-limit.md - Feature documentation
docs/limit/README.md - Limit system overview
docs/api/v1/model-limits.md - API documentation

API Client & Schemas:

src/lib/api-client/v1/actions/ - Client-side API actions for all new resources
src/lib/api/v1/schemas/ - Zod schemas for model-groups, model-limits, quota-boosts

Modified Files (key changes)

src/drizzle/schema.ts (+132) - 5 new tables, 2 new enums, column additions
src/app/v1/_lib/proxy/guard-pipeline.ts (+71/-6) - Integrated model rate limit guard
src/app/v1/_lib/proxy/rate-limit-guard.ts (+385/-345) - Refactored for billing isolation
src/app/v1/_lib/proxy/session.ts (+99) - Extended session context
src/app/v1/_lib/proxy/response-handler.ts (+44/-19) - Response handling for model group billing
src/app/v1/_lib/proxy/forwarder.ts (+7/-2) - Forwarder integration
src/instrumentation.ts (+56) - Startup initialization for cache SWR
src/lib/api/v1/openapi-types.gen.ts (+5567/-1478) - Generated OpenAPI types
messages/*/quota.json - i18n strings for all 5 languages

Breaking Changes

None. The feature is behind ENABLE_MODEL_RATE_LIMIT (default off). All new DB columns have safe defaults. When disabled, the guard bypasses and behavior is identical to mainline.

Environment Variables

Variable	Default	Description
`ENABLE_MODEL_RATE_LIMIT`	`false`	Master switch for group rate limiting
`MODEL_RATE_LIMIT_FAIL_OPEN`	-	Redis failure behavior (fail-open when set)

测试计划

bun run typecheck 通过
bun run build 通过
bun run lint（无 error；warning 与基线同量级）
bun run i18n:audit-messages-no-emoji / i18n:audit-placeholders 通过
bun run validate:migrations 通过（迁移 0105 幂等）
纯单元测试全绿：模型组 / 用户组 / 限额 / 提额 REST、bucket-service、完全切分计费等
tests/api/* 与集成/E2E 需真实 Redis + Postgres，待在具备数据库的环境补充验证

备注 / Checklist

目标分支为 dev
Conventional Commits
Docker Build Test（CI 触发）
已知后续项：OPT-B 模型维度 lease 百分比（quotaModelLeasePercent* / quotaModelLeaseMinSliceUsd）当前未配置时回退主线百分比；真实 PG + Redis 的集成/E2E 测试待补。

Description enhanced by Claude AI

Greptile Summary

This PR introduces a two-dimensional User Group × Model Group rate limiting system behind ENABLE_MODEL_RATE_LIMIT (default off). The core design — SWR in-process snapshot, per-axis bypass flags for billing isolation, BucketLeaseService for atomic Redis enforcement, and idempotent DDL migration — is well-architected with thorough test coverage and correct failover/fail-closed semantics.

Proxy guard pipeline: modelRateLimit is inserted before rateLimit via a globalThis-backed extension registry; on a clean pass it sets bypassUser/KeyGlobalCost flags so the mainline cost gate is skipped for that axis and model-group spend is excluded from global aggregation via counted_in_*_global markers.
Billing isolation: resolveCountedFlags freezes per-axis counted flags at settle time; model bucket lease decrements run unconditionally while mainline decrements are skipped for bypassed axes; the all-null-caps edge case (a limit row with every cost window set to null) causes bypass to be set with no enforcement in place, effectively making that subject unlimited.
Schema & API: five new tables with correct indexes and safe DDL defaults; the upsert schema allows rows where all five cost limits are omitted, which can create the all-null bypass condition without a validation error.

Confidence Score: 3/5

Safe to merge only after the all-null-caps bypass is addressed; until then a misconfigured or placeholder limit row silently removes all cost enforcement for the matched subject.

The bypass flag in the guard is set unconditionally for every bucket that passes checkCostLimits, including buckets where every cost window is null (no active limit). A model_group_limits row created with only the required foreign-key fields and no dollar limits — which the upsert schema and DB both permit — causes checkCostLimits to return { allowed: true, failOpen: false }, triggering bypass and removing both the model-group and the mainline global cost gate for the matched user or key. This is a present defect on the changed enforcement path, not a theoretical future concern.

src/app/v1/_lib/proxy/model-rate-limit-guard.ts (bypass loop) and src/lib/api/v1/schemas/model-limits.ts (upsert validation)

Important Files Changed

Filename	Overview
src/app/v1/_lib/proxy/model-rate-limit-guard.ts	New guard step that resolves model-group buckets and sets per-axis bypass flags; contains the all-null-caps bypass bug where a limit row with no configured windows still bypasses the mainline global cost gate.
src/lib/api/v1/schemas/model-limits.ts	UpsertSchema allows all five cost fields to be null/omitted with no validation, enabling creation of all-null rows that trigger the bypass bug; a .refine() requiring at least one positive limit would close this.
src/lib/model-rate-limit/resolver.ts	Pure snapshot-based resolver implementing MAX-merge across user-group sources; 5h reset metadata inherits from first user-group source rather than the source contributing the maximum cap — documented design trade-off but can surprise admins.
src/lib/model-rate-limit/bucket-service.ts	Lease-based cost-window checker and decrementer for model group buckets; correctly guards against $0 misconfiguration; TOTAL_COST_CACHE_TTL tightened to 60s; logic is sound.
src/app/v1/_lib/proxy/guard-pipeline.ts	Adds globalThis-backed extension registry and bugfix #3 provider-session cleanup on guard throw; the provider-before-rateLimit reorder is intentional for post-redirect model lookup.
src/lib/model-rate-limit/cache.ts	SWR snapshot cache with Redis pub/sub invalidation; startup warm and read-your-writes on admin writes are correctly implemented.
src/app/v1/_lib/proxy/rate-limit-guard.ts	All mainline cost-gate checks and lease decrements are now wrapped in bypass-flag guards; billing isolation is correctly wired for all five windows and total.
src/app/v1/_lib/proxy/response-handler.ts	counted_in_*_global flags and model bucket lease decrements correctly integrated at billing time; bypass flags conditionally skip both fixed-5h counter writes and mainline lease decrements.
drizzle/0105_broad_crystal.sql	Idempotent DDL migration adding 5 tables, 2 enums, 4 new columns, and 9 indexes; all new columns have safe defaults and IF NOT EXISTS guards make it safe to re-run.
src/lib/model-rate-limit/backfill.ts	Correctly computes countedInUserGlobal/countedInKeyGlobal as !bypass* and fires model bucket decrements; clean separation from the guard enforcement path.

Sequence Diagram

sequenceDiagram
    participant C as Client
    participant GP as GuardPipeline
    participant MG as ModelRateLimitGuard
    participant BC as BucketCheckService
    participant RL as RateLimitGuard
    participant FW as Forwarder
    participant RH as ResponseHandler

    C->>GP: POST /v1/chat
    GP->>GP: auth / model / session / provider steps
    GP->>MG: execute(session)
    MG->>BC: resolveModelLimits(userId, keyId, model)
    BC-->>MG: buckets[]
    MG->>BC: checkCostLimits(bucket) x N
    BC-->>MG: "{allowed, failOpen}"
    alt violation
        MG-->>GP: throw RateLimitError
        GP->>GP: releaseProviderSessionRefs
    else "all passed and failOpen=false"
        MG->>MG: setBypassUser/KeyGlobalCost(true)
    end
    GP->>RL: execute(session)
    RL->>RL: "skip cost gates where bypass=true"
    GP->>FW: forward to upstream
    FW-->>RH: response + cost
    RH->>RH: resolveCountedFlags(!bypass)
    RH->>RH: updateMessageRequest(countedFlags)
    par model bucket decrements
        RH->>BC: decrementLease(bucket, cost) x N
    and mainline decrements non-bypassed axes
        RH->>RL: decrementLeaseBudget(axis, cost)
    end
    RH-->>C: proxied response

Comments Outside Diff (1)

src/app/v1/_lib/proxy/guard-pipeline.ts, line 270-280 (link)

provider step now runs before rateLimit, reversing the original cheap-gate order

In the original CHAT_PIPELINE, rateLimit ran before provider, so a rate-limited request was rejected with zero provider-selection overhead. The reordering means every rate-limited request now incurs a full provider-selection DB query and a Redis ZADD before failing. bugfix #03 correctly cleans up the ZADD, but this adds overhead for the common "quota exhausted" rejection path.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/app/v1/_lib/proxy/guard-pipeline.ts
Line: 270-280

Comment:
**`provider` step now runs before `rateLimit`, reversing the original cheap-gate order**

In the original `CHAT_PIPELINE`, `rateLimit` ran before `provider`, so a rate-limited request was rejected with zero provider-selection overhead. The reordering means every rate-limited request now incurs a full provider-selection DB query and a Redis ZADD before failing. `bugfix #03` correctly cleans up the ZADD, but this adds overhead for the common "quota exhausted" rejection path.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
src/app/v1/_lib/proxy/model-rate-limit-guard.ts:183-187
**All-null-caps bucket bypasses mainline global cost gate**

A `model_group_limits` row with every limit column null (a valid, API-accepted state — `UsdLimitSchema` marks all fields optional/nullable) resolves into a bucket whose `checkCostLimits` returns `{ allowed: true, failOpen: false }` because every lease window is skipped and `totalUsage` is never fetched. The loop below then unconditionally sets `bypassUserGlobalCost` / `bypassKeyGlobalCost` to `true`, silently removing the mainline global cost gate with no model-group enforcement in its place. The user becomes truly unlimited: no model-bucket spend is tracked (all `decrementLease` specs are filtered out) and no global gate applies.

The bypass should only be set when the bucket actually has at least one active (non-null, positive) limit window, i.e. add `if (!hasActiveLimit(bucket.caps)) continue;` before the axis-dispatch, where `hasActiveLimit` checks any of the five USD fields is not null/zero.

### Issue 2 of 3
src/lib/model-rate-limit/resolver.ts:96-123
**5h reset metadata taken from first user-group source regardless of which source contributes the effective cap**

`metaSource = indiv ?? ugSources[0]` means `limit5hResetMode` and `limit5hCostResetAt` are always inherited from the individual row (if present) or the first user-group source (by tag iteration order). If a second user-group contributes the higher `limit5hUsd` via `groupMaxForField` but configures a different reset mode (e.g. `"fixed"` vs the first group's `"rolling"`), the merged bucket enforces the maximum dollar cap from Group B but with Group A's `"rolling"` semantics. The admin who configured Group B's `"fixed"` mode cannot predict this outcome, and the effective enforcement diverges from what either group specifies in isolation.

This is an inherent limitation of single-metadata MAX-merge and may be an acceptable trade-off, but it is not currently documented in the user-facing docs or admin UI tooltips.

### Issue 3 of 3
src/lib/api/v1/schemas/model-limits.ts:49-76
**Upsert schema permits all-null limit rows with no validation error**

`UsdLimitSchema` marks every limit field `.optional()` and `.nullable()`. A `POST /api/v1/resources/model-limits` body with only `subjectType`, `subjectId`, and `modelGroupId` (and all five cost fields omitted or `null`) passes Zod validation and creates a row that triggers the all-null-caps bypass bug described on the guard. Adding a `.refine()` that requires at least one non-null, positive cost limit on create/update would prevent such rows from being persisted and would also serve as a clear hint to API consumers.

_{Reviews (3): Last reviewed commit: "fix(rate-limit): 修复二轮代码审核问题（故障切换/总额缓存/fa..." | Re-trigger Greptile}

将「按模型限额」重构为两维度模型——模型组（一组模型，全局互斥归属）× 限额主体（用户 / 用户组 / 密钥），可为每个 (主体 × 模型组) 设置 5h/每日/每周/每月/总额成本上限。多来源（个人行 + 用户组上限）按取最大值合并，用户组限额为人均上限；支持临时提额授予（按用户 × 模型组 × 窗口，带有效期，到点即时生效/失效，叠加在有效上限之上）。 - 完全切分：命中某轴（用户或密钥）模型组限额后，该轴消费既跳过主线全局成本闸门、也不计入该轴主线全局额（usage_ledger 按轴打标 counted_in_user_global / counted_in_key_global，DB 聚合、Redis 回填、展示分栏三处同源）；RPM 与并发护栏始终生效。Redis 故障时按 MODEL_RATE_LIMIT_FAIL_OPEN fail-open，且 fail-open 不置旁路标记以防双重放行。 - 新增模块：schema 五表 + 两枚举 + usage_ledger/message_request 打标两列、解析快照缓存（SWR + pub/sub 失效）、桶 lease 计量、guard 接入、模型组/用户组/限额/提额 Admin REST API、Dashboard 管理界面，5 语言 i18n。 - 通过 ENABLE_MODEL_RATE_LIMIT 开关控制，默认关闭，关闭时与主线逐字节一致。测试：纯单元测试全绿（含模型组/用户组/限额/提额/bucket-service）；typecheck、 build、lint、i18n 审计、迁移幂等校验均通过。tests/api/* 需 Redis/DB，本地未配置。

coderabbitai · 2026-06-09T05:43:39Z

Important

Review skipped

Too many files!

This PR contains 205 files, which is 55 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5bbf7719-ec70-4db4-b1fc-10f9bd267277

📥 Commits

Reviewing files that changed from the base of the PR and between 52a3f3b and 746ec5c.

📒 Files selected for processing (205)

.env.example
CHANGELOG.md
CLAUDE.md
docker-compose.local.yaml
docs/api/v1/README.md
docs/api/v1/model-limits.md
docs/limit/README.md
docs/limit/group-rate-limit.md
drizzle/0105_broad_crystal.sql
drizzle/meta/0105_snapshot.json
drizzle/meta/_journal.json
messages/en/errors.json
messages/en/myUsage.json
messages/en/quota.json
messages/en/settings/providers/strings.json
messages/ja/errors.json
messages/ja/myUsage.json
messages/ja/quota.json
messages/ja/settings/providers/strings.json
messages/ru/errors.json
messages/ru/myUsage.json
messages/ru/quota.json
messages/ru/settings/providers/strings.json
messages/zh-CN/errors.json
messages/zh-CN/myUsage.json
messages/zh-CN/quota.json
messages/zh-CN/settings/providers/strings.json
messages/zh-TW/errors.json
messages/zh-TW/myUsage.json
messages/zh-TW/quota.json
messages/zh-TW/settings/providers/strings.json
scripts/audit-settings-placeholders.allowlist.json
src/actions/key-quota.ts
src/actions/keys.ts
src/actions/model-group.ts
src/actions/model-limit.ts
src/actions/my-usage.ts
src/actions/quota-boost.ts
src/actions/system-config.ts
src/actions/user-group.ts
src/actions/users.ts
src/app/[locale]/dashboard/_components/user/key-quota-usage-dialog.tsx
src/app/[locale]/dashboard/_components/user/user-limit-badge.tsx
src/app/[locale]/dashboard/my-quota/page.tsx
src/app/[locale]/dashboard/quotas/_components/quota-tabs-nav.tsx
src/app/[locale]/dashboard/quotas/layout.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/edit-model-limit-dialog.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/model-combobox.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/model-limits-client.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/model-limits-sub-nav.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/quota-boost-dialog.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/quota-boost-panel.tsx
src/app/[locale]/dashboard/quotas/model-limits/_components/searchable-select.tsx
src/app/[locale]/dashboard/quotas/model-limits/layout.tsx
src/app/[locale]/dashboard/quotas/model-limits/model-groups/_components/create-model-group-dialog.tsx
src/app/[locale]/dashboard/quotas/model-limits/model-groups/_components/manage-members-dialog.tsx
src/app/[locale]/dashboard/quotas/model-limits/model-groups/_components/model-group-client.tsx
src/app/[locale]/dashboard/quotas/model-limits/model-groups/_components/model-members-select.tsx
src/app/[locale]/dashboard/quotas/model-limits/model-groups/page.tsx
src/app/[locale]/dashboard/quotas/model-limits/page.tsx
src/app/[locale]/dashboard/quotas/model-limits/user-groups/_components/user-group-client.tsx
src/app/[locale]/dashboard/quotas/model-limits/user-groups/page.tsx
src/app/[locale]/my-usage/_components/model-group-quota-cards.tsx
src/app/[locale]/my-usage/_components/quota-cards.tsx
src/app/[locale]/my-usage/page.tsx
src/app/[locale]/settings/providers/_components/provider-manager.tsx
src/app/[locale]/settings/providers/_components/provider-model-view.tsx
src/app/api/v1/_root/app.ts
src/app/api/v1/resources/me/handlers.ts
src/app/api/v1/resources/me/router.ts
src/app/api/v1/resources/model-groups/handlers.ts
src/app/api/v1/resources/model-groups/router.ts
src/app/api/v1/resources/model-limits/handlers.ts
src/app/api/v1/resources/model-limits/router.ts
src/app/api/v1/resources/quota-boosts/handlers.ts
src/app/api/v1/resources/quota-boosts/router.ts
src/app/api/v1/resources/user-groups/handlers.ts
src/app/api/v1/resources/user-groups/router.ts
src/app/v1/_lib/proxy-handler.ts
src/app/v1/_lib/proxy/errors.ts
src/app/v1/_lib/proxy/forwarder.ts
src/app/v1/_lib/proxy/guard-pipeline.ts
src/app/v1/_lib/proxy/model-rate-limit-guard.ts
src/app/v1/_lib/proxy/provider-session-cleanup.ts
src/app/v1/_lib/proxy/rate-limit-guard.ts
src/app/v1/_lib/proxy/response-handler.ts
src/app/v1/_lib/proxy/session.ts
src/drizzle/schema.ts
src/instrumentation.ts
src/lib/api-client/v1/actions/model-groups.ts
src/lib/api-client/v1/actions/model-limits.ts
src/lib/api-client/v1/actions/my-usage.ts
src/lib/api-client/v1/actions/quota-boosts.ts
src/lib/api-client/v1/actions/user-groups.ts
src/lib/api-client/v1/openapi-types.gen.ts
src/lib/api/v1/action-migration-matrix.ts
src/lib/api/v1/schemas/model-groups.ts
src/lib/api/v1/schemas/model-limits.ts
src/lib/api/v1/schemas/quota-boosts.ts
src/lib/api/v1/schemas/system-config.ts
src/lib/api/v1/schemas/user-groups.ts
src/lib/api/v1/schemas/users.ts
src/lib/config/system-settings-cache.ts
src/lib/dashboard/user-limit-usage-cache.ts
src/lib/ledger-backfill/trigger.sql
src/lib/model-rate-limit/backfill.ts
src/lib/model-rate-limit/boost-cleanup.ts
src/lib/model-rate-limit/bucket-lease.ts
src/lib/model-rate-limit/bucket-service.ts
src/lib/model-rate-limit/cache.ts
src/lib/model-rate-limit/keys.ts
src/lib/model-rate-limit/lease.ts
src/lib/model-rate-limit/register.ts
src/lib/model-rate-limit/resolver.ts
src/lib/model-rate-limit/types.ts
src/lib/notification/tasks/cost-alert.ts
src/lib/quota/limit-usage-split.ts
src/lib/rate-limit/index.ts
src/lib/rate-limit/lease-service.ts
src/lib/rate-limit/lease.ts
src/lib/rate-limit/service.ts
src/lib/redis/cost-cache-cleanup.ts
src/lib/redis/cost-display-cache-singleflight.ts
src/lib/redis/cost-display-cache.ts
src/lib/redis/lua-scripts.ts
src/lib/utils/error-messages.ts
src/lib/validation/schemas.ts
src/lib/webhook/templates/cost-alert.ts
src/lib/webhook/templates/placeholders.ts
src/lib/webhook/types.ts
src/repository/_shared/transformers.test.ts
src/repository/_shared/transformers.ts
src/repository/key.ts
src/repository/message-write-buffer.ts
src/repository/message.ts
src/repository/model-group-limit.ts
src/repository/model-group.ts
src/repository/provider.ts
src/repository/quota-boost.ts
src/repository/statistics.ts
src/repository/system-config.ts
src/repository/user-group.ts
src/types/system-config.ts
tests/api/v1/model-groups/model-groups.test.ts
tests/api/v1/model-limits/model-limits.test.ts
tests/api/v1/quota-boosts/quota-boosts.test.ts
tests/api/v1/user-groups/user-groups.test.ts
tests/configs/integration.config.ts
tests/e2e/_helpers/grl-harness.ts
tests/e2e/group-rate-limit-scenarios-extra-live.test.ts
tests/e2e/group-rate-limit-scenarios-live.test.ts
tests/e2e/model-rate-limit-lease-live.test.ts
tests/e2e/quota-boosts-live.test.ts
tests/integration/cost-cache-rolling-flow.test.ts
tests/integration/model-group-complete-split.test.ts
tests/integration/usage-ledger.test.ts
tests/integration/user-5h-reset-flow.test.ts
tests/unit/actions/key-quota-concurrent-inherit.test.ts
tests/unit/actions/key-quota-cost-reset.test.ts
tests/unit/actions/my-usage-user-5h-reset-boundary.test.ts
tests/unit/actions/total-usage-semantics.test.ts
tests/unit/actions/user-all-limit-window.test.ts
tests/unit/api/v1/action-migration-matrix.test.ts
tests/unit/app/proxy/guard-pipeline-error-cleanup.test.ts
tests/unit/lib/model-rate-limit/keys.test.ts
tests/unit/lib/model-rate-limit/lease.test.ts
tests/unit/lib/model-rate-limit/register.test.ts
tests/unit/lib/model-rate-limit/types.test.ts
tests/unit/lib/rate-limit/cache-filter-consistency.test.ts
tests/unit/lib/rate-limit/cost-limits.test.ts
tests/unit/lib/rate-limit/failover-bucket-resolve.test.ts
tests/unit/lib/rate-limit/get-current-cost-batch-parallel.test.ts
tests/unit/lib/rate-limit/get-current-cost-batch.test.ts
tests/unit/lib/rate-limit/get-current-cost-rolling.test.ts
tests/unit/lib/rate-limit/lease-overspend.test.ts
tests/unit/lib/rate-limit/rolling-boundary-cache.test.ts
tests/unit/lib/rate-limit/rolling-window-5h.test.ts
tests/unit/lib/rate-limit/rolling-window-cache-warm.test.ts
tests/unit/lib/rate-limit/service-extra.test.ts
tests/unit/lib/rate-limit/user-5h-reset-boundary.test.ts
tests/unit/lib/redis/cleanup-legacy-rolling-zsets.test.ts
tests/unit/lib/redis/cost-display-cache-singleflight.test.ts
tests/unit/lib/redis/cost-display-cache.test.ts
tests/unit/model-rate-limit/backfill.test.ts
tests/unit/model-rate-limit/boost-cleanup.test.ts
tests/unit/model-rate-limit/bucket-lease-service.test.ts
tests/unit/model-rate-limit/bucket-lease.test.ts
tests/unit/model-rate-limit/bucket-service.test.ts
tests/unit/model-rate-limit/cache.test.ts
tests/unit/model-rate-limit/guard.test.ts
tests/unit/model-rate-limit/resolver.test.ts
tests/unit/notification/cost-alert-window.test.ts
tests/unit/proxy/guard-pipeline-extension.test.ts
tests/unit/proxy/guard-pipeline-registry-shared.test.ts
tests/unit/proxy/model-rate-limit-guard.test.ts
tests/unit/proxy/rate-limit-guard.test.ts
tests/unit/quota-boost-repository.test.ts
tests/unit/repository/key-resolve-outcome.test.ts
tests/unit/repository/model-group.test.ts
tests/unit/repository/statistics-cost-by-model.test.ts
tests/unit/repository/sum-provider-cost-batch.test.ts
tests/unit/repository/system-config-update-missing-columns.test.ts
tests/unit/repository/user-group.test.ts
tests/unit/validation/system-settings-quota-lease.test.ts
tests/unit/webhook/templates/templates.test.ts

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efce8162c0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T05:47:20Z

+ALTER TABLE "usage_ledger" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint
+ALTER TABLE "usage_ledger" ADD COLUMN "counted_in_key_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint


Replace the ledger trigger in this migration

When this migration runs on an existing database, adding these columns is not enough: the installed fn_upsert_usage_ledger remains the previous definition from earlier migrations, whose INSERT/UPDATE lists do not copy NEW.counted_in_*_global. Because these new usage_ledger columns default to true, a model-split request can write false on message_request while the ledger row stays counted globally, so global DB/lease aggregations start charging split spend again after any refresh/restart. Please include the CREATE OR REPLACE FUNCTION update from src/lib/ledger-backfill/trigger.sql in this migration.

Useful? React with 👍 / 👎.

已修复（0ebebe96）：在 0105 末尾追加 fn_upsert_usage_ledger 的 CREATE OR REPLACE（镜像 src/lib/ledger-backfill/trigger.sql），使 ledger 行同步写入 counted_in_*_global，避免重启后切分计费被重新计入全局额。

chatgpt-codex-connector · 2026-06-09T05:47:20Z

+    const row = await repo.createQuotaBoostGrant({
+      userId: input.userId,
+      modelGroupId: input.modelGroupId,
+      window: input.window,
+      amountUsd: input.amountUsd,


Invalidate the model-limit cache after boost writes

This write changes data that is loaded into buildModelLimitSnapshot() (listAllActiveAndFutureGrants()), but this action never calls publishModelLimitCacheInvalidation() after creating or deleting a grant. If an admin grants or revokes an immediately-active boost while a proxy pod has a fresh 30s snapshot, enforcement continues using the old boost set until the cache expires and refreshes, so grants are delayed and revoked boosts can still allow traffic briefly. Please refresh/broadcast the model-limit cache on these mutations, as the model-limit upsert/delete actions do.

Useful? React with 👍 / 👎.

已修复（0ebebe96）：createQuotaBoostGrantAction / deleteQuotaBoostGrantAction 写后调用 publishModelLimitCacheInvalidation()，提额/撤销即时生效。

gemini-code-assist

Code Review

This pull request introduces the 'Group Rate Limit' feature, refactoring per-model limits into a two-dimensional model (model groups × limit subjects) with support for temporary quota boosts and complete cost partitioning. It adds five new database tables, updates the proxy pipeline with a new model rate limit guard, implements an in-process snapshot cache with stale-while-revalidate, and provides corresponding Admin REST APIs and Dashboard UI. Feedback on the changes highlights two critical issues: first, removing the postgres service and DSN from docker-compose.yaml will break default containerized deployments; second, the createSingletonModelGroup function silently ignores member conflicts using onConflictDoNothing(), which violates the global exclusivity rule (D6) and should be resolved by pre-checking for existing group memberships.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-09T05:51:21Z

    environment:
-      NODE_ENV: production
-      # 容器内使用 Dockerfile 默认端口 3000，对外通过 APP_PORT 暴露（默认 23000）
-      DSN: postgresql://${DB_USER:-postgres}:${DB_PASSWORD:-postgres}@postgres:5432/${DB_NAME:-claude_code_hub}


在 docker-compose.yaml 中移除 postgres 依赖和 DSN 环境变量会导致容器化部署在默认情况下无法正常工作。当使用 docker-compose 部署时，app 容器需要通过 postgres 服务名（即 Docker 内部网络域名）来连接数据库。如果移除了 DSN，容器将回退到使用 .env 中的 DSN（通常配置为 localhost），这会导致 app 容器尝试连接自身内部的 5432 端口而失败。建议保留 postgres 依赖和 DSN 环境变量，以确保开箱即用的容器化部署能力。

已修复（0ebebe96）：恢复 app 的 postgres 依赖与 DSN、AUTO_MIGRATE、ENABLE_RATE_LIMIT 等环境变量；本地测试用配置已分流到 docker-compose.local.yaml。

gemini-code-assist · 2026-06-09T05:51:21Z

+export async function createSingletonModelGroup(
+  model: string,
+  name?: string
+): Promise<ModelGroupRow> {
+  return db.transaction(async (tx) => {
+    const groupName = name?.trim() ?? model;
+
+    const [row] = await tx
+      .insert(modelGroups)
+      .values({ name: groupName, isSingleton: true })
+      .returning();
+
+    await tx
+      .insert(modelGroupMembers)
+      .values({ modelGroupId: row.id, model })
+      .onConflictDoNothing();
+
+    return row;
+  });
+}


在 createSingletonModelGroup 中，如果传入的 model 已经属于其他模型组，由于 modelGroupMembers 表在 model 列上有全局唯一索引，插入操作会发生冲突。当前实现使用了 onConflictDoNothing()，这会导致冲突被静默忽略，模型不会被关联到新组，但空的 modelGroups 记录仍然会被创建并成功返回。这违反了模型组的全局互斥规则（D6）。建议在创建组之前先检查该模型是否已存在于其他组中，若存在则抛出 ModelGroupMemberConflictError，以便事务回滚并向用户返回友好的错误提示。

export async function createSingletonModelGroup( model: string, name?: string ): Promise<ModelGroupRow> { return db.transaction(async (tx) => { const existing = await tx .select({ modelGroupId: modelGroupMembers.modelGroupId, groupName: modelGroups.name, }) .from(modelGroupMembers) .innerJoin(modelGroups, eq(modelGroupMembers.modelGroupId, modelGroups.id)) .where(eq(modelGroupMembers.model, model)) .limit(1); if (existing.length > 0) { const { modelGroupId, groupName } = existing[0]; throw new ModelGroupMemberConflictError(model, modelGroupId, groupName); } const groupName = name?.trim() ?? model; const [row] = await tx .insert(modelGroups) .values({ name: groupName, isSingleton: true }) .returning(); await tx .insert(modelGroupMembers) .values({ modelGroupId: row.id, model }); return row; }); }

已修复（0ebebe96）：createSingletonModelGroup 在事务内先查模型归属冲突，命中即抛 ModelGroupMemberConflictError 回滚，遵守 D6 全局互斥；并补充了对应单测。

greptile-apps · 2026-06-09T05:55:46Z

+function leaseWindowSpecs(bucket: ModelLimitBucket): WindowSpec[] {
+  const { caps } = bucket;
+  return [
+    {


Total window cache TTL is 60× longer than lease windows

The TOTAL_COST_CACHE_TTL_SECONDS = 300 (5 minutes) means a user who reaches their total budget can continue to spend for up to 5 minutes before the stale cached value expires and the DB is re-queried. In contrast, the lease-based windows (5h, daily, weekly, monthly) are seeded from DB on a quotaDbRefreshIntervalSeconds interval (default 10 s) with atomic Redis decrements. A user can overshoot their total model-group budget by an entire 5-minute window of spend with no enforcement.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/lib/model-rate-limit/bucket-service.ts Line: 44 Comment: **Total window cache TTL is 60× longer than lease windows** The `TOTAL_COST_CACHE_TTL_SECONDS = 300` (5 minutes) means a user who reaches their total budget can continue to spend for up to 5 minutes before the stale cached value expires and the DB is re-queried. In contrast, the lease-based windows (5h, daily, weekly, monthly) are seeded from DB on a `quotaDbRefreshIntervalSeconds` interval (default 10 s) with atomic Redis decrements. A user can overshoot their total model-group budget by an entire 5-minute window of spend with no enforcement. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

已调整（0ebebe96）：TOTAL_COST_CACHE_TTL_SECONDS 由 300s 收紧为 60s，缩小全局额超支窗口。该读穿仅在配置了 total 上限时触发全量聚合，开销可控。

greptile-apps · 2026-06-09T05:55:47Z

+  description: text('description'),
+  createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
+  updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow().notNull(),
+}, (t) => ({


model_group_limits.subject_id is a polymorphic FK with no cascade

The subject_id column references users.id, keys.id, or user_groups.id depending on subject_type, but has no foreign key constraint. Deleting a user or API key leaves their model_group_limits rows intact. Every snapshot rebuild performs listAllModelGroupLimits() (full-table scan), so the snapshot grows indefinitely with phantom limit entries for deleted entities in deployments that cycle users or rotate keys frequently.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/drizzle/schema.ts Line: 1234 Comment: **`model_group_limits.subject_id` is a polymorphic FK with no cascade** The `subject_id` column references `users.id`, `keys.id`, or `user_groups.id` depending on `subject_type`, but has no foreign key constraint. Deleting a user or API key leaves their `model_group_limits` rows intact. Every snapshot rebuild performs `listAllModelGroupLimits()` (full-table scan), so the snapshot grows indefinitely with phantom limit entries for deleted entities in deployments that cycle users or rotate keys frequently. How can I resolve this? If you propose a fix, please make it concise.

暂不修改：多态 subject_id 无法建单一外键；残留行在 resolver 中是惰性的（仅按存活主体匹配），不影响判定、仅占少量内存。建议后续仿 boost-cleanup.ts 加周期清理任务，详见汇总评论。

greptile-apps · 2026-06-09T05:55:49Z

+      if (
+        totalUsage !== null &&
+        caps.limitTotalUsd !== null &&
+        caps.limitTotalUsd !== undefined &&
+        totalUsage >= caps.limitTotalUsd
+      ) {


`$0 total limit permanently blocks users

If limitTotalUsd is 0 (admin misconfiguration), totalUsage >= 0 is always true since usage is non-negative, permanently hard-blocking the user. A caps.limitTotalUsd > 0 guard prevents this edge case.

Suggested change

if (

totalUsage !== null &&

caps.limitTotalUsd !== null &&

caps.limitTotalUsd !== undefined &&

totalUsage >= caps.limitTotalUsd

) {

if (

totalUsage !== null &&

caps.limitTotalUsd !== null &&

caps.limitTotalUsd !== undefined &&

caps.limitTotalUsd > 0 &&

totalUsage >= caps.limitTotalUsd

) {

Prompt To Fix With AI

This is a comment left during a code review. Path: src/lib/model-rate-limit/bucket-service.ts Line: 141-146 Comment: **`$0 total limit permanently blocks users** If `limitTotalUsd` is 0 (admin misconfiguration), `totalUsage >= 0` is always true since usage is non-negative, permanently hard-blocking the user. A `caps.limitTotalUsd > 0` guard prevents this edge case. ```suggestion if ( totalUsage !== null && caps.limitTotalUsd !== null && caps.limitTotalUsd !== undefined && caps.limitTotalUsd > 0 && totalUsage >= caps.limitTotalUsd ) { ``` How can I resolve this? If you propose a fix, please make it concise.

已修复（0ebebe96）：加 caps.limitTotalUsd > 0 守卫，避免 $0 误配置永久封锁用户。

github-actions · 2026-06-09T06:08:58Z

+	"updated_at" timestamp with time zone DEFAULT now() NOT NULL
+);
+--> statement-breakpoint
+ALTER TABLE "message_request" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint


[HIGH] [ERROR-HANDLING] Migration ALTER TABLE ADD COLUMN / ADD CONSTRAINT statements are not idempotent

The PR description states the migration is "idempotent DDL", but the ALTER TABLE ADD COLUMN (lines 67-75) and ALTER TABLE ADD CONSTRAINT (lines 76-79) statements lack the DO $$ BEGIN ... EXCEPTION WHEN ... END $$; wrapper that the enum creation uses. Re-running this migration (e.g., after a partial failure) will fail with column already exists or constraint already exists errors.

Suggested fix:

DO $$ BEGIN ALTER TABLE "message_request" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL; EXCEPTION WHEN duplicate_column THEN null; END $$;

Apply this pattern to all 9 ALTER TABLE statements (lines 67-79).

已修复（0ebebe96）：ADD COLUMN 改为 IF NOT EXISTS，4 个 ADD CONSTRAINT 包 DO $$ ... EXCEPTION WHEN duplicate_object；validate:migrations 通过。

github-actions · 2026-06-09T06:09:17Z

+  updates.updatedAt = new Date();
+
+  const [row] = await db.update(userGroups).set(updates).where(eq(userGroups.id, id)).returning();
+  return row;


[HIGH] [LOGIC-BUG] updateUserGroup returns undefined for non-existent row, violating return type

When id does not match any row, db.update(...).returning() returns an empty array. Destructuring const [row] = [] yields undefined, which is returned despite the declared type Promise<UserGroupRow>. This will cause a runtime TypeError in any caller that accesses properties on the result (e.g., updatedGroup.name).

getUserGroup (line 51) correctly handles this by returning null, and model-group.ts throws on missing rows. This function should follow the same pattern.

Suggested fix:

const [row] = await db.update(userGroups).set(updates).where(eq(userGroups.id, id)).returning(); if (\!row) { throw new Error(`User group id=${id} not found`); } return row;

已修复（0ebebe96）：updateUserGroup 对不存在的行抛错，符合 Promise<UserGroupRow> 契约。

github-actions · 2026-06-09T06:09:24Z

+      // Re-resolve path: log only. The in-flight forwarder run already committed
+      // to the new provider; the bucket lease decrement at settle will reflect
+      // the overshoot. Mainline global gate stays in effect for safety.
+      return;


[MEDIUM] [ERROR-SILENT] Re-resolve violation path says "log only" but has no logger call

The comment on line 121 says "log only" but the return on line 124 is reached without any logger.warn(). During provider failover, if a model-group bucket violation is detected, it is silently ignored with zero audit trail. This makes it impossible to diagnose why a request that should have been rate-limited was allowed through on the re-resolve path.

Suggested fix:

if (\!options.throwOnViolation) { const { bucket, result } = violation; logger.warn("[ModelRateLimit] Bucket violation on re-resolve path (not enforced, in-flight request continues)", { axis: bucket.axis, scopeId: bucket.scopeId, modelGroupId: bucket.modelGroupId, window: result.window, currentUsage: result.currentUsage, limitValue: result.limitValue, }); return; }

已修复（0ebebe96）：re-resolve 违规路径补 logger.warn，记录 axis/scopeId/modelGroupId/window/usage/limit。

github-actions · 2026-06-09T06:09:40Z

+  modelGroupId?: number;
+}): Promise<ActionResult<ModelGroupLimitRecord[]>> {
+  const session = await getSession();
+  if (!isAdmin(session)) return { ok: false, error: "权限不足" };


[MEDIUM] [STANDARD-VIOLATION] Missing errorCode on all error returns makes handler status mapping fragile

All error returns in this action omit the errorCode field (e.g., line 29 returns { ok: false, error: "..." } without errorCode). The handler (model-limits/handlers.ts) falls back to fragile Chinese string matching on detail.includes("权限") and detail.includes("不存在") to determine HTTP status codes. If these messages change during i18n or refactoring, the status mapping silently breaks.

Suggested fix:

if (\!isAdmin(session)) return { ok: false, error: "权限不足", errorCode: "UNAUTHORIZED" }; // ... return { ok: false, error: "获取按模型限额失败", errorCode: "NOT_FOUND" };

已修复（0ebebe96）：各 error 返回补 errorCode（UNAUTHORIZED / OPERATION_FAILED / DELETE_FAILED），handler 状态映射不再依赖中文字符串匹配。

github-actions · 2026-06-09T06:09:51Z

+    const data: UserGroupWithCount[] = await Promise.all(
+      groups.map(async (g) => ({
+        ...g,
+        memberCount: await countUsersInUserGroup(g.tag),


[MEDIUM] [PERFORMANCE-ISSUE] N+1 query: countUsersInUserGroup called per group

countUsersInUserGroup(g.tag) is called inside Promise.all for each user group. Each call performs a full users table scan with a JSONB containment check (tags @> [tag]::jsonb). With many user groups, this creates N independent full-table scans. The same pattern appears at line 76 for single-group lookup.

Suggested fix:
Batch the count into a single query:

const groups = await repoListUserGroups(); if (groups.length === 0) return { ok: true, data: [] }; // Single query with GROUP BY instead of N separate counts const counts = await db .select({ tag: sql<string>`unnest(${users.tags})`, count: sql<number>`count(*)::int` }) .from(users) .where(isNull(users.deletedAt)) .groupBy(sql`unnest(${users.tags})`); const countMap = new Map(counts.map((r) => [r.tag, r.count])); const data = groups.map((g) => ({ ...g, memberCount: countMap.get(g.tag) ?? 0 }));

已修复（0ebebe96）：新增 countUsersByTags 单次扫描批量计数，listUserGroups 不再逐组查询，消除 N+1。

github-actions · 2026-06-09T06:10:27Z

+  if ttl > 0 then
+    redis.call('SETEX', key, ttl, cjson.encode(lease))
+  end
+  return {newRemaining, 1}


[LOW] [LOGIC-BUG] Redis Lua return {newRemaining, 1} truncates fractional USD values to integer

Redis Lua returns table values as long long integers. When newRemaining is 0.75 (USD), the caller receives 0. The value stored in Redis JSON via cjson.encode is correct, so enforcement is unaffected. The newRemaining return value is currently fire-and-forget in the response handler, so no immediate breakage, but this will surprise any future caller that relies on the return value for display or logging.

Suggested fix:
Return as string to preserve precision:

return {tostring(newRemaining), "1"}

And update the caller:

const result = (await redis.eval(DECREMENT_LUA_SCRIPT, 1, leaseKey, String(cost))) as [string, string]; const newRemaining = Number(result[0]); const success = result[1] === "1" ? 1 : 0;

已修复（0ebebe96）：Lua 返回字符串元组（tostring(newRemaining) / "1"），调用侧 Number() 解析，保留小数美元精度。

github-actions · 2026-06-09T06:10:28Z

+
+      const remainingBudget = calculateLeaseSlice({
+        limitAmount,
+        currentUsage,


[LOW] [ERROR-HANDLING] decrementLeaseBudget lacks try/catch, inconsistent with sibling methods

getCostLease and refreshCostLeaseFromDb both wrap their logic in try/catch with fail-open semantics. This method is the only one without error handling. While ModelLeaseService.decrementLeaseBudget catches internally, a pre-delegation error (e.g., from buildModelGroupLeaseKey) would propagate uncaught. In the response handler, this would cause the entire Promise.all lease decrement batch to fail, losing all other decrements (key/user/provider) as well.

Suggested fix:

static async decrementLeaseBudget(params: DecrementBucketLeaseParams): Promise<DecrementModelLeaseResult> { try { const leaseKey = buildModelGroupLeaseKey(axis, scopeId, modelGroupId, window, resetMode); return await ModelLeaseService.decrementLeaseBudget({...}); } catch (error) { logger.error("[BucketLease] decrementLeaseBudget failed, fail-open", { axis, scopeId, modelGroupId, window, error }); return { success: true, newRemaining: -1, failOpen: true }; } }

已修复（0ebebe96）：decrementLeaseBudget 增加 try/catch fail-open，与同类方法一致，避免污染 response handler 的 Promise.all。

github-actions

Code Review Summary

This is a substantial XL PR (206 files, 36K+ lines) implementing a two-dimensional Group Rate Limit system (user-group x model-group). The core proxy guard pipeline, bucket service, lease system, and resolver are well-designed with proper fail-open semantics and correct complete-split billing logic. The issues below are concentrated in the data layer (migration, repository) and API handler surfaces.

PR Size: XL

Lines changed: 36,272 (32,603 additions + 3,669 deletions)
Files changed: 206

Split suggestion: This PR could be split into at least 3 logical units:

Schema + Migration + Repository (~20 files): New tables, migration, repository layer
Core Rate Limit Engine (~15 files): model-rate-limit/ library, guard pipeline, proxy integration
Admin API + Dashboard UI (~170 files): REST handlers, actions, UI components, i18n

Issues Found

Category	High	Medium	Low
Logic/Bugs	2	1	1
Security	0	0	0
Error Handling	0	2	1
Types	0	0	0
Comments/Docs	0	0	0
Tests	0	0	0
Performance	0	1	0
Simplification	0	0	0

High Priority Issues (Should Fix)

Migration not idempotent (drizzle/0105_broad_crystal.sql:67-79) - ALTER TABLE ADD COLUMN/ADD CONSTRAINT statements lack DO $$ BEGIN ... EXCEPTION WHEN ... END $$; wrapper, contradicting the PR's idempotency claim. Partial failure recovery is impossible.
updateUserGroup returns undefined for non-existent row (src/repository/user-group.ts:40) - TypeScript declares Promise<UserGroupRow> but runtime returns undefined when the ID doesn't match. Inconsistent with getUserGroup which returns null and model-group.ts which throws.

Medium Priority Issues (Should Fix)

Re-resolve violation path missing logging (src/app/v1/_lib/proxy/model-rate-limit-guard.ts:124) - Comment says "log only" but no logger.warn() call exists. Provider-failover bucket violations are silently ignored with zero audit trail.
Missing errorCode in model-limit action (src/actions/model-limit.ts:29) - All error returns omit errorCode, causing handler to fall back to fragile Chinese string matching (detail.includes("权限")) for HTTP status mapping.
N+1 query in listUserGroups (src/actions/user-group.ts:49) - countUsersInUserGroup(tag) is called per group via Promise.all, each performing a full users table JSONB containment scan.

Low Priority Issues (Author's Discretion)

Redis Lua returns truncated integers for fractional USD (src/lib/model-rate-limit/lease.ts:60) - return {newRemaining, 1} truncates floats to integers. The stored value is correct; the return value is currently unused but will surprise future callers.
BucketLeaseService.decrementLeaseBudget lacks try/catch (src/lib/model-rate-limit/bucket-lease.ts:158) - Inconsistent with sibling methods. An unhandled error could cause the entire lease decrement batch to fail.

Review Coverage

Automated review by Claude AI

针对 PR ding113#1260 多轮代码审核（Codex / Gemini / Greptile / CI）修复以下问题。迁移与部署 - 0105 迁移追加 fn_upsert_usage_ledger 的 CREATE OR REPLACE，使 usage_ledger.counted_in_*_global 与 message_request 同源，避免重启后切分计费被重新计入全局额 - 0105 的 ADD COLUMN / ADD CONSTRAINT 改为幂等（IF NOT EXISTS / DO 块） - 恢复 docker-compose.yaml 中 app 的 postgres 依赖与 DSN 等环境变量，修复容器化部署的开箱即用能力数据与限额逻辑 - createSingletonModelGroup 在事务内检测模型归属冲突并抛错，遵守 D6 全局互斥 - updateUserGroup 对不存在的行抛错，符合返回类型契约 - bucket-service 为 total 上限加 >0 守卫，避免 $0 误配置永久封锁用户 - total 窗口读穿缓存 TTL 由 300s 收紧为 60s，缩小全局额超支窗口 - bucket-lease.decrementLeaseBudget 增加 try/catch fail-open，避免污染批量扣减 - lease Lua 返回字符串元组以保留小数美元精度缓存与可观测性 - 提额 create / delete 后失效并广播模型限额快照缓存，保证即时生效 - model-rate-limit-guard 的 re-resolve 违规路径补 logger.warn - model-limit action 补 errorCode，handler 状态映射不再依赖中文字符串匹配 - listUserGroups 用 countUsersByTags 批量计数，消除 N+1 查询同步更新相关单元测试（lease / bucket-service / model-group）。 Refs: ding113#1260

GOSICK-Angel · 2026-06-09T06:49:20Z

代码审核问题修复说明

已在 0ebebe96 统一处理 Codex / Gemini / Greptile / CI 的审核意见。typecheck / build / lint（无新增 error）/ validate:migrations 均通过，相关单测全绿。

已修复

#	来源	文件	处理
1	Codex P1	`drizzle/0105_broad_crystal.sql`	追加 `fn_upsert_usage_ledger` 的 `CREATE OR REPLACE`（镜像 `trigger.sql`），让 `usage_ledger.counted_in_*_global` 与 `message_request` 同源；否则重启后切分计费会被重新计入全局额
2	CI HIGH	`drizzle/0105_broad_crystal.sql`	`ADD COLUMN` → `IF NOT EXISTS`，4 个 `ADD CONSTRAINT` 包 `DO/EXCEPTION`；`validate:migrations` 不再标记 0105
3	Gemini HIGH	`docker-compose.yaml`	恢复 app 的 `postgres` 依赖 + `DSN` 及 `AUTO_MIGRATE/ENABLE_RATE_LIMIT/...` 环境变量（本地测试配置已分流到 `docker-compose.local.yaml`），修复容器化开箱即用
4	Gemini HIGH	`src/repository/model-group.ts`	`createSingletonModelGroup` 事务内先查冲突，命中即抛 `ModelGroupMemberConflictError` 回滚，遵守 D6 全局互斥（原 `onConflictDoNothing` 会静默留下空组）
5	CI HIGH	`src/repository/user-group.ts`	`updateUserGroup` 对不存在行抛错，符合 `Promise<UserGroupRow>` 契约
6	Codex P2	`src/actions/quota-boost.ts`	create/delete 后调用 `publishModelLimitCacheInvalidation()`，提额/撤销即时生效
7	Greptile P2	`src/lib/model-rate-limit/bucket-service.ts`	total 上限加 `limitTotalUsd > 0` 守卫，避免 `$0` 误配置永久封锁
8	Greptile P1	`src/lib/model-rate-limit/bucket-service.ts`	total 窗口读穿缓存 TTL `300s → 60s`，缩小全局额超支窗口（仅配置 total 上限时才触发全量聚合，开销可控）
9	CI MEDIUM	`src/app/v1/_lib/proxy/model-rate-limit-guard.ts`	re-resolve 违规「只记日志」路径补 `logger.warn`
10	CI MEDIUM	`src/actions/model-limit.ts`	补 `errorCode`（`UNAUTHORIZED/OPERATION_FAILED/DELETE_FAILED`），handler 状态映射不再依赖中文字符串匹配
11	CI MEDIUM	`src/repository/user-group.ts`、`src/actions/user-group.ts`	新增 `countUsersByTags` 单次扫描批量计数，消除 `listUserGroups` 的 N+1
12	CI LOW	`src/lib/model-rate-limit/bucket-lease.ts`	`decrementLeaseBudget` 增加 try/catch fail-open，避免污染 response handler 的 `Promise.all`
13	CI LOW	`src/lib/model-rate-limit/lease.ts`	Lua 返回字符串元组 + `Number()` 解析，保留小数美元余额精度（同步更新单测）

暂不修改（设计取舍）

provider 排在 rateLimit 之前（Greptile P2）：这是该特性必需的——模型组查找需用 redirect 后的上游模型名（匹配 usage_ledger.model），必须先选定 provider；bugfix #03 已清理失败路径的 ZADD。回退会破坏功能，代码注释已说明。
model_group_limits.subject_id 多态外键无级联（Greptile P2）：多态列无法建单一外键；残留行在 resolver 中惰性（仅按存活主体匹配），不影响判定，仅占少量内存，目标规模（≤10k）可忽略。建议后续仿 boost-cleanup.ts 加周期清理任务，不在本次审核修复范围内改动用户/密钥删除路径。

测试备注

tests/api/* 与 k8s-deploy-shell-helpers 在本地无 Redis/Postgres 环境下的失败属基础设施依赖（鉴权 403 / openssl fallback），与本次改动无关，待具备数据库的环境验证——与 PR 测试计划中已声明的待补项一致。

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0ebebe96a8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T06:52:06Z

+      listenerSetter.call(session, (s: ProxySession) =>
+        resolveAndApplyForCurrentProvider(s, { throwOnViolation: false, resetOnEmpty: true })
+      );


Enforce model limits when failing over providers

When ProxyForwarder switches to an alternative provider after retry exhaustion, it calls session.changeProvider() before sending the next upstream attempt; this listener then re-resolves with throwOnViolation: false, so a quota violation for the fallback provider's redirected model is only logged and the request is still sent. In configurations where the initial provider is allowed or outside a model group but the fallback provider maps into an exhausted group bucket, failover bypasses the new group limit entirely; the non-hedge failover path needs to propagate/enforce the violation instead of treating it like an already-committed hedge winner.

Useful? React with 👍 / 👎.

已修复（746ec5c6）：非 hedge 的 provider 故障切换在发送下一次上游请求前，经 changeProvider(provider, { enforce: true }) 让监听器以 throwOnViolation 重解析；命中回退 provider 的已耗尽组桶时抛 RateLimitError 中止请求（与初始 guard 同路径，经 ProxyErrorHandler 转 402/429，provider session ref 由 proxy-handler 兜底清理）。hedge winner 提交路径不传 enforce，仍仅记录日志（已提交不可中止）。新增对应单测。

chatgpt-codex-connector · 2026-06-09T06:52:06Z

+    const cached = await redis.get(cacheKey);
+    if (cached !== null) {
+      const value = Number(cached);
+      if (Number.isFinite(value)) return value;


Refresh total-usage cache after settled model spend

For buckets with limitTotalUsd, a Redis hit returns the cached all-time usage, but settled requests never update or invalidate this total_cost:model:* key because decrementLease() only handles the lease windows. With Redis ready, once a value below the total cap is cached, every request for up to 60 seconds checks the same stale total even after ledger writes push the bucket over the cap, allowing arbitrary overspend during that TTL. The total cache needs to be incremented/invalidated when model-bucket spend settles, or bypassed for enforcement.

Useful? React with 👍 / 👎.

已修复（746ec5c6）：decrementLease 在结算时对已存在的 total_cost:model:* 键做条件 INCRBYFLOAT（Lua EXISTS 守卫，保留 TTL；键不存在则留给下次读穿重聚合，本身已含该行），使缓存窗口内的越界即时可见，消除 TTL 内超支。新增单测覆盖「配置 total 上限则 folding / 未配置则不触碰缓存」。

chatgpt-codex-connector · 2026-06-09T06:52:06Z

+    await repoAddModelGroupMember(groupId, trimmedModel);
+    return { ok: true, data: undefined };


Invalidate model-limit snapshot after member changes

Adding a member changes buildModelLimitSnapshot()'s modelToGroupId and groupMembers, but this action returns without publishing the model-limit cache invalidation used by limit-row writes. If an admin adds a model to an already-limited group, proxy pods with a fresh 30s snapshot continue resolving that model as ungrouped until TTL expiry, so the new per-model quota is not enforced immediately; the same invalidation should run after membership mutations.

Useful? React with 👍 / 👎.

已修复（746ec5c6）：addModelGroupMember / removeModelGroupMember / createSingletonModelGroup / deleteModelGroup 成功后调用 publishModelLimitCacheInvalidation()，使快照的 modelToGroupId / groupMembers 即时更新，新限额无需等 TTL。

greptile-apps · 2026-06-09T06:57:26Z

+export function isModelRateLimitFailOpen(): boolean {
+  return !isDisabled(process.env.MODEL_RATE_LIMIT_FAIL_OPEN);
+}


MODEL_RATE_LIMIT_FAIL_OPEN env var has no effect

isModelRateLimitFailOpen() is exported here but is never called anywhere in the codebase. The bucket-service and bucket-lease service always fail open unconditionally on Redis errors, making this env var a no-op. An operator who sets MODEL_RATE_LIMIT_FAIL_OPEN=false (documented in the PR's env table as the fail-closed toggle) would see no change in behavior: violations are always allowed through on Redis failure.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/lib/model-rate-limit/types.ts Line: 26-28 Comment: **`MODEL_RATE_LIMIT_FAIL_OPEN` env var has no effect** `isModelRateLimitFailOpen()` is exported here but is never called anywhere in the codebase. The bucket-service and bucket-lease service always fail open unconditionally on Redis errors, making this env var a no-op. An operator who sets `MODEL_RATE_LIMIT_FAIL_OPEN=false` (documented in the PR's env table as the fail-closed toggle) would see no change in behavior: violations are always allowed through on Redis failure. How can I resolve this? If you propose a fix, please make it concise.

已修复（746ec5c6）：在 guard 强制路径接入 isModelRateLimitFailOpen()——fail-closed（MODEL_RATE_LIMIT_FAIL_OPEN=false）下，无法评估（Redis/DB 故障，result.failOpen）的桶将拒绝请求；新增 MODEL_RATE_LIMIT_UNAVAILABLE 错误码及 5 语言文案。默认 fail-open 行为不变（保留主线全局闸门兜底）。新增 fail-closed 单测。

针对 PR ding113#1260 复评（Codex / Greptile）新增的 4 项问题修复。 - 故障切换不绕过模型组限额（Codex P1）：非 hedge 的 provider 故障切换在发送下一次上游请求前，经 `changeProvider({ enforce: true })` 让监听器以 throwOnViolation 重解析；命中回退 provider 的已耗尽组桶时抛 RateLimitError 中止请求（沿用现有 402/429 处理与 ref 清理）。hedge winner 提交路径仍仅记录日志，因其已提交。 - 总额读穿缓存随结算更新（Codex P1）：settle 时对存在的 `total_cost:model:*` 键做条件 INCRBYFLOAT（保留 TTL，不存在则交由下次读穿重聚合），避免缓存窗口内总额越过上限仍被放行的超支。 - MODEL_RATE_LIMIT_FAIL_OPEN 生效（Greptile P1）：fail-closed 时，无法评估（Redis/DB 故障）的桶在强制路径上拒绝请求，新增 MODEL_RATE_LIMIT_UNAVAILABLE 错误码及 5 语言文案；默认 fail-open 行为不变（保留主线全局闸门兜底）。 - 模型组成员变更失效快照缓存（Codex P2）：add/remove member、 createSingleton、deleteModelGroup 后 publishModelLimitCacheInvalidation()，使 modelToGroupId / groupMembers 变更即时生效。补充 guard fail-closed / 故障切换 enforce / hedge 仅记日志 / 总额缓存 folding 等单元测试。 Refs: ding113#1260

GOSICK-Angel · 2026-06-09T07:14:40Z

二轮代码审核修复（commit `746ec5c6`）

复评（Codex / Greptile）针对 0ebebe96 新增的 4 项问题已全部处理。typecheck / build / lint（无新增 error）/ i18n:audit-placeholders / i18n:audit-messages-no-emoji 均通过；model-rate-limit 单测 124 全绿。

#	来源	文件	处理
1	Codex P1	`model-rate-limit-guard.ts` / `session.ts` / `forwarder.ts`	故障切换强制限额：非 hedge 的 provider 故障切换在发送下一次上游请求前以 `changeProvider(p, { enforce: true })` 重解析并对违规抛错中止（hedge winner 提交路径不传 enforce，仍仅记日志）。RateLimitError 经现有 `ProxyErrorHandler` → 402/429，provider ref 由 proxy-handler 兜底清理
2	Codex P1	`bucket-service.ts`	总额缓存随结算更新：settle 时对已存在的 `total_cost:model:*` 键做条件 `INCRBYFLOAT`（Lua `EXISTS` 守卫 + 保留 TTL），消除缓存窗口内越界仍放行的超支
3	Greptile P1	`model-rate-limit-guard.ts` / `error-messages.ts` / `messages/*`	`MODEL_RATE_LIMIT_FAIL_OPEN` 生效：fail-closed 时无法评估的桶在强制路径拒绝请求；新增 `MODEL_RATE_LIMIT_UNAVAILABLE` 错误码 + 5 语言文案。默认 fail-open 不变
4	Codex P2	`actions/model-group.ts`	成员变更失效快照缓存：add/remove member、createSingleton、deleteModelGroup 后 `publishModelLimitCacheInvalidation()`，新成员/限额即时生效

新增单测：guard fail-closed、故障切换 enforce:true 抛错、hedge 仅记日志、总额缓存 folding（配置/未配置 total 上限两种）。

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 746ec5c6e1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-09T07:20:07Z

+    await repoDeleteUserGroup(id);
+    return { ok: true, data: undefined };


Invalidate model-limit cache after user-group deletion

When deleting a user group that has model-group limits, buildModelLimitSnapshot() has already cached both listUserGroups() and listAllModelGroupLimits() for 30 seconds, but this mutation returns without calling publishModelLimitCacheInvalidation(). During that TTL, proxy pods can keep resolving the deleted group's tag to its old limits, so users with that tag continue to be constrained or have their global quota bypassed even after the admin deletes the group; mirror the model-group/limit mutations by invalidating the model-limit snapshot here.

Useful? React with 👍 / 👎.

github-project-automation Bot added this to Claude Code Hub Roadmap Jun 9, 2026

github-project-automation Bot moved this to Backlog in Claude Code Hub Roadmap Jun 9, 2026

github-actions Bot added size/XL Extra Large PR (> 1000 lines) enhancement New feature or request area:core area:UI area:i18n labels Jun 9, 2026

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

github-actions Bot reviewed Jun 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 9, 2026

View reviewed changes

		ALTER TABLE "usage_ledger" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint
		ALTER TABLE "usage_ledger" ADD COLUMN "counted_in_key_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint

		await repoAddModelGroupMember(groupId, trimmedModel);
		return { ok: true, data: undefined };

		await repoDeleteUserGroup(id);
		return { ok: true, data: undefined };

Uh oh!

Conversation

GOSICK-Angel commented Jun 9, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary (English)

概述

主要改动

Changes

New Files (by area)

Modified Files (key changes)

Breaking Changes

Environment Variables

测试计划

备注 / Checklist

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

GOSICK-Angel commented Jun 9, 2026 •

edited by greptile-apps Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading