feat(rate-limit): 用户组 × 模型组限额(Group Rate Limit)#1260
Conversation
将「按模型限额」重构为两维度模型——模型组(一组模型,全局互斥归属)× 限额主体(用户 / 用户组 / 密钥),可为每个 (主体 × 模型组) 设置 5h/每日/每周/每月/总额成本上限。多来源(个人行 + 用户组上限)按取最大值合并, 用户组限额为人均上限;支持临时提额授予(按用户 × 模型组 × 窗口,带有效期, 到点即时生效/失效,叠加在有效上限之上)。 - 完全切分:命中某轴(用户或密钥)模型组限额后,该轴消费既跳过主线全局成本闸门、 也不计入该轴主线全局额(usage_ledger 按轴打标 counted_in_user_global / counted_in_key_global,DB 聚合、Redis 回填、展示分栏三处同源);RPM 与并发护栏 始终生效。Redis 故障时按 MODEL_RATE_LIMIT_FAIL_OPEN fail-open,且 fail-open 不置旁路标记以防双重放行。 - 新增模块:schema 五表 + 两枚举 + usage_ledger/message_request 打标两列、 解析快照缓存(SWR + pub/sub 失效)、桶 lease 计量、guard 接入、 模型组/用户组/限额/提额 Admin REST API、Dashboard 管理界面,5 语言 i18n。 - 通过 ENABLE_MODEL_RATE_LIMIT 开关控制,默认关闭,关闭时与主线逐字节一致。 测试:纯单元测试全绿(含模型组/用户组/限额/提额/bucket-service);typecheck、 build、lint、i18n 审计、迁移幂等校验均通过。tests/api/* 需 Redis/DB,本地未配置。
|
Important Review skippedToo many files! This PR contains 205 files, which is 55 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (205)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efce8162c0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ALTER TABLE "usage_ledger" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint | ||
| ALTER TABLE "usage_ledger" ADD COLUMN "counted_in_key_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint |
There was a problem hiding this comment.
Replace the ledger trigger in this migration
When this migration runs on an existing database, adding these columns is not enough: the installed fn_upsert_usage_ledger remains the previous definition from earlier migrations, whose INSERT/UPDATE lists do not copy NEW.counted_in_*_global. Because these new usage_ledger columns default to true, a model-split request can write false on message_request while the ledger row stays counted globally, so global DB/lease aggregations start charging split spend again after any refresh/restart. Please include the CREATE OR REPLACE FUNCTION update from src/lib/ledger-backfill/trigger.sql in this migration.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
已修复(0ebebe96):在 0105 末尾追加 fn_upsert_usage_ledger 的 CREATE OR REPLACE(镜像 src/lib/ledger-backfill/trigger.sql),使 ledger 行同步写入 counted_in_*_global,避免重启后切分计费被重新计入全局额。
| const row = await repo.createQuotaBoostGrant({ | ||
| userId: input.userId, | ||
| modelGroupId: input.modelGroupId, | ||
| window: input.window, | ||
| amountUsd: input.amountUsd, |
There was a problem hiding this comment.
Invalidate the model-limit cache after boost writes
This write changes data that is loaded into buildModelLimitSnapshot() (listAllActiveAndFutureGrants()), but this action never calls publishModelLimitCacheInvalidation() after creating or deleting a grant. If an admin grants or revokes an immediately-active boost while a proxy pod has a fresh 30s snapshot, enforcement continues using the old boost set until the cache expires and refreshes, so grants are delayed and revoked boosts can still allow traffic briefly. Please refresh/broadcast the model-limit cache on these mutations, as the model-limit upsert/delete actions do.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
已修复(0ebebe96):createQuotaBoostGrantAction / deleteQuotaBoostGrantAction 写后调用 publishModelLimitCacheInvalidation(),提额/撤销即时生效。
There was a problem hiding this comment.
Code Review
This pull request introduces the 'Group Rate Limit' feature, refactoring per-model limits into a two-dimensional model (model groups × limit subjects) with support for temporary quota boosts and complete cost partitioning. It adds five new database tables, updates the proxy pipeline with a new model rate limit guard, implements an in-process snapshot cache with stale-while-revalidate, and provides corresponding Admin REST APIs and Dashboard UI. Feedback on the changes highlights two critical issues: first, removing the postgres service and DSN from docker-compose.yaml will break default containerized deployments; second, the createSingletonModelGroup function silently ignores member conflicts using onConflictDoNothing(), which violates the global exclusivity rule (D6) and should be resolved by pre-checking for existing group memberships.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| environment: | ||
| NODE_ENV: production | ||
| # 容器内使用 Dockerfile 默认端口 3000,对外通过 APP_PORT 暴露(默认 23000) | ||
| DSN: postgresql://${DB_USER:-postgres}:${DB_PASSWORD:-postgres}@postgres:5432/${DB_NAME:-claude_code_hub} |
There was a problem hiding this comment.
There was a problem hiding this comment.
已修复(0ebebe96):恢复 app 的 postgres 依赖与 DSN、AUTO_MIGRATE、ENABLE_RATE_LIMIT 等环境变量;本地测试用配置已分流到 docker-compose.local.yaml。
| export async function createSingletonModelGroup( | ||
| model: string, | ||
| name?: string | ||
| ): Promise<ModelGroupRow> { | ||
| return db.transaction(async (tx) => { | ||
| const groupName = name?.trim() ?? model; | ||
|
|
||
| const [row] = await tx | ||
| .insert(modelGroups) | ||
| .values({ name: groupName, isSingleton: true }) | ||
| .returning(); | ||
|
|
||
| await tx | ||
| .insert(modelGroupMembers) | ||
| .values({ modelGroupId: row.id, model }) | ||
| .onConflictDoNothing(); | ||
|
|
||
| return row; | ||
| }); | ||
| } |
There was a problem hiding this comment.
在 createSingletonModelGroup 中,如果传入的 model 已经属于其他模型组,由于 modelGroupMembers 表在 model 列上有全局唯一索引,插入操作会发生冲突。当前实现使用了 onConflictDoNothing(),这会导致冲突被静默忽略,模型不会被关联到新组,但空的 modelGroups 记录仍然会被创建并成功返回。这违反了模型组的全局互斥规则(D6)。建议在创建组之前先检查该模型是否已存在于其他组中,若存在则抛出 ModelGroupMemberConflictError,以便事务回滚并向用户返回友好的错误提示。
export async function createSingletonModelGroup(
model: string,
name?: string
): Promise<ModelGroupRow> {
return db.transaction(async (tx) => {
const existing = await tx
.select({
modelGroupId: modelGroupMembers.modelGroupId,
groupName: modelGroups.name,
})
.from(modelGroupMembers)
.innerJoin(modelGroups, eq(modelGroupMembers.modelGroupId, modelGroups.id))
.where(eq(modelGroupMembers.model, model))
.limit(1);
if (existing.length > 0) {
const { modelGroupId, groupName } = existing[0];
throw new ModelGroupMemberConflictError(model, modelGroupId, groupName);
}
const groupName = name?.trim() ?? model;
const [row] = await tx
.insert(modelGroups)
.values({ name: groupName, isSingleton: true })
.returning();
await tx
.insert(modelGroupMembers)
.values({ modelGroupId: row.id, model });
return row;
});
}There was a problem hiding this comment.
已修复(0ebebe96):createSingletonModelGroup 在事务内先查模型归属冲突,命中即抛 ModelGroupMemberConflictError 回滚,遵守 D6 全局互斥;并补充了对应单测。
| function leaseWindowSpecs(bucket: ModelLimitBucket): WindowSpec[] { | ||
| const { caps } = bucket; | ||
| return [ | ||
| { |
There was a problem hiding this comment.
Total window cache TTL is 60× longer than lease windows
The TOTAL_COST_CACHE_TTL_SECONDS = 300 (5 minutes) means a user who reaches their total budget can continue to spend for up to 5 minutes before the stale cached value expires and the DB is re-queried. In contrast, the lease-based windows (5h, daily, weekly, monthly) are seeded from DB on a quotaDbRefreshIntervalSeconds interval (default 10 s) with atomic Redis decrements. A user can overshoot their total model-group budget by an entire 5-minute window of spend with no enforcement.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/model-rate-limit/bucket-service.ts
Line: 44
Comment:
**Total window cache TTL is 60× longer than lease windows**
The `TOTAL_COST_CACHE_TTL_SECONDS = 300` (5 minutes) means a user who reaches their total budget can continue to spend for up to 5 minutes before the stale cached value expires and the DB is re-queried. In contrast, the lease-based windows (5h, daily, weekly, monthly) are seeded from DB on a `quotaDbRefreshIntervalSeconds` interval (default 10 s) with atomic Redis decrements. A user can overshoot their total model-group budget by an entire 5-minute window of spend with no enforcement.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
已调整(0ebebe96):TOTAL_COST_CACHE_TTL_SECONDS 由 300s 收紧为 60s,缩小全局额超支窗口。该读穿仅在配置了 total 上限时触发全量聚合,开销可控。
| description: text('description'), | ||
| createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(), | ||
| updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow().notNull(), | ||
| }, (t) => ({ |
There was a problem hiding this comment.
model_group_limits.subject_id is a polymorphic FK with no cascade
The subject_id column references users.id, keys.id, or user_groups.id depending on subject_type, but has no foreign key constraint. Deleting a user or API key leaves their model_group_limits rows intact. Every snapshot rebuild performs listAllModelGroupLimits() (full-table scan), so the snapshot grows indefinitely with phantom limit entries for deleted entities in deployments that cycle users or rotate keys frequently.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/drizzle/schema.ts
Line: 1234
Comment:
**`model_group_limits.subject_id` is a polymorphic FK with no cascade**
The `subject_id` column references `users.id`, `keys.id`, or `user_groups.id` depending on `subject_type`, but has no foreign key constraint. Deleting a user or API key leaves their `model_group_limits` rows intact. Every snapshot rebuild performs `listAllModelGroupLimits()` (full-table scan), so the snapshot grows indefinitely with phantom limit entries for deleted entities in deployments that cycle users or rotate keys frequently.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
暂不修改:多态 subject_id 无法建单一外键;残留行在 resolver 中是惰性的(仅按存活主体匹配),不影响判定、仅占少量内存。建议后续仿 boost-cleanup.ts 加周期清理任务,详见汇总评论。
| if ( | ||
| totalUsage !== null && | ||
| caps.limitTotalUsd !== null && | ||
| caps.limitTotalUsd !== undefined && | ||
| totalUsage >= caps.limitTotalUsd | ||
| ) { |
There was a problem hiding this comment.
`$0 total limit permanently blocks users
If limitTotalUsd is 0 (admin misconfiguration), totalUsage >= 0 is always true since usage is non-negative, permanently hard-blocking the user. A caps.limitTotalUsd > 0 guard prevents this edge case.
| if ( | |
| totalUsage !== null && | |
| caps.limitTotalUsd !== null && | |
| caps.limitTotalUsd !== undefined && | |
| totalUsage >= caps.limitTotalUsd | |
| ) { | |
| if ( | |
| totalUsage !== null && | |
| caps.limitTotalUsd !== null && | |
| caps.limitTotalUsd !== undefined && | |
| caps.limitTotalUsd > 0 && | |
| totalUsage >= caps.limitTotalUsd | |
| ) { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/model-rate-limit/bucket-service.ts
Line: 141-146
Comment:
**`$0 total limit permanently blocks users**
If `limitTotalUsd` is 0 (admin misconfiguration), `totalUsage >= 0` is always true since usage is non-negative, permanently hard-blocking the user. A `caps.limitTotalUsd > 0` guard prevents this edge case.
```suggestion
if (
totalUsage !== null &&
caps.limitTotalUsd !== null &&
caps.limitTotalUsd !== undefined &&
caps.limitTotalUsd > 0 &&
totalUsage >= caps.limitTotalUsd
) {
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
已修复(0ebebe96):加 caps.limitTotalUsd > 0 守卫,避免 $0 误配置永久封锁用户。
| "updated_at" timestamp with time zone DEFAULT now() NOT NULL | ||
| ); | ||
| --> statement-breakpoint | ||
| ALTER TABLE "message_request" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL;--> statement-breakpoint |
There was a problem hiding this comment.
[HIGH] [ERROR-HANDLING] Migration ALTER TABLE ADD COLUMN / ADD CONSTRAINT statements are not idempotent
The PR description states the migration is "idempotent DDL", but the ALTER TABLE ADD COLUMN (lines 67-75) and ALTER TABLE ADD CONSTRAINT (lines 76-79) statements lack the DO $$ BEGIN ... EXCEPTION WHEN ... END $$; wrapper that the enum creation uses. Re-running this migration (e.g., after a partial failure) will fail with column already exists or constraint already exists errors.
Suggested fix:
DO $$ BEGIN
ALTER TABLE "message_request" ADD COLUMN "counted_in_user_global" boolean DEFAULT true NOT NULL;
EXCEPTION
WHEN duplicate_column THEN null;
END $$;Apply this pattern to all 9 ALTER TABLE statements (lines 67-79).
There was a problem hiding this comment.
已修复(0ebebe96):ADD COLUMN 改为 IF NOT EXISTS,4 个 ADD CONSTRAINT 包 DO $$ ... EXCEPTION WHEN duplicate_object;validate:migrations 通过。
| updates.updatedAt = new Date(); | ||
|
|
||
| const [row] = await db.update(userGroups).set(updates).where(eq(userGroups.id, id)).returning(); | ||
| return row; |
There was a problem hiding this comment.
[HIGH] [LOGIC-BUG] updateUserGroup returns undefined for non-existent row, violating return type
When id does not match any row, db.update(...).returning() returns an empty array. Destructuring const [row] = [] yields undefined, which is returned despite the declared type Promise<UserGroupRow>. This will cause a runtime TypeError in any caller that accesses properties on the result (e.g., updatedGroup.name).
getUserGroup (line 51) correctly handles this by returning null, and model-group.ts throws on missing rows. This function should follow the same pattern.
Suggested fix:
const [row] = await db.update(userGroups).set(updates).where(eq(userGroups.id, id)).returning();
if (\!row) {
throw new Error(`User group id=${id} not found`);
}
return row;There was a problem hiding this comment.
已修复(0ebebe96):updateUserGroup 对不存在的行抛错,符合 Promise<UserGroupRow> 契约。
| // Re-resolve path: log only. The in-flight forwarder run already committed | ||
| // to the new provider; the bucket lease decrement at settle will reflect | ||
| // the overshoot. Mainline global gate stays in effect for safety. | ||
| return; |
There was a problem hiding this comment.
[MEDIUM] [ERROR-SILENT] Re-resolve violation path says "log only" but has no logger call
The comment on line 121 says "log only" but the return on line 124 is reached without any logger.warn(). During provider failover, if a model-group bucket violation is detected, it is silently ignored with zero audit trail. This makes it impossible to diagnose why a request that should have been rate-limited was allowed through on the re-resolve path.
Suggested fix:
if (\!options.throwOnViolation) {
const { bucket, result } = violation;
logger.warn("[ModelRateLimit] Bucket violation on re-resolve path (not enforced, in-flight request continues)", {
axis: bucket.axis,
scopeId: bucket.scopeId,
modelGroupId: bucket.modelGroupId,
window: result.window,
currentUsage: result.currentUsage,
limitValue: result.limitValue,
});
return;
}There was a problem hiding this comment.
已修复(0ebebe96):re-resolve 违规路径补 logger.warn,记录 axis/scopeId/modelGroupId/window/usage/limit。
| modelGroupId?: number; | ||
| }): Promise<ActionResult<ModelGroupLimitRecord[]>> { | ||
| const session = await getSession(); | ||
| if (!isAdmin(session)) return { ok: false, error: "权限不足" }; |
There was a problem hiding this comment.
[MEDIUM] [STANDARD-VIOLATION] Missing errorCode on all error returns makes handler status mapping fragile
All error returns in this action omit the errorCode field (e.g., line 29 returns { ok: false, error: "..." } without errorCode). The handler (model-limits/handlers.ts) falls back to fragile Chinese string matching on detail.includes("权限") and detail.includes("不存在") to determine HTTP status codes. If these messages change during i18n or refactoring, the status mapping silently breaks.
Suggested fix:
if (\!isAdmin(session)) return { ok: false, error: "权限不足", errorCode: "UNAUTHORIZED" };
// ...
return { ok: false, error: "获取按模型限额失败", errorCode: "NOT_FOUND" };There was a problem hiding this comment.
已修复(0ebebe96):各 error 返回补 errorCode(UNAUTHORIZED / OPERATION_FAILED / DELETE_FAILED),handler 状态映射不再依赖中文字符串匹配。
| const data: UserGroupWithCount[] = await Promise.all( | ||
| groups.map(async (g) => ({ | ||
| ...g, | ||
| memberCount: await countUsersInUserGroup(g.tag), |
There was a problem hiding this comment.
[MEDIUM] [PERFORMANCE-ISSUE] N+1 query: countUsersInUserGroup called per group
countUsersInUserGroup(g.tag) is called inside Promise.all for each user group. Each call performs a full users table scan with a JSONB containment check (tags @> [tag]::jsonb). With many user groups, this creates N independent full-table scans. The same pattern appears at line 76 for single-group lookup.
Suggested fix:
Batch the count into a single query:
const groups = await repoListUserGroups();
if (groups.length === 0) return { ok: true, data: [] };
// Single query with GROUP BY instead of N separate counts
const counts = await db
.select({ tag: sql<string>`unnest(${users.tags})`, count: sql<number>`count(*)::int` })
.from(users)
.where(isNull(users.deletedAt))
.groupBy(sql`unnest(${users.tags})`);
const countMap = new Map(counts.map((r) => [r.tag, r.count]));
const data = groups.map((g) => ({ ...g, memberCount: countMap.get(g.tag) ?? 0 }));There was a problem hiding this comment.
已修复(0ebebe96):新增 countUsersByTags 单次扫描批量计数,listUserGroups 不再逐组查询,消除 N+1。
| if ttl > 0 then | ||
| redis.call('SETEX', key, ttl, cjson.encode(lease)) | ||
| end | ||
| return {newRemaining, 1} |
There was a problem hiding this comment.
[LOW] [LOGIC-BUG] Redis Lua return {newRemaining, 1} truncates fractional USD values to integer
Redis Lua returns table values as long long integers. When newRemaining is 0.75 (USD), the caller receives 0. The value stored in Redis JSON via cjson.encode is correct, so enforcement is unaffected. The newRemaining return value is currently fire-and-forget in the response handler, so no immediate breakage, but this will surprise any future caller that relies on the return value for display or logging.
Suggested fix:
Return as string to preserve precision:
return {tostring(newRemaining), "1"}And update the caller:
const result = (await redis.eval(DECREMENT_LUA_SCRIPT, 1, leaseKey, String(cost))) as [string, string];
const newRemaining = Number(result[0]);
const success = result[1] === "1" ? 1 : 0;There was a problem hiding this comment.
已修复(0ebebe96):Lua 返回字符串元组(tostring(newRemaining) / "1"),调用侧 Number() 解析,保留小数美元精度。
|
|
||
| const remainingBudget = calculateLeaseSlice({ | ||
| limitAmount, | ||
| currentUsage, |
There was a problem hiding this comment.
[LOW] [ERROR-HANDLING] decrementLeaseBudget lacks try/catch, inconsistent with sibling methods
getCostLease and refreshCostLeaseFromDb both wrap their logic in try/catch with fail-open semantics. This method is the only one without error handling. While ModelLeaseService.decrementLeaseBudget catches internally, a pre-delegation error (e.g., from buildModelGroupLeaseKey) would propagate uncaught. In the response handler, this would cause the entire Promise.all lease decrement batch to fail, losing all other decrements (key/user/provider) as well.
Suggested fix:
static async decrementLeaseBudget(params: DecrementBucketLeaseParams): Promise<DecrementModelLeaseResult> {
try {
const leaseKey = buildModelGroupLeaseKey(axis, scopeId, modelGroupId, window, resetMode);
return await ModelLeaseService.decrementLeaseBudget({...});
} catch (error) {
logger.error("[BucketLease] decrementLeaseBudget failed, fail-open", { axis, scopeId, modelGroupId, window, error });
return { success: true, newRemaining: -1, failOpen: true };
}
}There was a problem hiding this comment.
已修复(0ebebe96):decrementLeaseBudget 增加 try/catch fail-open,与同类方法一致,避免污染 response handler 的 Promise.all。
There was a problem hiding this comment.
Code Review Summary
This is a substantial XL PR (206 files, 36K+ lines) implementing a two-dimensional Group Rate Limit system (user-group x model-group). The core proxy guard pipeline, bucket service, lease system, and resolver are well-designed with proper fail-open semantics and correct complete-split billing logic. The issues below are concentrated in the data layer (migration, repository) and API handler surfaces.
PR Size: XL
- Lines changed: 36,272 (32,603 additions + 3,669 deletions)
- Files changed: 206
Split suggestion: This PR could be split into at least 3 logical units:
- Schema + Migration + Repository (~20 files): New tables, migration, repository layer
- Core Rate Limit Engine (~15 files):
model-rate-limit/library, guard pipeline, proxy integration - Admin API + Dashboard UI (~170 files): REST handlers, actions, UI components, i18n
Issues Found
| Category | Critical | High | Medium | Low |
|---|---|---|---|---|
| Logic/Bugs | 0 | 2 | 1 | 1 |
| Security | 0 | 0 | 0 | 0 |
| Error Handling | 0 | 0 | 2 | 1 |
| Types | 0 | 0 | 0 | 0 |
| Comments/Docs | 0 | 0 | 0 | 0 |
| Tests | 0 | 0 | 0 | 0 |
| Performance | 0 | 0 | 1 | 0 |
| Simplification | 0 | 0 | 0 | 0 |
High Priority Issues (Should Fix)
-
Migration not idempotent (
drizzle/0105_broad_crystal.sql:67-79) - ALTER TABLE ADD COLUMN/ADD CONSTRAINT statements lackDO $$ BEGIN ... EXCEPTION WHEN ... END $$;wrapper, contradicting the PR's idempotency claim. Partial failure recovery is impossible. -
updateUserGroupreturns undefined for non-existent row (src/repository/user-group.ts:40) - TypeScript declaresPromise<UserGroupRow>but runtime returnsundefinedwhen the ID doesn't match. Inconsistent withgetUserGroupwhich returnsnullandmodel-group.tswhich throws.
Medium Priority Issues (Should Fix)
-
Re-resolve violation path missing logging (
src/app/v1/_lib/proxy/model-rate-limit-guard.ts:124) - Comment says "log only" but nologger.warn()call exists. Provider-failover bucket violations are silently ignored with zero audit trail. -
Missing
errorCodein model-limit action (src/actions/model-limit.ts:29) - All error returns omiterrorCode, causing handler to fall back to fragile Chinese string matching (detail.includes("权限")) for HTTP status mapping. -
N+1 query in
listUserGroups(src/actions/user-group.ts:49) -countUsersInUserGroup(tag)is called per group viaPromise.all, each performing a fulluserstable JSONB containment scan.
Low Priority Issues (Author's Discretion)
-
Redis Lua returns truncated integers for fractional USD (
src/lib/model-rate-limit/lease.ts:60) -return {newRemaining, 1}truncates floats to integers. The stored value is correct; the return value is currently unused but will surprise future callers. -
BucketLeaseService.decrementLeaseBudgetlacks try/catch (src/lib/model-rate-limit/bucket-lease.ts:158) - Inconsistent with sibling methods. An unhandled error could cause the entire lease decrement batch to fail.
Review Coverage
- Logic and correctness
- Security (OWASP Top 10)
- Error handling
- Type safety
- Documentation accuracy
- Test coverage
- Code clarity
Automated review by Claude AI
针对 PR ding113#1260 多轮代码审核(Codex / Gemini / Greptile / CI)修复以下问题。 迁移与部署 - 0105 迁移追加 fn_upsert_usage_ledger 的 CREATE OR REPLACE,使 usage_ledger.counted_in_*_global 与 message_request 同源,避免重启后 切分计费被重新计入全局额 - 0105 的 ADD COLUMN / ADD CONSTRAINT 改为幂等(IF NOT EXISTS / DO 块) - 恢复 docker-compose.yaml 中 app 的 postgres 依赖与 DSN 等环境变量, 修复容器化部署的开箱即用能力 数据与限额逻辑 - createSingletonModelGroup 在事务内检测模型归属冲突并抛错,遵守 D6 全局互斥 - updateUserGroup 对不存在的行抛错,符合返回类型契约 - bucket-service 为 total 上限加 >0 守卫,避免 $0 误配置永久封锁用户 - total 窗口读穿缓存 TTL 由 300s 收紧为 60s,缩小全局额超支窗口 - bucket-lease.decrementLeaseBudget 增加 try/catch fail-open,避免污染批量扣减 - lease Lua 返回字符串元组以保留小数美元精度 缓存与可观测性 - 提额 create / delete 后失效并广播模型限额快照缓存,保证即时生效 - model-rate-limit-guard 的 re-resolve 违规路径补 logger.warn - model-limit action 补 errorCode,handler 状态映射不再依赖中文字符串匹配 - listUserGroups 用 countUsersByTags 批量计数,消除 N+1 查询 同步更新相关单元测试(lease / bucket-service / model-group)。 Refs: ding113#1260
代码审核问题修复说明已在 已修复
暂不修改(设计取舍)
测试备注
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0ebebe96a8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| listenerSetter.call(session, (s: ProxySession) => | ||
| resolveAndApplyForCurrentProvider(s, { throwOnViolation: false, resetOnEmpty: true }) | ||
| ); |
There was a problem hiding this comment.
Enforce model limits when failing over providers
When ProxyForwarder switches to an alternative provider after retry exhaustion, it calls session.changeProvider() before sending the next upstream attempt; this listener then re-resolves with throwOnViolation: false, so a quota violation for the fallback provider's redirected model is only logged and the request is still sent. In configurations where the initial provider is allowed or outside a model group but the fallback provider maps into an exhausted group bucket, failover bypasses the new group limit entirely; the non-hedge failover path needs to propagate/enforce the violation instead of treating it like an already-committed hedge winner.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
已修复(746ec5c6):非 hedge 的 provider 故障切换在发送下一次上游请求前,经 changeProvider(provider, { enforce: true }) 让监听器以 throwOnViolation 重解析;命中回退 provider 的已耗尽组桶时抛 RateLimitError 中止请求(与初始 guard 同路径,经 ProxyErrorHandler 转 402/429,provider session ref 由 proxy-handler 兜底清理)。hedge winner 提交路径不传 enforce,仍仅记录日志(已提交不可中止)。新增对应单测。
| const cached = await redis.get(cacheKey); | ||
| if (cached !== null) { | ||
| const value = Number(cached); | ||
| if (Number.isFinite(value)) return value; |
There was a problem hiding this comment.
Refresh total-usage cache after settled model spend
For buckets with limitTotalUsd, a Redis hit returns the cached all-time usage, but settled requests never update or invalidate this total_cost:model:* key because decrementLease() only handles the lease windows. With Redis ready, once a value below the total cap is cached, every request for up to 60 seconds checks the same stale total even after ledger writes push the bucket over the cap, allowing arbitrary overspend during that TTL. The total cache needs to be incremented/invalidated when model-bucket spend settles, or bypassed for enforcement.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
已修复(746ec5c6):decrementLease 在结算时对已存在的 total_cost:model:* 键做条件 INCRBYFLOAT(Lua EXISTS 守卫,保留 TTL;键不存在则留给下次读穿重聚合,本身已含该行),使缓存窗口内的越界即时可见,消除 TTL 内超支。新增单测覆盖「配置 total 上限则 folding / 未配置则不触碰缓存」。
| await repoAddModelGroupMember(groupId, trimmedModel); | ||
| return { ok: true, data: undefined }; |
There was a problem hiding this comment.
Invalidate model-limit snapshot after member changes
Adding a member changes buildModelLimitSnapshot()'s modelToGroupId and groupMembers, but this action returns without publishing the model-limit cache invalidation used by limit-row writes. If an admin adds a model to an already-limited group, proxy pods with a fresh 30s snapshot continue resolving that model as ungrouped until TTL expiry, so the new per-model quota is not enforced immediately; the same invalidation should run after membership mutations.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
已修复(746ec5c6):addModelGroupMember / removeModelGroupMember / createSingletonModelGroup / deleteModelGroup 成功后调用 publishModelLimitCacheInvalidation(),使快照的 modelToGroupId / groupMembers 即时更新,新限额无需等 TTL。
| export function isModelRateLimitFailOpen(): boolean { | ||
| return !isDisabled(process.env.MODEL_RATE_LIMIT_FAIL_OPEN); | ||
| } |
There was a problem hiding this comment.
MODEL_RATE_LIMIT_FAIL_OPEN env var has no effect
isModelRateLimitFailOpen() is exported here but is never called anywhere in the codebase. The bucket-service and bucket-lease service always fail open unconditionally on Redis errors, making this env var a no-op. An operator who sets MODEL_RATE_LIMIT_FAIL_OPEN=false (documented in the PR's env table as the fail-closed toggle) would see no change in behavior: violations are always allowed through on Redis failure.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/lib/model-rate-limit/types.ts
Line: 26-28
Comment:
**`MODEL_RATE_LIMIT_FAIL_OPEN` env var has no effect**
`isModelRateLimitFailOpen()` is exported here but is never called anywhere in the codebase. The bucket-service and bucket-lease service always fail open unconditionally on Redis errors, making this env var a no-op. An operator who sets `MODEL_RATE_LIMIT_FAIL_OPEN=false` (documented in the PR's env table as the fail-closed toggle) would see no change in behavior: violations are always allowed through on Redis failure.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
已修复(746ec5c6):在 guard 强制路径接入 isModelRateLimitFailOpen()——fail-closed(MODEL_RATE_LIMIT_FAIL_OPEN=false)下,无法评估(Redis/DB 故障,result.failOpen)的桶将拒绝请求;新增 MODEL_RATE_LIMIT_UNAVAILABLE 错误码及 5 语言文案。默认 fail-open 行为不变(保留主线全局闸门兜底)。新增 fail-closed 单测。
针对 PR ding113#1260 复评(Codex / Greptile)新增的 4 项问题修复。 - 故障切换不绕过模型组限额(Codex P1):非 hedge 的 provider 故障切换在 发送下一次上游请求前,经 `changeProvider({ enforce: true })` 让监听器 以 throwOnViolation 重解析;命中回退 provider 的已耗尽组桶时抛 RateLimitError 中止请求(沿用现有 402/429 处理与 ref 清理)。hedge winner 提交路径仍仅记录日志,因其已提交。 - 总额读穿缓存随结算更新(Codex P1):settle 时对存在的 `total_cost:model:*` 键做条件 INCRBYFLOAT(保留 TTL,不存在则交由下次 读穿重聚合),避免缓存窗口内总额越过上限仍被放行的超支。 - MODEL_RATE_LIMIT_FAIL_OPEN 生效(Greptile P1):fail-closed 时, 无法评估(Redis/DB 故障)的桶在强制路径上拒绝请求,新增 MODEL_RATE_LIMIT_UNAVAILABLE 错误码及 5 语言文案;默认 fail-open 行为 不变(保留主线全局闸门兜底)。 - 模型组成员变更失效快照缓存(Codex P2):add/remove member、 createSingleton、deleteModelGroup 后 publishModelLimitCacheInvalidation(), 使 modelToGroupId / groupMembers 变更即时生效。 补充 guard fail-closed / 故障切换 enforce / hedge 仅记日志 / 总额缓存 folding 等单元测试。 Refs: ding113#1260
二轮代码审核修复(commit
|
| # | 来源 | 文件 | 处理 |
|---|---|---|---|
| 1 | Codex P1 | model-rate-limit-guard.ts / session.ts / forwarder.ts |
故障切换强制限额:非 hedge 的 provider 故障切换在发送下一次上游请求前以 changeProvider(p, { enforce: true }) 重解析并对违规抛错中止(hedge winner 提交路径不传 enforce,仍仅记日志)。RateLimitError 经现有 ProxyErrorHandler → 402/429,provider ref 由 proxy-handler 兜底清理 |
| 2 | Codex P1 | bucket-service.ts |
总额缓存随结算更新:settle 时对已存在的 total_cost:model:* 键做条件 INCRBYFLOAT(Lua EXISTS 守卫 + 保留 TTL),消除缓存窗口内越界仍放行的超支 |
| 3 | Greptile P1 | model-rate-limit-guard.ts / error-messages.ts / messages/* |
MODEL_RATE_LIMIT_FAIL_OPEN 生效:fail-closed 时无法评估的桶在强制路径拒绝请求;新增 MODEL_RATE_LIMIT_UNAVAILABLE 错误码 + 5 语言文案。默认 fail-open 不变 |
| 4 | Codex P2 | actions/model-group.ts |
成员变更失效快照缓存:add/remove member、createSingleton、deleteModelGroup 后 publishModelLimitCacheInvalidation(),新成员/限额即时生效 |
新增单测:guard fail-closed、故障切换 enforce:true 抛错、hedge 仅记日志、总额缓存 folding(配置/未配置 total 上限两种)。
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 746ec5c6e1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| await repoDeleteUserGroup(id); | ||
| return { ok: true, data: undefined }; |
There was a problem hiding this comment.
Invalidate model-limit cache after user-group deletion
When deleting a user group that has model-group limits, buildModelLimitSnapshot() has already cached both listUserGroups() and listAllModelGroupLimits() for 30 seconds, but this mutation returns without calling publishModelLimitCacheInvalidation(). During that TTL, proxy pods can keep resolving the deleted group's tag to its old limits, so users with that tag continue to be constrained or have their global quota bypassed even after the admin deletes the group; mirror the model-group/limit mutations by invalidating the model-limit snapshot here.
Useful? React with 👍 / 👎.
Summary (English)
Refactors per-model rate limiting into a User Group x Model Group (Group Rate Limit) two-dimensional model:
Each
(Subject x Model Group)pair supports cost caps across 5h / daily / weekly / monthly / total windows. Multiple sources (personal + user group) merge by taking the maximum value. User group limits are per-capita. Temporary quota boosts are also supported (perUser x Model Group x Window, with expiry).Controlled via
ENABLE_MODEL_RATE_LIMITfeature flag (default off). When disabled, behavior is byte-for-byte identical to mainline (new DB columns have defaults, guard bypasses).Key design decisions:
counted_in_user_global/counted_in_key_globalmarkers.MODEL_RATE_LIMIT_FAIL_OPENwithout setting bypass markers (preventing double-release).usage_ledger/message_request, single idempotent migration0105_broad_crystal.sql.Scope: 206 files, +32,603 / -3,669 lines. Covers schema, proxy guard pipeline, bucket lease service, admin API, dashboard UI, i18n, and documentation.
概述
将「按模型限额」重构为**用户组 × 模型组限额(Group Rate Limit)**两维度模型:
可为每个
(主体 × 模型组)设置 5h / 每日 / 每周 / 每月 / 总额 的成本上限。多来源(个人行 + 用户组上限)按取最大值合并,用户组限额为人均上限;并支持临时提额授予(按用户 × 模型组 × 窗口,带有效期,到点即时生效/失效,叠加在有效上限之上)。主要改动
usage_ledger/message_request按轴打标counted_in_user_global/counted_in_key_global实现,DB 聚合、Redis 回填、展示分栏三处同源。RPM 与并发护栏始终生效。MODEL_RATE_LIMIT_FAIL_OPENfail-open,且 fail-open 不置旁路标记,防止双重放行。model_groups/model_group_members/model_group_limits/user_groups/quota_boost_grants,boost_window/limit_subject),usage_ledger/message_request各加 2 列打标,system_settings加 OPT-B 模型维度 lease 列。单个净增量迁移0105_broad_crystal.sql(幂等 DDL)。Changes
New Files (by area)
Schema & Migration:
drizzle/0105_broad_crystal.sql- Idempotent DDL migration (5 tables, 2 enums, column additions)Proxy Pipeline:
src/app/v1/_lib/proxy/model-rate-limit-guard.ts- New guard for model group rate limitingsrc/app/v1/_lib/proxy/provider-session-cleanup.ts- Provider session cleanup logicAdmin REST API (OpenAPI-synced):
src/app/api/v1/resources/model-groups/- Model group CRUDsrc/app/api/v1/resources/model-limits/- Model limit CRUDsrc/app/api/v1/resources/quota-boosts/- Quota boost CRUDsrc/app/api/v1/resources/user-groups/- User group CRUDsrc/app/api/v1/resources/me/- Self-service quota extensionsServer Actions:
src/actions/model-group.ts,src/actions/model-limit.ts,src/actions/quota-boost.ts,src/actions/user-group.tsDashboard UI:
src/app/[locale]/dashboard/quotas/model-limits/- Model limits management pagessrc/app/[locale]/dashboard/quotas/model-limits/model-groups/- Model group managementsrc/app/[locale]/dashboard/quotas/model-limits/user-groups/- User group managementsrc/app/[locale]/my-usage/_components/model-group-quota-cards.tsx- User-facing quota displayDocumentation:
docs/limit/group-rate-limit.md- Feature documentationdocs/limit/README.md- Limit system overviewdocs/api/v1/model-limits.md- API documentationAPI Client & Schemas:
src/lib/api-client/v1/actions/- Client-side API actions for all new resourcessrc/lib/api/v1/schemas/- Zod schemas for model-groups, model-limits, quota-boostsModified Files (key changes)
src/drizzle/schema.ts(+132) - 5 new tables, 2 new enums, column additionssrc/app/v1/_lib/proxy/guard-pipeline.ts(+71/-6) - Integrated model rate limit guardsrc/app/v1/_lib/proxy/rate-limit-guard.ts(+385/-345) - Refactored for billing isolationsrc/app/v1/_lib/proxy/session.ts(+99) - Extended session contextsrc/app/v1/_lib/proxy/response-handler.ts(+44/-19) - Response handling for model group billingsrc/app/v1/_lib/proxy/forwarder.ts(+7/-2) - Forwarder integrationsrc/instrumentation.ts(+56) - Startup initialization for cache SWRsrc/lib/api/v1/openapi-types.gen.ts(+5567/-1478) - Generated OpenAPI typesmessages/*/quota.json- i18n strings for all 5 languagesBreaking Changes
None. The feature is behind
ENABLE_MODEL_RATE_LIMIT(default off). All new DB columns have safe defaults. When disabled, the guard bypasses and behavior is identical to mainline.Environment Variables
ENABLE_MODEL_RATE_LIMITfalseMODEL_RATE_LIMIT_FAIL_OPEN测试计划
bun run typecheck通过bun run build通过bun run lint(无 error;warning 与基线同量级)bun run i18n:audit-messages-no-emoji/i18n:audit-placeholders通过bun run validate:migrations通过(迁移 0105 幂等)bucket-service、完全切分计费等tests/api/*与集成/E2E 需真实 Redis + Postgres,待在具备数据库的环境补充验证备注 / Checklist
devquotaModelLeasePercent*/quotaModelLeaseMinSliceUsd)当前未配置时回退主线百分比;真实 PG + Redis 的集成/E2E 测试待补。Description enhanced by Claude AI
Greptile Summary
This PR introduces a two-dimensional User Group × Model Group rate limiting system behind
ENABLE_MODEL_RATE_LIMIT(default off). The core design — SWR in-process snapshot, per-axis bypass flags for billing isolation, BucketLeaseService for atomic Redis enforcement, and idempotent DDL migration — is well-architected with thorough test coverage and correct failover/fail-closed semantics.modelRateLimitis inserted beforerateLimitvia aglobalThis-backed extension registry; on a clean pass it setsbypassUser/KeyGlobalCostflags so the mainline cost gate is skipped for that axis and model-group spend is excluded from global aggregation viacounted_in_*_globalmarkers.resolveCountedFlagsfreezes per-axis counted flags at settle time; model bucket lease decrements run unconditionally while mainline decrements are skipped for bypassed axes; the all-null-caps edge case (a limit row with every cost window set to null) causes bypass to be set with no enforcement in place, effectively making that subject unlimited.Confidence Score: 3/5
Safe to merge only after the all-null-caps bypass is addressed; until then a misconfigured or placeholder limit row silently removes all cost enforcement for the matched subject.
The bypass flag in the guard is set unconditionally for every bucket that passes checkCostLimits, including buckets where every cost window is null (no active limit). A model_group_limits row created with only the required foreign-key fields and no dollar limits — which the upsert schema and DB both permit — causes checkCostLimits to return { allowed: true, failOpen: false }, triggering bypass and removing both the model-group and the mainline global cost gate for the matched user or key. This is a present defect on the changed enforcement path, not a theoretical future concern.
src/app/v1/_lib/proxy/model-rate-limit-guard.ts (bypass loop) and src/lib/api/v1/schemas/model-limits.ts (upsert validation)
Important Files Changed
Sequence Diagram
sequenceDiagram participant C as Client participant GP as GuardPipeline participant MG as ModelRateLimitGuard participant BC as BucketCheckService participant RL as RateLimitGuard participant FW as Forwarder participant RH as ResponseHandler C->>GP: POST /v1/chat GP->>GP: auth / model / session / provider steps GP->>MG: execute(session) MG->>BC: resolveModelLimits(userId, keyId, model) BC-->>MG: buckets[] MG->>BC: checkCostLimits(bucket) x N BC-->>MG: "{allowed, failOpen}" alt violation MG-->>GP: throw RateLimitError GP->>GP: releaseProviderSessionRefs else "all passed and failOpen=false" MG->>MG: setBypassUser/KeyGlobalCost(true) end GP->>RL: execute(session) RL->>RL: "skip cost gates where bypass=true" GP->>FW: forward to upstream FW-->>RH: response + cost RH->>RH: resolveCountedFlags(!bypass) RH->>RH: updateMessageRequest(countedFlags) par model bucket decrements RH->>BC: decrementLease(bucket, cost) x N and mainline decrements non-bypassed axes RH->>RL: decrementLeaseBudget(axis, cost) end RH-->>C: proxied responseComments Outside Diff (1)
src/app/v1/_lib/proxy/guard-pipeline.ts, line 270-280 (link)providerstep now runs beforerateLimit, reversing the original cheap-gate orderIn the original
CHAT_PIPELINE,rateLimitran beforeprovider, so a rate-limited request was rejected with zero provider-selection overhead. The reordering means every rate-limited request now incurs a full provider-selection DB query and a Redis ZADD before failing.bugfix #03correctly cleans up the ZADD, but this adds overhead for the common "quota exhausted" rejection path.Prompt To Fix With AI
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix All With AI
Reviews (3): Last reviewed commit: "fix(rate-limit): 修复二轮代码审核问题(故障切换/总额缓存/fa..." | Re-trigger Greptile