fix(proxy): exclude upstream 400/422 from breaker#1228
Conversation
📝 WalkthroughWalkthrough新增上游错误分类逻辑,将HTTP 400和422状态码的ProxyError重新分类为不可重试的客户端错误,阻止重试循环和熔断器触发。包含相应的单元测试验证此行为。 Changes上游非重试错误处理
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a check to classify upstream 400 and 422 errors as non-retryable client errors, preventing them from triggering retries or counting against the provider's circuit breaker in the standard forwarding path. A corresponding unit test was also added. The reviewer noted that this mapping is missing in the streaming hedged path (sendStreamingWithHedge -> handleAttemptFailure), which would lead to inconsistent behavior where 400/422 errors still trigger failovers and affect the circuit breaker. It is recommended to apply this logic to the streaming path as well.
| if ( | ||
| errorCategory === ErrorCategory.PROVIDER_ERROR && | ||
| isNonRetryableUpstreamRequestError(lastError) | ||
| ) { | ||
| errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR; | ||
| } |
There was a problem hiding this comment.
While this correctly maps 400/422 upstream errors to NON_RETRYABLE_CLIENT_ERROR in the standard forwarding path, this mapping is completely missing in the streaming hedged path (sendStreamingWithHedge -> handleAttemptFailure).
As a result, if a streaming hedged request fails with an upstream 400 or 422 error:
- It will still be treated as
PROVIDER_ERROR. - It will count against the provider's circuit breaker via
recordFailure. - It will trigger
launchAlternative()and attempt to failover/retry with other providers.
To ensure consistent behavior across both paths, please apply this mapping in handleAttemptFailure (around line 3910) as well, and add a corresponding test case in proxy-forwarder-retry-limit.test.ts to cover the hedged path.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 1691-1696: The hedge/streaming failure path doesn't apply the same
400/422 reclassification as ProxyForwarder.send(); update
sendStreamingWithHedge()—specifically inside its handleAttemptFailure logic—to
check when errorCategory === ErrorCategory.PROVIDER_ERROR &&
isNonRetryableUpstreamRequestError(lastError) and then set errorCategory =
ErrorCategory.NON_RETRYABLE_CLIENT_ERROR so streaming attempts with upstream
400/422 are treated non-retryable and do not count toward circuit-breaker
metrics (same fix as in ProxyForwarder.send()).
In `@tests/unit/proxy/proxy-forwarder-retry-limit.test.ts`:
- Around line 903-929: 在 tests/unit/proxy/proxy-forwarder-retry-limit.test.ts
中补一条与现有 "upstream 400 should not retry or count against provider circuit
breaker" 对应的用例来覆盖 422 场景:复制当前测试逻辑但将 doForward.mockImplementationOnce 抛出的
ProxyError 状态码改为 422,并对 ProxyForwarder.send(session) 断言会 reject 且 statusCode 为
422,同时断言 ProxyForwarder.doForward 仅被调用一次并且 mocks.recordFailure 未被调用(保留对
createSession、createProvider、provider.maxRetryAttempts 等相同的设置和断言)。
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8668fe85-caa3-4bc3-a97a-a36122fe1d23
📒 Files selected for processing (2)
src/app/v1/_lib/proxy/forwarder.tstests/unit/proxy/proxy-forwarder-retry-limit.test.ts
| if ( | ||
| errorCategory === ErrorCategory.PROVIDER_ERROR && | ||
| isNonRetryableUpstreamRequestError(lastError) | ||
| ) { | ||
| errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR; | ||
| } |
There was a problem hiding this comment.
400/422 仅在非 hedge 路径重分类,流式请求仍可能误重试并计入熔断。
这段改动只覆盖 ProxyForwarder.send() 主路径;sendStreamingWithHedge() 的 handleAttemptFailure 没有复用同一判定,流式场景下 400/422 仍会按 PROVIDER_ERROR 处理,和本次“上游 400/422 不重试且不计熔断”的目标不一致。建议在 hedge 失败分类处同步加入同样的重分类分支。
建议补丁
diff --git a/src/app/v1/_lib/proxy/forwarder.ts b/src/app/v1/_lib/proxy/forwarder.ts
@@
if (reactiveRectifierResult.matched) {
@@
}
+
+ if (
+ errorCategory === ErrorCategory.PROVIDER_ERROR &&
+ isNonRetryableUpstreamRequestError(error)
+ ) {
+ errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR;
+ lastErrorCategory = errorCategory;
+ }
if (errorCategory === ErrorCategory.NON_RETRYABLE_CLIENT_ERROR) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/app/v1/_lib/proxy/forwarder.ts` around lines 1691 - 1696, The
hedge/streaming failure path doesn't apply the same 400/422 reclassification as
ProxyForwarder.send(); update sendStreamingWithHedge()—specifically inside its
handleAttemptFailure logic—to check when errorCategory ===
ErrorCategory.PROVIDER_ERROR && isNonRetryableUpstreamRequestError(lastError)
and then set errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR so
streaming attempts with upstream 400/422 are treated non-retryable and do not
count toward circuit-breaker metrics (same fix as in ProxyForwarder.send()).
| test("upstream 400 should not retry or count against provider circuit breaker", async () => { | ||
| const session = createSession(); | ||
| const provider = createProvider({ | ||
| providerVendorId: null, | ||
| maxRetryAttempts: 4, | ||
| }); | ||
| session.setProvider(provider); | ||
|
|
||
| const doForward = vi.spyOn( | ||
| ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown }, | ||
| "doForward" | ||
| ); | ||
| doForward.mockImplementationOnce(async () => { | ||
| throw new ProxyError("Provider returned 400: Bad Request", 400, { | ||
| body: '{"detail":"Bad Request"}', | ||
| providerId: provider.id, | ||
| providerName: provider.name, | ||
| }); | ||
| }); | ||
|
|
||
| await expect(ProxyForwarder.send(session)).rejects.toMatchObject({ | ||
| statusCode: 400, | ||
| }); | ||
|
|
||
| expect(doForward).toHaveBeenCalledTimes(1); | ||
| expect(mocks.recordFailure).not.toHaveBeenCalled(); | ||
| }); |
There was a problem hiding this comment.
建议补一条 422 用例,避免目标行为只被 400 半覆盖。
当前新增断言只覆盖 400;本次行为定义同时包含 422,建议并列增加 ProxyError(..., 422) 场景,断言同样“只调用 1 次 doForward 且不调用 recordFailure”。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/proxy/proxy-forwarder-retry-limit.test.ts` around lines 903 - 929,
在 tests/unit/proxy/proxy-forwarder-retry-limit.test.ts 中补一条与现有 "upstream 400
should not retry or count against provider circuit breaker" 对应的用例来覆盖 422
场景:复制当前测试逻辑但将 doForward.mockImplementationOnce 抛出的 ProxyError 状态码改为 422,并对
ProxyForwarder.send(session) 断言会 reject 且 statusCode 为 422,同时断言
ProxyForwarder.doForward 仅被调用一次并且 mocks.recordFailure 未被调用(保留对
createSession、createProvider、provider.maxRetryAttempts 等相同的设置和断言)。
Summary
Treat upstream 400/422
ProxyErrorresponses as non-retryable client errors, preventing unnecessary retries and unfair circuit breaker penalization of healthy providers.Problem
When an upstream provider returns HTTP 400 (Bad Request) or 422 (Unprocessable Entity), the proxy previously classified these as
PROVIDER_ERROR. This caused two issues:This also contributes to the retry storm behavior described in #854, where upstream 400 errors (e.g., malformed image URLs with large base64 payloads) trigger repeated retries that generate oversized provider chains and stuck request records.
Related Issues:
Solution
Reclassify upstream
ProxyErrorresponses with status codes 400 or 422 fromPROVIDER_ERRORtoNON_RETRYABLE_CLIENT_ERRORbefore retry and circuit breaker accounting. The existingNON_RETRYABLE_CLIENT_ERRORpath already handles immediate return without retry or circuit breaker impact.Changes
Core Changes
src/app/v1/_lib/proxy/forwarder.ts(+11)isNonRetryableUpstreamRequestError()type guard that identifiesProxyErrorinstances with status 400 or 422PROVIDER_ERRORand matches 400/422, reclassify asNON_RETRYABLE_CLIENT_ERRORTests
tests/unit/proxy/proxy-forwarder-retry-limit.test.ts(+35)doForwardcalled exactly once)recordFailurenot called)Testing
Checklist
devDescription enhanced by Claude AI