Skip to content

fix(proxy): exclude upstream 400/422 from breaker#1228

Draft
ROOOO wants to merge 1 commit into
ding113:devfrom
ROOOO:codex/exclude-400-422-breaker
Draft

fix(proxy): exclude upstream 400/422 from breaker#1228
ROOOO wants to merge 1 commit into
ding113:devfrom
ROOOO:codex/exclude-400-422-breaker

Conversation

@ROOOO

@ROOOO ROOOO commented May 30, 2026

Copy link
Copy Markdown

Summary

Treat upstream 400/422 ProxyError responses as non-retryable client errors, preventing unnecessary retries and unfair circuit breaker penalization of healthy providers.

Problem

When an upstream provider returns HTTP 400 (Bad Request) or 422 (Unprocessable Entity), the proxy previously classified these as PROVIDER_ERROR. This caused two issues:

  1. Wasteful retries: The same malformed request is retried against the same or different providers, but the 400/422 will recur since the problem is in the request itself, not the provider.
  2. Unfair circuit breaker impact: These errors count against the provider's health score, potentially marking healthy providers as unhealthy and excluding them from rotation.

This also contributes to the retry storm behavior described in #854, where upstream 400 errors (e.g., malformed image URLs with large base64 payloads) trigger repeated retries that generate oversized provider chains and stuck request records.

Related Issues:

Solution

Reclassify upstream ProxyError responses with status codes 400 or 422 from PROVIDER_ERROR to NON_RETRYABLE_CLIENT_ERROR before retry and circuit breaker accounting. The existing NON_RETRYABLE_CLIENT_ERROR path already handles immediate return without retry or circuit breaker impact.

Changes

Core Changes

  • src/app/v1/_lib/proxy/forwarder.ts (+11)
    • Add isNonRetryableUpstreamRequestError() type guard that identifies ProxyError instances with status 400 or 422
    • Add reclassification logic after error categorization: if error is PROVIDER_ERROR and matches 400/422, reclassify as NON_RETRYABLE_CLIENT_ERROR

Tests

  • tests/unit/proxy/proxy-forwarder-retry-limit.test.ts (+35)
    • New test: verify upstream 400 does not trigger retry (doForward called exactly once)
    • New test: verify upstream 400 does not record failure against circuit breaker (recordFailure not called)

Testing

bunx vitest run tests/unit/proxy/proxy-forwarder-retry-limit.test.ts

Checklist

  • Code follows project conventions
  • Tests added for upstream 400 behavior
  • Target branch is dev

Description enhanced by Claude AI

@coderabbitai

coderabbitai Bot commented May 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

新增上游错误分类逻辑,将HTTP 400和422状态码的ProxyError重新分类为不可重试的客户端错误,阻止重试循环和熔断器触发。包含相应的单元测试验证此行为。

Changes

上游非重试错误处理

Layer / File(s) Summary
错误分类函数和应用
src/app/v1/_lib/proxy/forwarder.ts
新增isNonRetryableUpstreamRequestError()守卫函数,当ProxyErrorstatusCode为400或422时返回true。在send()的异常处理分支中调用此函数,将PROVIDER_ERROR分类改写为NON_RETRYABLE_CLIENT_ERROR,从而跳过重试和熔断器计数。
重试行为验证测试
tests/unit/proxy/proxy-forwarder-retry-limit.test.ts
新增ProxyForwarder - client request upstream errors测试套件,验证上游400错误场景下doForward仅被调用一次(无重试)且recordFailure不被调用(不触发熔断器)。

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • ding113/claude-code-hub#651: 两个PR均修改ProxyForwarder.send()中基于错误分类的重试控制流,涉及相同的retry/endpoint-switching逻辑。
  • ding113/claude-code-hub#649: 两个PR都在src/app/v1/_lib/proxy/forwarder.ts中修改基于特定上游错误类型的分类和重试控制,影响相同的重试路由逻辑。
  • ding113/claude-code-hub#751: 两个PR均修改ProxyForwarder.send()ProxyError的处理方式,涉及上游错误路径的直接连接。

Suggested reviewers

  • ding113
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR标题清晰明确地概括了主要改动:排除上游400/422错误的熔断器计数,与实现内容高度相关。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed 拉取请求的描述清晰地解释了问题、解决方案和具体的代码变更,与提供的文件更改内容完全一致。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot requested a review from ding113 May 30, 2026 12:22

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a check to classify upstream 400 and 422 errors as non-retryable client errors, preventing them from triggering retries or counting against the provider's circuit breaker in the standard forwarding path. A corresponding unit test was also added. The reviewer noted that this mapping is missing in the streaming hedged path (sendStreamingWithHedge -> handleAttemptFailure), which would lead to inconsistent behavior where 400/422 errors still trigger failovers and affect the circuit breaker. It is recommended to apply this logic to the streaming path as well.

Comment on lines +1691 to +1696
if (
errorCategory === ErrorCategory.PROVIDER_ERROR &&
isNonRetryableUpstreamRequestError(lastError)
) {
errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While this correctly maps 400/422 upstream errors to NON_RETRYABLE_CLIENT_ERROR in the standard forwarding path, this mapping is completely missing in the streaming hedged path (sendStreamingWithHedge -> handleAttemptFailure).

As a result, if a streaming hedged request fails with an upstream 400 or 422 error:

  1. It will still be treated as PROVIDER_ERROR.
  2. It will count against the provider's circuit breaker via recordFailure.
  3. It will trigger launchAlternative() and attempt to failover/retry with other providers.

To ensure consistent behavior across both paths, please apply this mapping in handleAttemptFailure (around line 3910) as well, and add a corresponding test case in proxy-forwarder-retry-limit.test.ts to cover the hedged path.

@github-actions github-actions Bot added bug Something isn't working area:provider labels May 30, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/app/v1/_lib/proxy/forwarder.ts`:
- Around line 1691-1696: The hedge/streaming failure path doesn't apply the same
400/422 reclassification as ProxyForwarder.send(); update
sendStreamingWithHedge()—specifically inside its handleAttemptFailure logic—to
check when errorCategory === ErrorCategory.PROVIDER_ERROR &&
isNonRetryableUpstreamRequestError(lastError) and then set errorCategory =
ErrorCategory.NON_RETRYABLE_CLIENT_ERROR so streaming attempts with upstream
400/422 are treated non-retryable and do not count toward circuit-breaker
metrics (same fix as in ProxyForwarder.send()).

In `@tests/unit/proxy/proxy-forwarder-retry-limit.test.ts`:
- Around line 903-929: 在 tests/unit/proxy/proxy-forwarder-retry-limit.test.ts
中补一条与现有 "upstream 400 should not retry or count against provider circuit
breaker" 对应的用例来覆盖 422 场景:复制当前测试逻辑但将 doForward.mockImplementationOnce 抛出的
ProxyError 状态码改为 422,并对 ProxyForwarder.send(session) 断言会 reject 且 statusCode 为
422,同时断言 ProxyForwarder.doForward 仅被调用一次并且 mocks.recordFailure 未被调用(保留对
createSession、createProvider、provider.maxRetryAttempts 等相同的设置和断言)。
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8668fe85-caa3-4bc3-a97a-a36122fe1d23

📥 Commits

Reviewing files that changed from the base of the PR and between ed95b48 and bfb50d8.

📒 Files selected for processing (2)
  • src/app/v1/_lib/proxy/forwarder.ts
  • tests/unit/proxy/proxy-forwarder-retry-limit.test.ts

Comment on lines +1691 to +1696
if (
errorCategory === ErrorCategory.PROVIDER_ERROR &&
isNonRetryableUpstreamRequestError(lastError)
) {
errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

400/422 仅在非 hedge 路径重分类,流式请求仍可能误重试并计入熔断。

这段改动只覆盖 ProxyForwarder.send() 主路径;sendStreamingWithHedge()handleAttemptFailure 没有复用同一判定,流式场景下 400/422 仍会按 PROVIDER_ERROR 处理,和本次“上游 400/422 不重试且不计熔断”的目标不一致。建议在 hedge 失败分类处同步加入同样的重分类分支。

建议补丁
diff --git a/src/app/v1/_lib/proxy/forwarder.ts b/src/app/v1/_lib/proxy/forwarder.ts
@@
       if (reactiveRectifierResult.matched) {
@@
       }
+
+      if (
+        errorCategory === ErrorCategory.PROVIDER_ERROR &&
+        isNonRetryableUpstreamRequestError(error)
+      ) {
+        errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR;
+        lastErrorCategory = errorCategory;
+      }
 
       if (errorCategory === ErrorCategory.NON_RETRYABLE_CLIENT_ERROR) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/v1/_lib/proxy/forwarder.ts` around lines 1691 - 1696, The
hedge/streaming failure path doesn't apply the same 400/422 reclassification as
ProxyForwarder.send(); update sendStreamingWithHedge()—specifically inside its
handleAttemptFailure logic—to check when errorCategory ===
ErrorCategory.PROVIDER_ERROR && isNonRetryableUpstreamRequestError(lastError)
and then set errorCategory = ErrorCategory.NON_RETRYABLE_CLIENT_ERROR so
streaming attempts with upstream 400/422 are treated non-retryable and do not
count toward circuit-breaker metrics (same fix as in ProxyForwarder.send()).

Comment on lines +903 to +929
test("upstream 400 should not retry or count against provider circuit breaker", async () => {
const session = createSession();
const provider = createProvider({
providerVendorId: null,
maxRetryAttempts: 4,
});
session.setProvider(provider);

const doForward = vi.spyOn(
ProxyForwarder as unknown as { doForward: (...args: unknown[]) => unknown },
"doForward"
);
doForward.mockImplementationOnce(async () => {
throw new ProxyError("Provider returned 400: Bad Request", 400, {
body: '{"detail":"Bad Request"}',
providerId: provider.id,
providerName: provider.name,
});
});

await expect(ProxyForwarder.send(session)).rejects.toMatchObject({
statusCode: 400,
});

expect(doForward).toHaveBeenCalledTimes(1);
expect(mocks.recordFailure).not.toHaveBeenCalled();
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

建议补一条 422 用例,避免目标行为只被 400 半覆盖。

当前新增断言只覆盖 400;本次行为定义同时包含 422,建议并列增加 ProxyError(..., 422) 场景,断言同样“只调用 1 次 doForward 且不调用 recordFailure”。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/proxy/proxy-forwarder-retry-limit.test.ts` around lines 903 - 929,
在 tests/unit/proxy/proxy-forwarder-retry-limit.test.ts 中补一条与现有 "upstream 400
should not retry or count against provider circuit breaker" 对应的用例来覆盖 422
场景:复制当前测试逻辑但将 doForward.mockImplementationOnce 抛出的 ProxyError 状态码改为 422,并对
ProxyForwarder.send(session) 断言会 reject 且 statusCode 为 422,同时断言
ProxyForwarder.doForward 仅被调用一次并且 mocks.recordFailure 未被调用(保留对
createSession、createProvider、provider.maxRetryAttempts 等相同的设置和断言)。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider bug Something isn't working

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant