fix(proxy): 区分上游空内容流与首字超时#186
Merged
Merged
Conversation
之前 waitForFirstStreamContent 的 streamDoneBeforeContentPromise 分支也抛 FirstByteTimeoutError(timeoutMs),导致上游 SSE 流在 2~5 秒内正常关闭却没 产生 content-bearing chunk 时,日志固定显示 "Upstream first byte timed out after 30s"——把"配置阈值"渲染成了"实际等待时长"。生产日志多次复现该误导。 新增 UpstreamNoContentStreamError 表达"流已结束但无内容"的真实语义,构造 函数同时接收实际耗时与配置阈值。FailoverErrorType 增加 upstream_no_content_stream,getErrorType / isFailoverableError 单独识别该 类,故障转移行为保持一致。 顺手补齐 retryErrorType 字典里此前缺失的 first_byte_timeout、 stream_idle_timeout、stream_error 三项中英文,避免运行时回落到原始枚举名。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #186 +/- ##
==========================================
+ Coverage 74.14% 74.15% +0.01%
==========================================
Files 145 145
Lines 11043 11048 +5
Branches 3832 3832
==========================================
+ Hits 8188 8193 +5
Misses 1657 1657
Partials 1198 1198
🚀 New features to boost your workflow:
|
routing-decision-timeline.tsx 把 upstream_no_content_stream 并入 timeout 同款 Clock + 警告色,避免与 first_byte_timeout / stream_idle_timeout 同族不同色。 failover-circuit.md 异常类列表追加 UpstreamNoContentStreamError,并补一段 说明它与 FirstByteTimeoutError 的语义差异;顺手修正 isFailoverableError 的行号区间,跟随上一次提交带来的函数体扩展。
This was referenced May 24, 2026
g1331
added a commit
that referenced
this pull request
May 25, 2026
…卷挂载现状 (#167, #188) Self-review 抽读时发现四处与仓库现状不符的事实陈述,本次一并修正: 1. database.md / upgrade-rollback.md 关于「容器不会自动跑迁移」「需要部署人手工触发 pnpm db:migrate」的描述与 scripts/docker-entrypoint.sh 现状矛盾。 该 entrypoint 在应用启动前会自动跑一遍内嵌的 migration runner(不依赖 drizzle-kit,按文件名顺序 apply drizzle/*.sql、用 __drizzle_migrations 表去重)。改为说明自动 apply 行为,并把破坏性迁移段重写为「entrypoint 仍 forward apply,回滚必须靠 pg_dump」。 2. database.md / upgrade-rollback.md 建议的 `docker compose exec autorouter node node_modules/drizzle-kit/bin.cjs migrate` 在生产镜像内无法执行。 Dockerfile standalone runner stage 只 copy postgres 这一个 node_modules 子包,drizzle-kit 是 devDependency 不进镜像。改为推荐「重启 autorouter 让 entrypoint 重跑」或「docker run --rm --entrypoint /app/docker-entrypoint.sh ghcr.io/g1331/autorouter:vN.N.N true」这种把 entrypoint 与 server.js 解耦的临时容器写法。 3. persistence-backup.md 关于「RECORDER_FIXTURES_DIR 通常会挂入 autorouter-data named volume(如默认编排)」的描述错误。 docker-compose.yml 中 RECORDER_FIXTURES_DIR 默认值是 `tests/fixtures`,相对容器内 /app/,实际写到 /app/tests/fixtures,不在任何 named volume 上——容器重建即丢。补 ::: danger ::: 容器警告,并显式给出「显式把 RECORDER_FIXTURES_DIR 指到 /app/data/...」的修复路径。 4. contributing.md 关于「推荐用 squash merge」与仓库实际 merge commit 历史(PR #184/#185/#186 都是 Merge pull request 形态)冲突。改为陈述「近期实际历史以 merge commit 为主,cliff.toml 显式 skip 这类 commit」,把策略选择留给 reviewer。 来源对照段同步补 scripts/docker-entrypoint.sh 与 Dockerfile 两项依据。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
修复
waitForFirstStreamContent错误消息把"配置阈值"误当成"实际等待时长"的语义 bug。原先streamDoneBeforeContentPromise分支与真超时分支共用FirstByteTimeoutError(timeoutMs),导致上游 SSE 流在 2–5 秒内正常关闭却没产生 content-bearing chunk 时,UI 与日志固定显示Upstream first byte timed out after 30s—— 实际只等了 3 秒。UpstreamNoContentStreamError,构造函数同时接收elapsedMs(实际耗时)与firstByteTimeoutMs(配置阈值),文案如实呈现Upstream closed SSE stream after 3.58s without producing any content-bearing chunk (first-byte timeout config: 30s)。FailoverErrorType增加upstream_no_content_stream;getErrorType与isFailoverableError单独识别该类,故障转移行为与之前一致。retryErrorType字典里此前缺失的first_byte_timeout/stream_idle_timeout/stream_error三项中英文,避免运行时回落到原始枚举名。Investigation evidence
生产环境
rc-cx-pro上游六条 503 的实测耗时 vs 错误文案:timed out after 30stimed out after 30stimed out after 30stimed out after 30stimed out after 30stimed out after 30sfirstByteTimeout配置为默认 30 秒,但每条都在 2–5 秒结束。setTimeout计时器没有触发,命中的是streamDoneBeforeContentPromise分支——上游正常完成了 SSE 流但只发送 metadata 事件。Behavior changes
error_type由first_byte_timeout变更为upstream_no_content_stream。之前依赖匹配first_byte_timeout的失败规则需要追加upstream_no_content_stream才能继续覆盖该场景。request_logs行保留原first_byte_timeout值,不做迁移。recordFailure调用变更(电路熔断行为保持原状);如需让该错误也计入熔断失败计数,应作为独立 PR 评估。Test plan
pnpm exec tsc --noEmitpnpm lintpnpm format:checkpnpm test:run(147 文件 / 2487 用例全绿,其中tests/unit/services/proxy-client.test.ts的 path B 用例已改为期望新错误类)