Conversation
Adds an `enableOpenaiResponsesWebsocket` system setting (default on) that lets clients connect to `/v1/responses` over WebSocket. CCH accepts the upgrade via a new custom Node server that wraps the Next.js handler, then tunnels each client `response.create` frame through the existing HTTP proxy pipeline using an `x-cch-client-transport: websocket` marker. For Codex providers — and only Codex — the forwarder pre-flights an upstream WebSocket dial; on handshake rejection or close-before-first-event it gracefully falls back to the existing HTTP path while keeping the client WebSocket open. Fallbacks are recorded on the decision chain (`responses_ws_attempted` / `responses_ws_fallback`) and do NOT count toward provider/endpoint/vendor circuit-breaker accounting, mirroring the existing `http2_fallback` isolation pattern. Non-WebSocket clients, non-Codex providers, and all existing HTTP/SSE behavior are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
server.js:
- Cap per-connection pending-frame queue (count + bytes) to bound DoS surface;
close with 1008 when exceeded.
- Log `error` events instead of silently swallowing them, and add `.catch`
branches around `drain()` so async failures cannot become unhandled rejections.
- Track the in-flight internal `http.ClientRequest` and `req.destroy()` it on
client WebSocket close/error so orphaned upstream streams stop leaking
TCP/provider concurrency slots.
- Resolve the internal tunnel host from the configured `HOSTNAME` (with
loopback fallback for `0.0.0.0`/`::`) so containerized deployments that bind
to a non-loopback interface don't ECONNREFUSED their own tunnel.
- Sanitize `req.url` before logging on `ws_client_connected` (only `model` is
allow-listed; everything else is masked) so api_key/token query values do
not land in structured logs.
- Parse SSE event boundaries with /\r?\n\r?\n/ and trim `[DONE]` so
CRLF-terminated streams (common in real upstreams) are decoded correctly.
upstream-adapter.ts:
- Share the hop-by-hop / shape-header filter between the `Headers` and plain
`Record<string,string>` branches, so a future caller passing a plain object
can't leak `connection`/`host`/`content-length` etc. into the WS upgrade.
- Add a 20s first-event timeout to bound the wait after a successful upgrade
but before any frame arrives — silent upstreams no longer hang the request.
- When the upstream WS closes mid-stream without emitting a terminal event,
enqueue a synthetic `{type:"error", error:{code:"upstream_ws_mid_stream_error"}}`
frame so the downstream pipeline (fake-200 detection, finalization) treats
the truncated stream as an error rather than a clean success.
eligibility.ts:
- Header lookup on a plain `Record<string,string>` is now case-insensitive,
matching HTTP header semantics.
forwarder.ts:
- Pass the real `endpointAudit.endpointId` into both
`evaluateResponsesWsEligibility` and `markResponsesWsUnsupported`, restoring
per-endpoint isolation in the unsupported cache.
- Decode the *final* outgoing `requestBody` (Buffer/string) into JSON for the
WS frame so private-parameter filtering and `requestFilterEngine.applyFinal`
rewrites apply identically on the WS and HTTP paths.
- Drop a stray decorative emoji from a code comment (CLAUDE.md rule 1).
settings UI:
- Add an aria-label on the new toggle so screen readers can identify it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e WS bounds Address second-round PR review: Header spoofing (P1): the `x-cch-client-transport` marker was previously trusted from any inbound HTTP caller, so an external client could `curl` /v1/responses with that header and trick the forwarder into attempting an upstream WebSocket dial. Fixed in three layers: - New `responses-ws/internal-secret.ts` exposes a per-process random secret (env-overridable) plus `verifyInternalRequest()`. - `server.js` strips ALL inbound `x-cch-*` headers from the client WS request before tunneling to the internal HTTP loopback, then sets the internal markers (`x-cch-client-transport`, `x-cch-responses-ws-forward`, `x-cch-internal-secret`) itself. The secret is generated at server startup and lives only in this process. - `eligibility.ts#isWebsocketClientRequest` now requires both the public marker AND a successful `verifyInternalRequest()`, so spoofed external requests fall straight to the HTTP path with `isWebsocketClient: false`. WebSocket frame size: `WebSocketServer` now uses `maxPayload: 1 MiB` (default is 100 MiB) — public WS endpoints should not accept arbitrarily large frames. Upstream queue bounds: the upstream adapter caps buffered upstream bytes at 8 MiB. If the SSE consumer stalls or the upstream blasts faster than we can drain, we tear down the WS and emit a synthetic `upstream_ws_queue_overflow` error rather than growing the heap unbounded. The existing pop path now also decrements the byte counter. Unsupported-cache scoping: `UpstreamWsFailure` carries a new `cacheableAsUnsupported` flag. We only mark an endpoint as WS-unsupported when the upgrade is rejected with HTTP 400/404/405/426/501 — transient failures (401/403, 5xx, network errors, silent upstream, first-event timeout, mid-handshake aborts) re-probe on the next request. The forwarder honors the flag. Tracing slim-down: `outputFileTracingIncludes` no longer pulls in the entire `node_modules/next/**/*` tree. Only `next/dist/**/*` and `next/package.json` (plus `ws/**/*`) are explicitly included; everything else is left to Next's static analysis. Tests: - `internal-secret.test.ts`: full coverage of generate/honor/verify paths, including missing/wrong/case-mixed headers. - `eligibility.test.ts`: dedicated spoofing-prevention cases (no secret, wrong secret, missing forward flag) all return `isWebsocketClient:false`, not a WS-downgrade. - `upstream-adapter.test.ts`: `previous_response_id` + `store=false` continuity across two consecutive turns; HTTP 426/404/501 classified as cacheable-unsupported; HTTP 401/503 classified as NOT cacheable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a system-configured fake-streaming whitelist that keeps long synchronous generation requests (image / video) alive past Cloudflare's 120s no-body timeout. For whitelisted requests the proxy returns text/event-stream immediately, sends an SSE comment heartbeat (`: ping\n\n`) every 5 seconds while it serially calls upstream providers, validates the buffered upstream response is non-empty, and only then emits a final protocol-compatible stream (or a protocol-compatible error event on terminal failure). Non-stream requests follow the same buffer + validate semantics without heartbeat. Behavior: - Default whitelist (auto-enabled when system_settings has no persisted value): gpt-image-2, gpt-image-1.5, gemini-3.1-flash-image-preview, gemini-3-pro-image-preview. - Persisted empty array is preserved as explicit opt-out. - Provider-group restriction supported per model entry; empty groupTags = all groups. - Reuses ProxyForwarder's existing multi-provider loop and fake-200 detection for fallback; the validator catches additional cases (empty content array, comment-only SSE, etc.) and returns 502 if the response is undeliverable. - Client abort cleans up heartbeat timer and stops further attempts; abort is not counted as a provider failure. Includes: - Drizzle migration 0099 adds `fake_streaming_whitelist` JSONB column. - Eligibility matcher, stream intent / non-stream clone helpers, response validator, and protocol emitters for Anthropic / OpenAI Chat / OpenAI Responses / Gemini. - Settings UI section with model + group multi-select, all-groups hint, and i18n in zh-CN / zh-TW / en / ja / ru. - 121 unit tests covering settings persistence, UI, eligibility matching, stream-intent detection, response validation, protocol emission, the orchestrator (no-race + abort cleanup), and the runner (heartbeat + final emission + non-stream JSON path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Persist providers.custom_headers (jsonb) and merge it into outbound proxy
requests with a strict precedence: default outbound overrides ->
provider customHeaders -> auth headers -> final request filter. Both the
provider edit form (Options section) and the API test button render a
JSON textarea wired to a shared parser/validator that returns stable
error codes for localized messages.
Validation rejects protected auth names (authorization, x-api-key,
x-goog-api-key), CRLF, non-string values, malformed JSON, duplicate
names (case-insensitive), and invalid HTTP token names. Empty input
and {} normalize to null. The forwarder defensively re-strips protected
names so a stale DB row cannot bypass auth.
Includes 81 new tests across the shared parser, validation schemas,
proxy forwarder, provider test action, and the two textareas. Audit
emit redacts custom_headers/customHeaders to keep secrets out of audit
trails. Five-locale i18n added under apiTest.customHeaders and
sections.routing.customHeaders.
Closes #943, closes #944.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- proxy-integration: invoke ProxyForwarder cleanup hooks (clearResponseTimeout, releaseAgent) after consuming the buffered response to avoid leaking response timeout timers and agent-pool reservations. - proxy-handler: stop swallowing fake-streaming errors. Falling back to the normal forwarder path with a mutated session would either double-hit upstream (duplicating cost / message context) or run the request with a rewritten Gemini URL. Let the outer error handler turn the failure into a protocol error response instead. - orchestrator: align isStream default with documented semantics (defaults to false / non-stream validation), and close the race window between the loop-top abort check and the listener binding by re-checking abortSignal immediately after addEventListener. - runner: propagate `isStream: false` to the orchestrator (the buffered body is JSON, not SSE), and remove the synchronous non-stream placeholder that silently returned 200 on upstream failure. `buildFakeStreamingResponse` now throws fast on `isStream: false`; production callers go through `buildFakeStreamingNonStreamResponse` which returns accurate 200 / 502 / 499. - response-validator: drop the response.created false positive so streams that contain only metadata events (no output) are no longer accepted as deliverable. - system-settings-form: merge groupTags when the same model appears multiple times instead of silently dropping the later entry — server schema rejects duplicates so the client should aggregate intent before submission. - transformers: collapse the duplicated null/undefined and non-array branches for fakeStreamingWhitelist normalization. - proxy-integration: log a warning when session.clientAbortSignal is null so the silent abort-propagation degradation in edge / test environments is visible. All 121 fake-streaming unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Existing tests mock `getCachedSystemSettings` with a partial SystemSettings shape that doesn't include the new `fakeStreamingWhitelist` field. When my proxy-handler hook reaches into that field, `isFakeStreamingEligible` would crash with "Cannot read properties of undefined" before the request ever reached the normal forwarder path, breaking unrelated CI tests (hedge-error-pipeline.test.ts). Treat undefined / null whitelist the same as empty (= no eligibility) so legacy callers and partial mocks remain safe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenAI Chat Completions and Responses report `cached_tokens` as a subset of `prompt_tokens`/`input_tokens`, but internal cost buckets treat `input_tokens` and `cache_read_input_tokens` as disjoint. Without subtraction the cached portion was billed twice for openai-compatible upstreams. Extends the existing codex-only adjustment to cover openai-compatible via a shared Set, with regression tests for non-stream Chat/Responses, SSE, the clamp edge case, and a Claude reverse-protection guard. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review pass on the fake-streaming whitelist work in #1132. Fixes issues that were verified to still exist on top of e3f0ae1 / 6674f70; already-addressed and false-positive bot comments were left untouched. - proxy-handler: reuse the cached system settings already loaded for the highConcurrency / rawCrossProviderFallback flags instead of re-reading the cache; the second read had no fallback path so a transient cache miss escalated otherwise-routable requests into errors. - emitters (Gemini): split the buffered JSON across `data:` lines so pretty-printed (multi-line) upstream bodies are not silently truncated by SSE consumers. - emitters (Anthropic tool_use): emit `content_block_start` with empty `input: {}`, then a `input_json_delta` carrying the serialized input, then `content_block_stop`. Anthropic SDKs ignore inline input on `content_block_start` and only accumulate from deltas. - response-validator: replace the empty `if` for SSE `id:`/`retry:` fields with an explicit comment so the no-op intent is visible. - stream-intent: handle multi-valued `alt` (e.g. `?alt=json&alt=sse`) in both detection and stripping; only `sse` occurrences are removed. - runner (non-stream path): wrap orchestrator in try/catch and surface a structured 502 JSON `runner_error` instead of letting non-abort exceptions bubble past the fake-streaming contract. - system-settings-form: when merging duplicate model rows, treat any empty `groupTags` as the broader "all groups" scope and never narrow it by unioning explicit tags from sibling rows. - system-config (legacy update fallback): clone the deeper retry payloads from the already-pruned object instead of the original `updates`, otherwise the freshly-removed `fakeStreamingWhitelist` (and other newer columns) get reintroduced and break partially-migrated schemas. Tests: tool_use input_json_delta + null-input; Gemini multi-line JSON SSE round-trip; multi-valued `alt` parsing; "all groups" merge precedence over explicit tags. All 5353 unit tests pass; typecheck, lint, and build are clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves migration index 99 collision with PR #1131 (custom_headers). Renumbered fake_streaming_whitelist to migration idx 100.
…odex providers Renumbered enableOpenaiResponsesWebsocket migration to idx 101 after the custom_headers (99) and fake_streaming_whitelist (100) migrations from PRs #1131 and #1132. Reordered the system_settings cascade fallback so the new WS column is the first to drop, then fakeStreamingWhitelist, then allowNonConversationEndpointProviderFallback, etc. Resolved conflicts in: - drizzle/meta/_journal.json + 0101_snapshot.json (chained after 0100) - package.json (kept dev's @typescript/native-preview version, added @types/ws) - src/app/[locale]/settings/config/_components/system-settings-form.tsx (Plus icon import + both toggles) - src/repository/system-config.ts (extended cascade to handle WS column at the head)
…1142) * docs: update changelog for v0.7.4 [skip ci] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: dedupe multipart mask field and disable provider form autofill - proxy: skip logicalBody keys backed by a preserved file part in syncOpenAIImageMultipartFromLogicalBody, preventing the "[file]" placeholder from being re-emitted as a duplicate text part on /v1/images/edits requests - settings/providers: add autoComplete="off" on the form and text inputs (name, website, endpoint, proxy URL, MCP passthrough); use autoComplete="new-password" on the API key field to suppress browser/password-manager autofill that was overwriting user input Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: add v1 REST management OpenAPI * fix: address v1 management API review feedback * fix: harden v1 public errors and client decoding * fix: address v1 provider secret and csrf cache review * fix: preserve redacted v1 secrets on writes * fix: preserve cursor pagination for usage logs * fix(auth): preserve legacy action compatibility * fix(auth): preserve legacy opaque session provenance * fix(api): address v1 management review feedback * fix(api): preserve usage log cursor pagination * fix(config): tolerate optional env values with zod openapi * fix(validation): preserve optional preprocess schemas with zod openapi * fix(api): address remaining v1 review findings * fix(api): handle provider query validation and webhook secret clears * fix(api): reject unresolved webhook header redactions
* fix: restore dashboard API compatibility * fix: address dashboard compat review feedback * fix: preserve epoch user timestamps * fix: filter hidden provider types in vendor detail
Stop sending deprecated tpm/rpm/rpd/cc fields and coalesce null limit_concurrent_sessions to undefined so create/update no longer trips the strict Zod schema introduced by the v1 OpenAPI refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…1149) * fix: keys:reveal endpoint, redirect fallback regression test, bill-on-failure toggle Bundles four fixes raised against dev: 1. **keys:reveal endpoint (Issue #1)** — Add `GET /api/v1/keys/{id}:reveal` mirroring the provider key:reveal pattern (audit log, no-store cache, admin-only). Rewire dashboard key row + key list to fetch the unmasked key on click instead of pre-loading it in user-list payloads. Removes `fullKey` from `UserKeyDisplay` and replaces it with a `canReveal` permission flag — unmasked keys never leave the server unless explicitly requested. 2. **Model redirect fallback regression test (Issue #2)** — Pin the contract: when fallback occurs, the next provider's redirect rules MUST match against the user-requested model (not the model the previous provider rewrote it to). Static analysis + 4 new test scenarios confirm the existing `ModelRedirector.apply` already implements the contract. Test file now guards against future regression. 3. **499 frequency investigation (Issue #3)** — Read-only audit of the recent abort-listener cleanup commits (bcba5d0, 8968a42). Conclusion: no regression — those commits only fixed listener leaks in normal- completion paths. The user's log (Codex CLI cancelling stream at ~3s) is genuine and correctly classified as 499. No code change for this issue; the new bill-on-failure toggle (#4) gives operators a way to recover token billing for these cases. 4. **billNonSuccessfulRequests toggle (Issue #4)** — New system setting (default OFF). When ON, requests with non-2xx status that received positive token usage from the upstream (e.g., 499 mid-stream client abort with partial usage) are billed normally. The existing fake-200 detector still skips billing for fake-success error payloads regardless of toggle state. Schema migration + cache + UI toggle (5 locales) + unit tests. Tests added: - `tests/unit/actions/keys-reveal.test.ts` — admin/auth/404 paths and audit redaction for the new action. - `tests/unit/proxy/model-redirect-fallback.test.ts` — 4 scenarios for fallback redirect behavior. - `tests/unit/proxy/response-handler-bill-non-success.test.ts` — toggle semantics, fake-200 still skipped, setting-read failure fail-closed. Migration: 0102_useful_lionheart.sql adds `bill_non_successful_requests boolean default false not null` to `system_settings` (idempotent ADD COLUMN IF NOT EXISTS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr-review): address review feedback on keys:reveal Resolves feedback from CodeRabbit, Greptile, gemini-code-assist, and chatgpt-codex on PR #1149: 1. **Allow key owner to reveal their own key (HIGH)** — `canExposeFullKey` already marks an owner's keys as `canReveal: true` in list payloads, but the new `getUnmaskedKey` action and v1 router gated everything behind admin, causing a 403 regression for self-service users. - `getUnmaskedKey`: replace admin-only check with admin-OR-owner. - v1 `:reveal` route: relax `requireAuth("admin")` -> `requireAuth("read")`, update OpenAPI `x-required-access` to `admin | owner` and the description to reflect that ownership is checked at the action layer. - Test: replace "rejects non-admin" with two cases — owner allowed, non-owner rejected. 2. **Toast feedback on reveal/copy failure in key-list.tsx (HIGH)** — `fetchUnmaskedKey` was silently returning null. Mirror the key-row-item pattern: import `toast` from sonner and surface failures via `tCommon("copyFailed")`. 3. **Drop stale revealedKey cache when row identity changes (MAJOR)** — two call sites: - `key-row-item.tsx`: `useEffect` resets `revealedKey` and closes the dialog whenever `keyData.id` or `keyData.maskedKey` changes (parent may reuse the same component instance for a different row). - `key-list.tsx`: prune `revealedKeys` and `visibleKeyIds` whenever the incoming `keys` set changes so removed/replaced ids cannot leak their cached plaintext onto a new row. 4. **Missing assertion in model-redirect-fallback test 3 (MINOR)** — the "no rules" reset path must rewrite both `request.model` and `request.message.model`. Add the missing `request.message.model` check to guard against half-reset regressions. Tests: 16/16 pass (3 files), typecheck clean, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(pr-review): align keys:reveal x-required-access with the contract enum CI on commit 12bc4d2 failed two checks: - openapi-contract: 'admin | owner' is not in the allowed enum (public/read/admin) for x-required-access. - openapi-types-drift: generated types out of date because the route description / annotation changed. Fix: keep the auth tier as 'read' (already what `requireAuth("read")` applies on the route) and document the admin-OR-owner contract in the OpenAPI 'description' field instead. Regenerate openapi-types.gen.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1151) * fix(providers): inject statistics on REST list when include=statistics The provider management page showed today's usage and call counts as 0 after migrating to /api/v1/providers. The handler parsed include=statistics but never loaded the data, so the dashboard's expected { items: [{ id, statistics: {...} }] } payload was missing the statistics field on every item. Mirror the keys endpoint pattern: when include=statistics is requested, call getProviderStatisticsAsync and merge results into each item under a nested statistics object. Also extend ProviderSummarySchema and regenerate openapi-types.gen.ts so external API consumers see the new field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(providers): mirror statistics into deprecated top-level fields Address review feedback: - sanitizeProvider now also overwrites the legacy top-level fields (todayTotalCostUsd / todayCallCount / lastCallTime / lastCallModel) with statistics values when present, so the response cannot show zeroed top-level fields alongside a populated nested statistics object. Existing dashboard code that still reads the flat fields keeps working. - ProviderSummarySchema descriptions for the four legacy fields are marked deprecated and point readers to statistics.* as the canonical source. - Provider read test covers the sanitizeProvider(p, undefined) branch by mocking statistics for only one of the visible providers, and asserts the legacy top-level fields mirror the nested values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…yload cap (#1150) (#1153) * fix(responses-ws): send close frame after terminal event; raise WS payload cap Codex (tungstenite) clients reported "Connection reset without closing handshake" against /v1/responses despite the WS upgrade succeeding. Two issues compounded: 1. server.js never initiated a closing handshake after delivering the terminal SSE event, so the underlying socket eventually died without a close frame. Now sends close(1000) on response.completed and close(1011) on stream errors / HTTP failures. 2. WS_MAX_PAYLOAD_BYTES was set to 1 MiB, which the `ws` library enforces by socket.destroy() (TCP RST, no close frame). Raise to 32 MiB to fit Codex requests carrying conversation context; raise MAX_PENDING_BYTES to 64 MiB to match. Also tighten upstream-adapter cleanup so every exit path closes the upstream socket, and record a synthetic mid-stream error when upstream closes after the first event without a terminal type. Adds unit coverage for both the upstream-adapter mid-stream-close and the server.js close-handshake scenarios, plus a 4 MiB payload regression test. Documents the nginx Upgrade/Connection/timeout configuration required when running CCH behind a reverse proxy. Closes #1150 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(responses-ws): address PR review — drain race, terminal-close sentinel, redundant ternary - server.js: replace the module-level `closeClient` helper with a per-connection `requestClose(code, reason)` callback wired through `forwardToInternalHttp`. It synchronously sets `closed = true` and drops queued frames *before* initiating the WS close handshake, so a client that pipelined a second `response.create` after the first cannot trigger an extra upstream call (and provider charge) during the gap between `ws.close()` and the async `ws.on("close")` event. Adds a regression test that pipelines two frames and asserts exactly one upstream invocation. (codex P1) - upstream-adapter.ts: hoist a `terminalEventSeen` twin of the start()-scoped `sawTerminalEvent` and gate the `ws.on("close")` mid-stream-error branch on it. Without this, our own clean `closeUpstream(1000)` after a terminal event was being misclassified as a mid-stream drop, which could have injected a spurious `error` SSE frame after a successful response. (greptile / github-actions P1) - upstream-adapter.ts: simplify the redundant `typeof reason === "string" ? reason.length : reason.length` ternary in the close-reason text extraction. (gemini / coderabbit / greptile P2) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #1147. - Eliminate destroy/end race in ProxyForwarder gunzip path; replace pipe() with stream.pipeline() and switch gunzip error handler from end() to destroy(err). - Extract nodeStreamToWebStreamSafe to a standalone module with a single-fire settled flag, listener detach on every settlement path, and a leak-safe cancel() (close-listener cleanup for the destroy-emitted error swallow). - Add lifecycle markers and crash-diagnostics: process.on(uncaughtException/unhandledRejection) write Node diagnostic reports, log fatal, and exit(1) (preserves Node 20 fail-fast); synchronous stderr fallback for async logger transports. - Dockerfile: enable --report-on-fatalerror / --report-uncaught-exception / --report-directory=/app/reports. - docker-compose: mount ./data/reports for crash-report persistence. - Tests: 7 unit cases for nodeStreamToWebStreamSafe (normal end, source error, cancel mid-stream, cancel without reason, double-cancel, already-destroyed source, end+close).
* fix(ws): 补齐关闭路径并保留 Codex E2E 探针 * fix(ws): 中止关闭路径中的内部请求 * test: 统一 GPT-5 测试模型到 gpt-5.4 * test: 修正 WS E2E 传输判定与文档模型名 * test: 改善 Codex E2E Windows 缺失提示 * fix(ws): 复用 Responses 上游 WebSocket * fix(server): 默认 start 使用生产模式 * fix(ws): 补齐 Codex 故障兜底与 E2E * test(k8s): 加固 shell helper 跨平台执行 * test(ws): 扩展 Responses WebSocket 边界 E2E * test(ws): 加固 Responses WebSocket E2E 测试支架 * test(ws): 修复 Codex E2E Windows shim 解析 * test(ws): 加固 Codex WS probe 控制帧兼容 * fix(ws): 避免关闭活跃 Responses 上游会话 * docs(model): 更新价格表单 GPT-5.4 占位文案 * fix(ws): 稳定上游会话热重载清理状态 * test(ws): 避免 E2E 端口分配竞态 * test(ws): 清理 E2E probe 冗余变量 * fix(ws): 修复上游会话并发清理与弃用事件 * test(ws): 稳定二进制关闭回归断言 * test(ws): 让关闭竞态断言等待业务处理 * ci: 恢复 Codex 自动化默认模型 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
* fix(keys): register :reveal/:enable/:renew before generic CRUD to fix 400 NaN
Hono's RegExpRouter resolves overlapping route matches in registration order.
The generic `GET /keys/{keyId}` registered via `keysRouter.openapi(...)`
auto-installs a Zod-validating middleware; with the more specific
`/keys/:keyId{[0-9]+:reveal}` registered AFTER it, every reveal request
matched the generic route first, captured `keyId="155:reveal"`, and was
rejected as `expected number, received NaN` before ever reaching `revealKey`.
The handler-level suffix strip in `parseKeyParams` was correct but never ran.
Move the three custom-method routes (`:enable`, `:renew`, `:reveal`) ahead
of the generic GET/PATCH/DELETE block. Add a route-level integration test
for `:reveal` (the gap that let this regression slip through) and an
OpenAPI doc assertion. Regenerated `openapi-types.gen.ts` reflects only the
path-ordering change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(keys): address PR review on schema accuracy
- expiresAt: switch from `z.union([z.string(), z.null()])` to
`z.string().nullable()` so the generated TS type is `string | null`
instead of `string | unknown` (which collapses to `unknown` and
silently disables type checking on the field).
- KeyRevealResponseSchema.key: correct the description — admins AND the
key's owner can reveal; non-owners receive 403. The previous
"Returned only to admin callers" was misleading consumers.
Regenerated openapi-types.gen.ts to reflect both source changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
testProviderUnified was the only test-* api-client helper missing the toActionResult wrapper after the v1 REST refactor (7e2a722). The backend handler unwraps {ok,data} via actionJson, so apiPost returns the raw test-result body. The frontend api-test-button.tsx still expected the ActionResult shape, so response.ok was always undefined, the failure branch fired, and TestResultCard was never rendered regardless of the backend outcome. Wrap with toActionResult to match every other test* function in the same file and restore the expected success/failure UI plus details panel (requestUrl, rawResponse, validationDetails). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ings UI (#1162) - DEFAULT_FAKE_STREAMING_WHITELIST changed to [] - Removed Fake Streaming Whitelist editor from settings form - Removed OpenAI Responses WebSocket toggle from settings form - Keep state vars and save payload intact - Update related tests Co-authored-by: ding113 <ding113@users.noreply.github.com>
* fix(proxy): 1h TTL override 覆盖顶层 system 字段并补齐单测
`applyCacheTtlOverrideToMessage` 之前只遍历 `messages[].content[]`,完全没有处理
Anthropic 请求体顶层 `system` 字段。Claude Code 等客户端会把系统提示作为内容块数组
放在 `system` 上并标注 `cache_control: { type: "ephemeral" }`。当 key/provider 配
置为 1h TTL 时,该字段上的断点没有被显式写为 `ttl: "1h"`,在 upstream 启用
`extended-cache-ttl-2025-04-11` beta 后会与 messages 侧 1h 标记产生不一致并被
Anthropic 报错。
变更:
- 抽出 `applyTtlToContentBlocks` 内部小工具,让顶层 `system` 与
`messages[].content[]` 复用同一段覆写逻辑;`system` 为字符串时跳过。
- 维持"只覆写已有 ephemeral 断点"的语义,不自动注入新断点。
- 把 anthropic-beta header 1h 兜底逻辑抽成纯函数 `mergeAnthropicCacheTtlBetaFlag`,
便于单测固化(行为保持不变,仍在 betaFlags 仅含新 flag 时补
\`prompt-caching-2024-07-31\`)。
- 新增 \`tests/unit/proxy/cache-ttl-override.test.ts\`,共 14 条用例,覆盖:
system/messages 双向覆写、string-system 跳过、5m 反向覆写、非 ephemeral 不动、
beta header 合并/去重/空头兜底等。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(proxy): 仅在命中时回写 system/content,补齐 beta size===1 边界用例
回应 PR #1161 review 中的两条共识反馈:
1) 多位 reviewer(gemini-code-assist / coderabbitai / greptile-apps)同时指出:
`applyTtlToContentBlocks` 内部使用 `.map()`,即便没有命中也会返回新数组。
原实现无脑把 `result.blocks` 写回 `message.system` / `msgObj.content`,
破坏了无命中场景下的引用稳定性,可能误触发上游基于引用的脏检测。改为仅在
`result.applied` 为 true 时才回写;同步把对应单测从 `toEqual` 升级为 `toBe`,
显式断言无命中路径上引用未变。
2) 新增 `mergeAnthropicCacheTtlBetaFlag` 单测覆盖 greptile 指出的边界:客户端
只发送 `extended-cache-ttl-2025-04-11` 时,Set 在 `add()` 之后大小仍为 1,
会触发 `prompt-caching-2024-07-31` 兜底。该行为承自原内联实现,本次仅以测
试固化语义,不变更代码行为。
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ize (#1163) * fix(server): load standalone next config to honor proxyClientMaxBodySize The custom server bypasses Next's generated standalone server.js, which means `loadConfig()` looks for a `next.config.{js,ts,mjs}` next to the entrypoint, finds none in the standalone output, and silently falls back to defaults. As a result `experimental.proxyClientMaxBodySize: "100mb"` (and `serverActions.bodySizeLimit`) never reach the runtime, so any proxied request body matched by `src/proxy.ts` is clamped to the 10MB DEFAULT_BODY_CLONE_SIZE_LIMIT in next/dist/server/body-streams.js (`Request body exceeded 10MB for /v1/messages...` in pod logs). Mirror what Next's own standalone template does (next/dist/build/utils.ts → writeStandaloneDirectory): load `.next/required-server-files.json` and surface the resolved config via the `__NEXT_PRIVATE_STANDALONE_CONFIG` env var that `loadConfig()` already checks before falling back. Also exclude `/v1` (covers `/v1beta`) from the proxy matcher: the handler already no-ops for these paths, but matching them forces Next to clone the request body into a `PassThrough` for nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(server,proxy): regression tests for standalone config + v1 matcher Extract the two fix points into testable units and lock them down with unit tests so future refactors can't silently re-truncate request bodies. server.js → server-lib/standalone-config.js Pull the `__NEXT_PRIVATE_STANDALONE_CONFIG` injection into a pure helper. Tested cases: happy path (env var populated from manifest), preset env var (operator override wins), missing manifest (dev mode warning), manifest missing the `config` field, misuse guards. scripts/copy-custom-server-to-standalone.cjs now also copies server-lib/ since Next's traced files don't follow our custom server. src/proxy.ts → src/proxy.matcher.ts Pull the matcher pattern into its own module so the regression test doesn't have to mock next-intl, the auth module, the logger, etc. Tested cases: /v1/* and /v1beta/* must NOT match (would otherwise trip cloneBodyStream's 10MB clamp), /api/*, /_next/*, /favicon.ico still skipped, locale-prefixed app routes still matched. Verified locally: regenerating the bug (removing `v1` from the matcher) fails exactly the 6 new v1 assertions; restoring it makes all 23 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: inline proxy matcher pattern for Next.js static analysis Next.js requires matcher patterns in config exports to be statically analyzable strings. The previous implementation imported the pattern from another module, which Next.js couldn't parse at compile-time. Changes: - Inlined the matcher pattern as a static string literal in src/proxy.ts - Added detailed comment explaining the pattern's purpose and exclusions - The original pattern remains in src/proxy.matcher.ts for test use Fixes Docker Build Test failure in CI run: https://github.com/ding113/claude-code-hub/actions/runs/25564982177 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(proxy): tighten v1 matcher boundary, drop dead import, guard drift Address PR review feedback: - src/proxy.matcher.ts + src/proxy.ts: anchor the `v1` / `v1beta` exclusion to a path-segment boundary (`(?:/|$)`) so look-alike paths such as `/v10/foo`, `/v1foo`, `/v1beta-extra` keep going through middleware as intended (CodeRabbit P1). - src/proxy.ts: drop the now-unused `proxyMatcherPattern` import that Greptile/CodeRabbit flagged after the inline fix in ea3a0ca. - tests/unit/proxy-matcher.test.ts: extend coverage with the new bare segments (`/v1`, `/v1beta`) and the look-alike paths that must still match; add a drift-guard test that fails if the inlined literal in src/proxy.ts diverges from `proxyMatcherPattern`. - scripts/copy-custom-server-to-standalone.cjs: fail fast when `server-lib/` is missing instead of producing a standalone artifact that crashes on first request (CodeRabbit P1). The greptile P1 about `JSON.stringify(undefined)` poisoning the env var was already fixed in 8c753ce when the helper was extracted — the `!manifest.config` guard short-circuits before the assignment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
* docs: update changelog for v0.7.4 [skip ci] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * update AD [skip ci] * Update deployment.yaml 取消 FETCH_HEADERS_TIMTOUT 和 FETCH_BODY_TIMEOUT 超时配置。 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: ding113 <h.ding.262@gmail.com> Co-authored-by: Ding <44717411+ding113@users.noreply.github.com>
Co-authored-by: mci77777 <mci77777@users.noreply.github.com>
Fixed: - biome.json: bumped schema version 2.4.13 -> 2.4.15 to match CLI - forwarder.ts: auto-fixed import ordering via biome --write - key-row-item.tsx: added biome-ignore for intentional useEffect deps (resets state on row change) - response-handler-bill-non-success.test.ts: removed 7 unused suppression comments CI Run: https://github.com/ding113/claude-code-hub/actions/runs/25781662008 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nded timeouts (#1178) * fix(lifecycle): orchestrate graceful shutdown with server.close + bounded timeouts K8s/Docker rolling deploys were losing in-flight requests and being SIGKILL'd because the existing SIGTERM handler had three gaps: 1. server.close() was never called -> HTTP listener stayed open accepting new connections until SIGKILL, and in-flight requests were torn down abruptly. 2. No explicit process.exit() -> the process relied on event-loop exhaustion, but residual sockets / undici dispatchers kept it alive past grace period. 3. Langfuse forceFlush() had no upper bound -> a 3-minute hang blocked every other cleanup step (Redis, message buffer) and exhausted the K8s 30s grace period, so SIGKILL stole the remaining work. Also: async-task-manager registered its own SIGTERM listener that immediately abort()-ed all in-flight tasks, including SSE streams. This made any drain attempt meaningless because streaming responses were cancelled before server.close() finished. Move that cancellation into the cleanup stage so drain actually drains. New flow (orchestrated in server.js, which owns the http.Server handle): SIGTERM -> markShuttingDown() so /api/health/ready returns 503 -> Service drains -> server.close() + wss.close() (bounded by SHUTDOWN_DRAIN_MS, default 15s) -> runApplicationCleanup() (bounded by SHUTDOWN_CLEANUP_MS, default 10s) schedulers, async tasks, message buffer, Langfuse, Redis (each timeouted) -> process.exit(0) -> hard-exit watchdog at SHUTDOWN_HARD_EXIT_MS (default 25s) Bridging server.js (CommonJS entry) and TypeScript modules uses globalThis, following the existing pattern (see __cchCleanupResponsesWsSession). Tunable envs (default values picked to fit a K8s 30s grace period with ~5s buffer): SHUTDOWN_DRAIN_MS=15000 SHUTDOWN_CLEANUP_MS=10000 SHUTDOWN_HARD_EXIT_MS=25000 LANGFUSE_SHUTDOWN_TIMEOUT_MS=3000 Tests added: - tests/unit/lib/shutdown.test.ts (cleanup pipeline + timeouts) - tests/unit/server-shutdown.test.ts (orchestrator end-to-end) - tests/unit/langfuse/shutdown-timeout.test.ts (bounded forceFlush) - tests/unit/api/health-ready-shutdown.test.ts (readiness 503) - tests/unit/lib/async-task-manager-edge-runtime.test.ts (no SIGTERM listener) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(lifecycle): widen hard-exit gap, parse SHUTDOWN_*_MS strictly, clear drain timer on natural close Addressing review feedback on PR #1178: - Watchdog raced normal exit: with defaults drainMs (15s) + cleanupMs (10s) == hardExitMs (25s), both timers expire at the same wall-clock instant and the earlier-registered watchdog fired first, falsely logging the shutdown as failed. Raise default SHUTDOWN_HARD_EXIT_MS to 28000 for a 3s gap. - `Number(env) || default` silently coerced an intentional 0 to the default. Adopt the positive-integer parser already used in src/lib/langfuse/index.ts so operator overrides behave consistently. - Drain timer was not cleared when server.close() resolved naturally, so a misleading "shutdown_drain_timeout" warning could fire during the cleanup phase. Cancel it on natural close. - Test cleanup: drop unused checkDatabase/checkRedis/checkProxy mocks in the readiness shutdown test; replace `function () { ... }` mock constructors with class + Object.assign(this, fake) to satisfy biome useArrowFunction while remaining new-callable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ding113 <ding113@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🧪 测试结果
总体结果: ✅ 所有测试通过 |
Four issues surfaced by automated review on #1175 1. auth-middleware: classify cookie-sourced legacy tokens as `session` credential, not `user-api-key`. A cookie that survived validateAuthToken arrived through the login flow, so it should be treated as a browser session regardless of token format. The previous behavior locked admins out of the `/api/v1` management surface in legacy/dual SESSION_TOKEN_MODE deployments whenever ENABLE_API_KEY_ADMIN_ACCESS was disabled. Header-sourced raw keys are still classified as user-api-key. Thread: discussion_r3231984448 2. forwarder.applyProviderCustomHeaders: also strip transport-layer reserved names (host, content-length, connection, transfer-encoding, RESERVED_INTERNAL_HEADERS) when merging provider customHeaders into the outbound overrides map. HeaderProcessor.process applies overrides after the blacklist filter, so without this guard a dirty provider record could rewrite the upstream Host (sending credentials to an attacker-chosen target) or inject hop-by-hop headers that break request framing. Threads: discussion_r3232010260, discussion_r3232104658 3. forwarder.mergeAnthropicCacheTtlBetaFlag: always add the prompt-caching-2024-07-31 dependency, not only when the resulting set has exactly one element. If the client sent an unrelated beta flag (e.g. messages-2023-12-15), the previous size===1 guard skipped the backfill and the upstream rejected the request for a missing dependency. Thread: discussion_r3232008615 4. api-client/v1/fetcher.getCsrfToken: propagate transport/network failures from the CSRF endpoint instead of silently mapping them to a null token. Swallowing the error caused subsequent mutations to omit the X-CCH-CSRF header, return auth.csrf_invalid, and surface in the UI as PERMISSION_DENIED — masking the real cause as an auth problem. Thread: discussion_r3232281037 Tests: 5938 passed / 13 skipped / 0 failed. Updated existing assertions that codified the legacy lockout behavior (#1 in auth-middleware-api-key-admin.test and keys.test) and added regression coverage for transport-header stripping, unconditional prompt-caching backfill, and CSRF network-error propagation. Note: `bun run build` fails on a pre-existing tsgo error at src/lib/langfuse/trace-proxy-request.ts:411 (setTraceIO does not exist on LangfuseSpan — the SDK's API is updateTrace). That regression landed in 454254b and is unrelated to these review fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Langfuse tracing SDK exposes `updateTrace`, not `setTraceIO`. Commit 454254b renamed the test references but the corresponding rename in trace-proxy-request.ts called the non-existent `setTraceIO` method, which broke `bun run build` (tsgo error) and would have thrown at runtime — silently swallowed by the surrounding try/catch so trace-level input/output went unset. Rename src back to `updateTrace`, and rename the test mocks (`mockSetTraceIO` → `mockUpdateTrace`, mock key `setTraceIO` → `updateTrace`) so they continue to exercise the real SDK contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 6e1c978.
Switch the default install registry from npmjs.org to the Yarn registry (Cloudflare CDN). Picked after `bun install` proved unreliably slow from the default registry; npmmirror (Taobao) was tested first but slower from US-region runtimes than yarnpkg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
antd@6.4.0 (released 2026-05-14) bumped `@rc-component/notification` from `~1.2.0` to `~2.0.6`. notification@2.x's NotificationList imports `@rc-component/util/es/ref` without the `.js` suffix, and `@rc-component/util` ships ESM syntax under `es/` without `"type": "module"`. Node 20+ strict ESM resolution rejects the chain with `ERR_MODULE_NOT_FOUND`, which broke 22 UI test files locally and on CI immediately after antd@6.4.0 published. vitest's `server.deps.inline`, `fallbackCJS`, and `pool: vmThreads`/`forks` all failed to bypass this for transitive node_modules → node_modules imports (vite never enters that chain when both ends are externalized). Pin to `~6.3` so installs resolve to antd@6.3.7 + notification@1.2.0, which keeps Node ESM resolution happy. Revisit when antd ships a 6.4.x patch that addresses the upstream ESM packaging or when we adopt a test runner that transpiles node_modules (next/jest, swc-loader, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9ee3cda73
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // validateAuthToken came through the login flow, so it is a browser session — not a | ||
| // programmatic API key call. The ENABLE_API_KEY_ADMIN_ACCESS gate is meant to fence off | ||
| // header-based key usage, not to lock admins out of the management UI during migration. | ||
| if (source === "cookie") return "session"; |
There was a problem hiding this comment.
Classify legacy cookie API keys as user-api-key
Returning "session" for every non-opaque cookie token lets callers bypass ENABLE_API_KEY_ADMIN_ACCESS=false by moving an admin API key from Authorization into Cookie (auth-token=<admin-key>). In SESSION_TOKEN_MODE=legacy/dual, validateAuthToken accepts raw keys, and admin GET routes (for example key reveal endpoints) do not require CSRF, so this re-enables sensitive admin access that the flag is supposed to block for user-issued keys.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
) Walks every operation in /api/v1/openapi.json and verifies the documented x-required-access tier matches actual middleware enforcement, so any new endpoint that drifts from the contract surfaces immediately. - openapi-auth-sweep.test.ts: 175 ops × no-auth probe. public must not 401; read/admin must 401 with application/problem+json and auth.missing. - openapi-tier-sweep.test.ts: with validateAuthToken mocked to a non-admin session, 138 admin ops must 403 with auth.forbidden, and 35 read ops must accept the credential (no 401 / no 403 at the boundary). Co-authored-by: ding113 <ding113@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 724f070dca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return sessionData?.credentialType ?? "user-api-key"; | ||
| } catch (error) { | ||
| logger.warn("[V1Auth] Failed to classify opaque session credential", { | ||
| error: error instanceof Error ? error.message : String(error), | ||
| }); | ||
| return "user-api-key"; |
There was a problem hiding this comment.
Avoid defaulting opaque sessions to user API keys on read errors
resolveAuth runs validateAuthToken() and classifyCredential() in parallel, and the opaque classifier does a second Redis read that falls back to "user-api-key" on any read error. When Redis is flaky, one read can succeed (auth passes) while the classifier read fails, which makes admin requests hit auth.api_key_admin_disabled whenever ENABLE_API_KEY_ADMIN_ACCESS=false even for valid opaque admin sessions. This creates intermittent admin lockouts under transient Redis failures; the classifier should not downgrade to user-api-key on infrastructure errors after token validation succeeds.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
…Redis leak (#1187) * fix(auth): return disabled/expired key errors instead of 429 lockout Disabled keys hit the proxy pre-auth rate limiter on every request, incremented the failure counter, and locked the IP/key out with HTTP 429 "Too many authentication failures" after 20 attempts. The legitimate "key disabled" 401 was masked forever once the lockout tripped. Introduce `resolveApiKeyAuthOutcome` returning a discriminated union (`not_found` / `key_disabled` / `key_expired` / ok), map each reason to its own 401 error, and only feed `credentials` failures to the rate-limiter — admin-disabled or expired keys/users now bypass it entirely. `validateApiKeyAndGetUser` stays as a backwards-compatible wrapper. The /v1/models chain is updated the same way. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(availability): inline finalized predicate to restore index use The availability dashboard query was slow because the WHERE clause called the PL/pgSQL function `fn_is_message_request_finalized(...)`. PostgreSQL never inlines PL/pgSQL functions, so the predicate became opaque to the planner and the partial index `idx_message_request_provider_created_at_finalized_active` (predicate `status_code IS NOT NULL AND deleted_at IS NULL`) was no longer usable — the dashboard fell back to a sequential scan that re-evaluated the function per row. Inline a semantically-equivalent SQL expression so `status_code IS NOT NULL` becomes the dominant SARGable branch. The SQL function definition is unchanged (still called by the upsert trigger on the write path); a header comment marks the keep-in-sync requirement. Tests assert on the inlined form to guard against the slow function-call form returning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auth,availability): address PR #1187 review feedback - key.ts: classify duplicate-row matches deterministically. The relaxed WHERE clause can return multiple non-deleted rows for one key string (no unique constraint on keys.key) and result[0] was non-deterministic; prefer an active row, fall back to "any enabled = key_expired", else key_disabled. (chatgpt-codex P1) - availability-service.ts: wrap the provider_chain jsonb branch in a CASE expression so jsonb_array_length cannot run on a non-array row. PostgreSQL does not guarantee AND short-circuit, so a single non-array historical row would otherwise crash the dashboard query. Extract the finalized provider_chain reason list into FINALIZED_PROVIDER_CHAIN_REASONS and document the JSONB `?` operator's driver assumption. (coderabbit P2, gemini, greptile) - auth-guard.ts / available-models.ts: convert the outcome.reason branch to an exhaustive switch with assertNever, and introduce a buildAuthFailure factory so every failure path is forced to tag its failureKind at compile time. Adding a new ApiKeyAuthFailureReason now produces a TypeScript error until the new branch is handled. (greptile P2 ×2) - Tests cover the duplicate-row cases (ok / key_expired / key_disabled across mixed-state rows) and assert the CASE guard appears in the generated SQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(public-status): expire config snapshots so Redis stops accumulating publishPublicStatusConfigSnapshot, publishInternalPublicStatusConfigSnapshot, and publishCurrentPublicStatusConfigPointers all wrote keys with bare redis.set(...) and no TTL. Every config-version mint (provider/group/system settings change) created a new versioned snapshot key that never expired — on a busy operator the public-status:v1:config:* and config-internal:* key namespaces grow without bound. Neighbouring projection writers in rebuild-worker.ts already use a 30-day TTL via setWithTtl; only the config publishing path was missed when that pattern was introduced (#1056). Add PUBLIC_STATUS_CONFIG_TTL_SECONDS (30 days, matching GENERATION_PROJECTION_TTL_SECONDS in rebuild-worker.ts), widen the local RedisWriter type to the (key, value, "EX", seconds) ioredis overload, and apply the TTL to all four call sites — including the Lua script used by the pointer publisher so SET ... EX is atomic with the version compare. Each successful publish refreshes the TTL on the live pointer keys, so as long as configs are published at least every 30 days the active pointer never expires while stale versioned snapshots get cleaned up naturally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(auth,models): i18n new auth error messages + cover models auth outcomes CodeRabbit and CI flagged two remaining gaps in the previous fix: - The new 401 messages for invalid_api_key / key_disabled / key_expired were hardcoded Chinese strings, ignoring the project's i18n guideline (5 locales: zh-CN, zh-TW, en, ja, ru). The pre-existing strings in the same files (e.g. user_disabled, user_expired) were already hardcoded before this PR and remain so — see PR scope note — but the new branches should follow the established pattern. - /v1/models had no unit coverage for the new key_disabled / key_expired branches in handleAvailableModels.authenticateRequest. A regression back to a generic invalid_api_key would have gone undetected. Add PROXY_INVALID_API_KEY, PROXY_API_KEY_DISABLED, PROXY_API_KEY_EXPIRED codes to AUTH_ERRORS plus translations across all 5 locales. Wire auth-guard.ts and available-models.ts through getErrorMessageServer + next-intl/server's getLocale — same pattern the neighbouring rate-limit-guard.ts already uses. Add tests/unit/models/available-models-auth-outcome.test.ts covering all five 401 branches (key_disabled, key_expired, not_found, user_disabled, user_expired). Existing auth-guard tests mock next-intl/server + getErrorMessageServer so the unit tests can run outside a Next.js request context, and message assertions pin on the ERROR_CODES key (the localized text lives in messages/<locale>/errors.json). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(public-status): drop TTL from current-pointer Redis keys greptile flagged a P1: applying the same 30-day TTL to the three "current pointer" keys would dark out any deployment that goes longer than the TTL without publishing a new config. Pointer keys don't accumulate — only one entry per pointer name exists, overwritten atomically on every publish — so they MUST persist until explicitly overwritten. Only the versioned snapshot keys (`public-status:v1:config:<version>` and `:config-internal:<version>`) keep the 30-day TTL — those are the ones that accumulate as new config versions are minted. The pointer publisher's Lua script and JS fallback now write bare `SET` without `EX`. Tests assert the split: the versioned write carries `EX <ttl>` and the pointer write is a bare two-arg `set(key, value)` with no TTL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ding113 <ding113@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🧪 测试结果
总体结果: ✅ 所有测试通过 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0c9eaf57b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @@ -0,0 +1 @@ | |||
| ALTER TABLE "providers" ADD COLUMN "custom_headers" jsonb; No newline at end of file | |||
There was a problem hiding this comment.
Make additive migrations idempotent
This migration adds a new column without IF NOT EXISTS, so bun run db:migrate will hard-fail with duplicate column in environments where the column was pre-created (for example, hotfix backports, manual schema sync, or partial rollout drift). In the same release, later migrations (0101/0102) already use IF NOT EXISTS, so leaving 0099/0100 non-idempotent creates an avoidable startup/deploy risk when AUTO_MIGRATE=true.
Useful? React with 👍 / 👎.
…1189) * fix(availability): restore status_code IS NOT NULL terminal filter 可用性监控的 provider + 时间范围聚合改用 status_code IS NOT NULL 收敛 终态边界,与部分索引 idx_message_request_provider_created_at_finalized_active (deleted_at IS NULL AND status_code IS NOT NULL) 对齐,让热路径查询 能稳定命中索引;同时避免把仅写入 providerChain / errorMessage 片段但 statusCode 仍为 NULL 的"请求中"记录算入聚合(之前的内联终态谓词复刻 fn_is_message_request_finalized 语义后,会让分类函数把它们误算成 failure)。 终态记录的成功/失败/排除分类继续由 fn_compute_message_request_success_rate_outcome(...) 完成。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(availability): factor finalized-boundary asserts, cover error_message 把"终态边界只看 status_code IS NOT NULL"的 5 处断言抽到 expectStatusCodeOnlyFinalizedBoundary helper,并补上对 "error_message" is not null 的负向断言:之前只排除 blocked_by / provider_chain 路径,如果未来 finalized 回归成 "status_code" is not null or "error_message" is not null, 仅写入 errorMessage 的"请求中"记录会重新被纳入可用性统计而不被发现。 Addresses CodeRabbit nitpick on #1189. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ding113 <ding113@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(install): clarify branch selection prompt in deploy scripts
Users were confused by "Enter your choice [1]:" — unclear whether to
type a digit, use arrow keys, or type the branch name. Now the prompt
spells out the input ("Type 1 or 2 (or 'main'/'dev') and press Enter
[default: 1]:") and accepts the branch names as case-insensitive
aliases for the numeric choices.
Applies to both deploy.sh and deploy.ps1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(install): address PR1188 review nits and bugs
- deploy.sh: add -r to read (shell hygiene; ShellCheck) and apply the
default after whitespace normalization so " " or "\t" also fall back
to "1" (Greptile P1 — bare ${choice:-1} would not trigger on
whitespace-only input, which then hit the invalid branch).
- deploy.ps1: replace `switch` + trailing `break` with `if`/`return`.
PowerShell `break` inside a switch exits the switch, not the
enclosing `while`, so the previous structure exited the loop after a
single iteration regardless of validity — invalid input never
re-prompted (Gemini, high).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: ding113 <ding113@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20a05b4852
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const pathname = new URL(request.url).pathname; | ||
| const env = getEnvConfig(); | ||
| const docsHidden = env.LEGACY_ACTIONS_DOCS_MODE === "hidden" && isLegacyDocsPath(pathname); | ||
| const actionExecutionDisabled = !isLegacyActionsApiEnabled() && !isLegacyDocsPath(pathname); |
There was a problem hiding this comment.
Keep health route alive when disabling legacy actions API
actionExecutionDisabled treats every non-doc /api/actions/* path as gone when ENABLE_LEGACY_ACTIONS_API=false, so /api/actions/health starts returning 410 as well. That breaks existing probe wiring that still targets this endpoint (deploy/k8s/app/deployment.yaml readiness/liveness/startup probes, docker-compose.yaml healthcheck, and .github/workflows/test.yml wait loop), turning a feature-flag toggle into immediate unhealthy/restart behavior in those environments. Please exempt the health endpoint (or migrate all probes first) before gating the legacy surface.
Useful? React with 👍 / 👎.
* 修复管理员登录态对 Redis 的强依赖 * 回应审查并强化管理员签名会话 * 优化管理员签名会话 TTL 收紧逻辑 * 完善签名管理员会话失效短路 * 补充认证会话 TTL 配置说明 * 标记旧会话 TTL 常量弃用 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
* fix: 隔离 AgentPool 活跃 dispatcher 生命周期 * fix: 完善 AgentPool 退役边界 * fix: 收敛 AgentPool 退役测试细节 * fix: 限制 AgentPool 存活 dispatcher 容量 * chore: 重跑 CI * fix: 收敛 AgentPool shutdown pending 状态 * fix: 优化 AgentPool 容量回收观测 * test: 统一 AgentPool shutdown 竞态桩写法 * fix: 避免容量拒绝污染 AgentPool miss 指标 * test: 兜底释放 AgentPool pending creation 桩 * fix: 允许 AgentPool 同 key 退役替换 * fix: 避免容量拒绝污染 AgentPool 请求指标 * fix: 复用替换 credit 收敛 AgentPool 回收 * fix: 收敛 AgentPool 端点驱逐竞态 * fix: 补齐 AgentPool 指标语义与 credit 索引 * fix: 保留 markUnhealthy 旧签名安全兼容 * fix: 收敛 AgentPool pending 与失败指标 --------- Co-authored-by: tesgth032 <tesgth032@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e3c9e7db2d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| : obj.userId === -1 | ||
| ? "admin-token" | ||
| : "session"; |
There was a problem hiding this comment.
Classify legacy opaque sessions as restricted credentials
When a stored session payload lacks credentialType, this parser coerces it to "session" for every non-userId=-1 record. Pre-upgrade opaque sessions do not have this field, so sessions originally minted from API keys are reclassified as browser sessions; in resolveAuth, that bypasses the ENABLE_API_KEY_ADMIN_ACCESS=false block (which only triggers for "user-api-key") and can reopen admin /api/v1 access for old API-key-backed cookies until they expire. Missing/unknown credential provenance should default to the restricted path (or force re-login) instead of being treated as session.
Useful? React with 👍 / 👎.
🧪 测试结果
总体结果: ✅ 所有测试通过 |
…-only scans (#1194) * fix(db): cover endpoint in usage_ledger cost indexes to restore index-only scans #1091 added a non-billing-endpoint filter to LEDGER_BILLING_CONDITION, so every SUM(cost_usd) query now also references usage_ledger.endpoint. That column was not part of the three cost covering indexes (idx_usage_ledger_user_cost_cover, idx_usage_ledger_provider_cost_cover, idx_usage_ledger_key_cost), so the rate-limit and Quotas-page hot-path queries lost their Index Only Scan and degraded to a Bitmap Heap Scan with one heap fetch per matching row. Reproduced on Postgres 18 (1,000,000 rows, 200 users). The per-user SUM(cost_usd) query (sumUserTotalCost): - pre-#1091 condition: Index Only Scan, 46 shared buffers, Heap Fetches: 0, ~1ms - post-#1091 condition: Bitmap Heap Scan, 5027 shared buffers, 5000 heap blocks, ~15ms Add endpoint as a trailing column to the three covering indexes so the endpoint filter can be evaluated from the index. After applying the migration the post-#1091 query is back to Index Only Scan / Heap Fetches: 0 / ~40 buffers. Drizzle has no INCLUDE support, so endpoint is added as a trailing key column, matching the existing convention on idx_usage_ledger_key_created_at_desc_cover. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(db): make the 0103 cost-index migration idempotent and operator-safe Addresses PR review feedback on the index rebuild. A plain CREATE INDEX on usage_ledger holds a SHARE lock that blocks writes for the whole rebuild, and Drizzle's migrator is transactional so CREATE INDEX CONCURRENTLY cannot be inlined. Wrap each rebuild in a guarded DO block that skips when the index definition already contains `endpoint`. Operators on a large / busy database can now rebuild the three indexes ahead of time with CREATE INDEX CONCURRENTLY and the migration becomes a no-op -- the escape hatch documented on migrations 0082 / 0088, extended to the drop+recreate case. Verified on Postgres 18: 3-column to 4-column on the first apply, clean no-op on a second apply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ding113 <ding113@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🧪 测试结果
总体结果: ✅ 所有测试通过 |
* feat(providers): 放宽供应商密钥长度上限至 1 MiB 供应商 API 密钥此前被 Zod 校验限制为最多 1024 字符,超长 token、 JSON 凭证、base64 证书等场景会被拒绝。数据库 providers.key 为无 长度上限的 varchar 且未建索引,存储层本无此约束。 - 新增常量 PROVIDER_KEY_MAX_LENGTH(1 MiB),统一密钥长度上限 - 更新管理端与 v1 REST 的 4 处 Provider 创建/更新 Zod schema - 补充 CreateProvider/UpdateProvider 与 v1 schema 的密钥长度单测 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(providers): 统一 UpdateProviderSchema.key 的校验错误文案 与 CreateProviderSchema.key 保持一致,为 min/max 校验补充中文错误文案。 回应 PR #1195 的 code review 反馈(gemini-code-assist)。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: ding113 <ding113@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🧪 测试结果
总体结果: ✅ 所有测试通过 |
Summary
Release v0.8.1 — merges
devintomainwith 18+ bundled PRs spanning proxy WebSocket support, v1 REST management API, provider custom headers, fake streaming, billing fixes, and stability hardening.Included Changes
Proxy & Core Features
stream_options.include_usage=trueon forwarded streaming chat completions so token accounting is never empty.systemcontent blocks, not justmessages[].content[].pipe()withstream.pipeline(), adds single-fire settled flags, and registers Node diagnostic report hooks to eliminate SIGSEGV crashes under load.cached_tokensare now correctly subtracted frominput_tokensforopenai-compatibleproviders (previously onlycodexwas handled).Management API (v1 REST)
/api/v1/*REST surface — Full CRUD for providers, keys, users, webhooks, and notification settings with OpenAPI/Scalar docs, RFC 9457 problem responses, and TanStack Query client wrappers.GET /api/v1/providers?include=statisticsnow returns nestedstatisticsobjects and mirrors them into legacy top-level fields.:revealendpoint — Admin or owner can fetch an unmasked API key on demand; unmasked keys are no longer pre-loaded in list payloads.Provider Features
cf-aig-authorization); merged into outbound proxy requests.Billing & Settings
billNonSuccessfulRequests(default OFF); when ON, non-2xx responses with positive upstream usage are billed normally.Infrastructure & Deployment
.next/required-server-files.jsonsoproxyClientMaxBodySize: 100mbis honored.FETCH_HEADERS_TIMEOUT/FETCH_BODY_TIMEOUTenv vars (were interpreted as 600ms instead of 600s).--report-on-fatalerror,--report-uncaught-exception; docker-compose mounts./data/reports.Database Migrations
nervous_squadron_sinistercustom_headersJSONB onprovidersequal_expediterfake_streaming_whitelistJSONB onsystem_settingsworthless_gauntletenable_openai_responses_websocketonsystem_settingsuseful_lionheartbill_non_successful_requestsboolean onsystem_settingsRun
bun run db:migrateafter deploy (or setAUTO_MIGRATE=true).Fixed Issues
dev分支 + CPA 上游:/v1/responsesWebSocket 连接成功后 Codex 仍报 Connection reset without closing handshake #1150 — WS connection reset without closing handshakeBreaking Changes
None. All changes are additive or backward-compatible fixes. Legacy
/api/actions/*surface remains functional (deprecated behind feature flags).Testing
ADD COLUMN IF NOT EXISTS)Post-Deploy Checklist
bun run db:migrate(or verifyAUTO_MIGRATE=true)[Lifecycle] Process startedappears in container logs./data/reportsis writable for Node diagnostic reportsGET /api/v1/keys/{id}:revealreturns 200 (not 400)This release bundles PRs #1101, #1131, #1132, #1133, #1140, #1142, #1149, #1151, #1152, #1153, #1154, #1155, #1158, #1161, #1162, #1163, #1166, #1174