feat(realtime): /v1/realtime live transcription endpoint (gpt-realtime-whisper) by timcsy · Pull Request #86 · timcsy/ai-api

timcsy · 2026-06-12T14:52:09Z

階段 32（043）：即時字幕端點 `/v1/realtime`

OpenAI 相容的 realtime transcription WebSocket 端點——客戶端串流音訊、即時收文字，用量按時間計、歸戶分配、連線中可撤回。

做了什麼

WS relay（proxy/realtime.py）：薄雙向轉送（借 litellm RealTimeStreaming 結構，不經 litellm realtime——它是 Proxy form、音訊繞過 gateway，會失去分配歸戶 + 即時撤回）。含旁路撤回 watcher（每 N 秒 re-check 分配，非 active 即主動斷線）。
計量（research R2）：自算 client input_audio_buffer.append 的 PCM bytes → 時長；任何斷線路徑（正常／異常／撤回）都落一筆帳，不漏記。
計費單位對齊 litellm：gpt-realtime-whisper（PR Add gpt-realtime-whisper Realtime transcription support (OpenAI + Azure) BerriAI/litellm#29775）按秒計（input_cost_per_second）；落帳依 PriceList 的單位（second／minute），沿用增量② 的 call_records.{quantity,unit}——零 migration。
realtime 是能力軸、非 mode：litellm 把 gpt-realtime-whisper 標 mode=audio_transcription + supported_endpoints 含 /v1/realtime。model_kind 據此（或 admin realtime 能力標記）判 realtime——與 responses_support 同形狀。
可從 UI 測試：模型詳情「測試模型」對 realtime 模型跑 WS 煙霧（握手 + 微量音訊 + 等首個事件），等於部署後的協定真打。
前端：realtime 類型標籤、/v1/realtime WS 連線範例、價目單位加「分鐘」、能力欄 realtime 標記提示。
nginx：/v1/realtime 加 WS upgrade（HTTP/1.1 Upgrade + 不緩衝 + 長 timeout）。

不洩漏（FR-006）

上游 key/endpoint 永不下行；有測試證。

測試

契約 1–7（無效/撤回金鑰、非 realtime 模型、delta 轉送、落帳、異常中止、連線中撤回、no-leak）+ 純計量單元 + 能力偵測單元 + WS 煙霧單元/整合。
全套件 742 passed（零回歸）；ruff + mypy 乾淨；前端 tsc + 164 vitest + build 綠。
SC-006：既有 contract 測試 git diff 為空。

部署

無 migration、無新套件（websockets 已宣告為直接依賴）。migrationJob.enabled=false。

真打驗證（合併部署後）

CI 全用 mock provider WS。真連 Azure realtime WS（協定接通、計量對帳、upstream._build_realtime_url 的確切 Azure URL——可能需 intent=transcription）以部署後「測試模型」按鈕 / quickstart 驗證。

🤖 Generated with Claude Code

…minute unit, model_kind) Phase 32 realtime transcription endpoint — full speckit spec package (spec, plan, research, data-model, WS event contract, quickstart, tasks) plus the self-contained foundational layer: - websockets declared as a direct dependency (was transitive via uvicorn) — needed to relay /v1/realtime to the provider's realtime WS (Constitution Deviation noted). - model_kind: add `realtime` kind (mode→kind) so the catalog labels realtime models honestly; full suite re-run green (715 passed) per the model_kind lesson. - minute billing unit verified through the existing unit-billing path (calculate_unit_cost is unit-agnostic; `minute` is a new string value, no schema change) + test. Foundational logic (T001/T003/T006) done & green. The WS core — upstream WS client, mock provider WS server, bidirectional relay (US1), per-minute metering (US2), in-flight revocation (US3) — is the next focused block; T027 real Azure WS smoke needs credentials. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…per-minute billing, in-flight revocation WS core for the realtime transcription endpoint (US1/US2/US3): - proxy/realtime.py: thin bidirectional relay (borrowing litellm RealTimeStreaming structure, not its Proxy-form realtime) + side-channel revocation watcher + self-counted per-minute metering from input_audio_buffer.append PCM bytes (R2). Any close path bills one CallRecord(unit="minute") attributed to the allocation (FR-004: abnormal abort never loses usage). Never leaks upstream key/endpoint. - upstream.open_realtime_ws: websockets client to the provider realtime WS, injecting the credential as api-key/Bearer (exact Azure URL validated in T027). - handle_realtime takes an injectable open_upstream/check_active so CI exercises the full preflight→relay→metering→revocation path against a fake provider WS in-loop (engine is bound to the test loop; a TestClient portal would break the DB). - Frontend: realtime KIND_LABEL, /v1/realtime WS usage example, prices 'minute' unit. - nginx: /v1/realtime WS upgrade (HTTP/1.1 Upgrade + no buffering + long timeout). Tests: contract 1-7 (invalid/revoked key, non-realtime model, delta relay, clean- close billing, abnormal-abort billing, in-flight revoke, no-leak) + pure metering unit tests. Full suite 731 passed (715→731), zero regression; ruff+mypy clean; frontend tsc + 164 vitest + build green. SC-006: existing contract tests untouched. T027 (real Azure realtime WS smoke) remains for a credentialed environment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…st model" button realtime was excluded from the recipe table (it's a bidirectional WS, not a one-shot call), so the UI test button was disabled. Add a WS-smoke recipe instead: - upstream.realtime_smoke: opens the upstream realtime WS, runs the session handshake + a tiny silent-audio append, awaits the first server event. A non-error event proves egress(wss:443)+key+deployment+protocol — i.e. the T027 reachability check, now runnable straight from the deployed UI. Raises on error/timeout. - RECIPES["realtime"] = WS smoke, billable=True (gated by the existing confirm dialog; admin test writes only an audit event, never a member CallRecord). Now the model-detail page shows kind "即時字幕（realtime）" with an enabled (billable- confirmed) test button. Full suite 735 passed (731→735); ruff+mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nts), not a litellm mode litellm PR #29775 (gpt-realtime-whisper, merged 2026-06-11) ships the model as mode=audio_transcription and signals realtime via supported_endpoints containing /v1/realtime — i.e. realtime is a capability axis, not a mode (same shape as responses_support). The earlier mode==realtime gate would never match any Azure model. Fix: - model_kind: realtime is capability-derived — raw.supported_endpoints lists /v1/realtime OR an admin `realtime` capability marker (`realtime:blocked` force- disables, manual wins). gpt-realtime-whisper (audio_transcription) → realtime; whisper-1 (no /v1/realtime) stays stt. Everything keyed on model_kind (endpoint gate, test recipe, catalog label) now works for the real model. - billing: bill in the PriceList's unit — litellm prices realtime transcription per SECOND (input_cost_per_second), so default to `second` when unpriced; `minute` still honoured. Adds pcm_bytes_to_seconds + session_quantity. - model-detail: hint that adding the `realtime` capability marks a manually-added model as realtime (needed until litellm's price-map entry — currently clobbered by a json regen on main — is restored, after which import auto-detects it). Full suite 742 passed; ruff+mypy clean; frontend tsc + 164 vitest green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

timcsy and others added 4 commits June 12, 2026 14:53

Copilot AI review requested due to automatic review settings June 12, 2026 14:52

Copilot AI reviewed Jun 12, 2026

timcsy merged commit 0dba616 into main Jun 12, 2026
5 of 6 checks passed

timcsy mentioned this pull request Jun 12, 2026

fix(realtime): correct Azure transcription WS URL (intent=transcription, no deployment) #87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(realtime): /v1/realtime live transcription endpoint (gpt-realtime-whisper)#86

feat(realtime): /v1/realtime live transcription endpoint (gpt-realtime-whisper)#86
timcsy merged 4 commits into
mainfrom
043-realtime-transcription

timcsy commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timcsy commented Jun 12, 2026

階段 32（043）：即時字幕端點 /v1/realtime

做了什麼

不洩漏（FR-006）

測試

部署

真打驗證（合併部署後）

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

階段 32（043）：即時字幕端點 `/v1/realtime`