feat(realtime): /v1/realtime live transcription endpoint (gpt-realtime-whisper)#86
Merged
Conversation
…minute unit, model_kind) Phase 32 realtime transcription endpoint — full speckit spec package (spec, plan, research, data-model, WS event contract, quickstart, tasks) plus the self-contained foundational layer: - websockets declared as a direct dependency (was transitive via uvicorn) — needed to relay /v1/realtime to the provider's realtime WS (Constitution Deviation noted). - model_kind: add `realtime` kind (mode→kind) so the catalog labels realtime models honestly; full suite re-run green (715 passed) per the model_kind lesson. - minute billing unit verified through the existing unit-billing path (calculate_unit_cost is unit-agnostic; `minute` is a new string value, no schema change) + test. Foundational logic (T001/T003/T006) done & green. The WS core — upstream WS client, mock provider WS server, bidirectional relay (US1), per-minute metering (US2), in-flight revocation (US3) — is the next focused block; T027 real Azure WS smoke needs credentials. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…per-minute billing, in-flight revocation WS core for the realtime transcription endpoint (US1/US2/US3): - proxy/realtime.py: thin bidirectional relay (borrowing litellm RealTimeStreaming structure, not its Proxy-form realtime) + side-channel revocation watcher + self-counted per-minute metering from input_audio_buffer.append PCM bytes (R2). Any close path bills one CallRecord(unit="minute") attributed to the allocation (FR-004: abnormal abort never loses usage). Never leaks upstream key/endpoint. - upstream.open_realtime_ws: websockets client to the provider realtime WS, injecting the credential as api-key/Bearer (exact Azure URL validated in T027). - handle_realtime takes an injectable open_upstream/check_active so CI exercises the full preflight→relay→metering→revocation path against a fake provider WS in-loop (engine is bound to the test loop; a TestClient portal would break the DB). - Frontend: realtime KIND_LABEL, /v1/realtime WS usage example, prices 'minute' unit. - nginx: /v1/realtime WS upgrade (HTTP/1.1 Upgrade + no buffering + long timeout). Tests: contract 1-7 (invalid/revoked key, non-realtime model, delta relay, clean- close billing, abnormal-abort billing, in-flight revoke, no-leak) + pure metering unit tests. Full suite 731 passed (715→731), zero regression; ruff+mypy clean; frontend tsc + 164 vitest + build green. SC-006: existing contract tests untouched. T027 (real Azure realtime WS smoke) remains for a credentialed environment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st model" button realtime was excluded from the recipe table (it's a bidirectional WS, not a one-shot call), so the UI test button was disabled. Add a WS-smoke recipe instead: - upstream.realtime_smoke: opens the upstream realtime WS, runs the session handshake + a tiny silent-audio append, awaits the first server event. A non-error event proves egress(wss:443)+key+deployment+protocol — i.e. the T027 reachability check, now runnable straight from the deployed UI. Raises on error/timeout. - RECIPES["realtime"] = WS smoke, billable=True (gated by the existing confirm dialog; admin test writes only an audit event, never a member CallRecord). Now the model-detail page shows kind "即時字幕(realtime)" with an enabled (billable- confirmed) test button. Full suite 735 passed (731→735); ruff+mypy clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nts), not a litellm mode litellm PR #29775 (gpt-realtime-whisper, merged 2026-06-11) ships the model as mode=audio_transcription and signals realtime via supported_endpoints containing /v1/realtime — i.e. realtime is a capability axis, not a mode (same shape as responses_support). The earlier mode==realtime gate would never match any Azure model. Fix: - model_kind: realtime is capability-derived — raw.supported_endpoints lists /v1/realtime OR an admin `realtime` capability marker (`realtime:blocked` force- disables, manual wins). gpt-realtime-whisper (audio_transcription) → realtime; whisper-1 (no /v1/realtime) stays stt. Everything keyed on model_kind (endpoint gate, test recipe, catalog label) now works for the real model. - billing: bill in the PriceList's unit — litellm prices realtime transcription per SECOND (input_cost_per_second), so default to `second` when unpriced; `minute` still honoured. Adds pcm_bytes_to_seconds + session_quantity. - model-detail: hint that adding the `realtime` capability marks a manually-added model as realtime (needed until litellm's price-map entry — currently clobbered by a json regen on main — is restored, after which import auto-detects it). Full suite 742 passed; ruff+mypy clean; frontend tsc + 164 vitest green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
階段 32(043):即時字幕端點
/v1/realtimeOpenAI 相容的 realtime transcription WebSocket 端點——客戶端串流音訊、即時收文字,用量按時間計、歸戶分配、連線中可撤回。
做了什麼
proxy/realtime.py):薄雙向轉送(借 litellmRealTimeStreaming結構,不經 litellm realtime——它是 Proxy form、音訊繞過 gateway,會失去分配歸戶 + 即時撤回)。含旁路撤回 watcher(每 N 秒 re-check 分配,非 active 即主動斷線)。input_audio_buffer.append的 PCM bytes → 時長;任何斷線路徑(正常/異常/撤回)都落一筆帳,不漏記。gpt-realtime-whisper(PR Add gpt-realtime-whisper Realtime transcription support (OpenAI + Azure) BerriAI/litellm#29775)按秒計(input_cost_per_second);落帳依 PriceList 的單位(second/minute),沿用增量② 的call_records.{quantity,unit}——零 migration。gpt-realtime-whisper標mode=audio_transcription+supported_endpoints含/v1/realtime。model_kind據此(或 adminrealtime能力標記)判 realtime——與responses_support同形狀。/v1/realtimeWS 連線範例、價目單位加「分鐘」、能力欄realtime標記提示。/v1/realtime加 WS upgrade(HTTP/1.1 Upgrade + 不緩衝 + 長 timeout)。不洩漏(FR-006)
上游 key/endpoint 永不下行;有測試證。
測試
部署
無 migration、無新套件(
websockets已宣告為直接依賴)。migrationJob.enabled=false。真打驗證(合併部署後)
CI 全用 mock provider WS。真連 Azure realtime WS(協定接通、計量對帳、
upstream._build_realtime_url的確切 Azure URL——可能需intent=transcription)以部署後「測試模型」按鈕 / quickstart 驗證。🤖 Generated with Claude Code