Skip to content

feat(realtime): /v1/realtime live transcription endpoint (gpt-realtime-whisper)#86

Merged
timcsy merged 4 commits into
mainfrom
043-realtime-transcription
Jun 12, 2026
Merged

feat(realtime): /v1/realtime live transcription endpoint (gpt-realtime-whisper)#86
timcsy merged 4 commits into
mainfrom
043-realtime-transcription

Conversation

@timcsy

@timcsy timcsy commented Jun 12, 2026

Copy link
Copy Markdown
Owner

階段 32(043):即時字幕端點 /v1/realtime

OpenAI 相容的 realtime transcription WebSocket 端點——客戶端串流音訊、即時收文字,用量按時間計、歸戶分配、連線中可撤回。

做了什麼

  • WS relayproxy/realtime.py):薄雙向轉送(借 litellm RealTimeStreaming 結構,不經 litellm realtime——它是 Proxy form、音訊繞過 gateway,會失去分配歸戶 + 即時撤回)。含旁路撤回 watcher(每 N 秒 re-check 分配,非 active 即主動斷線)。
  • 計量(research R2):自算 client input_audio_buffer.append 的 PCM bytes → 時長;任何斷線路徑(正常/異常/撤回)都落一筆帳,不漏記。
  • 計費單位對齊 litellmgpt-realtime-whisper(PR Add gpt-realtime-whisper Realtime transcription support (OpenAI + Azure) BerriAI/litellm#29775)按計(input_cost_per_second);落帳依 PriceList 的單位(second/minute),沿用增量② 的 call_records.{quantity,unit}——零 migration
  • realtime 是能力軸、非 mode:litellm 把 gpt-realtime-whispermode=audio_transcription + supported_endpoints/v1/realtimemodel_kind 據此(或 admin realtime 能力標記)判 realtime——與 responses_support 同形狀。
  • 可從 UI 測試:模型詳情「測試模型」對 realtime 模型跑 WS 煙霧(握手 + 微量音訊 + 等首個事件),等於部署後的協定真打。
  • 前端:realtime 類型標籤、/v1/realtime WS 連線範例、價目單位加「分鐘」、能力欄 realtime 標記提示。
  • nginx/v1/realtime 加 WS upgrade(HTTP/1.1 Upgrade + 不緩衝 + 長 timeout)。

不洩漏(FR-006)

上游 key/endpoint 永不下行;有測試證。

測試

  • 契約 1–7(無效/撤回金鑰、非 realtime 模型、delta 轉送、落帳、異常中止、連線中撤回、no-leak)+ 純計量單元 + 能力偵測單元 + WS 煙霧單元/整合。
  • 全套件 742 passed(零回歸);ruff + mypy 乾淨;前端 tsc + 164 vitest + build 綠。
  • SC-006:既有 contract 測試 git diff 為空。

部署

無 migration、無新套件(websockets 已宣告為直接依賴)。migrationJob.enabled=false

真打驗證(合併部署後)

CI 全用 mock provider WS。真連 Azure realtime WS(協定接通、計量對帳、upstream._build_realtime_url 的確切 Azure URL——可能需 intent=transcription)以部署後「測試模型」按鈕 / quickstart 驗證。

🤖 Generated with Claude Code

timcsy and others added 4 commits June 12, 2026 14:53
…minute unit, model_kind)

Phase 32 realtime transcription endpoint — full speckit spec package (spec, plan,
research, data-model, WS event contract, quickstart, tasks) plus the self-contained
foundational layer:

- websockets declared as a direct dependency (was transitive via uvicorn) — needed
  to relay /v1/realtime to the provider's realtime WS (Constitution Deviation noted).
- model_kind: add `realtime` kind (mode→kind) so the catalog labels realtime models
  honestly; full suite re-run green (715 passed) per the model_kind lesson.
- minute billing unit verified through the existing unit-billing path (calculate_unit_cost
  is unit-agnostic; `minute` is a new string value, no schema change) + test.

Foundational logic (T001/T003/T006) done & green. The WS core — upstream WS client,
mock provider WS server, bidirectional relay (US1), per-minute metering (US2), in-flight
revocation (US3) — is the next focused block; T027 real Azure WS smoke needs credentials.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…per-minute billing, in-flight revocation

WS core for the realtime transcription endpoint (US1/US2/US3):
- proxy/realtime.py: thin bidirectional relay (borrowing litellm RealTimeStreaming
  structure, not its Proxy-form realtime) + side-channel revocation watcher +
  self-counted per-minute metering from input_audio_buffer.append PCM bytes (R2).
  Any close path bills one CallRecord(unit="minute") attributed to the allocation
  (FR-004: abnormal abort never loses usage). Never leaks upstream key/endpoint.
- upstream.open_realtime_ws: websockets client to the provider realtime WS,
  injecting the credential as api-key/Bearer (exact Azure URL validated in T027).
- handle_realtime takes an injectable open_upstream/check_active so CI exercises
  the full preflight→relay→metering→revocation path against a fake provider WS
  in-loop (engine is bound to the test loop; a TestClient portal would break the DB).
- Frontend: realtime KIND_LABEL, /v1/realtime WS usage example, prices 'minute' unit.
- nginx: /v1/realtime WS upgrade (HTTP/1.1 Upgrade + no buffering + long timeout).

Tests: contract 1-7 (invalid/revoked key, non-realtime model, delta relay, clean-
close billing, abnormal-abort billing, in-flight revoke, no-leak) + pure metering
unit tests. Full suite 731 passed (715→731), zero regression; ruff+mypy clean;
frontend tsc + 164 vitest + build green. SC-006: existing contract tests untouched.

T027 (real Azure realtime WS smoke) remains for a credentialed environment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…st model" button

realtime was excluded from the recipe table (it's a bidirectional WS, not a one-shot
call), so the UI test button was disabled. Add a WS-smoke recipe instead:
- upstream.realtime_smoke: opens the upstream realtime WS, runs the session
  handshake + a tiny silent-audio append, awaits the first server event. A non-error
  event proves egress(wss:443)+key+deployment+protocol — i.e. the T027 reachability
  check, now runnable straight from the deployed UI. Raises on error/timeout.
- RECIPES["realtime"] = WS smoke, billable=True (gated by the existing confirm
  dialog; admin test writes only an audit event, never a member CallRecord).

Now the model-detail page shows kind "即時字幕(realtime)" with an enabled (billable-
confirmed) test button. Full suite 735 passed (731→735); ruff+mypy clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nts), not a litellm mode

litellm PR #29775 (gpt-realtime-whisper, merged 2026-06-11) ships the model as
mode=audio_transcription and signals realtime via supported_endpoints containing
/v1/realtime — i.e. realtime is a capability axis, not a mode (same shape as
responses_support). The earlier mode==realtime gate would never match any Azure
model. Fix:

- model_kind: realtime is capability-derived — raw.supported_endpoints lists
  /v1/realtime OR an admin `realtime` capability marker (`realtime:blocked` force-
  disables, manual wins). gpt-realtime-whisper (audio_transcription) → realtime;
  whisper-1 (no /v1/realtime) stays stt. Everything keyed on model_kind (endpoint
  gate, test recipe, catalog label) now works for the real model.
- billing: bill in the PriceList's unit — litellm prices realtime transcription per
  SECOND (input_cost_per_second), so default to `second` when unpriced; `minute`
  still honoured. Adds pcm_bytes_to_seconds + session_quantity.
- model-detail: hint that adding the `realtime` capability marks a manually-added
  model as realtime (needed until litellm's price-map entry — currently clobbered by
  a json regen on main — is restored, after which import auto-detects it).

Full suite 742 passed; ruff+mypy clean; frontend tsc + 164 vitest green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 12, 2026 14:52

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@timcsy timcsy merged commit 0dba616 into main Jun 12, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants