Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ai-api Development Guidelines

Auto-generated from all feature plans. Last updated: 2026-06-11
Auto-generated from all feature plans. Last updated: 2026-06-12

## Active Technologies
- Python 3.11+(同 Phase 1) (002-auth-membership)
Expand Down Expand Up @@ -66,6 +66,8 @@ Auto-generated from all feature plans. Last updated: 2026-06-11
- PostgreSQL(生產)/ SQLite(dev、CI);**不新增表、不新增 migration**——沿用增量②(0019)的 `call_records.quantity/unit` 與 `price_list.price_unit/price_per_unit_usd`,新單位(query / character)為字串值 (041-multi-endpoint-complete)
- Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端少量範例) + FastAPI(含 `UploadFile` multipart,既有)、SQLAlchemy 2.x async、Pydantic v2、`litellm`(`amoderation`/`asearch`/`aimage_edit` 既有函式);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件** (042-endpoint-registry)
- PostgreSQL(生產)/ SQLite(dev、CI);**不新增表/欄/migration**——沿用 0019 的 `call_records.{quantity,unit}` 與 `price_list.{price_unit,price_per_unit_usd}`,新單位 `image`/`query` 為字串值 (042-endpoint-registry)
- Python 3.11+(後端為主)/ TypeScript strict + React 19(前端僅目錄顯示 realtime 類型 + 連線範例,極少量) + FastAPI(WebSocket — starlette 內建,**專案首次使用**)、SQLAlchemy 2.x async、Pydantic v2(皆既有);**`websockets`(直連 Azure realtime WS 的 async client,提為直接依賴——已隨 uvicorn/litellm 在 image,現宣告為直接依賴)**;既有 `proxy/preflight.py`、計費(`services/pricing.py` 的 `calculate_unit_cost`)、audit。**realtime 不經 litellm**(其 realtime 是 Proxy form / client 直連,違原則;借其 `RealTimeStreaming` 結構自寫薄 relay)。 (043-realtime-transcription)
- PostgreSQL(生產)/ SQLite(dev、CI);**不新增表、不新增 migration**——沿用增量②(0019)的 `call_records.{quantity,unit}` 與 `price_list.{price_unit,price_per_unit_usd}`,新單位 `minute` 為字串值。 (043-realtime-transcription)

- Python 3.11+ + LiteLLM(proxy core)、FastAPI(admin API)、 (001-gateway-core)

Expand All @@ -86,9 +88,9 @@ cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLO
Python 3.11+: Follow standard conventions

## Recent Changes
- 043-realtime-transcription: Added Python 3.11+(後端為主)/ TypeScript strict + React 19(前端僅目錄顯示 realtime 類型 + 連線範例,極少量) + FastAPI(WebSocket — starlette 內建,**專案首次使用**)、SQLAlchemy 2.x async、Pydantic v2(皆既有);**`websockets`(直連 Azure realtime WS 的 async client,提為直接依賴——已隨 uvicorn/litellm 在 image,現宣告為直接依賴)**;既有 `proxy/preflight.py`、計費(`services/pricing.py` 的 `calculate_unit_cost`)、audit。**realtime 不經 litellm**(其 realtime 是 Proxy form / client 直連,違原則;借其 `RealTimeStreaming` 結構自寫薄 relay)。
- 042-endpoint-registry: Added Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端少量範例) + FastAPI(含 `UploadFile` multipart,既有)、SQLAlchemy 2.x async、Pydantic v2、`litellm`(`amoderation`/`asearch`/`aimage_edit` 既有函式);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件**
- 041-multi-endpoint-complete: Added Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端) + FastAPI(含 `UploadFile` multipart)、SQLAlchemy 2.x async、Pydantic v2、`litellm`(`aimage_generation`/`arerank`/`aspeech`/`atranscription` library form);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件**
- 040-ocr-billing-units: Added Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端) + FastAPI、SQLAlchemy 2.x async、Alembic、Pydantic v2、`litellm`(library:`aocr` 既有函式);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件**


<!-- MANUAL ADDITIONS START -->
Expand Down
12 changes: 12 additions & 0 deletions deploy/nginx/default.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,18 @@ server {
proxy_set_header Connection "";
proxy_http_version 1.1;
}
# Realtime transcription is a bidirectional WebSocket (/v1/realtime). It needs
# the HTTP/1.1 Upgrade dance + no buffering + a long read timeout so the relay
# stays open while audio streams. Must precede the generic /v1 location.
location /v1/realtime {
proxy_pass http://${BACKEND_UPSTREAM};
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_buffering off;
proxy_cache off;
proxy_read_timeout 3600s;
}
location /v1 { proxy_pass http://${BACKEND_UPSTREAM}; }
location /docs { proxy_pass http://${BACKEND_UPSTREAM}; }
location /openapi.json { proxy_pass http://${BACKEND_UPSTREAM}; }
Expand Down
22 changes: 22 additions & 0 deletions frontend/src/components/api-usage-example.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,28 @@ export function ApiUsageExample({
-H "Authorization: Bearer $TOKEN" \\
-F "model=${m}" -F "image=@input.png" -F "prompt=make it red"`,
},
realtime: {
path: "/realtime",
desc: "即時字幕(realtime)模型,用 WebSocket 串流音訊、即時收文字(OpenAI realtime transcription 相容)。用量按分鐘計",
// WebSocket — not curl. Replace https:// with wss:// in the endpoint URL.
curl: `# pip install websockets — 串麥克風 PCM、即時收字幕(把 https 換成 wss)
import asyncio, base64, json, websockets

async def main():
url = "${base}/realtime".replace("https://", "wss://").replace("http://", "ws://")
async with websockets.connect(url, additional_headers={"Authorization": "Bearer $TOKEN"}) as ws:
await ws.send(json.dumps({"type": "session.update", "session": {
"type": "transcription", "model": "${m}",
"audio": {"input": {"format": {"type": "audio/pcm", "rate": 24000}}}}}))
await ws.send(json.dumps({"type": "input_audio_buffer.append",
"audio": base64.b64encode(pcm_chunk).decode()}))
async for msg in ws:
ev = json.loads(msg)
if ev.get("type") == "conversation.item.input_audio_transcription.delta":
print(ev["delta"], end="", flush=True)

asyncio.run(main())`,
},
};
if (kind && endpointInfo[kind]) {
const info = endpointInfo[kind]!;
Expand Down
5 changes: 5 additions & 0 deletions frontend/src/routes/admin/model-detail.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ const KIND_LABEL: Record<string, string> = {
moderation: "內容審核(moderation)",
search: "網路搜尋(search)",
image_edit: "圖片編輯(image edit)",
realtime: "即時字幕(realtime)",
unknown: "未知",
};

Expand Down Expand Up @@ -716,6 +717,10 @@ function EditBasicsDialog({
<div>
<Label htmlFor="b-cap">能力(逗號分隔)</Label>
<Input id="b-cap" className="mt-1" placeholder="chat, vision, function-calling" value={capabilities} onChange={(e) => setCapabilities(e.target.value)} />
<p className="text-xs text-muted-foreground mt-1">
加 <code>realtime</code> 把模型標為「即時字幕」(走 /v1/realtime WS、可在此頁測試);
手動加入、litellm 尚未帶入 supported_endpoints 時用得到。<code>realtime:blocked</code> 可強制關閉。
</p>
</div>
<div>
<Label htmlFor="b-rec">適用情境(逗號分隔)</Label>
Expand Down
5 changes: 3 additions & 2 deletions frontend/src/routes/admin/prices.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ const fmtDate = (iso: string) => new Date(iso).toLocaleString("zh-TW");

// Phase 31: non-token billing unit labels.
const UNIT_ZH: Record<string, string> = {
page: "頁", query: "查詢", character: "字元", image: "張", second: "秒",
page: "頁", query: "查詢", character: "字元", image: "張", second: "秒", minute: "分鐘",
};

/** Local "now" formatted for a <input type="datetime-local"> (YYYY-MM-DDTHH:mm). */
Expand Down Expand Up @@ -448,13 +448,14 @@ function AddPriceDialog({
<SelectItem value="character">每字元</SelectItem>
<SelectItem value="image">每張</SelectItem>
<SelectItem value="second">每秒</SelectItem>
<SelectItem value="minute">每分鐘</SelectItem>
</SelectContent>
</Select>
<Input id="p-perpage" className="font-mono flex-1" placeholder="0.003"
value={perPage} onChange={(e) => setPerPage(e.target.value)} />
</div>
<p className="text-xs text-muted-foreground mt-1">
非 token 模型(OCR=頁、rerank/search=查詢、TTS=字元、圖片編輯=張)依該單位計費,填此欄;token 欄可填 0。一筆價格只用一種單位。可按上方「從 LiteLLM 帶入建議價」自動填。
非 token 模型(OCR=頁、rerank/search=查詢、TTS=字元、圖片編輯=張、即時字幕=分鐘)依該單位計費,填此欄;token 欄可填 0。一筆價格只用一種單位。可按上方「從 LiteLLM 帶入建議價」自動填。
</p>
</div>

Expand Down
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,11 @@ dependencies = [
# multipart form parsing — required by FastAPI for /v1/audio/transcriptions
# (STT) audio file upload. FastAPI's official optional dependency.
"python-multipart>=0.0.18",
# async WebSocket client — required to relay /v1/realtime (live transcription)
# directly to the upstream provider's realtime WS. Already present transitively
# via uvicorn[standard]; declared directly so it can't vanish on an upstream
# change (Constitution Deviation: justified — direct provider WS needs a client).
"websockets>=13.0",
]

[project.optional-dependencies]
Expand Down
37 changes: 37 additions & 0 deletions specs/043-realtime-transcription/checklists/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Specification Quality Checklist: realtime 即時字幕端點

**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-12
**Feature**: [spec.md](../spec.md)

## Content Quality

- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed

## Requirement Completeness

- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified

## Feature Readiness

- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification

## Notes

- 0 個 [NEEDS CLARIFICATION]:所有未定細節都有「對齊既有專案慣例」的合理預設,記入 Assumptions(計量單位以供應商回報為先/否則估串流時長、撤回 SLO 對齊既有、配額建立時檢查、額度綁分配不限連線數)。
- 三個技術未知(直連供應商 realtime 連線協定、連線結束的計量來源、持續連線的轉送與連線中撤回機制)刻意**不**放進 spec——它們是規劃階段(research/plan)要先釘死的能力邊界,不是需求層的模糊。
- SC-004「約定上限時間」未填具體秒數為刻意:撤回 SLO 的具體值對齊既有分配撤回機制、由規劃階段定,spec 層不硬編。
- Input 行保留 user 原述(含 WebSocket / gpt-realtime-whisper / litellm Proxy 等字眼)為 speckit 慣例(記錄原始描述);正文以業務語言(持續連線/串流/相容端點)表述,不洩漏實作。
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Contract: realtime 即時字幕 WebSocket 端點

**端點**:`GET /v1/realtime`(WebSocket upgrade)— OpenAI 相容 realtime transcription
**認證**:`Authorization: Bearer <應用金鑰>`(連線 header,沿用既有金鑰)或 OpenAI realtime 慣例的 subprotocol header(tasks 階段對齊 OpenAI 客戶端慣例)
**形態**:雙向 WebSocket。客戶端上行音訊、平台下行文字事件。

## 連線生命週期

```
client → (WS upgrade + Bearer key)
platform: run_preflight(key → allocation → access → quota → model)
├─ 不通過 → close(code, reason) ;不開始串流(FR-002/005/007)
└─ 通過 → accept;開一條 platform↔Azure WS;進入雙向轉送
client → session.update {type:"transcription", model, audio.format}
client → input_audio_buffer.append {audio: <base64 PCM>} (重複,串流)
platform→ conversation.item.input_audio_transcription.delta {delta} (即時,SC-001 <1s)
platform→ conversation.item.input_audio_transcription.completed {transcript}
...
(任一端關閉 / 撤回 re-check 觸發)→ platform: 落帳 CallRecord(unit=minute) → close
```

## Client → Server 事件(平台接受並轉送上游)

| 事件 | 必要欄位 | 平台行為 |
|---|---|---|
| `session.update` | `type:"transcription"`, `model`, `audio.format{type,rate}` | 校驗 model 為 realtime 類型(否則 close,FR-007);記下 sample_rate/format 供計量;轉送上游 |
| `input_audio_buffer.append` | `audio`(base64 PCM)| **累計 audio_bytes(計量來源,R2)**;轉送上游 |
| `input_audio_buffer.commit` | — | 轉送上游(manual turn detection)|

## Server → Client 事件(平台從上游轉回)

| 事件 | 內容 | 備註 |
|---|---|---|
| `conversation.item.input_audio_transcription.delta` | `delta`(增量文字)| 即時字幕主要輸出;SC-001 首段 <1s |
| `conversation.item.input_audio_transcription.completed` | `transcript`(完整)| 一段話完成;平台在此路徑可記觀測 |
| `error` | `error{code,message}` | 上游錯誤透明轉回;不洩漏上游金鑰(FR-006)|

## 連線關閉碼(平台主動關閉時)

| 情境 | 關閉碼/原因 | 對應 |
|---|---|---|
| 金鑰無效/撤回、無有效分配、配額已滿 | policy violation + 可理解 reason | FR-002, SC-005 |
| 模型非 realtime 類型 | unsupported + reason | FR-007 |
| 連線中分配被撤回/暫停/隔離 | revoked + reason | FR-005, SC-004 |
| 上游斷線/失敗 | upstream_error + 透明原因 | FR-009 |

## 計量契約

- 計量單位:`minute`;數量 = `ceil(Σ append PCM bytes / (rate × bytes_per_sample × channels) / 60)`(精確 rounding tasks 定)。
- 落帳時機:**連線關閉(任何原因,含異常)**——`audio_bytes` 即時累計確保不漏記(FR-004/SC-003)。
- 歸戶:preflight 解出的 allocation;費用 = `calculate_unit_cost`(既有)。

## 不洩漏契約(FR-006)

任何下行事件、錯誤、關閉原因 MUST NOT 含上游 endpoint / key / 內部部署名;上游錯誤轉譯為對使用者可理解的訊息。

## 契約測試(合併前必過)

1. 無效/撤回金鑰連線 → 被 close、未開始串流。
2. 非 realtime 模型 → close(unsupported)。
3. 有效連線 + 送 append → 收到 delta(mock provider WS 回預錄 delta)。
4. 連線關閉 → 寫一筆 `CallRecord(unit="minute")`、quantity 對得上送出的音訊時長。
5. 連線中 mock 撤回分配 → 平台在 N 秒內主動 close(revoked) + 已累計時長落帳。
6. 異常中止(client 直接斷)→ 仍落帳已累計時長(不漏記)。
7. 任何錯誤/關閉訊息不含上游 key/endpoint。
54 changes: 54 additions & 0 deletions specs/043-realtime-transcription/data-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Phase 1 Data Model: realtime 即時字幕端點

**核心結論:不新增表、不新增 migration。** realtime 連線本身是 in-memory 的生命週期物件(不落表);用量沿用既有 `call_records`(增量② 0019 的 `quantity`/`unit`)+ `price_list`(`price_unit`/`price_per_unit_usd`),新單位 `minute` 為字串值。

## 1. RealtimeSession(in-memory,非持久化)

一次 WS 連線的執行期狀態,**不寫表**——只活在連線存活期間,斷線時把累計結果落成一筆 `CallRecord`。

| 欄位 | 型別 | 說明 |
|---|---|---|
| `allocation_id` | str | preflight 解出的歸戶分配(計量落帳對象)|
| `credential_id` | str | 建立連線的應用金鑰(審計用)|
| `member_id` | str | 擁有者(審計用)|
| `resource_model` | str | 請求的 realtime 模型 slug |
| `upstream_model` | str | 對映到上游的模型字串 |
| `started_at` | datetime(tz-aware)| 連線建立時間 |
| `audio_bytes` | int | 累計收到的 PCM 音訊 bytes(計量來源,R2)|
| `sample_rate` / `bytes_per_sample` / `channels` | int | 由 `session.update` 的 format 決定,換算時長用 |
| `close_reason` | enum | `normal` / `client_abort` / `upstream_error` / `revoked` |

**衍生**:`duration_seconds = audio_bytes / (sample_rate × bytes_per_sample × channels)`;`quantity_minutes = ceil(duration_seconds / 60)` 或精確分鐘(tasks 階段定 rounding,對齊計費慣例)。

**狀態轉移**:`connecting`(preflight 中)→ `streaming`(轉送中、累計 audio_bytes、週期 re-check)→ `closing`(任一端關閉或撤回觸發)→ 落帳 `CallRecord` → `closed`。

## 2. CallRecord(既有,沿用)

斷線時寫**一筆**,與其他非 token 端點同機制:

| 欄位 | 值 |
|---|---|
| `allocation_id` | RealtimeSession.allocation_id(歸戶;異常中止仍寫)|
| `quantity` | 累計分鐘數(R2 自算)|
| `unit` | `"minute"`(新字串值,**非新欄位**,0019 已有 unit 欄)|
| `cost_usd` | `calculate_unit_cost(quantity, price_per_unit)`(既有函式)|
| `outcome` | 對映 close_reason(`success` / `upstream_error` …,沿用既有 enum)|
| token 欄 | NULL(非 token 端點,沿用 0019 的 NULL⇒非 token 語意)|

**FR-004 不漏記**:`audio_bytes` 在 relay 迴圈即時累計,故任何斷線路徑(正常/異常/撤回)落帳時都有值。

## 3. PriceList(既有,沿用)

realtime 模型的價以 `price_unit="minute"` + `price_per_unit_usd`(如 gpt-realtime-whisper $0.017)存一筆 point-in-time 版本(append-only)。admin 在既有 `/prices` 設定(單位下拉加 `minute`,沿用階段 29 unit billing 的單位感知 UI)。**LiteLLM 僅建議、PriceList 是計費真理**(不變)。

## 4. Allocation(既有,沿用)

歸戶對象 + 配額載體 + 連線中 re-check 的狀態來源(active / revoked / paused / quarantined)。**不改 schema**。

## 5. model_kind:realtime 類型

`services/model_kind.py` 的 mode→kind 對映加 `realtime`(litellm `mode` 為 realtime/realtime-transcription 時)。對應目錄誠實(FR-008):realtime 模型顯正確類型、不假裝 chat。**改 model_kind 對映後須重跑全套件**(experience 教訓:有「未知 mode 反例」整合測試會撞)。

---

**Migration 結論**:**無**。沿用 0019 的 `call_records.{quantity,unit}` 與 `price_list.{price_unit,price_per_unit_usd}`;`minute` 是資料值非 schema 變更。RealtimeSession 不落表。
Loading