Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ai-api Development Guidelines

Auto-generated from all feature plans. Last updated: 2026-06-12
Auto-generated from all feature plans. Last updated: 2026-06-13

## Active Technologies
- Python 3.11+(同 Phase 1) (002-auth-membership)
Expand Down Expand Up @@ -68,6 +68,8 @@ Auto-generated from all feature plans. Last updated: 2026-06-12
- PostgreSQL(生產)/ SQLite(dev、CI);**不新增表/欄/migration**——沿用 0019 的 `call_records.{quantity,unit}` 與 `price_list.{price_unit,price_per_unit_usd}`,新單位 `image`/`query` 為字串值 (042-endpoint-registry)
- Python 3.11+(後端為主)/ TypeScript strict + React 19(前端僅目錄顯示 realtime 類型 + 連線範例,極少量) + FastAPI(WebSocket — starlette 內建,**專案首次使用**)、SQLAlchemy 2.x async、Pydantic v2(皆既有);**`websockets`(直連 Azure realtime WS 的 async client,提為直接依賴——已隨 uvicorn/litellm 在 image,現宣告為直接依賴)**;既有 `proxy/preflight.py`、計費(`services/pricing.py` 的 `calculate_unit_cost`)、audit。**realtime 不經 litellm**(其 realtime 是 Proxy form / client 直連,違原則;借其 `RealTimeStreaming` 結構自寫薄 relay)。 (043-realtime-transcription)
- PostgreSQL(生產)/ SQLite(dev、CI);**不新增表、不新增 migration**——沿用增量②(0019)的 `call_records.{quantity,unit}` 與 `price_list.{price_unit,price_per_unit_usd}`,新單位 `minute` 為字串值。 (043-realtime-transcription)
- Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端) + FastAPI、SQLAlchemy 2.x async、Alembic、Pydantic v2(後端);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件** (046-cost-quota)
- PostgreSQL(生產)/ SQLite(dev、CI);**新 migration `0020`**——`allocations` 加一個 nullable 欄 `quota_cost_usd_per_month`(純加欄)。累計來源沿用既有 `call_records.cost_usd`(0019 已有)。 (046-cost-quota)

- Python 3.11+ + LiteLLM(proxy core)、FastAPI(admin API)、 (001-gateway-core)

Expand All @@ -88,9 +90,9 @@ cd src [ONLY COMMANDS FOR ACTIVE TECHNOLOGIES][ONLY COMMANDS FOR ACTIVE TECHNOLO
Python 3.11+: Follow standard conventions

## Recent Changes
- 046-cost-quota: Added Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端) + FastAPI、SQLAlchemy 2.x async、Alembic、Pydantic v2(後端);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件**
- 043-realtime-transcription: Added Python 3.11+(後端為主)/ TypeScript strict + React 19(前端僅目錄顯示 realtime 類型 + 連線範例,極少量) + FastAPI(WebSocket — starlette 內建,**專案首次使用**)、SQLAlchemy 2.x async、Pydantic v2(皆既有);**`websockets`(直連 Azure realtime WS 的 async client,提為直接依賴——已隨 uvicorn/litellm 在 image,現宣告為直接依賴)**;既有 `proxy/preflight.py`、計費(`services/pricing.py` 的 `calculate_unit_cost`)、audit。**realtime 不經 litellm**(其 realtime 是 Proxy form / client 直連,違原則;借其 `RealTimeStreaming` 結構自寫薄 relay)。
- 042-endpoint-registry: Added Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端少量範例) + FastAPI(含 `UploadFile` multipart,既有)、SQLAlchemy 2.x async、Pydantic v2、`litellm`(`amoderation`/`asearch`/`aimage_edit` 既有函式);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件**
- 041-multi-endpoint-complete: Added Python 3.11+(後端)/ TypeScript strict + React 19 + Vite 6(前端) + FastAPI(含 `UploadFile` multipart)、SQLAlchemy 2.x async、Pydantic v2、`litellm`(`aimage_generation`/`arerank`/`aspeech`/`atranscription` library form);TanStack Query、shadcn/ui(前端)——**皆既有,不新增套件**


<!-- MANUAL ADDITIONS START -->
Expand Down
29 changes: 29 additions & 0 deletions alembic/versions/0020_cost_quota.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""Phase 33 (046): cost-based monthly quota — per-allocation USD spend cap.

Additive, nullable column (zero regression for token quota):
allocations: quota_cost_usd_per_month (NULL ⇒ no cost cap)
Existing rows stay NULL and keep using quota_tokens_per_month unchanged.
"""
from __future__ import annotations

from collections.abc import Sequence

import sqlalchemy as sa

from alembic import op

revision: str = "0020_cost_quota"
down_revision: str | Sequence[str] | None = "0019_billing_units"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None


def upgrade() -> None:
op.add_column(
"allocations",
sa.Column("quota_cost_usd_per_month", sa.Numeric(10, 6), nullable=True),
)


def downgrade() -> None:
op.drop_column("allocations", "quota_cost_usd_per_month")
16 changes: 16 additions & 0 deletions frontend/src/components/allocation-list.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ interface Allocation {
revoked_at: string | null;
token_prefix: string;
quota_tokens_per_month?: number | null;
quota_cost_usd_per_month?: string | null;
cost_used_this_month?: string | null;
price?: { input_per_1k: string; output_per_1k: string; cached_input_per_1k?: string } | null;
}

Expand Down Expand Up @@ -225,6 +227,20 @@ export function AllocationList() {
<div>配額:無上限</div>
)
)}
{a.status === "active" && a.quota_cost_usd_per_month != null && (() => {
const used = Number(a.cost_used_this_month ?? 0);
const cap = Number(a.quota_cost_usd_per_month);
const near = cap > 0 && used / cap >= 0.8;
return (
<div className="space-y-1">
<div className={near ? "text-destructive" : "text-foreground"}>
本月花費 ${used.toFixed(2)} / 上限 ${cap.toFixed(2)}
{near && "(接近上限)"}
</div>
<Progress value={cap > 0 ? Math.min(100, Math.round((used / cap) * 100)) : 0} />
</div>
);
})()}
<div>
現價(每 1M):
{a.price
Expand Down
34 changes: 31 additions & 3 deletions frontend/src/routes/admin/allocations.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ interface AdminAllocation {
display_name?: string | null;
status: string;
quota_tokens_per_month: number | null;
quota_cost_usd_per_month: string | null;
is_service_allocation: boolean;
quota_locked: boolean;
token_prefix: string;
Expand Down Expand Up @@ -86,6 +87,7 @@ export function AdminAllocationsPage() {
const [showRevoked, setShowRevoked] = React.useState(false);
const [quotaTarget, setQuotaTarget] = React.useState<AdminAllocation | null>(null);
const [quotaValue, setQuotaValue] = React.useState("");
const [costValue, setCostValue] = React.useState("");

const allocsQuery = useQuery<AdminAllocation[], ApiError>({
queryKey: ["admin", "allocations"],
Expand Down Expand Up @@ -257,6 +259,7 @@ export function AdminAllocationsPage() {
onClick={() => {
setQuotaTarget(a);
setQuotaValue(a.quota_tokens_per_month != null ? String(a.quota_tokens_per_month) : "");
setCostValue(a.quota_cost_usd_per_month != null ? String(Number(a.quota_cost_usd_per_month)) : "");
}}
>
調整配額
Expand Down Expand Up @@ -403,8 +406,9 @@ export function AdminAllocationsPage() {
<DialogContent>
<DialogHeader>
<DialogTitle>調整月度配額</DialogTitle>
<DialogDescription>留空=無限額;否則填非負整數 tokens。</DialogDescription>
<DialogDescription>兩種上限可同時設、任一達到即擋;留空=該項無上限。</DialogDescription>
</DialogHeader>
<label className="text-sm font-medium">每月 token 上限</label>
<Input
type="number"
min={0}
Expand All @@ -416,17 +420,41 @@ export function AdminAllocationsPage() {
{quotaValue.trim() !== "" && !/^\d+$/.test(quotaValue.trim()) && (
<p className="text-xs text-destructive">請填非負整數,或留空表示無限額。</p>
)}
<label className="text-sm font-medium mt-2">每月花費上限(USD)</label>
<Input
type="number"
min={0}
step="0.01"
value={costValue}
placeholder="無上限"
aria-label="每月花費上限"
onChange={(e) => setCostValue(e.target.value)}
/>
{costValue.trim() !== "" && !(Number(costValue) >= 0) && (
<p className="text-xs text-destructive">請填非負金額,或留空表示無上限。</p>
)}
<p className="text-xs text-muted-foreground">
花費上限以 USD 統一治理所有端點(token / 頁 / 張 / 秒 / 分…);只治理「已定價」的用量。
</p>
<DialogFooter>
<Button variant="outline" onClick={() => setQuotaTarget(null)}>取消</Button>
<Button
disabled={quotaValue.trim() !== "" && !/^\d+$/.test(quotaValue.trim())}
disabled={
(quotaValue.trim() !== "" && !/^\d+$/.test(quotaValue.trim())) ||
(costValue.trim() !== "" && !(Number(costValue) >= 0))
}
onClick={() => {
if (!quotaTarget) return;
const v = quotaValue.trim();
const c = costValue.trim();
if (v !== "" && !/^\d+$/.test(v)) return;
if (c !== "" && !(Number(c) >= 0)) return;
patchMut.mutate({
id: quotaTarget.id,
body: { quota_tokens_per_month: v === "" ? null : Number(v) },
body: {
quota_tokens_per_month: v === "" ? null : Number(v),
quota_cost_usd_per_month: c === "" ? null : Number(c),
},
});
setQuotaTarget(null);
}}
Expand Down
40 changes: 40 additions & 0 deletions specs/046-cost-quota/checklists/requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Specification Quality Checklist: 成本制配額(跨端點統一額度上限)

**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-06-13
**Feature**: [spec.md](../spec.md)

## Content Quality

- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed

## Requirement Completeness

- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified

## Feature Readiness

- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification

## Notes

- 0 個 [NEEDS CLARIFICATION]——關鍵抉擇皆有合理預設且源自 knowie-next 已收斂的 brief 與既有專案 pattern:
- 「以花費(USD)為跨單位共同分母」源自 vision 階段 29 既定結論。
- 「花費上限不進自適應配額池」為明確排除,避免雙再分配邏輯互撞。
- 「未定價呼叫花費為 0、不被治理」延續「PriceList 是計費唯一真理」。
- 「realtime 連線中把關沿用既有撤回 re-check 協程」對應原則 3 + 階段 32 既有機制。
- Key Entities 提到的「分配 / 用量紀錄」為**領域實體**(非技術框架),符合 spec 慣例。
- 準備好進入 `/speckit-plan`。
69 changes: 69 additions & 0 deletions specs/046-cost-quota/contracts/cost-quota.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Contract: 成本制配額

四個契約面。錯誤封包沿用既有 `{error:{code,message,request_id}}`。

## 1. Admin 分配 create / update — 收選填花費上限

**端點**:既有 `POST /admin/allocations`、`PATCH /admin/allocations/{id}`(或既有配額編輯端點)
**新增欄位(請求)**:

```jsonc
{ "quota_cost_usd_per_month": 5.00 } // 選填;null/省略 = 不設上限;< 0 → 422
```

- 接受 `null`(清除上限)、正數(設上限,含 0)、省略(不變)。
- 變更留稽核(沿用既有 allocation 更新的 audit,記新值)。
- **回應**:分配序列化多 `quota_cost_usd_per_month`。

## 2. Proxy 同步端點 — 花費超額拒絕

**端點**:所有計費端點(chat/responses + registry:embedding/ocr/image/rerank/audio/moderation/search/image_edit)共用 preflight。

```
preflight: 若 allocation.quota_cost_usd_per_month 非 null 且 current_month_cost ≥ cap
→ 403 { "error": { "code": "cost_quota_exceeded",
"message": "已達本月花費上限($X / $Y)" } }
→ 記一筆 CallRecord(outcome=rejected_cost_quota_exceeded, status=403, allocation 綁定)
```

- token 上限與花費上限**並列**:任一達到即擋(取較嚴者)。
- 未設花費上限(null)→ 此檢查跳過,行為與現況一致。
- 未定價呼叫(`cost_usd` NULL)不增加累計 → 不會因花費上限被擋。

## 3. 用量顯示 — 本月花費 / 上限

**端點**:`/me/usage`、`/me/allocations`、admin 用量每分配。
**新增欄位(回應,每分配)**:

```jsonc
{
"cost_used_this_month": "4.90", // Decimal 字串
"quota_cost_usd_per_month": "5.000000" // 或 null
}
```

## 4. Realtime 連線中花費把關

**端點**:`/v1/realtime`(階段 32)
**行為**:

```
連線建立:preflight 含 cost 檢查(同 §2,超額即 close 不開串流)
連線中:旁路 watcher 每 N 秒:
committed = current_month_cost(allocation) // 已落帳(不含本連線)
running = session_running_cost(sess, price) // 本連線進行中累計
若 committed + running ≥ cap → close(policy violation 1008, reason="本月花費上限")
+ 已累計時長落帳(沿用「任何 close 路徑都落帳」)
```

- 容差:最多「上限 + 一個 re-check 週期」的小幅超出(與既有撤回延遲同語意)。

## 契約測試(合併前必過)

1. 分配設花費上限 $X,混合 chat(token)+ OCR/realtime(非 token)累計花費達 $X → 後續呼叫 403 `cost_quota_exceeded`,且落一筆 `rejected_cost_quota_exceeded`。
2. 分配**未**設花費上限 → 大量非 token 呼叫不被擋(token 配額行為零回歸)。
3. 分配同設 token+cost 上限 → 先達到者擋(兩種各驗一次)。
4. 未定價模型呼叫 → 不增加累計、不被花費上限擋。
5. realtime 連線中累計花費超額 → N 秒內 close(mock provider WS)+ 已累計時長落帳。
6. 自適應配額池跑一輪 → 各分配 `quota_cost_usd_per_month` 不變(只 token 額度被再分配)。
7. `current_month_cost` = 該分配本月成功呼叫 `cost_usd` 之 `Decimal` 總和(含非 token,未定價以 0 計)。
44 changes: 44 additions & 0 deletions specs/046-cost-quota/data-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Phase 1 Data Model: 成本制配額

**核心結論:一個 additive migration(0020)、零新表。** 沿用既有 `call_records.cost_usd`(0019)作為累計來源;分配只加一個選填上限欄;新拒絕 outcome 走 VARCHAR enum 無 migration。

## 1. Allocation(既有,加一欄)

| 欄位 | 型別 | 說明 |
|---|---|---|
| `quota_cost_usd_per_month` | `Numeric(10,6)` **nullable** | 每月花費上限(USD)。NULL ⇒ 無花費上限(維持現況)。**migration 0020 純加欄。** |

既有 `quota_tokens_per_month`(token 上限)、`quota_locked`、`is_service_allocation` 等**不變**。兩種上限可並存。

**驗證規則**:`>= 0`(0=立即擋;負值拒絕)。admin create/update 收選填值;空=NULL(不設)。

**與自適應池**:`quota_cost_usd_per_month` **不在** `quota_pool` 的讀寫範圍(池只動 `quota_tokens_per_month`)→ 不被再分配(SC-005)。

## 2. CallRecord(既有,沿用 + 新 outcome 值)

- 累計來源:`sum(cost_usd) where allocation_id=? and outcome=success and started_at>=月初(UTC)`。`cost_usd` NULL(未定價)→ `coalesce 0`,不計入。
- `CallOutcome` 列舉新增 **`rejected_cost_quota_exceeded`**(`Enum(native_enum=False, length=32)` 存 VARCHAR → **無 migration**)。被花費上限擋的呼叫以此 outcome 落一筆(綁 allocation、status 403),與 `rejected_quota_exceeded`(token)可區分。

## 3. 衍生計算(不落表)

| 名稱 | 定義 | 用途 |
|---|---|---|
| `current_month_cost(allocation_id)` | `Σ cost_usd`(成功、本月、coalesce 0)→ `Decimal` | preflight 檢查 + 用量顯示 |
| `is_over_cost_quota(allocation, spent)` | `cap is not None and spent >= cap` | preflight / watcher 共用判斷 |
| `session_running_cost(sess, price)` | realtime 連線進行中累計 = `session_quantity(sess, price.unit) × price.per_unit` | 連線中把關(committed + 此值 ≥ cap) |

## 4. 序列化新增欄(API 輸出)

| 端點 | 新欄 |
|---|---|
| admin 分配(list/detail)+ create/update 輸入 | `quota_cost_usd_per_month`(選填) |
| `/me/usage`、`/me/allocations`、admin 用量每分配 | `cost_used_this_month`(Decimal 字串)+ `quota_cost_usd_per_month` |

## 5. 狀態/流程

- **同步端點**:preflight 階段 `current_month_cost ≥ cap` → reject `cost_quota_exceeded`(403)+ 記一筆 `rejected_cost_quota_exceeded`。
- **realtime**:建立時 preflight 含 cost 檢查;連線中 watcher 每 N 秒 `committed + running ≥ cap` → close(已累計時長落帳,沿用「任何 close 路徑都落帳」)。

---

**Migration 結論**:**`0020` 純加 `allocations.quota_cost_usd_per_month`(nullable)**。token 欄、`call_records` 皆不動(`cost_usd`/`unit`/`quantity` 0019 已就緒);新 outcome 為 VARCHAR enum 值,非 schema 變更。
Loading