Skip to content

fix: account for cache tokens in channel tests#4979

Open
gtxx3600 wants to merge 1 commit into
QuantumNous:mainfrom
gtxx3600:fix/channel-test-cache-billing
Open

fix: account for cache tokens in channel tests#4979
gtxx3600 wants to merge 1 commit into
QuantumNous:mainfrom
gtxx3600:fix/channel-test-cache-billing

Conversation

@gtxx3600
Copy link
Copy Markdown

@gtxx3600 gtxx3600 commented May 19, 2026

⚠️ 提交说明 / PR Notice

Important

  • 请提供人工撰写的简洁摘要,避免直接粘贴未经整理的 AI 输出。

📝 变更描述 / Description

(简述:做了什么?为什么这样改能生效?请基于你对代码逻辑的理解来写,避免粘贴未经整理的内容)

渠道模型测试的非阶梯计费原先使用了一套简化公式,只计算普通输入和输出 tokens,没有复用正式请求链路中的文本计费逻辑,因此遇到 cache read/write tokens 时会漏算缓存部分。这个改动让渠道测试结算复用正常文本请求的 quota 计算,并补充缓存 token 语义推断和回归测试,避免测试日志中的消耗金额与实际模型用量不一致。

🚀 变更类型 / Type of change

  • 🐛 Bug 修复 (Bug fix) - 请关联对应 Issue,避免将设计取舍、理解偏差或预期不一致直接归类为 bug
  • ✨ 新功能 (New feature) - 重大特性建议先通过 Issue 沟通
  • ⚡ 性能优化 / 重构 (Refactor)
  • 📝 文档更新 (Documentation)

🔗 关联任务 / Related Issue

  • Closes # (如有)

✅ 提交前检查项 / Checklist

  • 人工确认: 我已亲自整理并撰写此描述,没有直接粘贴未经处理的 AI 输出。
  • 非重复提交: 我已搜索现有的 IssuesPRs,确认不是重复提交。
  • Bug fix 说明: 若此 PR 标记为 Bug fix,我已提交或关联对应 Issue,且不会将设计取舍、预期不一致或理解偏差直接归类为 bug。
  • 变更理解: 我已理解这些更改的工作原理及可能影响。
  • 范围聚焦: 本 PR 未包含任何与当前任务无关的代码改动。
  • 本地验证: 已在本地运行并通过测试或手动验证,维护者可以据此复核结果。
  • 安全合规: 代码中无敏感凭据,且符合项目代码规范。

📸 运行证明 / Proof of Work

(请在此粘贴截图、关键日志或测试报告,以证明变更生效)

已通过本地回归测试:

docker run --rm -v /Users/dave/workspace/new-api:/workspace -w /workspace golang:1.25.1 go test ./service ./controller -count=1

Summary by CodeRabbit

  • Bug Fixes

    • Quota calculations now correctly include cached input tokens, ensuring accurate usage metrics when leveraging token cache functionality.
  • New Features

    • Enhanced support for Anthropic Claude models with automatic detection of cache-separated token usage patterns.
  • Tests

    • Added tests validating cached token handling in quota calculations.

Review Change Stack

Reuse normal text quota calculation for channel model tests so cache read/write tokens and other text quota adjustments are included. Also infer separated cache usage when compatible upstreams report cache tokens outside prompt tokens.

Co-authored-by: Codex <noreply@openai.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

Walkthrough

This PR extends quota calculation to recognize and correctly bill Anthropic-format separated cache-read tokens. It adds cache-detection logic, exposes a new public CalculateTextQuotaForUsage helper, and refactors the test quota handler to use this helper while adding test coverage for cache-inclusive quota computation.

Changes

Cache-aware quota calculation and integration

Layer / File(s) Summary
Cache detection and quota calculation helper
service/text_quota.go, service/text_quota_test.go
New usageLooksLikeSeparatedInputCache helper detects when cached prompt tokens exceed regular prompt tokens (Claude separated cache format). usageSemanticFromUsage now infers "anthropic" semantic from this detection. New exported CalculateTextQuotaForUsage returns computed quota directly without logging, delegating to calculateTextQuotaSummary. Test validates inferred semantic flags and quota computation for separated cache usage.
Test quota handler integration and validation
controller/channel-test.go, controller/channel_test_internal_test.go
settleTestQuota refactored to use the new CalculateTextQuotaForUsage helper instead of inline quota math, eliminating manual price/ratio calculations. time import added. New test TestSettleTestQuotaIncludesCacheReadTokens validates that cache-read tokens are correctly included in quota computation with expected value assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • QuantumNous/new-api#3438: Adjusts quota accounting by guarding OpenRouter-specific handling with IsClaudeUsageSemantic flag that this PR sets during separated-cache detection.
  • QuantumNous/new-api#2798: Populates cached token fields for Gemini via cache ratios—those tokens are now correctly detected and billed by this PR's separated-cache inference logic.
  • QuantumNous/new-api#2811: Ensures cached input tokens are properly represented so downstream quota calculation can detect separated cache-read usage patterns.

Suggested reviewers

  • seefs001
  • Calcium-Ion

Poem

🐰 Hops through the cache, tokens aligned,
Quota logic finds the hidden kind—
Anthropic reads now count just right,
Billing shines with separated light!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and specifically describes the main change: accounting for cache tokens in channel tests by reusing normal quota calculation logic.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gtxx3600 gtxx3600 changed the title Fix channel test cache token billing fix: account for cache tokens in channel tests May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant