-
-
Notifications
You must be signed in to change notification settings - Fork 277
Description
Is it a request payload issue?
[ ] Yes, this is a request payload issue. I am using a client/cURL to send a request payload, but I received an unexpected error.
[x] No, it's another issue.
Describe the bug
CLI Proxy API Plus reports prompt_tokens in the usage object as only non-cached tokens, while cached_tokens is reported separately. This breaks the OpenAI API standard, where prompt_tokens should represent the total number of input tokens (including cached ones), and cached_tokens should be a subset of prompt_tokens.
This causes downstream cost-tracking systems (LiteLLM, Langfuse, etc.) to calculate negative costs, because they subtract cached_tokens from prompt_tokens to determine non-cached token count.
Current (incorrect) behavior:
"usage": {
"prompt_tokens": 1678,
"completion_tokens": 370,
"total_tokens": 2048,
"prompt_tokens_details": {
"cached_tokens": 105530
}
}Cost calculation by LiteLLM:
(prompt_tokens - cached_tokens) × price = (1678 - 105530) × $0.000005 = -$0.519 ❌
Expected behavior (OpenAI API standard):
"usage": {
"prompt_tokens": 107208,
"completion_tokens": 370,
"total_tokens": 107578,
"prompt_tokens_details": {
"cached_tokens": 105530
}
}Where prompt_tokens = non-cached (1678) + cached (105530) = 107208
Cost calculation:
(107208 - 105530) × $0.000005 + 105530 × $0.0000005 = $0.061 ✅
Reference: https://platform.openai.com/docs/api-reference/chat/object
CLI Type
claude code (openai-compatibility)
Model Name
claude-opus-4-6, claude-sonnet-4-6, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, maybe other (not tested)
LLM Client
LiteLLM Proxy (v1.81.12) + Open WebUI + claude code
Request Information
Extracted from LiteLLM SpendLogs metadata:
{
"usage_object": {
"total_tokens": 2048,
"prompt_tokens": 1678,
"completion_tokens": 370,
"prompt_tokens_details": {
"cached_tokens": 105530
}
}
}Note: cached_tokens (105530) > prompt_tokens (1678), which is impossible under the OpenAI standard.
Expected behavior
prompt_tokens should include cached tokens in its count (as per OpenAI API standard). cached_tokens in prompt_tokens_details should be a subset of prompt_tokens, not an additional separate count.
Screenshots
LiteLLM dashboard showing negative spend due to this issue:
OS Type
- OS: Linux (Ubuntu)
- Docker image:
eceasy/cli-proxy-api-plus:latest
Additional context
This issue causes all cost-tracking tools that follow the OpenAI usage format to produce negative cost calculations. The only workaround currently is to disable cache-aware cost tracking entirely in LiteLLM, which results in less accurate cost reporting.
Multiple models are affected — the issue is consistent across all Anthropic models proxied through CLI Proxy API Plus.