Skip to content

prompt_tokens reports only non-cached tokens instead of total input tokens (breaks OpenAI API standard) #391

@gigipl

Description

@gigipl

Is it a request payload issue?
[ ] Yes, this is a request payload issue. I am using a client/cURL to send a request payload, but I received an unexpected error.
[x] No, it's another issue.

Describe the bug

CLI Proxy API Plus reports prompt_tokens in the usage object as only non-cached tokens, while cached_tokens is reported separately. This breaks the OpenAI API standard, where prompt_tokens should represent the total number of input tokens (including cached ones), and cached_tokens should be a subset of prompt_tokens.

This causes downstream cost-tracking systems (LiteLLM, Langfuse, etc.) to calculate negative costs, because they subtract cached_tokens from prompt_tokens to determine non-cached token count.

Current (incorrect) behavior:

"usage": {
  "prompt_tokens": 1678,
  "completion_tokens": 370,
  "total_tokens": 2048,
  "prompt_tokens_details": {
    "cached_tokens": 105530
  }
}

Cost calculation by LiteLLM:
(prompt_tokens - cached_tokens) × price = (1678 - 105530) × $0.000005 = -$0.519

Expected behavior (OpenAI API standard):

"usage": {
  "prompt_tokens": 107208,
  "completion_tokens": 370,
  "total_tokens": 107578,
  "prompt_tokens_details": {
    "cached_tokens": 105530
  }
}

Where prompt_tokens = non-cached (1678) + cached (105530) = 107208

Cost calculation:
(107208 - 105530) × $0.000005 + 105530 × $0.0000005 = $0.061

Reference: https://platform.openai.com/docs/api-reference/chat/object

CLI Type
claude code (openai-compatibility)

Model Name
claude-opus-4-6, claude-sonnet-4-6, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, maybe other (not tested)

LLM Client
LiteLLM Proxy (v1.81.12) + Open WebUI + claude code

Request Information
Extracted from LiteLLM SpendLogs metadata:

{
  "usage_object": {
    "total_tokens": 2048,
    "prompt_tokens": 1678,
    "completion_tokens": 370,
    "prompt_tokens_details": {
      "cached_tokens": 105530
    }
  }
}

Note: cached_tokens (105530) > prompt_tokens (1678), which is impossible under the OpenAI standard.

Expected behavior
prompt_tokens should include cached tokens in its count (as per OpenAI API standard). cached_tokens in prompt_tokens_details should be a subset of prompt_tokens, not an additional separate count.

Screenshots
LiteLLM dashboard showing negative spend due to this issue:

Image

OS Type

  • OS: Linux (Ubuntu)
  • Docker image: eceasy/cli-proxy-api-plus:latest

Additional context
This issue causes all cost-tracking tools that follow the OpenAI usage format to produce negative cost calculations. The only workaround currently is to disable cache-aware cost tracking entirely in LiteLLM, which results in less accurate cost reporting.

Multiple models are affected — the issue is consistent across all Anthropic models proxied through CLI Proxy API Plus.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions