prompt_tokens reports only non-cached tokens instead of total input tokens (breaks OpenAI API standard)

**Is it a request payload issue?**
[ ] Yes, this is a request payload issue. I am using a client/cURL to send a request payload, but I received an unexpected error.
[x] No, it's another issue.

**Describe the bug**

CLI Proxy API Plus reports `prompt_tokens` in the usage object as **only non-cached tokens**, while `cached_tokens` is reported separately. This breaks the OpenAI API standard, where `prompt_tokens` should represent the **total** number of input tokens (including cached ones), and `cached_tokens` should be a subset of `prompt_tokens`.

This causes downstream cost-tracking systems (LiteLLM, Langfuse, etc.) to calculate **negative costs**, because they subtract `cached_tokens` from `prompt_tokens` to determine non-cached token count.

**Current (incorrect) behavior:**

```json
"usage": {
  "prompt_tokens": 1678,
  "completion_tokens": 370,
  "total_tokens": 2048,
  "prompt_tokens_details": {
    "cached_tokens": 105530
  }
}
```

Cost calculation by LiteLLM:
`(prompt_tokens - cached_tokens) × price = (1678 - 105530) × $0.000005 = -$0.519` ❌

**Expected behavior (OpenAI API standard):**

```json
"usage": {
  "prompt_tokens": 107208,
  "completion_tokens": 370,
  "total_tokens": 107578,
  "prompt_tokens_details": {
    "cached_tokens": 105530
  }
}
```

Where `prompt_tokens` = non-cached (1678) + cached (105530) = **107208**

Cost calculation:
`(107208 - 105530) × $0.000005 + 105530 × $0.0000005 = $0.061` ✅

Reference: https://platform.openai.com/docs/api-reference/chat/object

**CLI Type**
claude code (openai-compatibility)

**Model Name**
claude-opus-4-6, claude-sonnet-4-6, claude-sonnet-4-5-20250929, claude-haiku-4-5-20251001, maybe other (not tested)

**LLM Client**
LiteLLM Proxy (v1.81.12) + Open WebUI + claude code

**Request Information**
Extracted from LiteLLM SpendLogs metadata:

```json
{
  "usage_object": {
    "total_tokens": 2048,
    "prompt_tokens": 1678,
    "completion_tokens": 370,
    "prompt_tokens_details": {
      "cached_tokens": 105530
    }
  }
}
```

Note: `cached_tokens` (105530) > `prompt_tokens` (1678), which is impossible under the OpenAI standard.

**Expected behavior**
`prompt_tokens` should include cached tokens in its count (as per OpenAI API standard). `cached_tokens` in `prompt_tokens_details` should be a subset of `prompt_tokens`, not an additional separate count.

**Screenshots**
LiteLLM dashboard showing negative spend due to this issue:

<img width="314" height="220" alt="Image" src="https://github.com/user-attachments/assets/b3086789-2507-4e92-a74d-4df861729e6d" />

**OS Type**
- OS: Linux (Ubuntu)
- Docker image: `eceasy/cli-proxy-api-plus:latest`

**Additional context**
This issue causes all cost-tracking tools that follow the OpenAI usage format to produce negative cost calculations. The only workaround currently is to disable cache-aware cost tracking entirely in LiteLLM, which results in less accurate cost reporting.

Multiple models are affected — the issue is consistent across all Anthropic models proxied through CLI Proxy API Plus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

prompt_tokens reports only non-cached tokens instead of total input tokens (breaks OpenAI API standard) #391

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

prompt_tokens reports only non-cached tokens instead of total input tokens (breaks OpenAI API standard) #391

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions