Bug Description
When passing custom skills in the container parameter, prompt caching fails on the message prefix. The skills list is identical on every request (same skill_ids, same types), yet cache_read_input_tokens shows the message-level cache is never read. The same request WITHOUT skills in the container (container={"skills": []}) caches correctly.
Evidence
| Scenario |
Turn |
Request ID |
cache_read |
cache_write |
Result |
container={"skills": []} |
1 |
req_011CZrmGrt9icQRYZk43zYuG |
0 |
35301 |
Expected (cold) |
container={"skills": []} |
2 |
req_011CZrmH1kTHygE588SePDwg |
35301 |
155 |
HIT - reads full prefix |
container={"skills": [8 custom skills]} |
1 |
req_011CZrhUg87qqzAC5uGD67GJ |
0 |
37097 |
Expected (cold) |
container={"skills": [8 custom skills]} |
2 |
req_011CZrhV5bVzxmFwvRYRAfvD |
0 |
37327 |
MISS - should have read ~37097 |
With skills present, Turn 2 reads cache_read_input_tokens=0 when it should read the full prefix from Turn 1. Without skills, caching works correctly.
Additional details
- Removing
"version": "latest" from skills config does not help
- The skills list is a compile-time constant (nothing changes between requests)
- 5-minute TTL, ~37k tokens in both scenarios
- With skills, Turn 2 does show
read=33852 at the system+tools level but write=3448 on messages, suggesting the resolved skills content differs between requests
Hypothesis
The API resolves custom skills server-side on each request, and this resolution produces non-deterministic output (different serialization, internal metadata, ordering, etc.) that changes the cache prefix hash even though the client-side input is identical.
Impact
Any API user with custom skills cannot benefit from message-level prompt caching. This significantly increases costs and latency for multi-turn conversations using skills.
Environment
- Package:
anthropic (Python SDK)
- Model:
claude-sonnet-4-6
- Betas:
compact-2026-01-12, files-api-2025-04-14, skills-2025-10-02
- Method:
client.beta.messages.stream()
Bug Description
When passing custom skills in the
containerparameter, prompt caching fails on the message prefix. The skills list is identical on every request (same skill_ids, same types), yetcache_read_input_tokensshows the message-level cache is never read. The same request WITHOUT skills in the container (container={"skills": []}) caches correctly.Evidence
container={"skills": []}req_011CZrmGrt9icQRYZk43zYuGcontainer={"skills": []}req_011CZrmH1kTHygE588SePDwgcontainer={"skills": [8 custom skills]}req_011CZrhUg87qqzAC5uGD67GJcontainer={"skills": [8 custom skills]}req_011CZrhV5bVzxmFwvRYRAfvDWith skills present, Turn 2 reads
cache_read_input_tokens=0when it should read the full prefix from Turn 1. Without skills, caching works correctly.Additional details
"version": "latest"from skills config does not helpread=33852at the system+tools level butwrite=3448on messages, suggesting the resolved skills content differs between requestsHypothesis
The API resolves custom skills server-side on each request, and this resolution produces non-deterministic output (different serialization, internal metadata, ordering, etc.) that changes the cache prefix hash even though the client-side input is identical.
Impact
Any API user with custom skills cannot benefit from message-level prompt caching. This significantly increases costs and latency for multi-turn conversations using skills.
Environment
anthropic(Python SDK)claude-sonnet-4-6compact-2026-01-12,files-api-2025-04-14,skills-2025-10-02client.beta.messages.stream()