during our usage, we notice GPT models get proper cached tokens
but claude models always only do expensive cache_writes but never cache_reads, which means you pay for cache creation and never benefit from it
i'd like a way to log the object that's being sent to the upstream API to be LOGGED somehow somewhere so i can analyze what is being sent from the computer server.
Claude models are very very strict on caching. The entire thing needs to be fully immutable. if anything is altered, including user message itself, you will break cache hit on subsequent prompts.
during our usage, we notice GPT models get proper cached tokens
but claude models always only do expensive cache_writes but never cache_reads, which means you pay for cache creation and never benefit from it
i'd like a way to log the object that's being sent to the upstream API to be LOGGED somehow somewhere so i can analyze what is being sent from the computer server.
Claude models are very very strict on caching. The entire thing needs to be fully immutable. if anything is altered, including user message itself, you will break cache hit on subsequent prompts.