fix(codex): restore prompt cache continuity for Codex requests#2374
fix(codex): restore prompt cache continuity for Codex requests#2374luispater merged 8 commits intorouter-for-me:devfrom
Conversation
Prompt caching on Codex was not reliably reusable through the proxy because repeated chat-completions requests could reach the upstream without the same continuity envelope. In practice this showed up most clearly with OpenCode, where cache reads worked in the reference client but not through CLIProxyAPI, although the root cause is broader than OpenCode itself. The proxy was breaking continuity in several ways: executor-layer Codex request preparation stripped prompt_cache_retention, chat-completions translation did not preserve that field, continuity headers used a different shape than the working client behavior, and OpenAI-style Codex requests could be sent without a stable prompt_cache_key. When that happened, session_id fell back to a fresh random value per request, so upstream Codex treated repeated requests as unrelated turns instead of as part of the same cacheable context. This change fixes that by preserving caller-provided prompt_cache_retention on Codex execution paths, preserving prompt_cache_retention when translating OpenAI chat-completions requests to Codex, aligning Codex continuity headers to session_id, and introducing an explicit Codex continuity policy that derives a stable continuity key from the best available signal. The resolution order prefers an explicit prompt_cache_key, then execution session metadata, then an explicit idempotency key, then stable request-affinity metadata, then a stable client-principal hash, and finally a stable auth-ID hash when no better continuity signal exists. The same continuity key is applied to both prompt_cache_key in the request body and session_id in the request headers so repeated requests reuse the same upstream cache/session identity. The auth manager also keeps auth selection sticky for repeated request sequences, preventing otherwise-equivalent Codex requests from drifting across different upstream auth contexts and accidentally breaking cache reuse. To keep the implementation maintainable, the continuity resolution and diagnostics are centralized in a dedicated Codex continuity helper instead of being scattered across executor flow code. Regression coverage now verifies retention preservation, continuity-key precedence, stable auth-ID fallback, websocket parity, translator preservation, and auth-affinity behavior. Manual validation confirmed prompt cache reads now occur through CLIProxyAPI when using Codex via OpenCode, and the fix should also benefit other clients that rely on stable repeated Codex request continuity.
There was a problem hiding this comment.
Code Review
This pull request introduces 'Auth Affinity' and 'Codex Continuity' to improve request routing and prompt cache management. Auth Affinity allows the system to pin a specific authentication identity to a session or idempotency key, while Codex Continuity provides a unified way to resolve and apply prompt cache keys across HTTP and WebSocket executors. Feedback suggests aligning the WebSocket continuity logic with the HTTP implementation, ensuring stable string representations for API keys used in metadata, and refactoring repetitive logic for metadata extraction and affinity application to improve maintainability.
| if cache.ID != "" { | ||
| rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID) | ||
| headers.Set("Conversation_id", cache.ID) | ||
| headers.Set("Session_id", cache.ID) | ||
| headers.Set("session_id", cache.ID) | ||
| } |
There was a problem hiding this comment.
The continuity logic for websockets in applyCodexPromptCacheHeaders doesn't seem to be fully aligned with the HTTP path in codex_executor.go. The cacheHelper function was updated to use the new resolveCodexContinuity helper for openai format requests, but applyCodexPromptCacheHeaders was not. To ensure consistent behavior for prompt cache continuity, consider updating this function to also use resolveCodexContinuity for openai requests. This would also allow you to correctly populate the Source field of the codexContinuity struct for logging.
sdk/api/handlers/handlers.go
Outdated
| if apiKey, exists := ginCtx.Get("apiKey"); exists && apiKey != nil { | ||
| meta[authAffinityMetadataKey] = fmt.Sprintf("principal:%v", apiKey) | ||
| } |
There was a problem hiding this comment.
Using fmt.Sprintf("principal:%v", apiKey) to generate authAffinityMetadataKey can be risky if apiKey is a pointer or a struct, as the string representation might not be stable across requests, leading to auth affinity failures. The value of auth_affinity_key is used as a map key in conductor.go and needs to be consistent.
Consider handling the apiKey value with type switches to ensure a stable string representation, similar to how it's handled in resolveCodexContinuity for client_principal.
if apiKey, exists := ginCtx.Get("apiKey"); exists && apiKey != nil {
var keyStr string
switch v := apiKey.(type) {
case string:
keyStr = v
case fmt.Stringer:
keyStr = v.String()
default:
// Using %v for other types can lead to unstable keys if they are pointers or maps.
// This should be handled carefully.
keyStr = fmt.Sprintf("%v", v)
}
if trimmedKey := strings.TrimSpace(keyStr); trimmedKey != "" {
meta[authAffinityMetadataKey] = "principal:" + trimmedKey
}
}| if v, exists := ginCtx.Get("apiKey"); exists && v != nil { | ||
| switch value := v.(type) { | ||
| case string: | ||
| if trimmed := strings.TrimSpace(value); trimmed != "" { | ||
| return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"} | ||
| } | ||
| case fmt.Stringer: | ||
| if trimmed := strings.TrimSpace(value.String()); trimmed != "" { | ||
| return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"} | ||
| } | ||
| default: | ||
| trimmed := strings.TrimSpace(fmt.Sprintf("%v", value)) | ||
| if trimmed != "" { | ||
| return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"} | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for handling apiKey from the gin context is repetitive across the string, fmt.Stringer, and default cases in the switch statement. This can be refactored to reduce code duplication and improve maintainability. You can extract the string value first and then perform the check and hashing logic once.
if v, exists := ginCtx.Get("apiKey"); exists && v != nil {
var keyStr string
switch value := v.(type) {
case string:
keyStr = value
case fmt.Stringer:
keyStr = value.String()
default:
keyStr = fmt.Sprintf("%v", value)
}
if trimmed := strings.TrimSpace(keyStr); trimmed != "" {
return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"}
}
}| if opts.Metadata != nil { | ||
| if raw, ok := opts.Metadata[cliproxyexecutor.SelectedAuthMetadataKey]; ok && raw != nil { | ||
| switch v := raw.(type) { | ||
| case string: | ||
| selectedAuthID = strings.TrimSpace(v) | ||
| case []byte: | ||
| selectedAuthID = strings.TrimSpace(string(v)) | ||
| } | ||
| } | ||
| if raw, ok := opts.Metadata[cliproxyexecutor.ExecutionSessionMetadataKey]; ok && raw != nil { | ||
| switch v := raw.(type) { | ||
| case string: | ||
| executionSessionID = strings.TrimSpace(v) | ||
| case []byte: | ||
| executionSessionID = strings.TrimSpace(string(v)) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for extracting selectedAuthID and executionSessionID from opts.Metadata is duplicated. You could create a helper function to extract a string value from the metadata map, which would handle the type switching and trimming in one place. This would make the code more concise and easier to maintain.
Example helper:
func getStringFromMetadata(meta map[string]any, key string) string {
if raw, ok := meta[key]; ok && raw != nil {
switch v := raw.(type) {
case string:
return strings.TrimSpace(v)
case []byte:
return strings.TrimSpace(string(v))
}
}
return ""
}Then you could use it like this:
selectedAuthID := getStringFromMetadata(opts.Metadata, cliproxyexecutor.SelectedAuthMetadataKey)
executionSessionID := getStringFromMetadata(opts.Metadata, cliproxyexecutor.ExecutionSessionMetadataKey)
sdk/cliproxy/auth/conductor.go
Outdated
| if pinnedAuthID := pinnedAuthIDFromMetadata(opts.Metadata); pinnedAuthID == "" { | ||
| if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" { | ||
| if affinityAuthID := m.AuthAffinity(affinityKey); affinityAuthID != "" { | ||
| meta := opts.Metadata | ||
| if meta == nil { | ||
| meta = make(map[string]any) | ||
| opts.Metadata = meta | ||
| } | ||
| meta[cliproxyexecutor.PinnedAuthMetadataKey] = affinityAuthID | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for applying auth affinity by checking for a pinned auth ID and then resolving affinity from auth_affinity_key is duplicated in pickNext and pickNextMixed. To improve maintainability and reduce code duplication, you could extract this logic into a private helper method on the Manager.
Example helper:
func (m *Manager) applyAuthAffinity(opts *cliproxyexecutor.Options) {
if opts == nil || pinnedAuthIDFromMetadata(opts.Metadata) != "" {
return
}
if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
if affinityAuthID := m.AuthAffinity(affinityKey); affinityAuthID != "" {
if opts.Metadata == nil {
opts.Metadata = make(map[string]any)
}
opts.Metadata[cliproxyexecutor.PinnedAuthMetadataKey] = affinityAuthID
}
}
}Then you can call m.applyAuthAffinity(&opts) at the beginning of pickNext and pickNextMixed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 511b8a992e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if cache.ID != "" { | ||
| rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID) | ||
| } | ||
| rawJSON = applyCodexContinuityBody(rawJSON, continuity) |
There was a problem hiding this comment.
Preserve Claude cache ID when applying continuity
cacheHelper still computes a Claude-scoped cache.ID, but this refactor now applies body/header continuity only through continuity; because the Claude branch never assigns continuity, prompt_cache_key and session_id stop being forwarded for Claude-origin requests. That regresses previously working prompt-cache/session continuity for Claude→Codex flows and causes repeated turns to miss upstream cache reuse.
Useful? React with 👍 / 👎.
sdk/cliproxy/auth/conductor.go
Outdated
| if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" { | ||
| m.SetAuthAffinity(affinityKey, auth.ID) | ||
| if log.IsLevelEnabled(log.DebugLevel) { |
There was a problem hiding this comment.
Pin auth affinity only after a successful execution
Affinity is written before the selected auth is proven healthy, so a transient failure on that first attempt can lock the request key to a bad auth ID; on the next pick, the pinned ID is forced while tried already contains it, which short-circuits fallback to other available credentials and returns failure. This makes retry behavior worse exactly when one credential is flaky and others could succeed.
Useful? React with 👍 / 👎.
Align websocket continuity resolution with the HTTP Codex path, make auth-affinity principal keys use a stable string representation, and extract small helpers that remove duplicated continuity and affinity logic without changing the validated cache-hit behavior.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62b17f40a1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| continuity = resolveCodexContinuity(ctx, auth, req, opts) | ||
| cache.ID = continuity.Key | ||
| } | ||
|
|
||
| if cache.ID != "" { | ||
| rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID) | ||
| } | ||
| rawJSON = applyCodexContinuityBody(rawJSON, continuity) |
There was a problem hiding this comment.
Carry Claude cache ID into continuity before applying headers
The Claude branch still computes cache.ID, but only continuity is applied to prompt_cache_key/session_id now. Because continuity is never set for from == "claude", this path silently drops continuity fields that were previously sent, so repeated Claude→Codex requests lose cache/session reuse. The same pattern is also present in websocket continuity handling (applyCodexPromptCacheHeaders).
Useful? React with 👍 / 👎.
| if affinityKey := metadataString(opts.Metadata, codexAuthAffinityMetadataKey); affinityKey != "" { | ||
| return codexContinuity{Key: affinityKey, Source: "auth_affinity"} |
There was a problem hiding this comment.
Do not forward auth affinity key directly as continuity ID
Returning auth_affinity_key verbatim as the Codex continuity key can leak sensitive caller identity into upstream session_id/prompt_cache_key. In this commit, handlers populate auth_affinity_key from stablePrincipalMetadataKey(...) (e.g., principal:<apiKey>), so this branch can send raw principal/API-key material upstream instead of the previous SHA1-derived opaque ID.
Useful? React with 👍 / 👎.
sdk/cliproxy/auth/conductor.go
Outdated
| if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" { | ||
| m.SetAuthAffinity(affinityKey, auth.ID) |
There was a problem hiding this comment.
Pin auth affinity only after a successful execution
Affinity is persisted immediately after selection, before the request succeeds. If that auth then errors, the next retry sees a pinned auth via applyAuthAffinity and cannot rotate to other credentials, which defeats multi-auth fallback under failures/rate limits and can return errors even when another auth is healthy.
Useful? React with 👍 / 👎.
Restore Claude continuity after the continuity refactor, keep auth-affinity keys out of upstream Codex session identifiers, and only persist affinity after successful execution so retries can still rotate to healthy credentials when the first auth fails.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26eca8b6ba
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" { | ||
| m.SetAuthAffinity(affinityKey, authID) | ||
| if entry != nil && log.IsLevelEnabled(log.DebugLevel) { | ||
| entry.Debugf("auth affinity pinned key=%s auth_id=%s provider=%s model=%s", affinityKey, authID, provider, model) |
There was a problem hiding this comment.
Avoid logging raw auth affinity keys
persistAuthAffinity logs affinityKey verbatim, but that key can include principal:<api key> from requestExecutionMetadata (ginCtx.Get("apiKey")), and apiKey is populated from result.Principal in AuthMiddleware (which is the raw credential for config-based access). When debug logging is enabled, this emits client secrets into logs, creating a credential disclosure path.
Useful? React with 👍 / 👎.
sdk/api/handlers/handlers.go
Outdated
| } else if explicitIdempotencyKey != "" { | ||
| meta[authAffinityMetadataKey] = explicitIdempotencyKey |
There was a problem hiding this comment.
Stop keying long-lived affinity by Idempotency-Key
This stores the request's explicit Idempotency-Key as auth_affinity_key, which is then persisted in Manager.affinity on successful executions. Because idempotency keys are commonly unique per operation and the new affinity map has no eviction path in production code, sustained traffic with unique keys will grow affinity state without bound and eventually increase memory usage.
Useful? React with 👍 / 👎.
Stop using one-shot idempotency keys as long-lived auth-affinity identifiers and remove raw affinity-key values from debug logs so sticky routing keeps its continuity benefits without creating avoidable memory growth or credential exposure risks.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c4cbd44da
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" { | ||
| m.SetAuthAffinity(affinityKey, authID) |
There was a problem hiding this comment.
Scope persisted auth affinity by provider
persistAuthAffinity saves a single global mapping from auth_affinity_key to authID, but the key built in request metadata is caller-scoped (e.g., principal:<apiKey>) rather than provider-scoped. After one successful request, applyAuthAffinity will pin that auth for later requests even when they target a different provider, and scheduler selection treats pinned IDs as strict (returning auth_not_found when provider/auth don’t match), so a client that switches providers can be forced onto an incompatible auth and fail.
Useful? React with 👍 / 👎.
sdk/api/handlers/handlers.go
Outdated
| meta[coreexecutor.ExecutionSessionMetadataKey] = executionSessionID | ||
| meta[authAffinityMetadataKey] = executionSessionID |
There was a problem hiding this comment.
Avoid leaking affinity entries for closed execution sessions
This adds execution_session_id as auth_affinity_key, which is unique per websocket passthrough session, but the manager’s affinity map has no lifecycle tie-in to session closure. Because successful requests persist these keys and CloseExecutionSession does not clear them, every completed websocket session leaves a stale map entry behind, causing unbounded in-memory growth over time under normal streaming traffic.
Useful? React with 👍 / 👎.
Keep sticky auth affinity limited to matching providers and stop persisting execution-session IDs as long-lived affinity keys so provider switching and normal streaming traffic do not create incorrect pins or stale affinity state.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6962e09dd9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
sdk/cliproxy/auth/conductor.go
Outdated
| key = strings.TrimSpace(key) | ||
| if provider == "" || key == "" { | ||
| return "" | ||
| } | ||
| return provider + "|" + key |
There was a problem hiding this comment.
Scope auth affinity by model to avoid cross-model lockouts
The affinity map key is currently only provider|principal, so one successful request can pin a principal to an auth for that provider across all models. If that auth does not support a later requested model, the pickers still enforce the pinned ID and return auth_not_found instead of selecting another auth that supports the model. This turns normal multi-model routing into hard failures for affected principals; include model (or model family) in the affinity scope, or fall back when the pinned auth is incompatible.
Useful? React with 👍 / 👎.
sdk/cliproxy/auth/conductor.go
Outdated
| if affinityAuthID := m.AuthAffinity(provider, affinityKey); affinityAuthID != "" { | ||
| if opts.Metadata == nil { | ||
| opts.Metadata = make(map[string]any) | ||
| } | ||
| opts.Metadata[cliproxyexecutor.PinnedAuthMetadataKey] = affinityAuthID |
There was a problem hiding this comment.
Treat affinity as preference instead of hard pin
Writing the affinity result into PinnedAuthMetadataKey makes affinity strict, not sticky-preferred. In the execute*MixedOnce retry loops, once that pinned auth is attempted and added to tried, subsequent retries cannot pick any other credential, so transient failures on one auth (quota/cooldown/network) cause request failure even when other auths are healthy. This regresses credential failover behavior and should be changed to advisory pinning or automatic unpinning on retry.
Useful? React with 👍 / 👎.
Remove the experimental auth-affinity routing changes from this PR so it stays focused on the validated Codex continuity fix. This keeps the prompt-cache repair while avoiding unrelated routing-policy concerns such as provider/model affinity scope, lifecycle cleanup, and hard-pin fallback semantics.
|
@codex please re-review the current PR head only. This PR has been narrowed substantially since the earlier review comments. The auth-affinity routing changes that triggered the later scheduler/pinning concerns have been removed from the branch. The remaining diff is intentionally limited to the Codex continuity/cache fix:
Please evaluate the current head commit as a scoped Codex prompt-cache continuity fix, not the earlier affinity-routing iterations. |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Drop the chat-completions translator edits from this PR so the branch complies with the repository policy that forbids pull-request changes under internal/translator. The remaining PR stays focused on the executor-level Codex continuity fix that was validated to restore cache reuse.
|
@codex please re-review the current head commit only. The PR has been narrowed further to comply with repository policy: all internal/translator changes were removed. The remaining diff is only the executor-level Codex continuity/cache fix. |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
@luispater this makes OpenCode work with OpenAI and cache to hit. It works for me, when compiling the .exe and the originally released .exe does not... |
xkonjin
left a comment
There was a problem hiding this comment.
Automated review focusing on bugs, security, and tests.
Security:
- Potential secret in diff at line 219.
- Potential secret in diff at line 529.
Drop the last affinity-related executor artifacts so the PR stays focused on the minimal Codex continuity fix set: stable prompt cache identity, stable session_id, and the executor-only behavior that was validated to restore cache reads.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e5d3541b5a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if v := strings.TrimSpace(ginCtx.GetHeader("Idempotency-Key")); v != "" { | ||
| return codexContinuity{Key: v, Source: "idempotency_key"} |
There was a problem hiding this comment.
Avoid request-scoped Idempotency-Key as primary continuity key
In resolveCodexContinuity, Idempotency-Key is chosen before stable identities (client_principal / auth_id), so any client that sends a different idempotency key per request will get a different prompt_cache_key/session_id on every turn. That breaks the continuity this change is trying to restore (cache/session reuse across turns) and can regress performance for common retry-safe clients that auto-generate per-request idempotency keys. Consider demoting this source below stable caller/auth-derived keys (or only using it for explicit retry contexts).
Useful? React with 👍 / 👎.
|
@luispater I made it work for myself, but I wanted it to work for all the other people. I can't keep battling these agents forever, cause they will ALWAYS find something to latch onto, as they are inherently non-deterministic |
luispater
left a comment
There was a problem hiding this comment.
Summary:
This PR cleanly addresses Codex prompt-cache continuity by preserving prompt_cache_retention, centralizing continuity resolution, and consistently applying continuity to prompt_cache_key + session_id across HTTP and websocket paths.
Key findings:
- Blocking: none.
- Non-blocking: continuity behavior now depends on precedence logic (
prompt_cache_key-> execution session -> idempotency key -> client principal -> auth id). This is reasonable and covered by targeted unit tests, but production traffic diversity remains the main validation surface.
Test plan:
- Reviewed changed files and continuity flow end-to-end in executor/websocket paths.
- Ran:
go test ./internal/runtime/executor(pass).
Follow-ups (optional):
- Consider adding an integration-style test fixture that simulates repeated mixed-source requests (HTTP + websocket) with/without gin context to further harden precedence behavior.
This is an automated Codex review result and still requires manual verification by a human reviewer.
Summary
This PR fixes Codex prompt cache continuity through CLIProxyAPI.
While this PR fixes #2373 directly, it also addresses the same underlying class of continuity problems discussed in #1962 and #1998, and is adjacent to the Codex session/account pinning issues in #2230 and #2256.
In the broken state, repeated Codex requests could reach the upstream without the same continuity envelope, so upstream prompt caching did not reliably reuse prior prompt state. The most visible repro was with OpenCode using Codex through CLIProxyAPI, where cache reads worked in the reference client but not through the proxy. The fix is not specific to OpenCode, but that integration made the failure mode easy to validate.
At a high level, this PR:
prompt_cache_retentionon Codex request paths instead of stripping itprompt_cache_retentionwhen translating OpenAI Chat Completions requests to Codexsession_idWhy this change is needed
Repeated Codex requests were missing the stable continuity signals needed for upstream cache reuse.
Observed failure modes included:
prompt_cache_retentionprompt_cache_retentionprompt_cache_keysession_idfell back to a fresh random UUID per requestThe combined effect was that repeated requests looked like unrelated turns to the upstream.
What this PR changes
prompt_cache_retentionon Codex HTTP, stream, and websocket request preparation pathsprompt_cache_retentionin the OpenAI Chat Completions -> Codex translatorsession_idprompt_cache_keyIdempotency-Keyprompt_cache_keysession_idValidation result
This change was validated in both tests and live runtime behavior.
Expected outcome after the fix:
session_idis stable across repeated requests in the same continuity contextTests run
go test ./sdk/cliproxy/auth ./internal/runtime/executor ./internal/translator/codex/openai/responses ./internal/translator/codex/openai/chat-completionsResult:
Built artifact
Detailed root cause analysis
The investigation showed that the problem was not just
prompt_cache_retention.The more important issue was that repeated Codex requests did not share a stable continuity identity when forwarded by the proxy. In the failing path:
prompt_cache_keycould be absentsession_idcould change on every requestThis meant that even identical prompts could be treated by the upstream as unrelated requests.
The final working behavior required preserving a continuity envelope that includes:
session_idprompt_cache_keyThe fix encoded that continuity policy explicitly instead of relying on incidental request context.
Implementation notes for gatekeeper review
This PR intentionally keeps the scope focused on Codex continuity and avoids broader redesign.
Architectural choices:
prompt_cache_retentionThe goal was to fix the real bug with the smallest behaviorally complete change set, not to redesign request routing or provider architecture.
Behavioral details of continuity resolution
For OpenAI-style requests forwarded to Codex, continuity resolution now prefers the strongest available stable signal in this order:
prompt_cache_keyIdempotency-KeyThe chosen continuity key is then applied to:
prompt_cache_keyin the request bodysession_idin the request headersThis ensures repeated requests reuse the same upstream continuity identity even when the original caller does not provide all continuity fields directly.
Runtime validation outcome
Manual runtime validation during debugging confirmed the pre-fix and post-fix behavior difference.
Before the fix, repeated Codex requests showed continuity drift such as:
prompt_cache_keysession_idAfter the fix, repeated requests in the validated scenario reused stable continuity values and upstream cache reads started working through CLIProxyAPI.
This was specifically confirmed in the OpenCode -> CLIProxyAPI -> Codex flow, which previously missed prompt cache reuse.
Risk assessment
This PR does change request continuity behavior for Codex, so it is not zero-risk, but the risk is bounded and justified.
Main considerations:
Despite that, the changes are targeted, test-covered, and constrained to the exact area that was failing.
Out of scope
This PR does not attempt to: