Skip to content

fix(codex): restore prompt cache continuity for Codex requests#2374

Merged
luispater merged 8 commits intorouter-for-me:devfrom
VooDisss:codex-cache-clean
Mar 28, 2026
Merged

fix(codex): restore prompt cache continuity for Codex requests#2374
luispater merged 8 commits intorouter-for-me:devfrom
VooDisss:codex-cache-clean

Conversation

@VooDisss
Copy link
Copy Markdown
Contributor

Summary

This PR fixes Codex prompt cache continuity through CLIProxyAPI.

While this PR fixes #2373 directly, it also addresses the same underlying class of continuity problems discussed in #1962 and #1998, and is adjacent to the Codex session/account pinning issues in #2230 and #2256.

In the broken state, repeated Codex requests could reach the upstream without the same continuity envelope, so upstream prompt caching did not reliably reuse prior prompt state. The most visible repro was with OpenCode using Codex through CLIProxyAPI, where cache reads worked in the reference client but not through the proxy. The fix is not specific to OpenCode, but that integration made the failure mode easy to validate.

At a high level, this PR:

  • preserves prompt_cache_retention on Codex request paths instead of stripping it
  • preserves prompt_cache_retention when translating OpenAI Chat Completions requests to Codex
  • aligns Codex continuity headers to session_id
  • introduces explicit Codex continuity resolution so repeated requests reuse a stable upstream cache/session identity
  • keeps auth selection sticky for repeated request sequences so continuity is not accidentally broken by auth rotation
  • adds targeted regression tests covering the continuity policy and retention preservation

Why this change is needed

Repeated Codex requests were missing the stable continuity signals needed for upstream cache reuse.

Observed failure modes included:

  • executor-layer Codex request preparation removed prompt_cache_retention
  • Chat Completions -> Codex translation dropped prompt_cache_retention
  • Codex request preparation used header shapes that did not match the working client behavior
  • OpenAI-style Codex requests could be sent without a stable prompt_cache_key
  • when no stable continuity key existed, session_id fell back to a fresh random UUID per request
  • auth selection could drift across repeated requests, which changes upstream identity and can also break cache reuse

The combined effect was that repeated requests looked like unrelated turns to the upstream.

What this PR changes

  • preserves caller-provided prompt_cache_retention on Codex HTTP, stream, and websocket request preparation paths
  • preserves prompt_cache_retention in the OpenAI Chat Completions -> Codex translator
  • applies Codex continuity using exact session_id
  • centralizes continuity resolution in a dedicated helper instead of scattering the policy across executor flow code
  • derives a stable continuity key with explicit precedence:
    • request prompt_cache_key
    • execution session metadata
    • explicit Idempotency-Key
    • request/auth affinity key
    • stable client-principal hash
    • stable auth-ID hash
  • applies the chosen continuity key to both:
    • request body prompt_cache_key
    • request header session_id
  • adds sticky auth affinity so repeated request sequences reuse the same Codex auth when possible
  • keeps websocket behavior aligned with the same continuity model

Validation result

This change was validated in both tests and live runtime behavior.

Expected outcome after the fix:

  • repeated Codex requests sent through CLIProxyAPI can reuse upstream cache continuity instead of behaving like fresh turns
  • session_id is stable across repeated requests in the same continuity context
  • prompt cache reads now occur through CLIProxyAPI in the previously failing OpenCode + Codex scenario

Tests run

  • go test ./sdk/cliproxy/auth ./internal/runtime/executor ./internal/translator/codex/openai/responses ./internal/translator/codex/openai/chat-completions

Result:

  • all targeted packages passed

Built artifact

  • Windows executable was built successfully for manual verification during development
Detailed root cause analysis

The investigation showed that the problem was not just prompt_cache_retention.

The more important issue was that repeated Codex requests did not share a stable continuity identity when forwarded by the proxy. In the failing path:

  • prompt_cache_key could be absent
  • session_id could change on every request
  • auth/account selection could rotate across repeated requests

This meant that even identical prompts could be treated by the upstream as unrelated requests.

The final working behavior required preserving a continuity envelope that includes:

  • stable auth/account context
  • stable session_id
  • stable prompt_cache_key
  • stable Codex-compatible request defaults

The fix encoded that continuity policy explicitly instead of relying on incidental request context.

Implementation notes for gatekeeper review

This PR intentionally keeps the scope focused on Codex continuity and avoids broader redesign.

Architectural choices:

  • continuity policy was extracted into a dedicated helper so executor flow remains orchestration-focused
  • auth affinity was added only because auth rotation was a real, observed cache-breaking failure mode
  • translator changes are intentionally minimal and limited to preserving prompt_cache_retention
  • websocket behavior was kept aligned with the same continuity semantics

The goal was to fix the real bug with the smallest behaviorally complete change set, not to redesign request routing or provider architecture.

Behavioral details of continuity resolution

For OpenAI-style requests forwarded to Codex, continuity resolution now prefers the strongest available stable signal in this order:

  1. explicit prompt_cache_key
  2. execution session metadata
  3. explicit Idempotency-Key
  4. auth/request affinity metadata
  5. stable client-principal-derived hash
  6. stable auth-ID-derived hash

The chosen continuity key is then applied to:

  • prompt_cache_key in the request body
  • session_id in the request headers

This ensures repeated requests reuse the same upstream continuity identity even when the original caller does not provide all continuity fields directly.

Runtime validation outcome

Manual runtime validation during debugging confirmed the pre-fix and post-fix behavior difference.

Before the fix, repeated Codex requests showed continuity drift such as:

  • empty prompt_cache_key
  • changing session_id
  • occasional auth rotation across repeated requests

After the fix, repeated requests in the validated scenario reused stable continuity values and upstream cache reads started working through CLIProxyAPI.

This was specifically confirmed in the OpenCode -> CLIProxyAPI -> Codex flow, which previously missed prompt cache reuse.

Risk assessment

This PR does change request continuity behavior for Codex, so it is not zero-risk, but the risk is bounded and justified.

Main considerations:

  • continuity precedence is partly based on validated runtime behavior rather than a formal upstream contract
  • sticky auth affinity changes repeated-request behavior intentionally, but only to preserve continuity
  • diagnostics were retained because they were useful during validation and make future regressions easier to inspect

Despite that, the changes are targeted, test-covered, and constrained to the exact area that was failing.

Out of scope

This PR does not attempt to:

  • redesign provider routing generally
  • change non-Codex providers
  • redesign translator behavior beyond the necessary Codex retention pass-through
  • change unrelated cache-control semantics

Prompt caching on Codex was not reliably reusable through the proxy because repeated chat-completions requests could reach the upstream without the same continuity envelope. In practice this showed up most clearly with OpenCode, where cache reads worked in the reference client but not through CLIProxyAPI, although the root cause is broader than OpenCode itself.

The proxy was breaking continuity in several ways: executor-layer Codex request preparation stripped prompt_cache_retention, chat-completions translation did not preserve that field, continuity headers used a different shape than the working client behavior, and OpenAI-style Codex requests could be sent without a stable prompt_cache_key. When that happened, session_id fell back to a fresh random value per request, so upstream Codex treated repeated requests as unrelated turns instead of as part of the same cacheable context.

This change fixes that by preserving caller-provided prompt_cache_retention on Codex execution paths, preserving prompt_cache_retention when translating OpenAI chat-completions requests to Codex, aligning Codex continuity headers to session_id, and introducing an explicit Codex continuity policy that derives a stable continuity key from the best available signal. The resolution order prefers an explicit prompt_cache_key, then execution session metadata, then an explicit idempotency key, then stable request-affinity metadata, then a stable client-principal hash, and finally a stable auth-ID hash when no better continuity signal exists.

The same continuity key is applied to both prompt_cache_key in the request body and session_id in the request headers so repeated requests reuse the same upstream cache/session identity. The auth manager also keeps auth selection sticky for repeated request sequences, preventing otherwise-equivalent Codex requests from drifting across different upstream auth contexts and accidentally breaking cache reuse.

To keep the implementation maintainable, the continuity resolution and diagnostics are centralized in a dedicated Codex continuity helper instead of being scattered across executor flow code. Regression coverage now verifies retention preservation, continuity-key precedence, stable auth-ID fallback, websocket parity, translator preservation, and auth-affinity behavior. Manual validation confirmed prompt cache reads now occur through CLIProxyAPI when using Codex via OpenCode, and the fix should also benefit other clients that rely on stable repeated Codex request continuity.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'Auth Affinity' and 'Codex Continuity' to improve request routing and prompt cache management. Auth Affinity allows the system to pin a specific authentication identity to a session or idempotency key, while Codex Continuity provides a unified way to resolve and apply prompt cache keys across HTTP and WebSocket executors. Feedback suggests aligning the WebSocket continuity logic with the HTTP implementation, ensuring stable string representations for API keys used in metadata, and refactoring repetitive logic for metadata extraction and affinity application to improve maintainability.

Comment on lines 794 to 797
if cache.ID != "" {
rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID)
headers.Set("Conversation_id", cache.ID)
headers.Set("Session_id", cache.ID)
headers.Set("session_id", cache.ID)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The continuity logic for websockets in applyCodexPromptCacheHeaders doesn't seem to be fully aligned with the HTTP path in codex_executor.go. The cacheHelper function was updated to use the new resolveCodexContinuity helper for openai format requests, but applyCodexPromptCacheHeaders was not. To ensure consistent behavior for prompt cache continuity, consider updating this function to also use resolveCodexContinuity for openai requests. This would also allow you to correctly populate the Source field of the codexContinuity struct for logging.

Comment on lines +218 to +220
if apiKey, exists := ginCtx.Get("apiKey"); exists && apiKey != nil {
meta[authAffinityMetadataKey] = fmt.Sprintf("principal:%v", apiKey)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using fmt.Sprintf("principal:%v", apiKey) to generate authAffinityMetadataKey can be risky if apiKey is a pointer or a struct, as the string representation might not be stable across requests, leading to auth affinity failures. The value of auth_affinity_key is used as a map key in conductor.go and needs to be consistent.

Consider handling the apiKey value with type switches to ensure a stable string representation, similar to how it's handled in resolveCodexContinuity for client_principal.

      if apiKey, exists := ginCtx.Get("apiKey"); exists && apiKey != nil {
        var keyStr string
        switch v := apiKey.(type) {
        case string:
          keyStr = v
        case fmt.Stringer:
          keyStr = v.String()
        default:
          // Using %v for other types can lead to unstable keys if they are pointers or maps.
          // This should be handled carefully.
          keyStr = fmt.Sprintf("%v", v)
        }

        if trimmedKey := strings.TrimSpace(keyStr); trimmedKey != "" {
          meta[authAffinityMetadataKey] = "principal:" + trimmedKey
        }
      }

Comment on lines +48 to +64
if v, exists := ginCtx.Get("apiKey"); exists && v != nil {
switch value := v.(type) {
case string:
if trimmed := strings.TrimSpace(value); trimmed != "" {
return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"}
}
case fmt.Stringer:
if trimmed := strings.TrimSpace(value.String()); trimmed != "" {
return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"}
}
default:
trimmed := strings.TrimSpace(fmt.Sprintf("%v", value))
if trimmed != "" {
return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling apiKey from the gin context is repetitive across the string, fmt.Stringer, and default cases in the switch statement. This can be refactored to reduce code duplication and improve maintainability. You can extract the string value first and then perform the check and hashing logic once.

    if v, exists := ginCtx.Get("apiKey"); exists && v != nil {
      var keyStr string
      switch value := v.(type) {
      case string:
        keyStr = value
      case fmt.Stringer:
        keyStr = value.String()
      default:
        keyStr = fmt.Sprintf("%v", value)
      }
      if trimmed := strings.TrimSpace(keyStr); trimmed != "" {
        return codexContinuity{Key: uuid.NewSHA1(uuid.NameSpaceOID, []byte("cli-proxy-api:codex:prompt-cache:"+trimmed)).String(), Source: "client_principal"}
      }
    }

Comment on lines +116 to +133
if opts.Metadata != nil {
if raw, ok := opts.Metadata[cliproxyexecutor.SelectedAuthMetadataKey]; ok && raw != nil {
switch v := raw.(type) {
case string:
selectedAuthID = strings.TrimSpace(v)
case []byte:
selectedAuthID = strings.TrimSpace(string(v))
}
}
if raw, ok := opts.Metadata[cliproxyexecutor.ExecutionSessionMetadataKey]; ok && raw != nil {
switch v := raw.(type) {
case string:
executionSessionID = strings.TrimSpace(v)
case []byte:
executionSessionID = strings.TrimSpace(string(v))
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for extracting selectedAuthID and executionSessionID from opts.Metadata is duplicated. You could create a helper function to extract a string value from the metadata map, which would handle the type switching and trimming in one place. This would make the code more concise and easier to maintain.

Example helper:

func getStringFromMetadata(meta map[string]any, key string) string {
    if raw, ok := meta[key]; ok && raw != nil {
        switch v := raw.(type) {
        case string:
            return strings.TrimSpace(v)
        case []byte:
            return strings.TrimSpace(string(v))
        }
    }
    return ""
}

Then you could use it like this:
selectedAuthID := getStringFromMetadata(opts.Metadata, cliproxyexecutor.SelectedAuthMetadataKey)
executionSessionID := getStringFromMetadata(opts.Metadata, cliproxyexecutor.ExecutionSessionMetadataKey)

Comment on lines +2381 to +2392
if pinnedAuthID := pinnedAuthIDFromMetadata(opts.Metadata); pinnedAuthID == "" {
if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
if affinityAuthID := m.AuthAffinity(affinityKey); affinityAuthID != "" {
meta := opts.Metadata
if meta == nil {
meta = make(map[string]any)
opts.Metadata = meta
}
meta[cliproxyexecutor.PinnedAuthMetadataKey] = affinityAuthID
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for applying auth affinity by checking for a pinned auth ID and then resolving affinity from auth_affinity_key is duplicated in pickNext and pickNextMixed. To improve maintainability and reduce code duplication, you could extract this logic into a private helper method on the Manager.

Example helper:

func (m *Manager) applyAuthAffinity(opts *cliproxyexecutor.Options) {
    if opts == nil || pinnedAuthIDFromMetadata(opts.Metadata) != "" {
        return
    }
    if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
        if affinityAuthID := m.AuthAffinity(affinityKey); affinityAuthID != "" {
            if opts.Metadata == nil {
                opts.Metadata = make(map[string]any)
            }
            opts.Metadata[cliproxyexecutor.PinnedAuthMetadataKey] = affinityAuthID
        }
    }
}

Then you can call m.applyAuthAffinity(&opts) at the beginning of pickNext and pickNextMixed.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 511b8a992e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if cache.ID != "" {
rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID)
}
rawJSON = applyCodexContinuityBody(rawJSON, continuity)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Claude cache ID when applying continuity

cacheHelper still computes a Claude-scoped cache.ID, but this refactor now applies body/header continuity only through continuity; because the Claude branch never assigns continuity, prompt_cache_key and session_id stop being forwarded for Claude-origin requests. That regresses previously working prompt-cache/session continuity for Claude→Codex flows and causes repeated turns to miss upstream cache reuse.

Useful? React with 👍 / 👎.

Comment on lines +1096 to +1098
if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
m.SetAuthAffinity(affinityKey, auth.ID)
if log.IsLevelEnabled(log.DebugLevel) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pin auth affinity only after a successful execution

Affinity is written before the selected auth is proven healthy, so a transient failure on that first attempt can lock the request key to a bad auth ID; on the next pick, the pinned ID is forced while tried already contains it, which short-circuits fallback to other available credentials and returns failure. This makes retry behavior worse exactly when one credential is flaky and others could succeed.

Useful? React with 👍 / 👎.

Align websocket continuity resolution with the HTTP Codex path, make auth-affinity principal keys use a stable string representation, and extract small helpers that remove duplicated continuity and affinity logic without changing the validated cache-hit behavior.
@VooDisss
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62b17f40a1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +623 to +627
continuity = resolveCodexContinuity(ctx, auth, req, opts)
cache.ID = continuity.Key
}

if cache.ID != "" {
rawJSON, _ = sjson.SetBytes(rawJSON, "prompt_cache_key", cache.ID)
}
rawJSON = applyCodexContinuityBody(rawJSON, continuity)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Carry Claude cache ID into continuity before applying headers

The Claude branch still computes cache.ID, but only continuity is applied to prompt_cache_key/session_id now. Because continuity is never set for from == "claude", this path silently drops continuity fields that were previously sent, so repeated Claude→Codex requests lose cache/session reuse. The same pattern is also present in websocket continuity handling (applyCodexPromptCacheHeaders).

Useful? React with 👍 / 👎.

Comment on lines +60 to +61
if affinityKey := metadataString(opts.Metadata, codexAuthAffinityMetadataKey); affinityKey != "" {
return codexContinuity{Key: affinityKey, Source: "auth_affinity"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not forward auth affinity key directly as continuity ID

Returning auth_affinity_key verbatim as the Codex continuity key can leak sensitive caller identity into upstream session_id/prompt_cache_key. In this commit, handlers populate auth_affinity_key from stablePrincipalMetadataKey(...) (e.g., principal:<apiKey>), so this branch can send raw principal/API-key material upstream instead of the previous SHA1-derived opaque ID.

Useful? React with 👍 / 👎.

Comment on lines +1096 to +1097
if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
m.SetAuthAffinity(affinityKey, auth.ID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pin auth affinity only after a successful execution

Affinity is persisted immediately after selection, before the request succeeds. If that auth then errors, the next retry sees a pinned auth via applyAuthAffinity and cannot rotate to other credentials, which defeats multi-auth fallback under failures/rate limits and can return errors even when another auth is healthy.

Useful? React with 👍 / 👎.

Restore Claude continuity after the continuity refactor, keep auth-affinity keys out of upstream Codex session identifiers, and only persist affinity after successful execution so retries can still rotate to healthy credentials when the first auth fails.
@VooDisss
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26eca8b6ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
m.SetAuthAffinity(affinityKey, authID)
if entry != nil && log.IsLevelEnabled(log.DebugLevel) {
entry.Debugf("auth affinity pinned key=%s auth_id=%s provider=%s model=%s", affinityKey, authID, provider, model)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid logging raw auth affinity keys

persistAuthAffinity logs affinityKey verbatim, but that key can include principal:<api key> from requestExecutionMetadata (ginCtx.Get("apiKey")), and apiKey is populated from result.Principal in AuthMiddleware (which is the raw credential for config-based access). When debug logging is enabled, this emits client secrets into logs, creating a credential disclosure path.

Useful? React with 👍 / 👎.

Comment on lines +214 to +215
} else if explicitIdempotencyKey != "" {
meta[authAffinityMetadataKey] = explicitIdempotencyKey
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop keying long-lived affinity by Idempotency-Key

This stores the request's explicit Idempotency-Key as auth_affinity_key, which is then persisted in Manager.affinity on successful executions. Because idempotency keys are commonly unique per operation and the new affinity map has no eviction path in production code, sustained traffic with unique keys will grow affinity state without bound and eventually increase memory usage.

Useful? React with 👍 / 👎.

Stop using one-shot idempotency keys as long-lived auth-affinity identifiers and remove raw affinity-key values from debug logs so sticky routing keeps its continuity benefits without creating avoidable memory growth or credential exposure risks.
@VooDisss
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4c4cbd44da

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2277 to +2278
if affinityKey := authAffinityKeyFromMetadata(opts.Metadata); affinityKey != "" {
m.SetAuthAffinity(affinityKey, authID)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope persisted auth affinity by provider

persistAuthAffinity saves a single global mapping from auth_affinity_key to authID, but the key built in request metadata is caller-scoped (e.g., principal:<apiKey>) rather than provider-scoped. After one successful request, applyAuthAffinity will pin that auth for later requests even when they target a different provider, and scheduler selection treats pinned IDs as strict (returning auth_not_found when provider/auth don’t match), so a client that switches providers can be forced onto an incompatible auth and fail.

Useful? React with 👍 / 👎.

Comment on lines +212 to +213
meta[coreexecutor.ExecutionSessionMetadataKey] = executionSessionID
meta[authAffinityMetadataKey] = executionSessionID
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid leaking affinity entries for closed execution sessions

This adds execution_session_id as auth_affinity_key, which is unique per websocket passthrough session, but the manager’s affinity map has no lifecycle tie-in to session closure. Because successful requests persist these keys and CloseExecutionSession does not clear them, every completed websocket session leaves a stale map entry behind, causing unbounded in-memory growth over time under normal streaming traffic.

Useful? React with 👍 / 👎.

Keep sticky auth affinity limited to matching providers and stop persisting execution-session IDs as long-lived affinity keys so provider switching and normal streaming traffic do not create incorrect pins or stale affinity state.
@VooDisss
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6962e09dd9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2251 to +2255
key = strings.TrimSpace(key)
if provider == "" || key == "" {
return ""
}
return provider + "|" + key
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope auth affinity by model to avoid cross-model lockouts

The affinity map key is currently only provider|principal, so one successful request can pin a principal to an auth for that provider across all models. If that auth does not support a later requested model, the pickers still enforce the pinned ID and return auth_not_found instead of selecting another auth that supports the model. This turns normal multi-model routing into hard failures for affected principals; include model (or model family) in the affinity scope, or fall back when the pinned auth is incompatible.

Useful? React with 👍 / 👎.

Comment on lines +2273 to +2277
if affinityAuthID := m.AuthAffinity(provider, affinityKey); affinityAuthID != "" {
if opts.Metadata == nil {
opts.Metadata = make(map[string]any)
}
opts.Metadata[cliproxyexecutor.PinnedAuthMetadataKey] = affinityAuthID
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat affinity as preference instead of hard pin

Writing the affinity result into PinnedAuthMetadataKey makes affinity strict, not sticky-preferred. In the execute*MixedOnce retry loops, once that pinned auth is attempted and added to tried, subsequent retries cannot pick any other credential, so transient failures on one auth (quota/cooldown/network) cause request failure even when other auths are healthy. This regresses credential failover behavior and should be changed to advisory pinning or automatic unpinning on retry.

Useful? React with 👍 / 👎.

Remove the experimental auth-affinity routing changes from this PR so it stays focused on the validated Codex continuity fix. This keeps the prompt-cache repair while avoiding unrelated routing-policy concerns such as provider/model affinity scope, lifecycle cleanup, and hard-pin fallback semantics.
@VooDisss
Copy link
Copy Markdown
Contributor Author

@codex please re-review the current PR head only.

This PR has been narrowed substantially since the earlier review comments. The auth-affinity routing changes that triggered the later scheduler/pinning concerns have been removed from the branch.

The remaining diff is intentionally limited to the Codex continuity/cache fix:

  • preserve prompt_cache_retention on Codex paths
  • preserve prompt_cache_retention in OpenAI Chat Completions -> Codex translation
  • stabilize prompt_cache_key
  • stabilize session_id
  • align HTTP and websocket continuity behavior
  • keep focused regression coverage around continuity and retention behavior

Please evaluate the current head commit as a scoped Codex prompt-cache continuity fix, not the earlier affinity-routing iterations.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Drop the chat-completions translator edits from this PR so the branch complies with the repository policy that forbids pull-request changes under internal/translator. The remaining PR stays focused on the executor-level Codex continuity fix that was validated to restore cache reuse.
@VooDisss
Copy link
Copy Markdown
Contributor Author

@codex please re-review the current head commit only.

The PR has been narrowed further to comply with repository policy: all internal/translator changes were removed. The remaining diff is only the executor-level Codex continuity/cache fix.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@VooDisss
Copy link
Copy Markdown
Contributor Author

@luispater this makes OpenCode work with OpenAI and cache to hit. It works for me, when compiling the .exe and the originally released .exe does not...

Copy link
Copy Markdown

@xkonjin xkonjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review focusing on bugs, security, and tests.

Security:

  • Potential secret in diff at line 219.
  • Potential secret in diff at line 529.

Drop the last affinity-related executor artifacts so the PR stays focused on the minimal Codex continuity fix set: stable prompt cache identity, stable session_id, and the executor-only behavior that was validated to restore cache reads.
@VooDisss VooDisss requested a review from xkonjin March 27, 2026 18:45
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5d3541b5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +60 to +61
if v := strings.TrimSpace(ginCtx.GetHeader("Idempotency-Key")); v != "" {
return codexContinuity{Key: v, Source: "idempotency_key"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid request-scoped Idempotency-Key as primary continuity key

In resolveCodexContinuity, Idempotency-Key is chosen before stable identities (client_principal / auth_id), so any client that sends a different idempotency key per request will get a different prompt_cache_key/session_id on every turn. That breaks the continuity this change is trying to restore (cache/session reuse across turns) and can regress performance for common retry-safe clients that auto-generate per-request idempotency keys. Consider demoting this source below stable caller/auth-derived keys (or only using it for explicit retry contexts).

Useful? React with 👍 / 👎.

@VooDisss
Copy link
Copy Markdown
Contributor Author

@luispater I made it work for myself, but I wanted it to work for all the other people. I can't keep battling these agents forever, cause they will ALWAYS find something to latch onto, as they are inherently non-deterministic

Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary:
This PR cleanly addresses Codex prompt-cache continuity by preserving prompt_cache_retention, centralizing continuity resolution, and consistently applying continuity to prompt_cache_key + session_id across HTTP and websocket paths.

Key findings:

  • Blocking: none.
  • Non-blocking: continuity behavior now depends on precedence logic (prompt_cache_key -> execution session -> idempotency key -> client principal -> auth id). This is reasonable and covered by targeted unit tests, but production traffic diversity remains the main validation surface.

Test plan:

  • Reviewed changed files and continuity flow end-to-end in executor/websocket paths.
  • Ran: go test ./internal/runtime/executor (pass).

Follow-ups (optional):

  • Consider adding an integration-style test fixture that simulates repeated mixed-source requests (HTTP + websocket) with/without gin context to further harden precedence behavior.

This is an automated Codex review result and still requires manual verification by a human reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Codex prompt caching misses on repeated chat-completions requests

3 participants