Skip to content

feat: enable codex models#1666

Open
Kylejeong2 wants to merge 1 commit intomainfrom
kylejeong/stg-1324-enable-codex-models-through-ai-sdk
Open

feat: enable codex models#1666
Kylejeong2 wants to merge 1 commit intomainfrom
kylejeong/stg-1324-enable-codex-models-through-ai-sdk

Conversation

@Kylejeong2
Copy link
Member

@Kylejeong2 Kylejeong2 commented Feb 5, 2026

why

people want to try codex models in Stagehand (also see benchmarks)

what changed

Codex needs a text verbosity and reasoning effort of medium (not low) so we set this using a conditional operator.

test plan

Ran evals


Summary by cubic

Enables Codex models in Stagehand via the AI SDK by setting required OpenAI provider options. Addresses STG-1324. Codex now uses medium text verbosity and medium reasoning effort; gpt-5 defaults remain unchanged.

  • New Features
    • Detects Codex models and sets textVerbosity: “medium” and reasoningEffort: “medium”.
    • Applies changes in AISdkClient and AISdkClientWrapped without affecting gpt-5 behavior.

Written for commit a926fad. Summary will update on new commits. Review in cubic

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

Enabled codex model support in Stagehand by adding special configuration for OpenAI's codex models. Changes applied consistently across both production (aisdk.ts) and evaluation (AISdkClientWrapped.ts) clients.

Key Changes:

  • Added isCodex detection for models containing "codex" in their name
  • Modified usesLowReasoningEffort logic to exclude codex models
  • Set codex-specific providerOptions: textVerbosity: "medium" and reasoningEffort: "medium" (required by codex API)
  • Previous gpt-5.1/5.2 models continue using textVerbosity: "low" and reasoningEffort: "low"
  • Other gpt-5 models use textVerbosity: "low" and reasoningEffort: "minimal"

Review Notes:
The conditional logic for determining reasoningEffort uses nested ternary operators that check isCodex first, then usesLowReasoningEffort, then default to "minimal". Upon careful analysis, this logic appears correct for all model types. The PR description states codex models require medium verbosity and reasoning effort, which is properly enforced in the code.

Confidence Score: 4/5

  • This PR is safe to merge with minor verification recommended
  • The implementation correctly adds codex model support with appropriate configuration values. The nested ternary logic is sound and handles all cases properly. Changes are consistently applied to both production and eval clients. The code follows existing patterns and is well-commented. Minor confidence reduction due to the complexity of nested ternaries and lack of unit tests visible in this PR to verify the codex-specific logic paths.
  • No files require special attention - both files have identical, straightforward changes

Important Files Changed

Filename Overview
packages/core/lib/v3/llm/aisdk.ts Added codex model detection with special handling for textVerbosity and reasoningEffort settings; potential logic bug where codex models outside gpt-5.1/5.2 may not receive correct reasoningEffort value
packages/evals/lib/AISdkClientWrapped.ts Mirrored changes from aisdk.ts for evaluation client; same potential logic bug exists

Sequence Diagram

sequenceDiagram
    participant Client
    participant AISdkClient
    participant ModelCheck
    participant generateObject

    Client->>AISdkClient: createChatCompletion(options)
    AISdkClient->>ModelCheck: Check model.modelId
    
    alt contains "gpt-5"
        ModelCheck->>ModelCheck: isGPT5 = true
        
        alt contains "codex"
            ModelCheck->>ModelCheck: isCodex = true
            ModelCheck->>ModelCheck: usesLowReasoningEffort = false
            Note over ModelCheck: Codex models excluded from low reasoning
        else contains "gpt-5.1" or "gpt-5.2"
            ModelCheck->>ModelCheck: isCodex = false
            ModelCheck->>ModelCheck: usesLowReasoningEffort = true
        else other gpt-5 variant
            ModelCheck->>ModelCheck: isCodex = false
            ModelCheck->>ModelCheck: usesLowReasoningEffort = false
        end
        
        ModelCheck->>generateObject: Call with providerOptions
        
        alt isCodex = true
            Note over generateObject: textVerbosity: "medium"<br/>reasoningEffort: "medium"
        else usesLowReasoningEffort = true
            Note over generateObject: textVerbosity: "low"<br/>reasoningEffort: "low"
        else default
            Note over generateObject: textVerbosity: "low"<br/>reasoningEffort: "minimal"
        end
        
    else not gpt-5
        ModelCheck->>generateObject: Call without providerOptions
        Note over generateObject: Uses model defaults
    end
    
    generateObject-->>AISdkClient: objectResponse
    AISdkClient-->>Client: Return result
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +175 to +179
reasoningEffort: isCodex
? "medium"
: usesLowReasoningEffort
? "low"
: "minimal",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic bug: codex models that are gpt-5 (not gpt-5.1 or gpt-5.2) will get reasoningEffort: "minimal" instead of "medium". The ternary checks isCodex first, but if the model is gpt-5-codex, usesLowReasoningEffort is false (doesn't match gpt-5.1 or gpt-5.2), so it falls through to "minimal".

Suggested change
reasoningEffort: isCodex
? "medium"
: usesLowReasoningEffort
? "low"
: "minimal",
reasoningEffort: isCodex
? "medium"
: usesLowReasoningEffort
? "low"
: "minimal",
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/core/lib/v3/llm/aisdk.ts
Line: 175:179

Comment:
Logic bug: codex models that are `gpt-5` (not `gpt-5.1` or `gpt-5.2`) will get `reasoningEffort: "minimal"` instead of `"medium"`. The ternary checks `isCodex` first, but if the model is `gpt-5-codex`, `usesLowReasoningEffort` is false (doesn't match `gpt-5.1` or `gpt-5.2`), so it falls through to `"minimal"`.

```suggestion
                  reasoningEffort: isCodex
                    ? "medium"
                    : usesLowReasoningEffort
                      ? "low"
                      : "minimal",
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +152 to +156
reasoningEffort: isCodex
? "medium"
: usesLowReasoningEffort
? "low"
: "minimal",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same logic bug as in aisdk.ts: codex models that are gpt-5 (not gpt-5.1 or gpt-5.2) will get reasoningEffort: "minimal" instead of "medium".

Suggested change
reasoningEffort: isCodex
? "medium"
: usesLowReasoningEffort
? "low"
: "minimal",
reasoningEffort: isCodex
? "medium"
: usesLowReasoningEffort
? "low"
: "minimal",
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/evals/lib/AISdkClientWrapped.ts
Line: 152:156

Comment:
Same logic bug as in `aisdk.ts`: codex models that are `gpt-5` (not `gpt-5.1` or `gpt-5.2`) will get `reasoningEffort: "minimal"` instead of `"medium"`.

```suggestion
                  reasoningEffort: isCodex
                    ? "medium"
                    : usesLowReasoningEffort
                      ? "low"
                      : "minimal",
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files

Confidence score: 4/5

  • The main risk is a new hardcoded model name check (isCodex) in packages/core/lib/v3/llm/aisdk.ts, which violates the rule against hardcoded LLM model name checks and could cause future model handling regressions.
  • Severity is moderate (5/10) with good confidence, so this looks safe to merge with a small policy-compliance concern rather than a likely runtime break.
  • Pay close attention to packages/core/lib/v3/llm/aisdk.ts - hardcoded model name check violates the no-list rule.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/llm/aisdk.ts">

<violation number="1" location="packages/core/lib/v3/llm/aisdk.ts:133">
P2: Rule violated: **Ensure we never check against hardcoded lists of allowed LLM model names**

This adds a new hardcoded model name check (`isCodex`) which violates the rule against hardcoded LLM model name checks. The rule states newly added code should accept any model name and let the provider handle errors. The exception only applies to guarding *against* known-bad models, not special-casing to enable models.

Consider an alternative approach such as:
- Letting the API return an error for unsupported configurations, then handling it
- Using a model capabilities/metadata system
- Adding a configuration option for these parameters rather than inferring from model name</violation>
</file>
Architecture diagram
sequenceDiagram
    participant App as Stagehand Application
    participant Client as AISdkClient / Wrapped
    participant SDK as AI SDK (generateObject)
    participant OpenAI as OpenAI API

    App->>Client: Request LLM Extraction (modelId, response_model)
    
    Note over Client: Internal Model Classification
    Client->>Client: Check modelId for "gpt-5" and "codex"

    alt NEW: Model is Codex-based
        Note over Client: Set specific Codex requirements
        Client->>Client: Set textVerbosity: "medium"
        Client->>Client: Set reasoningEffort: "medium"
    else Model is Standard GPT-5
        Note over Client: Apply default GPT-5 constraints
        Client->>Client: Set textVerbosity: "low"
        alt modelId is 5.1 or 5.2
            Client->>Client: Set reasoningEffort: "low"
        else 
            Client->>Client: Set reasoningEffort: "minimal"
        end
    end

    Client->>SDK: generateObject(prompt, schema, providerOptions)
    
    Note right of SDK: Includes OpenAI-specific<br/>textVerbosity & reasoningEffort
    
    SDK->>OpenAI: POST /v1/chat/completions
    OpenAI-->>SDK: JSON Response
    SDK-->>Client: Typed Object
    Client-->>App: Extraction Result
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.


let objectResponse: Awaited<ReturnType<typeof generateObject>>;
const isGPT5 = this.model.modelId.includes("gpt-5");
const isCodex = this.model.modelId.includes("codex");
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Rule violated: Ensure we never check against hardcoded lists of allowed LLM model names

This adds a new hardcoded model name check (isCodex) which violates the rule against hardcoded LLM model name checks. The rule states newly added code should accept any model name and let the provider handle errors. The exception only applies to guarding against known-bad models, not special-casing to enable models.

Consider an alternative approach such as:

  • Letting the API return an error for unsupported configurations, then handling it
  • Using a model capabilities/metadata system
  • Adding a configuration option for these parameters rather than inferring from model name
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/llm/aisdk.ts, line 133:

<comment>This adds a new hardcoded model name check (`isCodex`) which violates the rule against hardcoded LLM model name checks. The rule states newly added code should accept any model name and let the provider handle errors. The exception only applies to guarding *against* known-bad models, not special-casing to enable models.

Consider an alternative approach such as:
- Letting the API return an error for unsupported configurations, then handling it
- Using a model capabilities/metadata system
- Adding a configuration option for these parameters rather than inferring from model name</comment>

<file context>
@@ -130,9 +130,11 @@ export class AISdkClient extends LLMClient {
 
     let objectResponse: Awaited<ReturnType<typeof generateObject>>;
     const isGPT5 = this.model.modelId.includes("gpt-5");
+    const isCodex = this.model.modelId.includes("codex");
     const usesLowReasoningEffort =
-      this.model.modelId.includes("gpt-5.1") ||
</file context>
Fix with Cubic

Comment on lines +135 to +136
(this.model.modelId.includes("gpt-5.1") ||
this.model.modelId.includes("gpt-5.2")) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be consolidated into includes gpt-5.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think so, will test if gpt-5-2025-08-07 also has the same reasoning requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants