Skip to content

feat(sdk): thread reasoningEffort through agent spawning pipeline#1148

Open
ahhlun wants to merge 7 commits into
bradygaster:devfrom
ahhlun:squad/reasoning-effort-support
Open

feat(sdk): thread reasoningEffort through agent spawning pipeline#1148
ahhlun wants to merge 7 commits into
bradygaster:devfrom
ahhlun:squad/reasoning-effort-support

Conversation

@ahhlun

@ahhlun ahhlun commented May 22, 2026

Copy link
Copy Markdown

Summary

Threads reasoningEffort through the full agent spawning pipeline, enabling per-agent configuration of reasoning effort levels (low, medium, high, xhigh).

Problem

SquadSessionConfig already has a reasoningEffort?: SquadReasoningEffort field that passes through to the Copilot SDK, but Squad's spawning pipeline never sets it. Users cannot configure reasoning effort per-agent through charter files, config, or SDK builders.

Solution

Thread reasoningEffort through all 3 configuration layers and both spawning paths, mirroring the existing pattern for model:

Configuration Layers

  1. Charter markdown — Parse **Reasoning Effort:** from ## Model section in charter.md
  2. Config persistencedefaultReasoningEffort and agentReasoningEffortOverrides in .squad/config.json
  3. SDK buildersreasoningEffort on defineAgent() and defineDefaults()

Spawning Paths

  • Lifecycle managerSpawnAgentOptions.reasoningEffortOverrideSquadSessionConfig.reasoningEffort
  • Fan-out pipelineAgentSpawnConfig.reasoningEffortOverridecreateSession({reasoningEffort})

Resolution

New resolveReasoningEffort() with layered priority hierarchy matching model resolution:

  • Layer 0a: Per-agent config override
  • Layer 0b: Global config default
  • Layer 1: Spawn-time override
  • Layer 2: Charter preference
  • Layer 3: Default (undefined — let SDK/API decide)

The value auto is a sentinel meaning "let the downstream SDK/API decide" and resolves to undefined.

Effort Clamping

New clampReasoningEffort() caps requested effort to the model's maximum supported level. This prevents API errors when a user requests e.g. xhigh on a model that only supports up to high.

Behavior:

  • xhigh on a model with [low,medium,high]high (clamped)
  • xhigh on a model with no effort support → undefined (stripped)
  • max and xhigh treated as equivalent rank
  • resolveReasoningEffort() accepts optional supportedEfforts from listModels() and auto-clamps

Usage Example

In charter.md:

## Model

- **Preferred:** claude-opus-4.7-1m-internal
- **Reasoning Effort:** xhigh

In .squad/config.json:

{
  "defaultReasoningEffort": "high",
  "agentReasoningEffortOverrides": {
    "fenster": "xhigh"
  }
}

In squad.config.ts:

defineAgent({
  name: 'fenster',
  role: 'Core Dev',
  model: 'claude-opus-4.7-1m-internal',
  reasoningEffort: 'xhigh',
});

Files Changed (13)

File Change
packages/squad-sdk/src/agents/charter-compiler.ts Parse **Reasoning Effort:**, add to ParsedCharter, CharterConfigOverrides, CompiledCharter
packages/squad-sdk/src/agents/index.ts Add reasoningEffort to AgentCharter interface, wire into compile methods
packages/squad-sdk/src/agents/lifecycle.ts Add reasoningEffortOverride to SpawnAgentOptions, pass to SquadSessionConfig
packages/squad-sdk/src/coordinator/fan-out.ts Add reasoningEffortOverride to AgentSpawnConfig, optional resolveReasoningEffort on FanOutDependencies
packages/squad-sdk/src/config/models.ts Add read/writeReasoningEffort(), resolveReasoningEffort(), clampReasoningEffort()
packages/squad-sdk/src/config/schema.ts Add reasoningEffort to AgentConfig, defaultReasoningEffort to ModelConfig
packages/squad-sdk/src/builders/types.ts Add reasoningEffort to AgentDefinition and DefaultsDefinition
packages/squad-sdk/src/builders/index.ts Validate reasoningEffort in defineAgent() and defineDefaults()
packages/squad-sdk/templates/charter.md Add **Reasoning Effort:** auto to ## Model section
test/charter-compiler.test.ts Tests for parsing reasoning effort from charter
test/fan-out.test.ts Tests for passing reasoning effort through spawn
test/model-preference.test.ts Tests for config persistence, resolution hierarchy, and clamping
.changeset/reasoning-effort.md Minor bump for squad-sdk

Testing

Unit Tests

  • 183 tests pass across 5 test files (charter-compiler, fan-out, model-preference, builders, lifecycle)
  • ✅ Build passes (npm run build)
  • ✅ Backwards compatible — resolveReasoningEffort on FanOutDependencies is optional

Live Integration Test (claude-opus-4.7-1m-internal)

Ran a live test creating real Copilot sessions to verify reasoning effort is actually applied at the API level:

xhigh default (medium)
Reasoning delta events 8 0
assistant.reasoning event ✅ present ❌ absent
Output tokens 128 206

Key findings:

  • reasoningEffort: "xhigh" on claude-opus-4.7-1m-internal produced 8 reasoning/thinking delta events with actual thinking content. The default (medium) produced zero.
  • The model spends tokens on internal reasoning with higher effort, resulting in more concise output (128 vs 206 tokens).
  • Confirmed the full pipeline works: SquadSessionConfig.reasoningEffort → Copilot SDK → Copilot CLI → API → model thinking tokens.

Live Model Capabilities (from listModels() API)

Model Supported Efforts Default
claude-sonnet-4.6 [none,low,medium,high] medium
claude-opus-4.6 [none,low,medium,high] medium
claude-opus-4.7 [none,medium] medium
claude-opus-4.7-1m-internal [none,low,medium,high,xhigh] medium
gpt-5.5 [none,low,medium,high,xhigh] medium
gpt-5.4 [none,low,medium,high,xhigh] medium
gpt-5.2-codex [none,low,medium,high,xhigh] high
gpt-5.2 [none,low,medium,high] medium
gpt-5-mini [none,low,medium,high] medium

Clamping Validation

  • Tested sending xhigh to claude-sonnet-4.6 (max: high) — API silently accepted (no crash), confirming clamping is a nice-to-have for transparency rather than crash prevention.

Add support for configuring reasoning effort level per-agent through:

- Charter markdown: **Reasoning Effort:** field in ## Model section
- Config persistence: defaultReasoningEffort and agentReasoningEffortOverrides in .squad/config.json
- SDK builders: reasoningEffort on defineAgent() and defineDefaults()
- Layered resolution: resolveReasoningEffort() with priority hierarchy matching model resolution
- Lifecycle manager: SpawnAgentOptions.reasoningEffortOverride passes through to SquadSessionConfig
- Fan-out pipeline: AgentSpawnConfig.reasoningEffortOverride passes through to createSession()

The value 'auto' is treated as a sentinel meaning 'let the SDK/API decide' and resolves to undefined.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 22, 2026 00:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds end-to-end “reasoning effort” support across charter parsing, persistent config, spawn/fan-out session creation, and the public builder API.

Changes:

  • Parse **Reasoning Effort:** in charters and thread it through charter compilation + session creation.
  • Add persistent config support (defaultReasoningEffort, agentReasoningEffortOverrides) plus a layered resolveReasoningEffort() resolver.
  • Add tests covering config round-trips, charter parsing/compilation, and fan-out session parameter propagation.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/model-preference.test.ts Adds unit tests for reasoning-effort config read/write + layered resolution.
test/fan-out.test.ts Verifies reasoningEffort propagation (override/charter/auto) into createSession.
test/charter-compiler.test.ts Adds charter markdown parsing + compile-time resolution tests for reasoning effort.
packages/squad-sdk/templates/charter.md Updates charter template to include “Reasoning Effort: auto”.
packages/squad-sdk/src/coordinator/fan-out.ts Threads reasoningEffortOverride into spawn/session creation (with optional resolver).
packages/squad-sdk/src/config/schema.ts Extends config schema types with reasoning-effort fields.
packages/squad-sdk/src/config/models.ts Implements read/write + layered resolution for reasoning effort.
packages/squad-sdk/src/builders/types.ts Exposes reasoningEffort in builder types.
packages/squad-sdk/src/builders/index.ts Validates reasoningEffort in defineAgent/defineDefaults.
packages/squad-sdk/src/agents/lifecycle.ts Threads reasoning effort through agent lifecycle session creation.
packages/squad-sdk/src/agents/index.ts Adds reasoningEffort to AgentCharter.
packages/squad-sdk/src/agents/charter-compiler.ts Parses and compiles a resolved reasoning effort from charter/config overrides.
.changeset/reasoning-effort.md Declares a minor release and documents the feature.

Comment thread .changeset/reasoning-effort.md Outdated
Comment thread packages/squad-sdk/src/config/models.ts Outdated
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/agents/charter-compiler.ts Outdated
Comment thread packages/squad-sdk/src/agents/charter-compiler.ts
Comment thread packages/squad-sdk/src/agents/lifecycle.ts Outdated
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/config/models.ts
allai-dev and others added 2 commits May 21, 2026 17:13
Add clampReasoningEffort() that caps the requested effort to the highest
level the model supports. This prevents API errors when a user requests
e.g. 'xhigh' on a model that only supports up to 'high' (like GPT-5.5).

resolveReasoningEffort() now accepts an optional supportedEfforts array
(from listModels()) and auto-clamps the result.

Behavior:
- xhigh on GPT-5.5 [low,medium,high] -> high (clamped down)
- xhigh on GPT-5.2-codex [low,medium,high,xhigh] -> xhigh (unchanged)
- xhigh on gpt-4.1 (no effort support) -> undefined (stripped)
- max and xhigh treated as equivalent rank

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Validate reasoning effort at parse time: charter compiler now rejects
  invalid values and normalizes 'auto' to undefined early, preventing
  invalid strings from reaching SquadSessionConfig
- Validate readReasoningEffort() against allowed set (low/medium/high/xhigh/auto),
  matching the per-agent override validation for consistency
- Add isValid() guard in resolveReasoningEffort() so invalid values at
  any layer (spawn override, charter, config) fall through gracefully
- Remove duplicate extractReasoningEffort() regex in lifecycle.ts;
  use compileCharterFull() output instead (single source of truth)
- Add plain-object check after JSON.parse in write functions to prevent
  corruption when config.json contains non-object JSON (e.g. array/number)
- Fix changeset docs to match actual priority order:
  per-agent config > global config > spawn override > charter > undefined
- Add tests for invalid inputs at all resolver layers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ahhlun

ahhlun commented May 22, 2026

Copy link
Copy Markdown
Author

All 10 review comments addressed in commit 423a7b0:

  1. Changeset priority order — Fixed docs to match implementation: per-agent config > global config > spawn override > charter > undefined.

  2. Validation consistencyreadReasoningEffort() now validates against [low, medium, high, xhigh, auto], matching per-agent override validation. Invalid values (e.g. typos) are ignored.

  3. JSON.parse safety — Both write functions now verify parsed config is a plain object before mutating; falls back to { version: 1 } if config.json contains non-object JSON.

  4. auto/invalid normalization in charter compilerparseCharterMarkdown() now rejects auto and invalid values at parse time (normalized to undefined). compileCharterFull() validates config overrides separately so invalid overrides fall through to the charter value.

  5. Removed duplicate parsing in lifecycle — Replaced extractReasoningEffort() regex with compileCharterFull() output. Single source of truth for charter parsing.

  6. Invalid input tests — Added tests for invalid values at every resolver layer (spawnOverride: 'invalid', charterPreference: 'turbo', persisted defaultReasoningEffort: 'invalid'). All fall through to undefined.

190 tests pass (up from 183).

Add Per-Agent Reasoning Effort section to squad.agent.md so the
coordinator knows to use reasoningEffort as a session parameter
instead of switching to different model variants (e.g. using
claude-opus-4.7-1m-internal with xhigh effort instead of switching
to claude-opus-4.7-xhigh).

Includes:
- Layered resolution instructions matching the SDK implementation
- Config persistence commands (always use xhigh, per-agent, clear)
- Explicit rule: same model + different effort, NOT different model
- Updated spawn output format showing effort annotation

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ahhlun ahhlun requested a review from Copilot May 22, 2026 05:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Comment thread packages/squad-sdk/src/agents/lifecycle.ts
Comment thread packages/squad-sdk/src/coordinator/fan-out.ts
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/config/models.ts
Comment thread packages/squad-sdk/src/config/models.ts Outdated
Comment thread packages/squad-sdk/src/coordinator/fan-out.ts
- Centralize VALID_REASONING_EFFORTS as exported const from config/models.ts;
  charter-compiler and builders import it instead of maintaining duplicates
- Validate reasoning effort in lifecycle before setting on SquadSessionConfig:
  reject auto and invalid values so they never reach the SDK
- Validate in fan-out fallback path: invalid reasoningEffortOverride values
  are filtered out instead of passed through to createSession
- Fix clampReasoningEffort to clamp UP to model's minimum when requested
  effort is below the supported range (e.g. low on a model that only
  supports [medium] returns medium instead of undefined)
- Validate write APIs: writeReasoningEffort and writeAgentReasoningEffortOverrides
  now reject invalid values instead of persisting them to config.json
- Add test for clamp-up behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ahhlun

ahhlun commented May 22, 2026

Copy link
Copy Markdown
Author

All review feedback addressed across commits 423a7b0, b437d8a, and 046138a.

Round 1 (10 comments) — 423a7b0:

  • Fixed changeset priority order docs
  • Validated readReasoningEffort against allowed set
  • Added JSON.parse plain-object safety check
  • Normalized auto/invalid in charter compiler
  • Removed duplicate parsing in lifecycle
  • Added invalid-input tests at every layer

Round 2 (7 new comments) — 046138a:

  • Centralized VALID_REASONING_EFFORTS (single exported const, no duplication)
  • Validated in lifecycle and fan-out fallback path (auto/invalid never reach SDK)
  • clampReasoningEffort now clamps UP to model minimum (not just down)
  • Write APIs reject invalid values before persisting
  • Added clamp-up test

Coordinator update — b437d8a:

  • Taught coordinator about reasoning effort as a session parameter (not a model variant)

237 tests pass across 6 test files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

👋 Friendly nudge — this PR has had no activity for 16 days.

What needs attention:

  • 👀 No approving reviews yet. Request a review from a teammate.

If this PR is abandoned, please close it. If it's blocked on something external, leave a comment so the team knows.
This is an automated check that runs on weekdays. It won't nudge the same PR more than once per week.

@ahhlun

ahhlun commented Jun 9, 2026

Copy link
Copy Markdown
Author

@bradygaster @tamirdresher can you help with review?

@tamirdresher

Copy link
Copy Markdown
Collaborator

@ahhlun nice work — the thing this PR sets out to do is done. reasoningEffort threads through SquadSessionConfig end-to-end: wiring lands in both spawn paths (lifecycle.ts:226 and fan-out.ts:143), charter parsing handles missing field, casing, whitespace, and the auto sentinel cleanly, backwards compat holds, and builder validation is symmetric across defineAgent / defineDefaults. ~30 unit tests across the 3 new test files cover the resolver, clamper, and persistence round-trips well. Approving.

On the integrator-injection shape

resolveReasoningEffort() and clampReasoningEffort() sitting as optional deps on FanOutDependencies — same shape as resolveModel — is the right SDK boundary. Downstream integrators (squad-cli, custom coordinators) wire them in; the SDK just exposes the hooks. That's the existing pattern and it's correct here too.

One testing question

The live integration test in the PR description confirms reasoningEffort: "xhigh" passed directly into SquadSessionConfig reaches the Copilot SDK → CLI → API and fires 8 reasoning delta events. That proves the spawn-time override path (Layer 1).

The other two paths this PR ships — charter **Reasoning Effort:** xhigh (Layer 2) and .squad/config.json (Layers 0a/0b, once the integrator wires the resolver) — weren't live-verified the same way. Worth one more pass on the existing test harness: drive a real Copilot CLI session with **Reasoning Effort: xhigh** in the charter and no spawn-time override, confirm you see the same 8 delta events. Closes the loop for Layer 2 end-to-end without needing the integrator work for Layers 0a/0b. (Those will live-verify naturally once squad-cli is wired in the follow-up.)

Smaller nits — not blockers

  • clampReasoningEffort() clamps UP as well as down — low against a model that only supports [high] returns high. Test enshrines it (model-preference.test.ts:646-650) so it''s deliberate, but a code comment would save the next reader from filing it as a bug.
  • writeReasoningEffort() / writeAgentReasoningEffortOverrides() silently DELETE valid prefs on invalid input — no way to distinguish a typo from an intentional clear. A console.warn or thrown error would help.
  • Invalid charter values drop silently in charter-compiler.ts:280-283 — editor gets zero feedback. Same fix.
  • ''low'' | ''medium'' | ''high'' | ''xhigh'' is defined in 3 places (SquadReasoningEffort in adapter/types.ts:43, VALID_REASONING_EFFORTS in config/models.ts, inline in builders/types.ts:69,123) and then widened to bare string in schema.ts. Worth consolidating to one canonical type.

Address nits here or in a follow-up — your call. Looking forward to the integrator wiring PR.

Follow-up to @tamirdresher's review on PR bradygaster#1148:

- clampReasoningEffort: document the deliberate upward clamp (JSDoc and
  inline comment) so a request below a model's minimum isn't mistaken
  for a bug.
- writeReasoningEffort: warn and preserve the existing preference on
  invalid input instead of silently clearing it (only null clears).
- writeAgentReasoningEffortOverrides: warn about dropped invalid entries
  instead of discarding them silently.
- charter parsing: warn on an invalid Reasoning Effort value so charter
  authors get feedback instead of a silent drop.
- Consolidate the reasoning-effort union on the canonical
  SquadReasoningEffort type: builders/types.ts references it and
  VALID_REASONING_EFFORTS is tied to it via satisfies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🛫 PR Readiness Check

ℹ️ This comment updates on each push. Last checked: commit ff77480

PR Scope: 📦🔧 Mixed (product + infrastructure)

⚠️ 4 item(s) to address before review

Status Check Details
Single commit 7 commits — consider squashing before review
Not in draft Ready for review
Branch up to date dev is 164 commit(s) ahead — rebase recommended
Copilot review No Copilot review yet — it may still be processing
Changeset present Changeset file found
Scope clean No .squad/ or docs/proposals/ files
No merge conflicts No merge conflicts
Copilot threads resolved 12 active Copilot thread(s) resolved (5 outdated skipped)
CI passing 6 check(s) still running

Files Changed (19 files, +1022 −6)

File +/−
.changeset/reasoning-effort.md +17 −0
.github/agents/squad.agent.md +27 −0
.squad-templates/squad.agent.md +27 −0
packages/squad-cli/templates/squad.agent.md.template +27 −0
packages/squad-sdk/src/adapter/types.ts +4 −0
packages/squad-sdk/src/agents/charter-compiler.ts +32 −0
packages/squad-sdk/src/agents/index.ts +5 −0
packages/squad-sdk/src/agents/lifecycle.ts +20 −6
packages/squad-sdk/src/builders/index.ts +8 −0
packages/squad-sdk/src/builders/types.ts +8 −0
packages/squad-sdk/src/config/models.ts +333 −0
packages/squad-sdk/src/config/schema.ts +3 −0
packages/squad-sdk/src/coordinator/fan-out.ts +16 −0
packages/squad-sdk/templates/charter.md +1 −0
packages/squad-sdk/templates/squad.agent.md.template +27 −0
templates/squad.agent.md.template +27 −0
test/charter-compiler.test.ts +66 −0
test/fan-out.test.ts +72 −0
test/model-preference.test.ts +302 −0

Total: +1022 −6


This check runs automatically on every push. Fix any ❌ items and push again.
See CONTRIBUTING.md and PR Requirements for details.

@github-actions

Copy link
Copy Markdown
Contributor

🟠 Impact Analysis — PR #1148

Risk tier: 🟠 HIGH

📊 Summary

Metric Count
Files changed 19
Files added 1
Files modified 18
Files deleted 0
Modules touched 6
Critical files 2

🎯 Risk Factors

  • 19 files changed (6-20 → MEDIUM)
  • 6 modules touched (5-8 → HIGH)
  • Critical files touched: packages/squad-sdk/src/agents/index.ts, packages/squad-sdk/src/builders/index.ts

📦 Modules Affected

ci-workflows (1 file)
  • .github/agents/squad.agent.md
root (2 files)
  • .changeset/reasoning-effort.md
  • templates/squad.agent.md.template
squad-cli (1 file)
  • packages/squad-cli/templates/squad.agent.md.template
squad-sdk (11 files)
  • packages/squad-sdk/src/adapter/types.ts
  • packages/squad-sdk/src/agents/charter-compiler.ts
  • packages/squad-sdk/src/agents/index.ts
  • packages/squad-sdk/src/agents/lifecycle.ts
  • packages/squad-sdk/src/builders/index.ts
  • packages/squad-sdk/src/builders/types.ts
  • packages/squad-sdk/src/config/models.ts
  • packages/squad-sdk/src/config/schema.ts
  • packages/squad-sdk/src/coordinator/fan-out.ts
  • packages/squad-sdk/templates/charter.md
  • packages/squad-sdk/templates/squad.agent.md.template
templates (1 file)
  • .squad-templates/squad.agent.md
tests (3 files)
  • test/charter-compiler.test.ts
  • test/fan-out.test.ts
  • test/model-preference.test.ts

⚠️ Critical Files

  • packages/squad-sdk/src/agents/index.ts
  • packages/squad-sdk/src/builders/index.ts

This report is generated automatically for every PR. See #733 for details.

@ahhlun

ahhlun commented Jun 12, 2026

Copy link
Copy Markdown
Author

@tamirdresher thanks for the thorough review! Addressed all four nits in ff77480:

  • clampReasoningEffort clamp-up — documented the deliberate upward clamp in both the JSDoc and an inline comment, so a request below a model''s minimum reads as intentional rather than a bug.
  • writeReasoningEffort — now console.warns on invalid input and preserves the existing preference instead of silently clearing it (only null clears).
  • writeAgentReasoningEffortOverrides — now console.warns listing any dropped invalid entries.
  • Charter parsingconsole.warns on an invalid **Reasoning Effort:** value so charter authors get feedback instead of a silent drop.
  • Type consolidation — unified the union on the canonical SquadReasoningEffort (adapter/types.ts); builders/types.ts now references it and VALID_REASONING_EFFORTS is tied to it via satisfies. Left schema.ts as string to match the existing loosely-typed on-disk model field.

npm run lint is clean and all reasoning-effort tests pass. Leaving the Layer-2 charter live-verification for the follow-up wiring PR as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants