-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Summary
When using ProviderConfig with type: "anthropic" (BYOK), the CLI binary sends
max_tokens: 8192 to the Anthropic API regardless of any configuration. This value
is hardcoded internally and cannot be overridden from the SDK.
Claude Sonnet 4.6 supports up to 32,000 output tokens (per models.list capabilities),
but the CLI caps it at 8,192. When the model generates a long response (e.g., writing
a large file via the create tool), the response is silently truncated at 8,192 tokens
with stop_reason: "max_tokens", the tool call is incomplete, and the session transitions
to session.idle without any error event — a silent failure.
Environment
- Python SDK:
github-copilot-sdk>=0.2.1rc1 - Provider: Anthropic via Azure AI Foundry (
type: "anthropic") - Model:
claude-sonnet-4.6
Reproduction
from copilot import CopilotClient
from copilot.client import SubprocessConfig
from copilot.session import PermissionHandler
client = CopilotClient(SubprocessConfig(cwd="/tmp/workspace", use_logged_in_user=False))
await client.start()
session = await client.create_session(
model="claude-sonnet-4.6",
provider={
"type": "anthropic",
"base_url": "https://your-endpoint.services.ai.azure.com/anthropic/",
"api_key": "your-key",
},
streaming=True,
on_permission_request=PermissionHandler.approve_all,
)
# Ask the model to generate a large document (triggers create tool with >8K tokens)
await session.send("Generate a comprehensive 5000-word analysis report and save it to output/report.md")
Key observations from log attached:
process-1774635480610-54170.log
- The
createtool call is incomplete — it haspathbut is missingfile_text
(the content was truncated before the model could finish writing it) stop_reason: "max_tokens"with exactlyoutput_tokens: 8192— a hard ceiling[WARNING] Max tokens reached— the CLI is aware of the truncation- No
session.erroris emitted — onlyassistant.message→session.idle - The session goes idle as if everything succeeded — the SDK consumer receives
session.idleand has no indication the response was truncated - No
tool.execution_start— the CLI doesn't even attempt to execute the truncated
tool call, but also doesn't report the failure
For comparison, the previous turn (turn 18) completed successfully with output_tokens: 8147
and stop_reason: "tool_use" — just 45 tokens below the 8,192 ceiling. The model was
generating progressively longer tool calls (file contents) across turns, and the ceiling
was hit on the next one.