[BUG] Anthropic BYOK: max_tokens hardcoded at 8192 causes silent truncation — no error event emitted to SDK

## Summary

When using `ProviderConfig` with `type: "anthropic"` (BYOK), the CLI binary sends
`max_tokens: 8192` to the Anthropic API regardless of any configuration. This value
is hardcoded internally and cannot be overridden from the SDK.

Claude Sonnet 4.6 supports up to 32,000 output tokens (per `models.list` capabilities),
but the CLI caps it at 8,192. When the model generates a long response (e.g., writing
a large file via the `create` tool), the response is silently truncated at 8,192 tokens
with `stop_reason: "max_tokens"`, the tool call is incomplete, and the session transitions
to `session.idle` without any error event — a silent failure.

## Environment

- **Python SDK**: `github-copilot-sdk>=0.2.1rc1`
- **Provider**: Anthropic via Azure AI Foundry (`type: "anthropic"`)
- **Model**: `claude-sonnet-4.6`

## Reproduction

```
from copilot import CopilotClient
from copilot.client import SubprocessConfig
from copilot.session import PermissionHandler

client = CopilotClient(SubprocessConfig(cwd="/tmp/workspace", use_logged_in_user=False))
await client.start()

session = await client.create_session(
    model="claude-sonnet-4.6",
    provider={
        "type": "anthropic",
        "base_url": "https://your-endpoint.services.ai.azure.com/anthropic/",
        "api_key": "your-key",
    },
    streaming=True,
    on_permission_request=PermissionHandler.approve_all,
)

# Ask the model to generate a large document (triggers create tool with >8K tokens)
await session.send("Generate a comprehensive 5000-word analysis report and save it to output/report.md")

```

## Key observations from log attached:

[process-1774635480610-54170.log](https://github.com/user-attachments/files/26324408/process-1774635480610-54170.log)

1. **The `create` tool call is incomplete** — it has `path` but is missing `file_text`
   (the content was truncated before the model could finish writing it)
2. **`stop_reason: "max_tokens"`** with exactly **`output_tokens: 8192`** — a hard ceiling
3. **`[WARNING] Max tokens reached`** — the CLI is aware of the truncation
4. **No `session.error` is emitted** — only `assistant.message` → `session.idle`
5. **The session goes idle as if everything succeeded** — the SDK consumer receives
   `session.idle` and has no indication the response was truncated
6. **No `tool.execution_start`** — the CLI doesn't even attempt to execute the truncated
   tool call, but also doesn't report the failure

For comparison, the previous turn (turn 18) completed successfully with `output_tokens: 8147`
and `stop_reason: "tool_use"` — just 45 tokens below the 8,192 ceiling. The model was
generating progressively longer tool calls (file contents) across turns, and the ceiling
was hit on the next one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Anthropic BYOK: max_tokens hardcoded at 8192 causes silent truncation — no error event emitted to SDK #955

Summary

Environment

Reproduction

Key observations from log attached:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Anthropic BYOK: max_tokens hardcoded at 8192 causes silent truncation — no error event emitted to SDK #955

Description

Summary

Environment

Reproduction

Key observations from log attached:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions