Add proactive context window / token budget management

## Problem

RockBot currently tracks token usage *after* LLM requests complete (via `LlmUsage` in `LlmResponse`), but has no way to estimate token consumption *before* sending a request. This means the system can't proactively manage the context window — it only discovers it exceeded the budget when the LLM returns a `"length"` finish reason, at which point information has already been lost.

Tools like Claude Code and GitHub Copilot solve this by using client-side tokenizers to estimate consumption before each request, enabling smart context management strategies.

## Proposed Solution

### 1. `ITokenEstimator` Abstraction

A provider-agnostic interface for estimating token counts before sending requests:

```csharp
public interface ITokenEstimator
{
    /// <summary>Estimate token count for a single message.</summary>
    int EstimateTokens(LlmChatMessage message);

    /// <summary>Estimate token count for a full conversation + tools.</summary>
    TokenEstimate EstimateRequest(IReadOnlyList<LlmChatMessage> messages, IReadOnlyList<LlmToolDefinition>? tools = null);
}

public sealed record TokenEstimate(
    int SystemPromptTokens,
    int ConversationTokens,
    int ToolDefinitionTokens,
    int TotalTokens);
```

**Implementations:**
- **`TiktokenEstimator`** — Uses `Microsoft.ML.Tokenizers` (covers OpenAI models, good general-purpose BPE approximation)
- **`AnthropicCountTokensEstimator`** — Calls Anthropic's `/v1/messages/count_tokens` API for exact counts
- **`CharacterHeuristicEstimator`** — Simple chars÷4 fallback for unknown models

### 2. Model Capability Registry

A registry of context window sizes and model metadata so the orchestrator knows its budget:

```csharp
public interface IModelCapabilityRegistry
{
    ModelCapabilities GetCapabilities(string modelId);
}

public sealed record ModelCapabilities(
    string ModelId,
    int ContextWindowTokens,      // e.g., 200_000 for Claude Sonnet
    int MaxOutputTokens,          // e.g., 8_192
    int EffectiveInputBudget);    // ContextWindow - MaxOutput - safety margin
```

Could be populated from configuration, or from provider APIs that expose model metadata.

### 3. Context Budget Tracker

Middleware or service that tracks running token estimates through the conversation lifecycle:

```csharp
public interface IContextBudgetTracker
{
    /// <summary>Current estimated usage vs. budget.</summary>
    ContextBudgetStatus GetStatus(string sessionId);
    
    /// <summary>Event raised when usage crosses a threshold.</summary>
    event Action<ContextBudgetAlert> OnBudgetAlert;
}

public sealed record ContextBudgetStatus(
    int EstimatedTokensUsed,
    int BudgetTokens,
    double UtilizationPercent);

public sealed record ContextBudgetAlert(
    string SessionId,
    double UtilizationPercent,    // e.g., 0.70, 0.90
    AlertLevel Level);            // Warning, Critical
```

### 4. Proactive Summarization Trigger

When the budget tracker hits a configurable threshold (e.g., 70%), automatically summarize older conversation turns *before* the next LLM call — rather than waiting for a `"length"` finish reason which means context was already truncated.

This builds on the existing "sliding window with summarization" design decision from `open-questions.md`, making the summarization trigger *proactive* rather than reactive.

## How Token Counting Works (Background)

Three mechanisms, typically used in combination:

| Mechanism | Timing | Accuracy | Cost |
|-----------|--------|----------|------|
| **API response `usage`** | After request | Exact | Free (included in response) |
| **Client-side tokenizer** (tiktoken, `Microsoft.ML.Tokenizers`) | Before request | ~95-99% accurate | Free (local computation) |
| **Count-tokens API** (Anthropic, Google) | Before request | Exact | Small API cost, no inference |

## Integration Points

- **`IConversationMemory`** — Budget tracker reads conversation history to estimate current usage
- **`UserMessageHandler`** — Checks budget before building LLM request; triggers summarization if needed
- **`ChunkingAIFunction`** — Already manages large tool results; token estimator could improve the 16K character threshold to be token-aware
- **`ILlmClient`** — Could expose model capabilities alongside chat completions

## Design Considerations

- Token estimation should be **optional** — the system should degrade gracefully if no estimator is configured (fall back to character heuristics or skip proactive management)
- Different providers have different tokenizers — the estimator should be **per-model or per-provider**, selected based on the model being used
- Estimation is inherently approximate for most approaches — design for **budgets with safety margins**, not exact counts
- `Microsoft.ML.Tokenizers` is the canonical .NET tokenizer library and supports OpenAI models; evaluate whether it covers enough models or if provider-specific APIs are needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add proactive context window / token budget management #164

Problem

Proposed Solution

1. `ITokenEstimator` Abstraction

2. Model Capability Registry

3. Context Budget Tracker

4. Proactive Summarization Trigger

How Token Counting Works (Background)

Integration Points

Design Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mechanism	Timing	Accuracy	Cost
API response `usage`	After request	Exact	Free (included in response)
Client-side tokenizer (tiktoken, `Microsoft.ML.Tokenizers`)	Before request	~95-99% accurate	Free (local computation)
Count-tokens API (Anthropic, Google)	Before request	Exact	Small API cost, no inference

Add proactive context window / token budget management #164

Description

Problem

Proposed Solution

1. ITokenEstimator Abstraction

2. Model Capability Registry

3. Context Budget Tracker

4. Proactive Summarization Trigger

How Token Counting Works (Background)

Integration Points

Design Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `ITokenEstimator` Abstraction