-
Notifications
You must be signed in to change notification settings - Fork 54
[Bug] Context usage indicator always shows <1% with BYOK models via LiteLLM proxy #857
Description
Description
When using BYOK custom models routed through a LiteLLM proxy (v1.81.15), the context usage indicator in the CLI status bar always shows <1% regardless of actual context consumption. The /cost command also shows Input: 0 tokens (related to #157).
Environment
- Droid CLI: v0.81.0
- macOS (arm64)
- BYOK config:
provider: "anthropic"pointing to a LiteLLM proxy backed by Azure/Bedrock - LiteLLM version: 1.81.15
- Models affected: All custom BYOK models (both Anthropic and OpenAI provider types)
showTokenUsageIndicator: truein settings.json
Steps to Reproduce
- Configure a BYOK model with
provider: "anthropic"pointing to a LiteLLM proxy - Start a droid session using the custom model
- Run several prompts that consume meaningful context
- Observe the context usage indicator -- always shows
<1% - Run
/cost-- input tokens show as 0
Investigation: The proxy IS returning correct usage data
I tested the LiteLLM proxy directly with curl and confirmed all token usage fields are correctly populated in every response path:
Non-streaming:
"usage": {"input_tokens": 16, "cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0, "output_tokens": 8, "total_tokens": 24}Streaming (message_start event):
"usage": {"input_tokens": 16, "cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0, "output_tokens": 1}Streaming (message_delta event):
"usage": {"output_tokens": 8}Streaming (message_stop event):
"usage": {"input_tokens": 16, "output_tokens": 8}Streaming with extended thinking: Also returns correct input_tokens and output_tokens.
All Anthropic SSE event types (message_start, message_delta, message_stop) contain properly structured usage data. The proxy is fully compliant with the Anthropic streaming format.
Likely Root Causes
-
Context window size unknown for custom models: The proxy's
/v1/modelsendpoint returns minimal metadata (nomax_tokensorcontext_windowfield). Droid likely computes context % asinput_tokens / context_window_size, and without knowing the context window for a custom model, it may default to 0 or an extremely large value, yielding<1%. -
Token accumulation not working for BYOK streaming: Even though usage data is present in streaming events, Droid may not be parsing/accumulating
input_tokensfrom themessage_startevent for BYOK provider models (per [Bug] /cost only shows partial token counts when using BYOK models #157).
Expected Behavior
- Context usage indicator should reflect actual token consumption as a percentage of the model's context window
- For BYOK models where context window is unknown, Droid could either:
- Allow users to specify
context_windowin the BYOK config - Use a sensible default based on the model name (e.g.,
claude-*→ 200K) - Fall back to showing raw token count instead of a percentage
- Allow users to specify
Related
- [Bug] /cost only shows partial token counts when using BYOK models #157 (
/costshows partial token counts for BYOK models)