Skip to content

[Bug] Context usage indicator always shows <1% with BYOK models via LiteLLM proxy #857

@Rubyj

Description

@Rubyj

Description

When using BYOK custom models routed through a LiteLLM proxy (v1.81.15), the context usage indicator in the CLI status bar always shows <1% regardless of actual context consumption. The /cost command also shows Input: 0 tokens (related to #157).

Environment

  • Droid CLI: v0.81.0
  • macOS (arm64)
  • BYOK config: provider: "anthropic" pointing to a LiteLLM proxy backed by Azure/Bedrock
  • LiteLLM version: 1.81.15
  • Models affected: All custom BYOK models (both Anthropic and OpenAI provider types)
  • showTokenUsageIndicator: true in settings.json

Steps to Reproduce

  1. Configure a BYOK model with provider: "anthropic" pointing to a LiteLLM proxy
  2. Start a droid session using the custom model
  3. Run several prompts that consume meaningful context
  4. Observe the context usage indicator -- always shows <1%
  5. Run /cost -- input tokens show as 0

Investigation: The proxy IS returning correct usage data

I tested the LiteLLM proxy directly with curl and confirmed all token usage fields are correctly populated in every response path:

Non-streaming:

"usage": {"input_tokens": 16, "cache_creation_input_tokens": 0,
          "cache_read_input_tokens": 0, "output_tokens": 8, "total_tokens": 24}

Streaming (message_start event):

"usage": {"input_tokens": 16, "cache_creation_input_tokens": 0,
          "cache_read_input_tokens": 0, "output_tokens": 1}

Streaming (message_delta event):

"usage": {"output_tokens": 8}

Streaming (message_stop event):

"usage": {"input_tokens": 16, "output_tokens": 8}

Streaming with extended thinking: Also returns correct input_tokens and output_tokens.

All Anthropic SSE event types (message_start, message_delta, message_stop) contain properly structured usage data. The proxy is fully compliant with the Anthropic streaming format.

Likely Root Causes

  1. Context window size unknown for custom models: The proxy's /v1/models endpoint returns minimal metadata (no max_tokens or context_window field). Droid likely computes context % as input_tokens / context_window_size, and without knowing the context window for a custom model, it may default to 0 or an extremely large value, yielding <1%.

  2. Token accumulation not working for BYOK streaming: Even though usage data is present in streaming events, Droid may not be parsing/accumulating input_tokens from the message_start event for BYOK provider models (per [Bug] /cost only shows partial token counts when using BYOK models #157).

Expected Behavior

  • Context usage indicator should reflect actual token consumption as a percentage of the model's context window
  • For BYOK models where context window is unknown, Droid could either:
    • Allow users to specify context_window in the BYOK config
    • Use a sensible default based on the model name (e.g., claude-* → 200K)
    • Fall back to showing raw token count instead of a percentage

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions