Plexir is designed to be provider-agnostic, supporting multiple LLM services with a robust failover mechanism.
Settings are stored in JSON format at ~/.plexir/config.json. This file is automatically secured with 0o600 permissions (owner read/write only).
gemini: Google's Gemini models. Supports API Key (AI Studio) and OAuth (Vertex AI/Standalone).groq: Ultra-fast inference for Llama 3 and Mistral models. Requires a Groq API key.cerebras: High-performance inference provider (OpenAI-compatible). Requires a Cerebras API key.openai: Supports official OpenAI models or any OpenAI-compatible API (like local Ollama instances).
For Gemini providers, you can specify how Plexir should authenticate:
auto(Default): Attempts API Key first, then looks for Standalone OAuth tokens (oauth_creds.json).api_key: Strictly use the providedapi_key.oauth: Strictly use standalone OAuth credentials (using the custom REST client bypass for AI Studio access).
Instead of plain text, you can use:
env:VARIABLE_NAME: Read from environment variables.keyring:username: Read from the system keyring (service:plexir).
Plexir automatically manages the context window to prevent model errors when conversations get too long.
For Gemini providers, Plexir uses the native countTokens API to ensure 100% accurate token measurement. For other providers, it uses a word-based heuristic (1.3 tokens per word) to estimate usage.
When the current conversation history reaches 90% of the provider's context limit, Plexir will proactively:
- Summarize/Distill older messages (using the active LLM) to save space while retaining critical facts.
- Notify the user with a system message in the chat.
- Accumulate the summary at the top of the history to ensure continuity.
You can override the default limit for any provider. This is useful for testing shorter windows or managing costs on preview models.
- Example:
/config set "Gemini Primary" context_limit 50000 - Example:
/config set "Groq Backup" context_limit 8000
Setting to 0 or null will use the model's default limit.
Plexir manages providers using a priority order defined in your config.
- Jittered Retries: If a provider returns a rate limit error (429) or transient server error (503), Plexir will retry with a smart exponential backoff (e.g.,
2^attempt + jitter). - Explicit Delay: Plexir parses "Retry-After" hints from the Google API (e.g., "retry in 17s") and waits the exact duration required.
- Instant Fallback: If retries are exhausted or a hard "Fatal" error occurs, Plexir immediately switches to the next provider in the list.
Plexir allows you to configure token prices per model in config.json. This ensures cost estimation remains accurate as providers update their pricing.
"pricing": {
"gemini-2.0-flash": [0.10, 0.40],
"gpt-4o": [2.50, 10.00]
}(Format: [Input price per 1M, Output price per 1M])
You can manage your providers without leaving the TUI using slash commands:
/config list/config add "My Backup" groq llama3-70b-8192 api_key=YOUR_KEY/config set "Gemini Primary" api_key env:GEMINI_KEY/config set "Gemini Primary" auth_mode oauth/config reorder "Groq Backup" upPlexir tracks real-time token usage and estimates costs based on the active provider's pricing.
To prevent unexpected costs during long sessions, you can set a maximum dollar amount for the current session. If exceeded, Plexir will stop generating and notify you.
/config budget 0.50Set to 0 to disable the limit.
Usage metrics (Tokens and Estimated Cost) are always visible in the System Status sidebar.