A powerful command-line interface for managing and interacting with the Inference Gateway. This CLI provides tools for configuration, monitoring, and management of inference services.
Early Development Stage: This project is in its early development stage and breaking changes are expected until it reaches a stable version.
Always use pinned versions by specifying a specific version tag when downloading binaries or using install scripts.
- Features
- Installation
- Quick Start
- Commands
- Tools for LLMs
- Configuration
- Cost Tracking
- Tool Approval System
- Shortcuts
- Global Flags
- Examples
- Development
- License
- Automatic Gateway Management: Automatically downloads and runs the Inference Gateway binary (no Docker required!)
- Zero-Configuration Setup: Start chatting immediately with just your API keys in a
.envfile - Interactive Chat: Chat with models using an interactive interface
- Status Monitoring: Check gateway health and resource usage
- Conversation History: Store and retrieve past conversations with multiple storage backends
- Conversation Storage - Detailed storage backend documentation
- Conversation Title Generation - AI-powered title generation system
- Configuration Management: Manage gateway settings via YAML config
- Project Initialization: Set up local project configurations
- Tool Execution: LLMs can execute whitelisted commands and tools - See all tools โ
- Tool Approval System: User approval workflow for sensitive operations with real-time diff visualization
- Agent Modes: Three operational modes for different workflows:
- Standard Mode (default): Normal operation with all configured tools and approval checks
- Plan Mode: Read-only mode for planning and analysis without execution
- Auto-Accept Mode: All tools auto-approved for rapid execution (YOLO mode)
- Toggle between modes with Shift+Tab
- Token Usage Tracking: Accurate token counting with polyfill support for providers that don't return usage metrics
- Cost Tracking: Real-time cost calculation for API usage with per-model breakdown and configurable pricing
- Inline History Auto-Completion: Smart command history suggestions with inline completion
- Customizable Keybindings: Fully configurable keyboard shortcuts for the chat interface
- Extensible Shortcuts System: Create custom commands with AI-powered snippets - Learn more โ
- MCP Server Support: Direct integration with Model Context Protocol servers for extended tool capabilities - Learn more โ
go install github.com/inference-gateway/cli@latestThis installs the binary as cli. To rename it to infer:
mv $(go env GOPATH)/bin/cli $(go env GOPATH)/bin/inferOr use an alias:
alias infer="$(go env GOPATH)/bin/cli"# Create network and deploy inference gateway first
docker network create inference-gateway
docker run -d --name inference-gateway --network inference-gateway \
--env-file .env \
ghcr.io/inference-gateway/inference-gateway:latest
# Pull and run the CLI
docker pull ghcr.io/inference-gateway/cli:latest
docker run -it --rm --network inference-gateway ghcr.io/inference-gateway/cli:latest chat# Latest version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash
# Specific version
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --version v0.77.0
# Custom installation directory
curl -fsSL https://raw.githubusercontent.com/inference-gateway/cli/main/install.sh | bash -s -- --install-dir $HOME/.local/binDownload the latest release binary for your platform from the releases page.
Verify the binary (recommended for security):
# Download binary and checksums
curl -L -o infer-darwin-amd64 \
https://github.com/inference-gateway/cli/releases/latest/download/infer-darwin-amd64
curl -L -o checksums.txt \
https://github.com/inference-gateway/cli/releases/latest/download/checksums.txt
# Verify checksum
shasum -a 256 infer-darwin-amd64
grep infer-darwin-amd64 checksums.txt
# Install
chmod +x infer-darwin-amd64
sudo mv infer-darwin-amd64 /usr/local/bin/inferFor advanced verification with Cosign signatures, see Binary Verification Guide.
git clone https://github.com/inference-gateway/cli.git
cd cli
go build -o infer cmd/infer/main.go
sudo mv infer /usr/local/bin/- Initialize your project:
infer initThis creates a .infer/ directory with configuration and shortcuts.
- Set up your environment (create
.envfile):
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
DEEPSEEK_API_KEY=your_key_here- Start chatting:
infer chatNow that you're up and running, explore these guides:
- Commands Reference - Complete command documentation
- Tools Reference - Available tools for LLMs
- Configuration Guide - Full configuration options
- Shortcuts Guide - Custom shortcuts and AI-powered snippets
- A2A Agents - Agent-to-agent communication setup
The CLI provides several commands for different workflows. For detailed documentation, see Commands Reference.
infer init - Initialize a new project with configuration and shortcuts
infer init # Initialize project configuration
infer init --userspace # Initialize user-level configurationinfer chat - Start an interactive chat session with model selection
infer chatFeatures: Model selection, real-time streaming, scrollable history, three agent modes (Standard/Plan/Auto-Accept).
infer agent - Execute autonomous tasks in background mode
infer agent "Please fix the github issue 38"
infer agent --model "openai/gpt-4" "Implement feature from issue #42"
infer agent "Analyze this UI issue" --files screenshot.pngFeatures: Autonomous execution, multimodal support (images/files), parallel tool execution.
infer config - Manage CLI configuration settings
# Agent configuration
infer config agent set-model "deepseek/deepseek-chat"
infer config agent set-system "You are a helpful assistant"
infer config agent set-max-turns 100
infer config agent verbose-tools enable
# Tool management
infer config tools enable
infer config tools bash enable
infer config tools safety enable
# Export configuration
infer config export set-model "anthropic/claude-4.1-haiku"See Commands Reference for all configuration options.
infer agents - Manage A2A (Agent-to-Agent) agent configurations
infer agents init # Initialize agents configuration
infer agents add browser-agent # Add an agent from the registry with defaults
infer agents add custom https://... # Add a custom agent
infer agents list # List all agentsFor detailed A2A setup, see A2A Agents Configuration.
infer status - Check gateway health and resource usage
infer statusinfer conversation-title - Manage AI-powered conversation titles
infer conversation-title generate # Generate titles for all conversations
infer conversation-title status # Show generation statusinfer version - Display CLI version information
infer versionWhen tool execution is enabled, LLMs can use various tools to interact with your system. Below is a summary of available tools. For detailed documentation, parameters, and examples, see Tools Reference.
| Tool | Purpose | Approval Required | Documentation |
|---|---|---|---|
| Bash | Execute whitelisted shell commands | Optional | Details |
| Read | Read file contents with line ranges | No | Details |
| Write | Write content to files | Yes | Details |
| Edit | Exact string replacements in files | Yes | Details |
| MultiEdit | Multiple atomic edits to files | Yes | Details |
| Delete | Delete files and directories | Yes | Details |
| Tree | Display directory structure | No | Details |
| Grep | Search files with regex (ripgrep/Go) | No | Details |
| WebSearch | Search the web (DuckDuckGo/Google) | No | Details |
| WebFetch | Fetch content from URLs | No | Details |
| Github | Interact with GitHub API | No | Details |
| TodoWrite | Create and manage task lists | No | Details |
| A2A_SubmitTask | Submit tasks to A2A agents | No | Details |
| A2A_QueryAgent | Query A2A agent capabilities | No | Details |
| A2A_QueryTask | Check A2A task status | No | Details |
| A2A_DownloadArtifacts | Download A2A task outputs | No | Details |
Tool Configuration:
Tools can be enabled/disabled and configured individually:
# Enable/disable specific tools
infer config tools bash enable
infer config tools write enable
# Configure tool settings
infer config tools grep set-backend ripgrep
infer config tools web-fetch add-domain "example.com"See Tools Reference for complete documentation.
The CLI uses a powerful 2-layer configuration system with environment variable support.
Create a minimal configuration:
# .infer/config.yaml
gateway:
url: http://localhost:8080
docker: true # Use Docker mode (or false for binary mode)
tools:
enabled: true
bash:
enabled: true
agent:
model: "deepseek/deepseek-chat"
max_turns: 50
chat:
theme: tokyo-night- Environment Variables (
INFER_*) - Highest priority - Command Line Flags
- Project Config (
.infer/config.yaml) - Userspace Config (
~/.infer/config.yaml) - Built-in Defaults - Lowest priority
Example:
# Set via environment variable (highest priority)
export INFER_AGENT_MODEL="openai/gpt-4"
# Or via config file
infer config agent set-model "deepseek/deepseek-chat"
# Or via command flag
infer chat --model "anthropic/claude-4"- gateway.url - Gateway URL (default:
http://localhost:8080) - gateway.docker - Use Docker mode vs binary mode (default:
true) - tools.enabled - Enable/disable all tools (default:
true) - agent.model - Default model for agent operations
- agent.max_turns - Maximum turns for agent sessions (default:
50) - chat.theme - Chat interface theme (default:
tokyo-night) - chat.status_bar.enabled - Enable/disable status bar (default:
true) - chat.status_bar.indicators - Configure individual status indicators (all enabled by default except
max_output)
All configuration can be set via environment variables with the INFER_ prefix:
export INFER_GATEWAY_URL="http://localhost:8080"
export INFER_AGENT_MODEL="deepseek/deepseek-chat"
export INFER_TOOLS_BASH_ENABLED=true
export INFER_CHAT_THEME="tokyo-night"Format: INFER_<PATH> where dots become underscores.
Example: agent.model โ INFER_AGENT_MODEL
For complete configuration documentation, including all options and environment variables, see Configuration Reference.
The CLI automatically tracks API costs based on token usage for all providers and models. Costs are calculated in real-time with support for both aggregate totals and per-model breakdowns.
Use the /cost command in any chat session to see the cost breakdown:
# In chat, use the /cost shortcut
/costThis displays:
- Total session cost in USD
- Input/output costs separately
- Per-model breakdown when using multiple models
- Token usage for each model
Status Bar: Session costs are also displayed in the status bar (e.g., ๐ฐ $0.0234) if enabled.
The CLI includes hardcoded pricing for 30+ models across all major providers (Anthropic, OpenAI, Google, DeepSeek, Groq, Mistral, Cohere, etc.). Prices are updated regularly to match current provider pricing.
Override pricing for specific models or add pricing for custom models:
# .infer/config.yaml
pricing:
enabled: true
currency: "USD"
custom_prices:
# Override existing model pricing
"openai/gpt-4o":
input_price_per_mtoken: 2.50 # Price per million input tokens
output_price_per_mtoken: 10.00 # Price per million output tokens
# Add pricing for custom/local models
"ollama/llama3.2":
input_price_per_mtoken: 0.0
output_price_per_mtoken: 0.0
"custom-fine-tuned-model":
input_price_per_mtoken: 5.00
output_price_per_mtoken: 15.00Via environment variables:
# Disable cost tracking entirely
export INFER_PRICING_ENABLED=false
# Override specific model pricing (use underscores in model names)
export INFER_PRICING_CUSTOM_PRICES_OPENAI_GPT_4O_INPUT_PRICE_PER_MTOKEN=3.00
export INFER_PRICING_CUSTOM_PRICES_OPENAI_GPT_4O_OUTPUT_PRICE_PER_MTOKEN=12.00
# Hide cost from status bar
export INFER_CHAT_STATUS_BAR_INDICATORS_COST=falseStatus Bar Configuration:
# .infer/config.yaml
chat:
status_bar:
enabled: true
indicators:
cost: true # Show/hide cost indicator- Costs are calculated as:
(tokens / 1,000,000) ร price_per_million_tokens - Prices are per million tokens (input and output priced separately)
- Models without pricing data (Ollama, free tiers) show $0.00
- Token counts use actual usage from providers or polyfilled estimates
The CLI includes a comprehensive approval system for sensitive tool operations, providing security and visibility into what actions LLMs are taking.
When a tool requiring approval is executed:
- Validation: Tool arguments are validated
- Approval Prompt: User sees tool details with:
- Tool name and parameters
- Real-time diff preview (for file modifications)
- Approve/Reject/Auto-approve options
- Execution: Tool runs only if approved
| Tool | Requires Approval | Reason |
|---|---|---|
| Write | Yes | Creates/modifies files |
| Edit | Yes | Modifies file contents |
| MultiEdit | Yes | Multiple file modifications |
| Delete | Yes | Removes files/directories |
| Bash | Optional | Executes system commands |
| Read, Grep, Tree | No | Read-only operations |
| WebSearch, WebFetch | No | External read-only |
| A2A Tools | No | Agent delegation |
Configure approval requirements per tool:
# Enable/disable approval for specific tools
infer config tools safety enable # Global approval
infer config tools bash enable # Enable bash toolOr via configuration file:
tools:
safety:
require_approval: true # Global default
write:
require_approval: true
bash:
require_approval: false # Override for bash- y / Enter - Approve execution
- n / Esc - Reject execution
- a - Auto-approve (disables approval for session)
The CLI provides an extensible shortcuts system for quickly executing common commands with /shortcut-name syntax.
Core:
/clear- Clear conversation history/exit- Exit chat session/help [shortcut]- Show available shortcuts/switch [model]- Switch to different model/theme [name]- Switch chat theme/cost- Show session cost breakdown with per-model details/compact- Compact conversation/export [format]- Export conversation
Git Shortcuts (created by infer init):
/git-status- Show working tree status/git-commit- Generate AI commit message from staged changes/git-push- Push commits to remote/git-log- Show commit logs
SCM Shortcuts (GitHub integration):
/scm-issues- List GitHub issues/scm-issue <number>- Show issue details/scm-pr-create [context]- Generate AI-powered PR plan
Create shortcuts that use LLMs to transform data:
# .infer/shortcuts/custom-example.yaml
shortcuts:
- name: analyze-diff
description: "Analyze git diff with AI"
command: bash
args:
- -c
- |
diff=$(git diff)
jq -n --arg diff "$diff" '{diff: $diff}'
snippet:
prompt: |
Analyze this diff and suggest improvements:
```diff
{diff}
```
template: |
## Analysis
{llm}Create custom shortcuts by adding YAML files to .infer/shortcuts/:
# .infer/shortcuts/custom-dev.yaml
shortcuts:
- name: tests
description: "Run all tests"
command: go
args:
- test
- ./...
- name: build
description: "Build the project"
command: go
args:
- build
- -o
- infer
- .Use with /tests or /build.
For complete shortcuts documentation, including advanced features and examples, see Shortcuts Guide.
-v, --verbose: Enable verbose output--config <path>: Specify custom config file path
# Initialize project
infer init
# Start interactive chat
infer chat
# Execute autonomous task
infer agent "Fix the bug in issue #42"
# Check gateway status
infer status# Start chat
infer chat
# In chat, use shortcuts to get context
/scm-issue 123
# Discuss with AI, let it use tools to:
# - Read files
# - Search codebase
# - Make changes
# - Run tests
# Generate PR plan when ready
/scm-pr-create Fixes the authentication timeout issue# Set default model
infer config agent set-model "deepseek/deepseek-chat"
# Enable bash tool
infer config tools bash enable
# Configure web search
infer config tools web-search enable
# Check current configuration
infer config showFor development, use Task for build automation:
task dev # Format, build, and test
task build # Build binary
task test # Run testsSee CLAUDE.md for detailed development documentation.
MIT License - see LICENSE file for details.