Mini-A includes several built-in performance optimizations designed to reduce token usage, minimize LLM calls, and lower costs while maintaining high-quality results. These optimizations are enabled by default and require no configuration.
graph TD
Start((Goal Entered)) --> Context[Automatic Context Management]
Start --> Escalation[Dynamic Escalation]
Start --> Parallel[Parallel Action Prompting]
Start --> Planning[Two-Phase Planning]
Context -->|Compress history| Outcome
Escalation -->|Choose optimal model| Outcome
Parallel -->|Batch operations| Outcome
Planning -->|Separate plan & execution| Outcome[(Lower Tokens / Fewer Calls / Faster Results)]
classDef primary fill:#f97316,stroke:#9a3412,stroke-width:2px,color:#fff
classDef effect fill:#fde68a,stroke:#92400e,color:#78350f
class Context,Escalation,Parallel,Planning primary
class Outcome effect
| Feature | Token Savings | Call Reduction | User Action Required |
|---|---|---|---|
| Automatic Context Management | 30-50% | - | None (automatic) |
| Dynamic Escalation | 5-10% | 10-15% | None (automatic) |
| Parallel Action Prompting | 15-25% | 20-30% | None (automatic) |
| Two-Phase Planning | 15-25% | - | Use useplanning=true |
Combined Impact: 40-60% token reduction, 25-40% fewer LLM calls, 50-70% cost savings on complex goals.
Automatically manages conversation context to prevent unbounded token growth without requiring manual configuration.
Key Features:
- Smart default limit: 50,000 tokens (auto-enabled, no configuration needed)
- Two-tier compression:
- 60% threshold: Removes duplicate observations
- 80% threshold: Summarizes old context
- Context deduplication: Automatically removes redundant entries
Step 1: 5K tokens
Step 5: 15K tokens (growing linearly)
Step 10: 30K tokens → triggers deduplication (removes ~20% redundant entries)
Step 12: 40K tokens → triggers summarization (compresses to ~20K tokens)
✅ No configuration required - works out of the box
✅ 30-50% token reduction on long-running goals
✅ Preserves important context (STATE, SUMMARY entries always kept)
✅ Backward compatible - existing maxcontext parameter still works
You can still override the default behavior:
# Disable automatic management (not recommended)
mini-a goal="..." maxcontext=0
# Set custom limit
mini-a goal="..." maxcontext=100000Note: Setting maxcontext=0 disables automatic context management entirely. This is only recommended for very short goals that won't exceed context limits.
Automatically adjusts when to escalate from low-cost to main model based on goal complexity.
Goal Complexity Assessment:
- Simple: Short, direct goals (e.g., "what is 2+2?")
- Medium: Multi-step or moderate length (e.g., "list files and count them")
- Complex: Long goals with conditions (e.g., "analyze files, then if errors, fix and report")
| Metric | Simple | Medium | Complex |
|---|---|---|---|
| Consecutive errors | 3 | 2 | 2 |
| Consecutive thoughts | 5 | 4 | 3 |
| Total thoughts | 8 | 6 | 5 |
| Steps without action | 6 | 4 | 3 |
Before (fixed thresholds):
Every goal: Escalate after 2 errors or 3 thoughts
Result: Wastes main model on simple tasks OR under-utilizes low-cost model
After (dynamic thresholds):
Simple goal: "what is the capital of France?"
→ Complexity: simple
→ Allows 5 thoughts, 3 errors
→ Stays on low-cost model for entire task
Complex goal: "analyze all TypeScript files, fix errors if found, create report"
→ Complexity: complex
→ Allows 3 thoughts, 2 errors
→ Escalates quickly to main model for difficult work
✅ Optimizes cost/quality tradeoff automatically ✅ 10-20% better cost efficiency on varied workloads ✅ Smarter resource allocation based on task difficulty ✅ Transparent - escalation reasons logged with thresholds
Enable verbose mode to see complexity assessment:
mini-a goal="your goal" verbose=true
# Output:
# [info] Goal complexity assessed as: medium
# [info] Escalation thresholds: errors=2, thoughts=4, totalThoughts=6Encourages LLMs to batch independent operations into a single step, reducing round-trips.
System prompts now include:
- Clear recommendation to use action arrays
- Concrete examples of parallel syntax
- Explicit benefits (fewer calls, faster execution)
- Guidance on when to use parallel actions
Before (sequential execution - 3 LLM calls):
Step 1: {"action":"read_file","params":{"path":"a.txt"}}
Step 2: {"action":"read_file","params":{"path":"b.txt"}}
Step 3: {"action":"read_file","params":{"path":"c.txt"}}After (parallel execution - 1 LLM call):
Step 1: {
"action": [
{"action":"read_file","params":{"path":"a.txt"}},
{"action":"read_file","params":{"path":"b.txt"}},
{"action":"read_file","params":{"path":"c.txt"}}
]
}Perfect for:
- Reading multiple files simultaneously
- Calling several independent MCP tools
- Gathering data from different sources
- Batch validation checks
✅ 20-30% fewer steps for multi-file operations ✅ 15-25% token reduction from fewer round-trips ✅ Faster execution - parallel tool execution when possible ✅ Better LLM awareness - models understand when to batch
Goal: "Compare config files from dev, staging, and prod environments"
Old behavior: 3 separate read operations (3 LLM calls) New behavior: 1 batched read operation (1 LLM call)
When useplanning=true, separates plan generation from execution to reduce per-step overhead.
Traditional approach:
Every step: [400-token planning guidance] + action
Total overhead: N × 400 tokens
Two-phase approach:
Phase 1: Generate plan (1 LLM call, ~50 tokens)
Phase 2: Execute with light guidance (N × 80 tokens)
Total overhead: 50 + (N × 80 tokens)
Savings: ~320 tokens per step after initial plan
Phase 1: Planning (upfront, separate call)
mini-a goal="complex task" useplanning=true
# Generates:
# [plan] Generating execution plan using low-cost model...
# [plan] Plan generated successfully (strategy: simple)Generated plan includes:
- Strategy (simple or tree)
- List of steps with dependencies
- Checkpoints for verification
- Risk assessment
Phase 2: Execution (reduced overhead)
Instead of full planning guidance (13 bullet points), each step receives:
## PLANNING:
• The execution plan has already been generated. Focus on executing tasks.
• Update step 'status' and 'progress' as you work.
• Mark 'state.plan.meta.needsReplan=true' if obstacles occur.
✅ 15-25% token reduction in planning mode ✅ Uses low-cost model for plan generation ✅ Clearer separation - planning vs execution ✅ Better focus - execution prompts emphasize progress updates
Enable planning mode for:
- Multi-step complex goals
- Tasks requiring coordination
- Goals with dependencies
- When you want structured progress tracking
# Enable planning
mini-a goal="analyze codebase and create report" useplanning=true
# Planning with file tracking
mini-a goal="refactor project" useplanning=true planfile="progress.md"When a plan file is provided, Mini-A updates it with:
---
## Progress Update - 2025-01-10T15:30:00Z
### Completed Tasks
- ✅ Scanned directory for TypeScript files (15 found)
- ✅ Analyzed files for syntax errors (3 errors found)
### Knowledge for Next Execution
- Files with errors: auth.ts, config.ts, utils.ts
- Error types: missing type annotations, unused variables
- Next: Fix errors then regenerate reportSimple Goal: "What is the capital of France?"
- Steps: 1
- Tokens: ~3K
- Model: Main (unnecessary)
Complex Goal: "Analyze 10 TypeScript files and create report"
- Steps: 15
- Tokens per step: ~8K → 15K → 25K (growing)
- Total tokens: ~180K
- Model switches: Random
- Planning overhead: 400 tokens/step
Simple Goal: "What is the capital of France?"
- Steps: 1
- Tokens: ~3K
- Model: Low-cost (appropriate)
- Savings: Main model call avoided
Complex Goal: "Analyze 10 TypeScript files and create report"
- Steps: 5 (parallel reads)
- Tokens per step: ~8K → 10K → 12K (controlled)
- Total tokens: ~50K
- Planning: 1 upfront call, then 80 tokens/step
- Savings: 72% tokens, 67% fewer steps
Daily usage: 50 goals
- 30 simple (code questions, quick lookups)
- 15 medium (multi-file operations)
- 5 complex (refactoring, analysis)
Before optimizations:
- Total tokens: ~2.5M/day
- Calls: ~800/day
- Cost (GPT-4): ~$50/day
After optimizations:
- Total tokens: ~1.0M/day (-60%)
- Calls: ~550/day (-31%)
- Cost (GPT-4): ~$20/day
- Monthly savings: ~$900
Goal: "Analyze repository, identify bugs, suggest fixes"
Before optimizations:
- Steps: 25
- Total tokens: ~400K
- Main model calls: 25
- Cost: ~$8
After optimizations:
- Steps: 8 (parallel file reads)
- Total tokens: ~120K (-70%)
- Main model calls: 5 (smart escalation)
- Low-cost calls: 3
- Cost: ~$2.50 (-69%)
- Time saved: 40% faster (parallel execution)
# Good: Complex multi-step task
mini-a goal="refactor authentication system" useplanning=true planfile="auth-refactor.md"
# Not needed: Simple query
mini-a goal="what files are in this directory?" useshell=true# Set low-cost model for routine operations
export OAF_LC_MODEL="(type: openai, model: gpt-4-mini, key: '...')"
# Dynamic escalation handles the rest automatically# See optimization decisions in action
mini-a goal="..." verbose=true
# Watch for:
# - [compress] Removed N redundant entries
# - [warn] Escalating to main model: reason
# - [plan] Plan generated successfullyThe optimizations are designed to work automatically. Avoid:
- Setting
maxcontext=0(disables auto-management) - Forcing main model for simple tasks
- Manually batching operations (let LLM decide)
Symptom: Context exceeds limits even with optimizations
Solution:
# Reduce context limit (triggers compression earlier)
mini-a goal="..." maxcontext=30000
# For very long-running tasks, use planning mode with file tracking
mini-a goal="..." useplanning=true planfile="progress.md"Symptom: Goals assessed as "complex" when they're simple
Possible causes:
- Goal description is very long
- Multiple sentences with "and", "then", "if"
Solution: Simplify goal phrasing:
# Instead of:
mini-a goal="First, list all JavaScript files in this directory, and then count how many there are, and if there are more than 10 files then create a report summarizing them"
# Try:
mini-a goal="Count JavaScript files and report if over 10"Symptom: Still seeing sequential operations for independent tasks
Cause: LLM not recognizing batching opportunity
Solution: Be explicit in goal:
# Hint at parallel operations
mini-a goal="read ALL config files (dev.json, staging.json, prod.json) simultaneously and compare"| Parameter | Default | Description |
|---|---|---|
maxcontext |
50000 | Auto-enables at 50K tokens. Set to 0 to disable (not recommended) |
| Parameter | Default | Description |
|---|---|---|
useplanning |
false | Enable two-phase planning mode |
planfile |
- | File path for progress tracking |
planformat |
md | Plan format (md or json) |
No direct parameters - escalation is automatic based on goal complexity. Use verbose=true to observe behavior.
All optimizations are backward compatible. Existing configurations continue to work:
# Old configurations still work
mini-a goal="..." maxcontext=100000 # Explicit limit respected
mini-a goal="..." useplanning=true # Now uses two-phase mode
# New default behavior (if maxcontext not set)
mini-a goal="..." # Auto-enables at 50K tokensIf you previously set maxcontext=0 to disable summarization:
Old behavior: Unlimited context growth New behavior: Auto-enables at 50K
To restore old behavior (not recommended):
mini-a goal="..." maxcontext=0Better approach: Set higher limit if needed:
mini-a goal="..." maxcontext=200000- Preserve critical entries: STATE, SUMMARY never removed
- Remove exact duplicates: Normalized matching (numbers replaced with "N")
- Limit tool observations: Keep only last 2 per tool type
- Summarize old context: When over 80% of limit
// Complex: token > 200 OR (multi-step AND conditions) OR (tasks AND token > 150)
// Medium: token > 100 OR multi-step OR multiple tasks
// Simple: Everything else
Keywords:
- Multi-step: "and", "then", "first...second", "step 1"
- Conditions: "if", "unless", "when"
- Tasks: numbered lists, semicolonsUses dedicated prompt:
GOAL: <user goal>
Create execution plan with:
1. Strategy (simple or tree)
2. Steps with dependencies
3. Checkpoints
4. Risk assessment
Respond with JSON...
All optimizations expose counters through MiniA.getMetrics() and are registered under the mini-a namespace in the OpenAF metrics registry, making them available to Prometheus/Grafana scrapers.
var agent = new MiniA()
agent.start({ goal: "...", useshell: true })
log(agent.getMetrics())
// Poll OpenAF registry from another job:
// ow.metrics.get("mini-a")The table below lists the metric groups most relevant to performance and optimization. For the complete metric reference (all groups and all counters) see USAGE.md § Metric breakdown.
| Group | Key counters | What it tells you |
|---|---|---|
llm_calls |
normal, low_cost, total, fallback_to_main |
How often the main vs low-cost model was used, and how many escalations occurred |
performance |
steps_taken, total_session_time_ms, avg_step_time_ms, max_context_tokens, llm_estimated_tokens, llm_actual_tokens, step_prompt_build_ms, step_llm_wait_ms, step_tool_exec_ms, step_context_maintenance_ms |
End-to-end timing broken down by phase so you can identify where time is spent |
summarization |
summaries_made, summaries_skipped, context_summarizations, summaries_tokens_reduced, summaries_original_tokens, summaries_final_tokens |
Auto-summarization activity and tokens reclaimed |
context_compression |
prompt_context_compressed, prompt_context_tokens_saved, goal_block_compressed, goal_block_tokens_saved, hook_context_compressed, hook_context_tokens_saved |
Per-block compression savings (prompt, goal, hook context) |
system_prompt |
system_prompt_builds, system_prompt_tokens_last, system_prompt_budget_applied, system_prompt_budget_tokens_saved, system_prompt_examples_dropped, system_prompt_skill_descriptions_dropped, system_prompt_tool_details_dropped, system_prompt_planning_details_dropped, system_prompt_skills_trimmed |
System prompt construction cost and budget trimming decisions |
tool_cache |
hits, misses, total_requests, hit_rate |
Cache effectiveness for deterministic/read-only MCP tools |
tool_selection |
dynamic_used, keyword, llm_lc, llm_main, connection_chooser_lc, connection_chooser_main, fallback_all |
Which dynamic tool-selection strategy fired (mcpdynamic=true) |
mcp_resilience |
circuit_breaker_trips, circuit_breaker_resets, lazy_init_success, lazy_init_failed |
MCP connection health: circuit breaker activity and lazy-init outcomes |
behavior_patterns |
escalations, escalation_consecutive_errors, escalation_consecutive_thoughts, escalation_thought_loop, escalation_steps_without_action, escalation_similar_thoughts, escalation_context_window |
Escalation reasons—helps diagnose why the agent switched to the main model |
planning |
plans_generated, plans_validated, plans_validation_failed, plans_replanned, planning_disabled_simple_goal |
Planning engine activity and validation pass/fail rates |
deep_research |
sessions, cycles, validations_passed, validations_failed, early_success, max_cycles_reached |
Research-and-validate loop outcomes when deepresearch=true |
delegation |
total, completed, failed, timedout, retried, worker_hint_matched, workers_healthy |
Sub-agent delegation lifecycle and worker pool health |
history |
sessions_started, sessions_resumed, files_kept, files_deleted_by_period, files_deleted_by_count |
Console history session tracking and pruning activity |
Tip:
behavior_patterns.escalation_*counters identify why the agent escalated. Ifescalation_context_windowis high, lowermaxcontextor enableuseplanning. Ifescalation_consecutive_thoughtsis high, the agent is looping—review the goal phrasing or add constraints.
Q: Will this break my existing workflows? A: No, all optimizations are backward compatible. Existing configurations work unchanged.
Q: Can I disable these optimizations?
A: Yes, but not recommended. Set maxcontext=0 to disable auto-management. Other optimizations (escalation, parallel actions) are prompt-based and can't be disabled.
Q: Do I need to update my code? A: No, benefits are automatic. Update prompt phrasing to leverage parallel actions.
Q: Will my goals behave differently? A: Goals will complete faster with fewer tokens, but results quality is unchanged or improved.
Q: What if I'm already using maxcontext?
A: Your setting takes precedence. New behavior only applies when maxcontext is unset.
- Usage Guide - Complete configuration reference
- CHEATSHEET - Quick parameter reference
- GitHub Issues - Report problems or request features
✅ Automatic - No configuration required ✅ Backward Compatible - Existing setups unchanged ✅ Significant Savings - 40-60% token reduction, 50-70% cost savings ✅ Better Performance - Faster execution, smarter model usage ✅ Transparent - Verbose mode shows all optimization decisions
The optimizations work together to provide the best balance of cost, speed, and quality for all goal types.