Local LLM Optimization Suite - Intelligent routing, RAG, and multi-agent orchestration for Ollama models
Opti-Oignon is a comprehensive optimization framework for local LLMs running on Ollama. It maximizes the performance of your local models through intelligent task routing based on a custom benchmark, RAG (Retrieval-Augmented Generation), and multi-agent orchestration.
- Intelligent Routing - Automatically selects the best model for each task type
- RAG System - Enrich prompts with context from your personal documents
- Multi-Agent Pipelines - Orchestrate multiple models for complex workflows
- Pipeline Manager - Create and manage custom pipelines via visual UI
- Context Manager - Dynamic context monitoring with model-specific limits
- Benchmarking - Evaluate models and auto-generate routing configuration
- Dark Mode UI - Modern Gradio interface with keyboard shortcuts
- Multilingual - Interface in English, responses match user's language
Create, modify, and manage custom multi-agent pipelines directly from the UI:
- Visual step editor with drag-and-drop reordering
- LLM-powered prompt generation for pipeline steps
- Import/Export pipelines in YAML format
- Automatic pipeline detection via weighted keywords
Intelligent context monitoring for optimal model utilization:
- Dynamic context limits fetched from
ollama show - Real-time token estimation and usage display
- Smart truncation with preserved document structure
- Visual indicators (🟢🟡🔴) for context status
- Keepalive mechanism prevents Gradio timeouts during model loading
- Threaded execution for responsive UI
- Model override per pipeline step
- Python 3.10+
- Ollama running locally with models installed
- 16GB+ RAM recommended for 30B+ models
# Clone the repository
git clone https://github.com/AntsAreRad/opti-oignon.git
cd opti-oignon
# Create virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or: .venv\Scripts\activate # Windows
# Install the package
pip install -e .# Start the Gradio UI
python -m opti_oignon
# Or directly
opti-oignon uiThe interface will open at http://localhost:7860
The router analyzes your query to determine:
- Task type (code, debug, explanation, etc.)
- Language (R, Python, Bash, etc.)
- Complexity (simple, medium, complex)
Then selects the optimal model based on benchmark data.
Index your personal documents for context-aware responses:
# Index a folder
opti-oignon rag index ./docs --recursive
# Search indexed content
opti-oignon rag search "Shannon diversity"Supported formats: PDF, DOCX, CSV, Excel, Markdown, Python, R
Orchestrate multiple models for complex tasks:
- Code Review Pipeline: Planner → Coder → Reviewer
- Research Pipeline: Search → Analyze → Synthesize
- Debug Pipeline: Analyze → Fix → Explain
Create custom pipelines via the Pipelines tab:
- Define steps with visual editor (up to 10 steps)
- Assign agents (coder, reviewer, planner, explainer)
- Generate prompts using LLM assistance
- Set keywords for automatic detection
- Configure weights for pipeline priority
Monitor your context usage in real-time:
Context: qwen3-coder:30b (262K)
Input: ~15,000 / 253,952 tokens
[████░░░░░░░░░░░░░░░░] 6%
Run benchmarks to evaluate your models and generate optimal routing configuration:
# Estimate benchmark time (no execution)
opti-oignon benchmark --estimate
# Run quick benchmark (3 models)
opti-oignon benchmark --quick --confirm
# Full benchmark with all models
opti-oignon benchmark --confirm
# Interactive mode with manual scoring
opti-oignon benchmark --interactive --confirmResults are saved to routing/benchmarks/:
benchmark_YYYY-MM-DD_HH-MM.json- Detailed resultsbenchmark_YYYY-MM-DD_HH-MM.md- Human-readable reportbenchmark_latest.json- Latest results for comparison
Edit routing/config.yaml to customize:
# Task routing configuration
task_routing:
code_r:
primary: "qwen3-coder:30b"
fallback: ["devstral-small-2:latest"]
fast: "qwen2.5-coder:14b"
temperature: 0.3
timeout: 120Based on extensive benchmarking:
| Task | Model | Score | Speed |
|---|---|---|---|
| Code (R/Python) | qwen3-coder:30b |
9/10 | ~30s |
| Reasoning | deepseek-r1:32b |
8/10 | ~180s |
| Fast responses | nemotron-3-nano:30b |
8/10 | ~70s |
| Embeddings | mxbai-embed-large |
- | - |
opti-oignon/
├── README.md
├── LICENSE
├── CHANGELOG.md
├── CONTRIBUTING.md
├── setup.py
├── requirements.txt
│
├── opti_oignon/ # Main package
│ ├── __init__.py
│ ├── __main__.py # Entry point
│ ├── main.py # CLI
│ ├── ui.py # Gradio interface
│ ├── config.py # Configuration loader
│ ├── analyzer.py # Task detection
│ ├── router.py # Model routing
│ ├── executor.py # Query execution
│ ├── presets.py # Quick presets
│ ├── history.py # Conversation history
│ ├── context_manager.py # Context monitoring (v1.1+)
│ ├── pipeline_manager.py # Pipeline CRUD (v1.2+)
│ ├── dynamic_pipeline_ui.py # Dynamic planning (v1.2+)
│ │
│ ├── config/ # Configuration files
│ │ ├── models.yaml
│ │ ├── presets.yaml
│ │ └── user_profile.yaml
│ │
│ ├── data/ # User data
│ │ └── pipelines_custom.yaml
│ │
│ ├── routing/ # Intelligent routing
│ │ └── benchmark.py
│ │
│ ├── agents/ # Multi-agent system
│ │ ├── orchestrator.py
│ │ ├── base.py
│ │ ├── dynamic_pipeline.py
│ │ └── specialists/
│ │
│ └── rag/ # RAG system
│ ├── indexer.py
│ ├── retriever.py
│ ├── chunkers.py
│ └── augmenter.py
│
├── docs/ # Documentation
│ ├── INSTALLATION.md
│ ├── BENCHMARK.md
│ ├── ARCHITECTURE.md
│ ├── CONFIGURATION.md
│ ├── CONTEXT_MANAGER.md
│ └── PIPELINE_MANAGER.md
│
├── examples/ # Usage examples
│ ├── basic_usage.py
│ ├── rag_example.py
│ └── multi_agent_example.py
│
└── routing/ # Routing configuration
├── config.yaml
└── benchmarks/
The Gradio interface provides:
- Real-time task detection and model routing
- Context usage indicator
- Preset and pipeline selection
- RAG toggle and status
- Document upload support
- Visual pipeline editor
- Step-by-step creation (up to 10 steps)
- LLM-powered prompt generation
- Import/Export functionality
- Searchable conversation history
- Export to Markdown
- Multi-agent execution metadata
- Model configuration
- RAG settings
- Preset management
- Benchmark controls
from opti_oignon import analyzer, router, executor
# Analyze a query
analysis = analyzer.analyze("Write a function to calculate Shannon index in R")
print(f"Task: {analysis.task_type}, Language: {analysis.language}")
# Get optimal model
routing = router.route(analysis)
print(f"Model: {routing.model}")
# Execute query
response = executor.execute("Your prompt here")
print(response)from opti_oignon import get_pipeline_manager, Pipeline, PipelineStep
pm = get_pipeline_manager()
# List all pipelines
for p in pm.list_all():
print(f"{p.emoji} {p.id}: {p.name}")
# Create custom pipeline
new_pipeline = Pipeline(
id="my_pipeline",
name="My Custom Pipeline",
steps=[
PipelineStep(name="Analyze", agent="reviewer"),
PipelineStep(name="Implement", agent="coder"),
],
keywords=["custom", "workflow"],
)
pm.create(new_pipeline)Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
See CHANGELOG.md for a detailed history of changes.
- v1.2.0 - Pipeline Manager, keepalive executor, model override per step
- v1.1.0 - Context Manager with dynamic limits
This project is licensed under the MIT License - see the LICENSE file for details.
Léon Brouillé - M2 IMABEE (Ecology)
Project Link: https://github.com/AntsAreRad/opti-oignon