Summary
Add support for using local LLMs (via Ollama) as an alternative to Claude Haiku for executing development plans. This enables:
- Zero API costs after hardware investment
- Privacy — code never leaves the machine
- Offline execution — no internet required
Motivation
Models like Qwen3-Coder-Next-80B now rival Claude on coding benchmarks and can run locally on Apple Silicon Macs with 64GB+ unified memory. For teams with suitable hardware, this eliminates per-token costs entirely.
Proposed Implementation
1. Ollama-compatible executor agent
- Use Ollama's OpenAI-compatible API (
localhost:11434/v1/chat/completions)
- New executor template that works with local models
- Configurable model selection (qwen3-coder-next, codellama, deepseek-coder, etc.)
2. Configuration options
{
"executor": {
"provider": "ollama",
"model": "qwen3-coder-next",
"baseUrl": "http://localhost:11434",
"contextWindow": 128000
}
}
3. Prompt tuning
- May need DevPlan-specific system prompts optimized for open models
- Test and document which models work best with DevPlan format
Hardware Requirements
| Model |
Min RAM |
Speed (M4 Pro) |
| Qwen3-Coder-Next-80B (Q4) |
64GB |
~10-15 tok/s |
| DeepSeek-Coder-33B (Q4) |
24GB |
~25-30 tok/s |
| CodeLlama-34B (Q4) |
24GB |
~25-30 tok/s |
Success Criteria
Related
Summary
Add support for using local LLMs (via Ollama) as an alternative to Claude Haiku for executing development plans. This enables:
Motivation
Models like Qwen3-Coder-Next-80B now rival Claude on coding benchmarks and can run locally on Apple Silicon Macs with 64GB+ unified memory. For teams with suitable hardware, this eliminates per-token costs entirely.
Proposed Implementation
1. Ollama-compatible executor agent
localhost:11434/v1/chat/completions)2. Configuration options
{ "executor": { "provider": "ollama", "model": "qwen3-coder-next", "baseUrl": "http://localhost:11434", "contextWindow": 128000 } }3. Prompt tuning
Hardware Requirements
Success Criteria
Related