feat: Local LLM support via Ollama for plan execution

## Summary

Add support for using local LLMs (via Ollama) as an alternative to Claude Haiku for executing development plans. This enables:
- **Zero API costs** after hardware investment
- **Privacy** — code never leaves the machine
- **Offline execution** — no internet required

## Motivation

Models like **Qwen3-Coder-Next-80B** now rival Claude on coding benchmarks and can run locally on Apple Silicon Macs with 64GB+ unified memory. For teams with suitable hardware, this eliminates per-token costs entirely.

## Proposed Implementation

### 1. Ollama-compatible executor agent
- Use Ollama's OpenAI-compatible API (`localhost:11434/v1/chat/completions`)
- New executor template that works with local models
- Configurable model selection (qwen3-coder-next, codellama, deepseek-coder, etc.)

### 2. Configuration options
```json
{
  "executor": {
    "provider": "ollama",
    "model": "qwen3-coder-next",
    "baseUrl": "http://localhost:11434",
    "contextWindow": 128000
  }
}
```

### 3. Prompt tuning
- May need DevPlan-specific system prompts optimized for open models
- Test and document which models work best with DevPlan format

## Hardware Requirements

| Model | Min RAM | Speed (M4 Pro) |
|-------|---------|----------------|
| Qwen3-Coder-Next-80B (Q4) | 64GB | ~10-15 tok/s |
| DeepSeek-Coder-33B (Q4) | 24GB | ~25-30 tok/s |
| CodeLlama-34B (Q4) | 24GB | ~25-30 tok/s |

## Success Criteria

- [ ] Executor agent can use Ollama as backend
- [ ] At least one model (Qwen3-Coder-Next) tested end-to-end with a sample plan
- [ ] Documentation for local LLM setup
- [ ] Performance comparison vs Haiku (speed, quality)

## Related

- Ollama OpenAI compatibility: https://ollama.com/blog/openai-compatibility
- Qwen3-Coder-Next: https://ollama.com/library/qwen3-coder-next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Local LLM support via Ollama for plan execution #145

Summary

Motivation

Proposed Implementation

1. Ollama-compatible executor agent

2. Configuration options

3. Prompt tuning

Hardware Requirements

Success Criteria

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model	Min RAM	Speed (M4 Pro)
Qwen3-Coder-Next-80B (Q4)	64GB	~10-15 tok/s
DeepSeek-Coder-33B (Q4)	24GB	~25-30 tok/s
CodeLlama-34B (Q4)	24GB	~25-30 tok/s

feat: Local LLM support via Ollama for plan execution #145

Description

Summary

Motivation

Proposed Implementation

1. Ollama-compatible executor agent

2. Configuration options

3. Prompt tuning

Hardware Requirements

Success Criteria

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions