Summary
Enable hardware-aware task routing in the multi-agent swarm, intelligently distributing work across different model sizes and GPU configurations.
Problem
The swarm currently treats all agents equally regardless of the underlying hardware. On high-performance local setups (dual RTX 4090s, mixed GPU configurations), there's no way to route lightweight tasks to faster/smaller models while reserving large models for complex reasoning.
Proposal
Implement hardware-aware task scheduling that:
- Model routing: Route syntax-level verification (lint, format, type check) to fast quantized models (e.g., Qwen 1.5B) while reserving 8B/32B models for deep architectural planning
- GPU-aware scheduling: Detect available GPUs and their VRAM, distribute agents accordingly
- Latency-optimized dispatch: Measure per-model response latency and route time-sensitive tasks (interactive feedback) to faster models
- Cost-aware batching: Group similar small tasks for batch processing on smaller models
Implementation Ideas
- Extend
Config with a [models] section supporting multiple model profiles with capability tags
- Add
ModelRouter that selects the best model for each task based on complexity estimation
- Use the existing
sysinfo dependency for hardware detection
- Integrate with
nvml_wrapper (already in deps) for GPU monitoring
- Task complexity heuristic: token count, file count, language complexity → model selection
Example Config
[models.fast]
endpoint = "http://localhost:8001/v1"
model = "Qwen/Qwen3.5-1.5B"
capabilities = ["syntax", "format", "lint"]
[models.deep]
endpoint = "http://localhost:8002/v1"
model = "Qwen/Qwen3.5-32B"
capabilities = ["architecture", "refactor", "security"]
Relevant Code
src/tools/swarm_tool.rs — swarm dispatch
src/orchestration/ — multi-agent orchestration
src/config/mod.rs — model profiles already exist (resolve_model())
Summary
Enable hardware-aware task routing in the multi-agent swarm, intelligently distributing work across different model sizes and GPU configurations.
Problem
The swarm currently treats all agents equally regardless of the underlying hardware. On high-performance local setups (dual RTX 4090s, mixed GPU configurations), there's no way to route lightweight tasks to faster/smaller models while reserving large models for complex reasoning.
Proposal
Implement hardware-aware task scheduling that:
Implementation Ideas
Configwith a[models]section supporting multiple model profiles with capability tagsModelRouterthat selects the best model for each task based on complexity estimationsysinfodependency for hardware detectionnvml_wrapper(already in deps) for GPU monitoringExample Config
Relevant Code
src/tools/swarm_tool.rs— swarm dispatchsrc/orchestration/— multi-agent orchestrationsrc/config/mod.rs— model profiles already exist (resolve_model())