diff --git a/CHANGELOG.md b/CHANGELOG.md index 0b638be..c866f5d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -42,6 +42,17 @@ versioning follows [Semantic Versioning](https://semver.org/). --verify` and returns a clear `GitNonZero` rather than silently overwriting. Branch naming stays the documented `loop//w` (see `worktree::branch_name`). +- **Per-task model selection in `hew loop`.** Heavy tasks can route to a + stronger model without changing the rest of the queue. Precedence + (highest first): description tag ``, label + `model:`, config `loop.model.by_priority.

`, + `loop.model.by_type.`, `loop.model.default`. The resolved model + is passed to the spawner per iter as `--model` / `-m` and logged in + `iter-NNN.json::model`. `hew loop summary` adds a "by model" + breakdown table (iters, tasks, input/cached/output/total) when at + least one iter recorded a model; hidden otherwise. See + `docs/LOOP.md` "Per-task model selection" for syntax + the + per-model prompt-cache caveat. ### Changed diff --git a/docs/LOOP.md b/docs/LOOP.md index 5094cbc..90ab3fc 100644 --- a/docs/LOOP.md +++ b/docs/LOOP.md @@ -136,6 +136,73 @@ iter 6: claude succeeds → no cooldown If iter 5's claude retry had errored, the loop would re-enter cooldown for another 3 iters on codex, then retry claude once at iter 9. +--- + +## Per-task model selection + +By default every iter runs against the runtime's default model. When +one task is genuinely harder than the rest of the queue — a tricky +refactor, a thorny algorithm, an architectural call — you can route +just that task to a stronger model without touching the rest. The +loop resolves a `--model` / `-m` override per iter from this +precedence chain (highest wins): + +1. Description tag: `` anywhere in the + task description. Cheapest to add; travels with the task body. +2. Label: `bd label add hew-X model:opus-4-7`. Useful when you don't + want to edit the description, or when many sibling tasks should + share an override. +3. Config: `loop.model.by_priority.

` and `loop.model.by_type.`, + both maps. By-priority wins over by-type when both match. +4. Config: `loop.model.default`. The project-wide floor. +5. None — runtime picks its default. + +### TOML config + +```toml +[loop.model] +default = "sonnet-4-6" + +[loop.model.by_priority] +P0 = "opus-4-7" +P1 = "opus-4-7" + +[loop.model.by_type] +bug = "opus-4-7" +``` + +Read/write via `hew config get loop.model.by_priority` (comma-separated +`KEY=VAL` pairs) or `hew config set loop.model.by_priority.P0 opus-4-7` +(dotted single-entry form; empty value clears). + +### Per-model spend in the summary + +When at least one iter records a `model`, `hew loop summary` adds a +"by model" breakdown: + +```text +by model +───────────────────────────────────────────────────────────────── +model iters tasks input cached output total +opus-4-7 3 3 12.4k 98.1k 4.2k 114.7k +sonnet-4-6 7 6 8.1k 71.2k 2.9k 82.2k +(default) 1 1 0.5k 3.4k 0.2k 4.1k +───────────────────────────────────────────────────────────────── +``` + +Iters without a resolved model collapse under `(default)`. The table +is hidden when no iter recorded one — no flag, no config to enable. + +### Caveat: per-model cache pools + +Anthropic's prompt cache is keyed per model. If half a run uses +`sonnet-4-6` and half uses `opus-4-7`, you pay the cache-creation +input cost twice — once per model — even if the prefix bytes are +identical. The compounding effect from "Memory-graph compounding" is +still real, but it's per-model. Reserve overrides for tasks that +genuinely benefit from the swap; sprinkling them across an otherwise +uniform queue costs cache hits. + ### Failure classification The loop categorizes each iter's outcome before deciding what to do: diff --git a/hew-core/src/config.rs b/hew-core/src/config.rs index d8758a9..fd0d6a0 100644 --- a/hew-core/src/config.rs +++ b/hew-core/src/config.rs @@ -4,6 +4,7 @@ //! to `~/.config/hew/config.toml`. All fields have sensible defaults so //! a missing file is not an error. +use std::collections::BTreeMap; use std::path::{Path, PathBuf}; use serde::{Deserialize, Serialize}; @@ -61,6 +62,26 @@ pub struct LoopConfig { /// Iters the loop stays on the fallback before retrying the /// primary. `None` → default 3 per `DECISION:loop-fallback-policy`. pub fallback_cooldown_iters: Option, + /// Per-task model selection knobs consumed by the dynamic-model + /// resolver (epic `hew-1tq`). All-None / empty by default. + pub model: LoopModelConfig, +} + +/// Persistent inputs for the dynamic per-task model resolver. Model +/// names are free-form strings passed verbatim to the spawner; the +/// runtime CLI rejects unknown ones at invocation time. +#[derive(Debug, Clone, Default, Serialize, Deserialize, schemars::JsonSchema)] +#[serde(default)] +pub struct LoopModelConfig { + /// Fallback model used when no `by_priority` / `by_type` rule + /// matches the task. `None` means "let the resolver fall through + /// to the runtime's own default". + pub default: Option, + /// Model override keyed by task priority label (`P0`..`P4`). + pub by_priority: BTreeMap, + /// Model override keyed by task type (`task`, `bug`, `chore`, + /// `feature`, `epic`, ...). + pub by_type: BTreeMap, } /// Effective default for `fallback_cooldown_iters` when neither the @@ -343,10 +364,49 @@ pub fn get(cfg: &Config, key: &str) -> Option { "loop.fallback_cooldown_iters" | "loop.fallback-cooldown-iters" => { cfg.loop_cfg.fallback_cooldown_iters.map(|n| n.to_string()) } + "loop.model.default" => cfg.loop_cfg.model.default.clone(), + "loop.model.by_priority" | "loop.model.by-priority" => { + Some(format_map(&cfg.loop_cfg.model.by_priority)) + } + "loop.model.by_type" | "loop.model.by-type" => { + Some(format_map(&cfg.loop_cfg.model.by_type)) + } + k if k.starts_with("loop.model.by_priority.") + || k.starts_with("loop.model.by-priority.") => + { + let sub = k.rsplit_once('.').map(|(_, s)| s).unwrap_or_default(); + cfg.loop_cfg.model.by_priority.get(sub).cloned() + } + k if k.starts_with("loop.model.by_type.") || k.starts_with("loop.model.by-type.") => { + let sub = k.rsplit_once('.').map(|(_, s)| s).unwrap_or_default(); + cfg.loop_cfg.model.by_type.get(sub).cloned() + } _ => None, } } +fn format_map(m: &BTreeMap) -> String { + m.iter().map(|(k, v)| format!("{k}={v}")).collect::>().join(",") +} + +fn parse_map(v: &str) -> Result> { + let mut out = BTreeMap::new(); + for entry in v.split(',').map(str::trim).filter(|s| !s.is_empty()) { + let (k, val) = entry.split_once('=').ok_or_else(|| HewError::MissingFlag { + flag: format!("value (expected comma-separated KEY=VALUE pairs, got `{v}`)"), + })?; + let k = k.trim(); + let val = val.trim(); + if k.is_empty() || val.is_empty() { + return Err(HewError::MissingFlag { + flag: format!("value (empty key or value in `{entry}`)"), + }); + } + out.insert(k.to_string(), val.to_string()); + } + Ok(out) +} + /// Set a key. Returns error for unknown keys or invalid values. pub fn set(cfg: &mut Config, key: &str, value: &str) -> Result<()> { let bool_val = |v: &str| -> Result { @@ -498,6 +558,40 @@ pub fn set(cfg: &mut Config, key: &str, value: &str) -> Result<()> { cfg.loop_cfg.fallback_cooldown_iters = Some(n); } } + "loop.model.default" => { + cfg.loop_cfg.model.default = + if value.is_empty() { None } else { Some(value.to_string()) }; + } + "loop.model.by_priority" | "loop.model.by-priority" => { + cfg.loop_cfg.model.by_priority = parse_map(value)?; + } + "loop.model.by_type" | "loop.model.by-type" => { + cfg.loop_cfg.model.by_type = parse_map(value)?; + } + k if k.starts_with("loop.model.by_priority.") + || k.starts_with("loop.model.by-priority.") => + { + let sub = k.rsplit_once('.').map(|(_, s)| s).unwrap_or_default(); + if sub.is_empty() { + return Err(HewError::MissingFlag { flag: format!("key (missing sub-key: {k})") }); + } + if value.is_empty() { + cfg.loop_cfg.model.by_priority.remove(sub); + } else { + cfg.loop_cfg.model.by_priority.insert(sub.to_string(), value.to_string()); + } + } + k if k.starts_with("loop.model.by_type.") || k.starts_with("loop.model.by-type.") => { + let sub = k.rsplit_once('.').map(|(_, s)| s).unwrap_or_default(); + if sub.is_empty() { + return Err(HewError::MissingFlag { flag: format!("key (missing sub-key: {k})") }); + } + if value.is_empty() { + cfg.loop_cfg.model.by_type.remove(sub); + } else { + cfg.loop_cfg.model.by_type.insert(sub.to_string(), value.to_string()); + } + } _ => { return Err(HewError::MissingFlag { flag: format!("key (unknown: {key})") }); } @@ -531,6 +625,9 @@ pub fn keys() -> &'static [&'static str] { "compact.exempt", "loop.fallback_runtime", "loop.fallback_cooldown_iters", + "loop.model.default", + "loop.model.by_priority", + "loop.model.by_type", ] } @@ -623,6 +720,9 @@ mod tests { "compact.exempt" => "STATUS:custom,SOMETHING:else", "loop.fallback_runtime" => "codex", "loop.fallback_cooldown_iters" => "5", + "loop.model.default" => "sonnet-4-6", + "loop.model.by_priority" => "P0=opus-4-7,P3=haiku-4-5", + "loop.model.by_type" => "bug=sonnet-4-6,chore=haiku-4-5", k if k.starts_with("optional-skills.") => "yes", _ => "true", }; @@ -951,6 +1051,131 @@ security = false assert_eq!(loaded.loop_cfg.fallback_cooldown_iters, Some(7)); } + // ──────── loop.model.* ──────── + + #[test] + fn loop_model_defaults_are_empty() { + let cfg = Config::default(); + assert!(cfg.loop_cfg.model.default.is_none()); + assert!(cfg.loop_cfg.model.by_priority.is_empty()); + assert!(cfg.loop_cfg.model.by_type.is_empty()); + assert_eq!(get(&cfg, "loop.model.default"), None); + assert_eq!(get(&cfg, "loop.model.by_priority"), Some(String::new())); + assert_eq!(get(&cfg, "loop.model.by_type"), Some(String::new())); + } + + #[test] + fn loop_model_default_is_free_form_string() { + let mut cfg = Config::default(); + // Per task `Craft:`, no validation against a model catalogue. + set(&mut cfg, "loop.model.default", "sonnet-4-6").unwrap(); + assert_eq!(cfg.loop_cfg.model.default.as_deref(), Some("sonnet-4-6")); + set(&mut cfg, "loop.model.default", "some-future-model-2030").unwrap(); + assert_eq!(cfg.loop_cfg.model.default.as_deref(), Some("some-future-model-2030")); + set(&mut cfg, "loop.model.default", "").unwrap(); + assert!(cfg.loop_cfg.model.default.is_none()); + } + + #[test] + fn loop_model_by_priority_dotted_keys() { + let mut cfg = Config::default(); + set(&mut cfg, "loop.model.by_priority.P0", "opus-4-7").unwrap(); + set(&mut cfg, "loop.model.by_priority.P3", "haiku-4-5").unwrap(); + assert_eq!(get(&cfg, "loop.model.by_priority.P0"), Some("opus-4-7".into())); + assert_eq!(get(&cfg, "loop.model.by_priority.P3"), Some("haiku-4-5".into())); + assert_eq!(get(&cfg, "loop.model.by_priority.P9"), None); + // Comma-list view stays deterministic (BTreeMap order). + assert_eq!(get(&cfg, "loop.model.by_priority"), Some("P0=opus-4-7,P3=haiku-4-5".into())); + // Empty value removes the entry. + set(&mut cfg, "loop.model.by_priority.P0", "").unwrap(); + assert!(!cfg.loop_cfg.model.by_priority.contains_key("P0")); + } + + #[test] + fn loop_model_by_type_dotted_keys() { + let mut cfg = Config::default(); + set(&mut cfg, "loop.model.by_type.bug", "sonnet-4-6").unwrap(); + set(&mut cfg, "loop.model.by_type.chore", "haiku-4-5").unwrap(); + assert_eq!(get(&cfg, "loop.model.by_type.bug"), Some("sonnet-4-6".into())); + assert_eq!(get(&cfg, "loop.model.by-type.chore"), Some("haiku-4-5".into())); + set(&mut cfg, "loop.model.by_type.bug", "").unwrap(); + assert!(!cfg.loop_cfg.model.by_type.contains_key("bug")); + } + + #[test] + fn loop_model_map_bulk_set_parses_comma_list() { + let mut cfg = Config::default(); + set(&mut cfg, "loop.model.by_priority", "P0=opus-4-7, P1=sonnet-4-6,P3=haiku-4-5").unwrap(); + assert_eq!(cfg.loop_cfg.model.by_priority.get("P0").unwrap(), "opus-4-7"); + assert_eq!(cfg.loop_cfg.model.by_priority.get("P1").unwrap(), "sonnet-4-6"); + assert_eq!(cfg.loop_cfg.model.by_priority.get("P3").unwrap(), "haiku-4-5"); + // Empty bulk-set clears the map. + set(&mut cfg, "loop.model.by_priority", "").unwrap(); + assert!(cfg.loop_cfg.model.by_priority.is_empty()); + // Malformed entry without `=` rejected. + assert!(set(&mut cfg, "loop.model.by_priority", "P0,P1").is_err()); + assert!(set(&mut cfg, "loop.model.by_priority", "=opus").is_err()); + assert!(set(&mut cfg, "loop.model.by_priority", "P0=").is_err()); + } + + #[test] + fn loop_model_partial_section_parses() { + // Only `default` present — by_priority / by_type fall back to empty. + let tmp = tempfile::tempdir().unwrap(); + let path = tmp.path().join("config.toml"); + std::fs::write( + &path, + r#" +[loop.model] +default = "sonnet-4-6" +"#, + ) + .unwrap(); + let loaded = load_from(&path).unwrap(); + assert_eq!(loaded.loop_cfg.model.default.as_deref(), Some("sonnet-4-6")); + assert!(loaded.loop_cfg.model.by_priority.is_empty()); + assert!(loaded.loop_cfg.model.by_type.is_empty()); + } + + #[test] + fn loop_model_missing_section_uses_defaults() { + // Pre-existing on-disk config with no [loop.model] section at all. + let tmp = tempfile::tempdir().unwrap(); + let path = tmp.path().join("config.toml"); + std::fs::write( + &path, + r#" +update_check = true + +[loop] +fallback_runtime = "codex" +"#, + ) + .unwrap(); + let loaded = load_from(&path).unwrap(); + assert!(loaded.loop_cfg.model.default.is_none()); + assert!(loaded.loop_cfg.model.by_priority.is_empty()); + assert!(loaded.loop_cfg.model.by_type.is_empty()); + } + + #[test] + fn loop_model_round_trips_through_disk() { + let tmp = tempfile::tempdir().unwrap(); + let path = tmp.path().join("config.toml"); + let mut cfg = Config::default(); + set(&mut cfg, "loop.model.default", "sonnet-4-6").unwrap(); + set(&mut cfg, "loop.model.by_priority.P0", "opus-4-7").unwrap(); + set(&mut cfg, "loop.model.by_priority.P3", "haiku-4-5").unwrap(); + set(&mut cfg, "loop.model.by_type.bug", "sonnet-4-6").unwrap(); + save_to(&path, &cfg).unwrap(); + + let loaded = load_from(&path).unwrap(); + assert_eq!(loaded.loop_cfg.model.default.as_deref(), Some("sonnet-4-6")); + assert_eq!(loaded.loop_cfg.model.by_priority.get("P0").unwrap(), "opus-4-7"); + assert_eq!(loaded.loop_cfg.model.by_priority.get("P3").unwrap(), "haiku-4-5"); + assert_eq!(loaded.loop_cfg.model.by_type.get("bug").unwrap(), "sonnet-4-6"); + } + #[test] fn compact_keys_survive_disk_roundtrip() { let tmp = tempfile::tempdir().unwrap(); diff --git a/hew-core/src/lib.rs b/hew-core/src/lib.rs index f64fc9e..1676b1d 100644 --- a/hew-core/src/lib.rs +++ b/hew-core/src/lib.rs @@ -25,6 +25,7 @@ pub mod git; pub mod guard; pub mod install; pub mod loop_log; +pub mod loop_model; pub mod loop_summary; pub mod memories; pub mod merge_back; diff --git a/hew-core/src/loop_log.rs b/hew-core/src/loop_log.rs index ff9204a..52f6250 100644 --- a/hew-core/src/loop_log.rs +++ b/hew-core/src/loop_log.rs @@ -98,6 +98,12 @@ pub struct IterLog { /// Always `false` when no fallback is configured. #[serde(default)] pub cooldown_engaged: bool, + /// Model the spawner was invoked with for this iter, as resolved by + /// `hew_core::loop_model::resolve_model` (description tag > label > + /// config). `None` ⇒ runtime default was used (display as `(default)` + /// in `hew loop summary`). Absent in pre-epic-D iter logs. + #[serde(default)] + pub model: Option, } impl IterLog { @@ -124,6 +130,7 @@ impl IterLog { symbols_touched, runtime_used: None, cooldown_engaged: false, + model: None, } } } @@ -602,6 +609,41 @@ mod tests { assert!(active_run_ids(&missing).unwrap().is_empty()); } + #[test] + fn iter_log_round_trips_model_field() { + let mut it = Iter::new(1, "2026-05-26T00:00:00Z"); + it.outcome = Some(IterOutcome::Closed); + let mut log = IterLog::from_iter(&it, None, Vec::new(), Vec::new()); + log.model = Some("opus-4.7".into()); + let json = serde_json::to_string(&log).unwrap(); + let parsed: IterLog = serde_json::from_str(&json).unwrap(); + assert_eq!(parsed.model.as_deref(), Some("opus-4.7")); + } + + #[test] + fn iter_log_from_iter_defaults_model_to_none() { + let it = Iter::new(1, "2026-05-26T00:00:00Z"); + let log = IterLog::from_iter(&it, None, Vec::new(), Vec::new()); + assert!(log.model.is_none()); + } + + #[test] + fn iter_log_parses_pre_model_fixture_with_model_none() { + // Backward-compat: legacy iter logs written before the `model` + // field existed must still parse cleanly (model defaults to None). + let path = Path::new(env!("CARGO_MANIFEST_DIR")) + .join("tests") + .join("fixtures") + .join("iter-log-pre-model.json"); + let body = std::fs::read_to_string(&path).expect("read pre-model fixture"); + let parsed: IterLog = serde_json::from_str(&body).expect("parse pre-model fixture"); + assert_eq!(parsed.number, 1); + assert_eq!(parsed.task_id.as_deref(), Some("hew-abc")); + assert_eq!(parsed.outcome.as_deref(), Some("closed")); + assert_eq!(parsed.runtime_used.as_deref(), Some("claude")); + assert!(parsed.model.is_none(), "missing model must deserialize to None"); + } + #[test] fn stop_reason_label_covers_all_variants() { for r in [ diff --git a/hew-core/src/loop_model.rs b/hew-core/src/loop_model.rs new file mode 100644 index 0000000..3b38583 --- /dev/null +++ b/hew-core/src/loop_model.rs @@ -0,0 +1,213 @@ +//! Pure per-task model resolver for the `hew loop` dynamic-model epic. +//! +//! Precedence (first match wins): +//! 1. Description tag ``. +//! 2. Label `model:X`. +//! 3. `cfg.by_priority` keyed by `P{n}`. +//! 4. `cfg.by_type` keyed by task issue type. +//! 5. `cfg.default`. +//! 6. `None` → spawner falls back to the runtime's own default. +//! +//! No I/O, no allocation if every rule misses. +//! +//! See `DECISION:loop-fallback-policy` for fallback runtime; this resolver +//! only chooses the *model name*, leaving runtime selection to the loop. + +use crate::config::LoopModelConfig; + +/// Input shape for the resolver. Callers adapt their own task struct +/// (e.g. `bd::ReadyTask`) into this borrow-only view. +#[derive(Debug, Clone, Copy)] +pub struct TaskRecord<'a> { + pub description: &'a str, + pub labels: &'a [String], + pub priority: u8, + pub issue_type: &'a str, +} + +const TAG_OPEN: &str = ""; +const LABEL_PREFIX: &str = "model:"; + +/// Resolve the model name for a task. See module docs for precedence. +pub fn resolve_model(task: &TaskRecord<'_>, cfg: &LoopModelConfig) -> Option { + if let Some(m) = extract_tag(task.description) { + return Some(m); + } + if let Some(m) = extract_label(task.labels) { + return Some(m); + } + let prio_key = format!("P{}", task.priority); + if let Some(m) = cfg.by_priority.get(&prio_key) { + return Some(m.clone()); + } + if !task.issue_type.is_empty() + && let Some(m) = cfg.by_type.get(task.issue_type) + { + return Some(m.clone()); + } + cfg.default.clone() +} + +fn extract_tag(description: &str) -> Option { + let start = description.find(TAG_OPEN)?; + let rest = &description[start + TAG_OPEN.len()..]; + let end = rest.find(TAG_CLOSE)?; + let value = rest[..end].trim(); + if value.is_empty() { None } else { Some(value.to_string()) } +} + +fn extract_label(labels: &[String]) -> Option { + for raw in labels { + let l = raw.trim(); + if let Some(rest) = l.strip_prefix(LABEL_PREFIX) { + let v = rest.trim(); + if !v.is_empty() { + return Some(v.to_string()); + } + } + } + None +} + +#[cfg(test)] +mod tests { + use super::*; + use std::collections::BTreeMap; + + fn cfg( + default: Option<&str>, + by_priority: &[(&str, &str)], + by_type: &[(&str, &str)], + ) -> LoopModelConfig { + let mut bp = BTreeMap::new(); + for (k, v) in by_priority { + bp.insert((*k).to_string(), (*v).to_string()); + } + let mut bt = BTreeMap::new(); + for (k, v) in by_type { + bt.insert((*k).to_string(), (*v).to_string()); + } + LoopModelConfig { default: default.map(str::to_string), by_priority: bp, by_type: bt } + } + + fn task<'a>( + description: &'a str, + labels: &'a [String], + priority: u8, + issue_type: &'a str, + ) -> TaskRecord<'a> { + TaskRecord { description, labels, priority, issue_type } + } + + #[test] + fn tag_wins_over_label_and_config() { + let labels = vec!["model:label-pick".to_string()]; + let t = task("body more", &labels, 0, "task"); + let c = cfg(Some("cfg-default"), &[("P0", "cfg-prio")], &[("task", "cfg-type")]); + assert_eq!(resolve_model(&t, &c), Some("tag-pick".to_string())); + } + + #[test] + fn label_wins_over_config() { + let labels = vec!["model:label-pick".to_string()]; + let t = task("no tag here", &labels, 0, "task"); + let c = cfg(Some("cfg-default"), &[("P0", "cfg-prio")], &[("task", "cfg-type")]); + assert_eq!(resolve_model(&t, &c), Some("label-pick".to_string())); + } + + #[test] + fn by_priority_beats_by_type() { + let labels: Vec = vec![]; + let t = task("", &labels, 0, "task"); + let c = cfg(Some("cfg-default"), &[("P0", "cfg-prio")], &[("task", "cfg-type")]); + assert_eq!(resolve_model(&t, &c), Some("cfg-prio".to_string())); + } + + #[test] + fn by_type_used_when_priority_misses() { + let labels: Vec = vec![]; + let t = task("", &labels, 2, "bug"); + let c = cfg(Some("cfg-default"), &[("P0", "cfg-prio")], &[("bug", "cfg-bug")]); + assert_eq!(resolve_model(&t, &c), Some("cfg-bug".to_string())); + } + + #[test] + fn default_used_when_no_rule_matches() { + let labels: Vec = vec![]; + let t = task("", &labels, 4, "chore"); + let c = cfg(Some("cfg-default"), &[("P0", "cfg-prio")], &[("bug", "cfg-bug")]); + assert_eq!(resolve_model(&t, &c), Some("cfg-default".to_string())); + } + + #[test] + fn all_empty_returns_none() { + let labels: Vec = vec![]; + let t = task("", &labels, 2, "task"); + let c = cfg(None, &[], &[]); + assert_eq!(resolve_model(&t, &c), None); + } + + #[test] + fn malformed_empty_tag_falls_through_to_label() { + let labels = vec!["model:label-pick".to_string()]; + let t = task("body more", &labels, 0, "task"); + let c = cfg(None, &[], &[]); + assert_eq!(resolve_model(&t, &c), Some("label-pick".to_string())); + } + + #[test] + fn malformed_unterminated_tag_falls_through() { + let labels: Vec = vec![]; + let t = task("body ", &labels, 0, "task"); + let c = cfg(None, &[], &[]); + assert_eq!(resolve_model(&t, &c), Some("opus-4-7".to_string())); + } +} diff --git a/hew-core/src/loop_summary.rs b/hew-core/src/loop_summary.rs index e61d648..9fdfc4a 100644 --- a/hew-core/src/loop_summary.rs +++ b/hew-core/src/loop_summary.rs @@ -36,6 +36,23 @@ pub struct Summary { /// All symbols the run touched across every iter, deduplicated. /// Empty when treesitter is off or no commits were made. pub symbols_touched: Vec, + /// Per-model token breakdown, in first-seen order. Empty when no + /// iter populated the `model` field (legacy / pre-Epic-D runs). + pub per_model: Vec, +} + +/// One row of the per-model breakdown table. `model` is the resolved +/// label as written to `IterLog.model`; iters with `model=None` are +/// grouped under `"(default)"`. +#[derive(Clone, Debug, PartialEq, Eq)] +pub struct ModelBreakdown { + pub model: String, + pub iter_count: u32, + pub tasks_closed: u32, + pub input: u64, + pub cached: u64, + pub output: u64, + pub total: u64, } /// Build a [`Summary`] from the run + its iter logs. `iter_logs` is @@ -81,6 +98,8 @@ pub fn summarize(run: &Run, iter_logs: &[IterLog]) -> Summary { } } + let per_model = aggregate_per_model(iter_logs); + let cache_stable_from = iter_logs.windows(2).enumerate().find_map(|(i, pair)| { match (&pair[0].prompt_prefix_hash, &pair[1].prompt_prefix_hash) { (Some(a), Some(b)) if a == b => Some((i + 2) as u32), @@ -100,7 +119,48 @@ pub fn summarize(run: &Run, iter_logs: &[IterLog]) -> Summary { per_iter_tokens, stop_reason: run.stop_reason, symbols_touched, + per_model, + } +} + +/// Group iters by their `model` label and sum tokens + iter/task counts. +/// Returns rows in first-seen order. When no iter has `model=Some(_)` +/// we return an empty Vec — the render side hides the section entirely. +fn aggregate_per_model(iter_logs: &[IterLog]) -> Vec { + let any_populated = iter_logs.iter().any(|l| l.model.is_some()); + if !any_populated { + return Vec::new(); } + let mut order: Vec = Vec::new(); + let mut rows: BTreeMap = BTreeMap::new(); + for log in iter_logs { + let key = log.model.clone().unwrap_or_else(|| "(default)".to_string()); + if !rows.contains_key(&key) { + order.push(key.clone()); + rows.insert( + key.clone(), + ModelBreakdown { + model: key.clone(), + iter_count: 0, + tasks_closed: 0, + input: 0, + cached: 0, + output: 0, + total: 0, + }, + ); + } + let row = rows.get_mut(&key).expect("inserted above"); + row.iter_count += 1; + if log.outcome.as_deref() == Some("closed") { + row.tasks_closed += 1; + } + row.input += log.cost.input; + row.cached += log.cost.cache_read + log.cost.cache_create; + row.output += log.cost.output; + row.total += log.cost.total(); + } + order.into_iter().map(|k| rows.remove(&k).expect("present")).collect() } /// Render the summary as a coloured terminal block. Pass `colorize=false` @@ -206,6 +266,47 @@ pub fn render(summary: &Summary, logs_path: &str, colorize: bool) -> String { let _ = writeln!(s, " {bold}symbols{reset}: {}{footer}", shown.join(", ")); } + // Per-model breakdown (hidden when no iter recorded a model). + if !summary.per_model.is_empty() { + let _ = writeln!(s, " {bold}by model{reset}:"); + let headers = ["model", "iters", "tasks", "input", "cached", "output", "total"]; + let mut widths: [usize; 7] = std::array::from_fn(|i| headers[i].chars().count()); + let rendered_rows: Vec<[String; 7]> = summary + .per_model + .iter() + .map(|m| { + [ + m.model.clone(), + m.iter_count.to_string(), + m.tasks_closed.to_string(), + fmt_int(m.input), + fmt_int(m.cached), + fmt_int(m.output), + fmt_int(m.total), + ] + }) + .collect(); + for row in &rendered_rows { + for (i, cell) in row.iter().enumerate() { + widths[i] = widths[i].max(cell.chars().count()); + } + } + let fmt_row = |cells: &[&str; 7]| -> String { + // First column left-aligned, numeric columns right-aligned. + let mut out = format!(" {:w$}", cells[i], w = widths[i])); + } + out + }; + let header_refs: [&str; 7] = std::array::from_fn(|i| headers[i]); + let _ = writeln!(s, "{dim}{}{reset}", fmt_row(&header_refs)); + for row in &rendered_rows { + let refs: [&str; 7] = std::array::from_fn(|i| row[i].as_str()); + let _ = writeln!(s, "{}", fmt_row(&refs)); + } + } + // Sparkline (skip when only one iter). if summary.per_iter_tokens.len() >= 2 { let spark = sparkline(&summary.per_iter_tokens); @@ -402,6 +503,7 @@ mod tests { symbols_touched, runtime_used: None, cooldown_engaged: false, + model: None, } } @@ -639,6 +741,103 @@ mod tests { assert!(txt.contains("src/y.rs:fn_b")); } + fn iter_log_with_model( + n: u32, + label: &str, + tokens: TokenSpend, + model: Option<&str>, + ) -> IterLog { + let mut log = iter_log(n, label, Some("h"), tokens); + log.model = model.map(str::to_string); + log + } + + #[test] + fn summarize_per_model_groups_and_sums_mixed_run() { + let t = |i, o, cr, cc| TokenSpend { input: i, output: o, cache_read: cr, cache_create: cc }; + let logs = vec![ + iter_log_with_model(1, "closed", t(100, 50, 0, 1000), Some("opus")), + iter_log_with_model(2, "no_close", t(80, 40, 200, 0), Some("opus")), + iter_log_with_model(3, "closed", t(200, 100, 5000, 0), Some("sonnet")), + iter_log_with_model(4, "closed", t(150, 75, 0, 500), Some("sonnet")), + iter_log_with_model(5, "closed", t(50, 25, 0, 0), Some("sonnet")), + ]; + let run = run_with(Vec::new()); + let sum = summarize(&run, &logs); + assert_eq!(sum.per_model.len(), 2); + // First-seen order: opus then sonnet. + let opus = &sum.per_model[0]; + assert_eq!(opus.model, "opus"); + assert_eq!(opus.iter_count, 2); + assert_eq!(opus.tasks_closed, 1); + assert_eq!(opus.input, 180); + assert_eq!(opus.cached, 1200); + assert_eq!(opus.output, 90); + assert_eq!(opus.total, 180 + 90 + 1200); + + let sonnet = &sum.per_model[1]; + assert_eq!(sonnet.model, "sonnet"); + assert_eq!(sonnet.iter_count, 3); + assert_eq!(sonnet.tasks_closed, 3); + assert_eq!(sonnet.input, 400); + assert_eq!(sonnet.cached, 5500); + assert_eq!(sonnet.output, 200); + assert_eq!(sonnet.total, 400 + 200 + 5500); + } + + #[test] + fn summarize_per_model_hides_section_when_no_model_recorded() { + let logs = vec![ + iter_log(1, "closed", Some("h1"), TokenSpend::default()), + iter_log(2, "closed", Some("h1"), TokenSpend::default()), + ]; + let run = run_with(Vec::new()); + let sum = summarize(&run, &logs); + assert!(sum.per_model.is_empty(), "per_model must be empty when no iter set model"); + let txt = render(&sum, "/x", false); + assert!(!txt.contains("by model"), "section must be hidden:\n{txt}"); + } + + #[test] + fn summarize_per_model_default_label_for_unlabeled_iters() { + // Mixed: one iter has model=None, one has Some("opus"). The None + // group should appear as "(default)". + let logs = vec![ + iter_log_with_model(1, "closed", TokenSpend::default(), None), + iter_log_with_model(2, "closed", TokenSpend::default(), Some("opus")), + ]; + let run = run_with(Vec::new()); + let sum = summarize(&run, &logs); + assert_eq!(sum.per_model.len(), 2); + assert_eq!(sum.per_model[0].model, "(default)"); + assert_eq!(sum.per_model[1].model, "opus"); + } + + #[test] + fn render_per_model_table_appears_with_columns() { + let t = |i, o, cr, cc| TokenSpend { input: i, output: o, cache_read: cr, cache_create: cc }; + let logs = vec![ + iter_log_with_model(1, "closed", t(100, 50, 0, 0), Some("opus")), + iter_log_with_model(2, "closed", t(200, 100, 1000, 0), Some("sonnet")), + ]; + let run = run_with(Vec::new()); + let sum = summarize(&run, &logs); + let txt = render(&sum, "/x", false); + assert!(txt.contains("by model"), "missing per-model header:\n{txt}"); + assert!(txt.contains("opus")); + assert!(txt.contains("sonnet")); + // Header row. + assert!(txt.contains("model")); + assert!(txt.contains("iters")); + assert!(txt.contains("tasks")); + assert!(txt.contains("input")); + assert!(txt.contains("cached")); + assert!(txt.contains("output")); + assert!(txt.contains("total")); + // Cached column for sonnet row = 1000 (cache_read+cache_create). + assert!(txt.contains("1,000"), "expected cached sum to render with separator:\n{txt}"); + } + #[test] fn render_strips_ansi_when_colorize_false() { let logs = vec![iter_log(1, "closed", Some("h1"), TokenSpend::default())]; diff --git a/hew-core/src/runner.rs b/hew-core/src/runner.rs index 1cd745b..3dc8280 100644 --- a/hew-core/src/runner.rs +++ b/hew-core/src/runner.rs @@ -8,6 +8,7 @@ use std::time::Duration; +use crate::config::LoopModelConfig; use crate::runtime::{RuntimeSpawner, SpawnFailureClass}; /// Per-run configuration. Set once at `hew loop` invocation, immutable @@ -30,6 +31,11 @@ pub struct RunConfig { /// running `decide::resolve` after the iter completes. Mutually /// exclusive with `interactive`. pub unattended: bool, + /// Per-task model selection knobs consumed by + /// [`crate::loop_model::resolve_model`] to pick a `--model` / + /// `-m` override for each iter. Empty by default (no overrides; + /// the spawner falls back to its own default). + pub loop_model: LoopModelConfig, } impl Default for RunConfig { @@ -42,6 +48,7 @@ impl Default for RunConfig { strict: true, interactive: false, unattended: false, + loop_model: LoopModelConfig::default(), } } } diff --git a/hew-core/tests/fixtures/iter-log-pre-model.json b/hew-core/tests/fixtures/iter-log-pre-model.json new file mode 100644 index 0000000..3c66234 --- /dev/null +++ b/hew-core/tests/fixtures/iter-log-pre-model.json @@ -0,0 +1,21 @@ +{ + "number": 1, + "task_id": "hew-abc", + "started_at": "2026-05-26T00:00:00Z", + "ended_at": "2026-05-26T00:01:00Z", + "outcome": "closed", + "prompt_prefix_hash": "abc123", + "cost": { + "input": 100, + "output": 50, + "cache_read": 0, + "cache_create": 0 + }, + "decisions": [], + "deferred": [], + "tool_calls": ["Read", "Edit"], + "stderr_tail": null, + "symbols_touched": [], + "runtime_used": "claude", + "cooldown_engaged": false +} diff --git a/hew/src/commands/loop_cmd.rs b/hew/src/commands/loop_cmd.rs index 58b5334..92b3290 100644 --- a/hew/src/commands/loop_cmd.rs +++ b/hew/src/commands/loop_cmd.rs @@ -23,10 +23,12 @@ use std::time::Duration; use clap::{Args as ClapArgs, Subcommand}; use hew_core::backpressure::{self, GateCheck, Verdict}; use hew_core::bd::{BdClient, RealBd}; +use hew_core::config::LoopModelConfig; use hew_core::loop_log::{ IterLog, LOOP_ROOT, Manifest, ManifestWorker, RunLog, iter_log_path, new_run_id, run_dir, run_log_path, stop_file_path, write_json_atomic, write_manifest, }; +use hew_core::loop_model::{TaskRecord, resolve_model}; use hew_core::prompt; use hew_core::runner::{CooldownState, Iter, IterOutcome, Run, RunConfig}; use hew_core::runtime::{ @@ -398,6 +400,7 @@ pub fn run_loop(ctx: &Ctx, args: Args) -> miette::Result<()> { let fallback_spawner: Option> = if args.dry_run { None } else { fallback.runtime.map(build_spawner_for) }; let gate = AutoGateRunner; + let loop_model = cfg.loop_cfg.model.clone(); run_loop_with( ctx, args, @@ -405,6 +408,7 @@ pub fn run_loop(ctx: &Ctx, args: Args) -> miette::Result<()> { spawner.as_deref(), fallback_spawner.as_deref(), fallback, + loop_model, &gate, &project_root, ) @@ -479,6 +483,7 @@ pub fn run_loop_with( spawner: Option<&dyn RuntimeSpawner>, fallback_spawner: Option<&dyn RuntimeSpawner>, fallback: FallbackConfig, + loop_model: LoopModelConfig, gate: &dyn GateRunner, project_root: &Path, ) -> miette::Result<()> { @@ -494,11 +499,22 @@ pub fn run_loop_with( spawner, fallback_spawner, fallback, + loop_model, gate, project_root, ); } - run_loop_parallel(ctx, args, bd, spawner, fallback_spawner, fallback, gate, project_root) + run_loop_parallel( + ctx, + args, + bd, + spawner, + fallback_spawner, + fallback, + loop_model, + gate, + project_root, + ) } /// Today's single-worker loop, factored out so [`run_loop_with`] can @@ -513,6 +529,7 @@ fn run_loop_serial( spawner: Option<&dyn RuntimeSpawner>, fallback_spawner: Option<&dyn RuntimeSpawner>, fallback: FallbackConfig, + loop_model: LoopModelConfig, gate: &dyn GateRunner, project_root: &Path, ) -> miette::Result<()> { @@ -563,6 +580,7 @@ fn run_loop_serial( spawner, fallback_spawner, fallback, + loop_model.clone(), gate, &worker, &skill, @@ -623,6 +641,7 @@ fn run_loop_parallel( spawner: Option<&dyn RuntimeSpawner>, fallback_spawner: Option<&dyn RuntimeSpawner>, fallback: FallbackConfig, + loop_model: LoopModelConfig, gate: &dyn GateRunner, project_root: &Path, ) -> miette::Result<()> { @@ -731,6 +750,7 @@ fn run_loop_parallel( spawner, fallback_spawner, fallback, + loop_model.clone(), gate, worker, &skill, @@ -841,6 +861,7 @@ pub fn run_worker_loop( spawner: Option<&dyn RuntimeSpawner>, fallback_spawner: Option<&dyn RuntimeSpawner>, fallback: FallbackConfig, + loop_model: LoopModelConfig, gate: &dyn GateRunner, worker: &Worker, skill: &skills::Skill, @@ -865,6 +886,7 @@ pub fn run_worker_loop( strict: args.strict, interactive: args.interactive, unattended: args.unattended, + loop_model, }; let collector = Collector::new(stop_path.to_path_buf()); @@ -974,11 +996,25 @@ pub fn run_worker_loop( let active_kind = if on_fallback { fallback.runtime.unwrap_or(primary_kind) } else { primary_kind }; + // Per-task model resolution (Epic D / hew-1tq). Honors + // description tag > label > config precedence — see + // `hew_core::loop_model::resolve_model`. Empty `LoopModelConfig` + // + un-annotated task ⇒ `None`, behavior identical to the + // pre-epic spawner default. + let model_override = resolve_model( + &TaskRecord { + description: &task.description, + labels: &[], + priority: task.priority, + issue_type: &task.issue_type, + }, + &cfg.loop_model, + ); + let spawn_opts = SpawnOpts { model_override, working_dir: None }; + let (mut outcome, tokens, mut stderr_tail, failure_class) = if let Some(s) = active_spawner { - // SpawnOpts::default() until Epic D wires per-task model - // resolution; opts is the future channel for that override. - match s.spawn(&assembled, allowed, &SpawnOpts::default()) { + match s.spawn(&assembled, allowed, &spawn_opts) { Ok(out) => { let oc = if out.success && out.closed_task.is_some() { IterOutcome::Closed @@ -1163,6 +1199,7 @@ pub fn run_worker_loop( let mut log = IterLog::from_iter(&iter, prefix_hash_hex, Vec::new(), symbols_touched); if active_spawner.is_some() { log.runtime_used = Some(active_kind.as_str().to_string()); + log.model = spawn_opts.model_override.clone(); } log.cooldown_engaged = cooldown.as_ref().map(|c| c.in_cooldown).unwrap_or(false); write_json_atomic(&iter_log_path(&worker.log_dir, worker.worker_n, iter_number), &log) diff --git a/hew/tests/loop_backpressure.rs b/hew/tests/loop_backpressure.rs index a286838..ddf8196 100644 --- a/hew/tests/loop_backpressure.rs +++ b/hew/tests/loop_backpressure.rs @@ -24,6 +24,7 @@ use hew_core::runtime::{RuntimeSpawner, SpawnFailureClass, SpawnOpts, SpawnOutco use hew::commands::loop_cmd::{ Args, GateRunner, StaticGateRunner, Worker, run_loop_with, run_worker_loop, }; +use hew_core::config::LoopModelConfig; use hew_core::runtime::FallbackConfig; /// Spawner that creates a second commit in `repo_dir` to simulate the @@ -168,6 +169,7 @@ fn gate_fail_reverts_iter_commits_and_files_status_memory() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -309,6 +311,7 @@ fn unattended_resolves_deferred_via_memory_lookup() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -392,6 +395,7 @@ fn without_unattended_deferred_is_left_alone() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -558,6 +562,7 @@ fn prompt_prefix_hash_is_stable_across_iters() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -617,6 +622,7 @@ fn out_of_band_closure_promotes_no_close_to_closed() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -674,6 +680,7 @@ fn gate_pass_keeps_iter_commit() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -820,6 +827,7 @@ fn cooldown_routes_to_fallback_for_n_iters_then_retries_primary() { Some(&primary), Some(&fallback_spawner), fallback_cfg, + LoopModelConfig::default(), &gate, &repo, ) @@ -943,6 +951,7 @@ fn gate_is_called_with_worker_worktree_dir() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &worker, &skill, @@ -996,6 +1005,7 @@ fn gate_falls_back_to_project_root_when_unspecified() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -1059,6 +1069,7 @@ fn run_worker_loop_uses_worker_worktree_for_git_calls() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &worker, &skill, @@ -1115,8 +1126,18 @@ fn jobs_1_uses_serial_fast_path() { args.jobs = 1; args.dry_run = true; - run_loop_with(&ctx(), args, &bd, None, None, FallbackConfig::default(), &gate, &repo) - .expect("serial loop runs"); + run_loop_with( + &ctx(), + args, + &bd, + None, + None, + FallbackConfig::default(), + LoopModelConfig::default(), + &gate, + &repo, + ) + .expect("serial loop runs"); let loop_root = repo.join(".hew/loop"); let entry = std::fs::read_dir(&loop_root) @@ -1156,8 +1177,18 @@ fn jobs_2_uses_dispatcher_path() { args.jobs = 2; args.dry_run = true; - run_loop_with(&ctx(), args, &bd, None, None, FallbackConfig::default(), &gate, &repo) - .expect("parallel loop runs (dry-run)"); + run_loop_with( + &ctx(), + args, + &bd, + None, + None, + FallbackConfig::default(), + LoopModelConfig::default(), + &gate, + &repo, + ) + .expect("parallel loop runs (dry-run)"); let loop_root = repo.join(".hew/loop"); let entry = std::fs::read_dir(&loop_root) diff --git a/hew/tests/loop_dynamic_model.rs b/hew/tests/loop_dynamic_model.rs new file mode 100644 index 0000000..4e6ab07 --- /dev/null +++ b/hew/tests/loop_dynamic_model.rs @@ -0,0 +1,153 @@ +//! Integration test for hew-6et: per-iter `resolve_model` output must +//! reach the spawner through `SpawnOpts::model_override`. Builds a +//! fixture ready task with a `` description, +//! drives one dry-run iter against a [`MockSpawner`], and asserts the +//! mock saw the model name verbatim. + +use std::collections::BTreeMap; +use std::ffi::OsStr; + +use hew_core::backpressure::GateCheck; +use hew_core::bd::{BdClient, BdOutput, BdVersion, ReadyTask, StatsSummary}; +use hew_core::config::LoopModelConfig; +use hew_core::ctx::{Ctx, OutputMode}; +use hew_core::error::Result as HewResult; +use hew_core::runtime::{FallbackConfig, MockSpawner, SpawnOutcome}; + +use hew::commands::loop_cmd::{Args, StaticGateRunner, run_loop_with}; + +#[derive(Debug)] +struct StaticBd { + ready: Vec, +} + +impl BdClient for StaticBd { + fn version(&self) -> HewResult { + Ok(BdVersion { raw: "test 1.0.0".into(), semver: "1.0.0".into() }) + } + fn ready(&self) -> HewResult> { + Ok(self.ready.clone()) + } + fn stats(&self) -> HewResult { + Ok(StatsSummary::default()) + } + fn prime_raw(&self) -> HewResult { + Ok(String::new()) + } + fn memories(&self) -> HewResult> { + Ok(BTreeMap::new()) + } + fn remember(&self, _: &str) -> HewResult<()> { + Ok(()) + } + fn run_raw(&self, _: &[&OsStr]) -> HewResult { + Ok(BdOutput { stdout: String::new(), stderr: String::new() }) + } +} + +fn args_one_dry_iter() -> Args { + Args { + max_iter: Some(1), + until_empty: false, + budget_tokens: None, + budget_wall: None, + strict: true, + interactive: false, + unattended: false, + runtime: "claude".into(), + stop_file: None, + // dry_run skips the auto-gate + git-head capture so the test + // doesn't need a temporary repo. The spawner itself is still + // invoked because it's passed in directly to run_loop_with. + dry_run: true, + skill: "hew-execute".into(), + fallback_runtime: None, + fallback_cooldown_iters: None, + jobs: 1, + } +} + +fn ctx() -> Ctx { + Ctx::new(true, OutputMode::Text, true, 0) +} + +#[test] +fn description_model_tag_threads_into_spawn_opts() { + let bd = StaticBd { + ready: vec![ReadyTask { + id: "hew-fake".into(), + title: "synthetic ready task".into(), + description: "body with tag".into(), + priority: 2, + status: "open".into(), + issue_type: "task".into(), + parent: None, + }], + }; + let spawner = MockSpawner::new(SpawnOutcome::default()); + let gate = + StaticGateRunner(GateCheck { tests_passed: true, lint_passed: true, ..Default::default() }); + // tempdir for project_root keeps the run-dir writes isolated. + let tmp = tempfile::tempdir().expect("tempdir"); + + run_loop_with( + &ctx(), + args_one_dry_iter(), + &bd, + Some(&spawner), + None, + FallbackConfig::default(), + LoopModelConfig::default(), + &gate, + tmp.path(), + ) + .expect("loop runs"); + + let last = spawner.last_opts.borrow(); + let opts = last.as_ref().expect("MockSpawner should have recorded SpawnOpts for the iter"); + assert_eq!( + opts.model_override.as_deref(), + Some("opus"), + "expected description tag `` to thread into SpawnOpts::model_override", + ); +} + +#[test] +fn no_annotation_no_config_leaves_model_override_none() { + let bd = StaticBd { + ready: vec![ReadyTask { + id: "hew-plain".into(), + title: "plain ready task".into(), + description: "no tag here".into(), + priority: 2, + status: "open".into(), + issue_type: "task".into(), + parent: None, + }], + }; + let spawner = MockSpawner::new(SpawnOutcome::default()); + let gate = + StaticGateRunner(GateCheck { tests_passed: true, lint_passed: true, ..Default::default() }); + let tmp = tempfile::tempdir().expect("tempdir"); + + run_loop_with( + &ctx(), + args_one_dry_iter(), + &bd, + Some(&spawner), + None, + FallbackConfig::default(), + LoopModelConfig::default(), + &gate, + tmp.path(), + ) + .expect("loop runs"); + + let last = spawner.last_opts.borrow(); + let opts = last.as_ref().expect("spawn opts recorded"); + assert!( + opts.model_override.is_none(), + "empty config + un-annotated task should leave model_override = None, got {:?}", + opts.model_override, + ); +} diff --git a/hew/tests/loop_parallel_e2e.rs b/hew/tests/loop_parallel_e2e.rs index 6c5dddc..d225fbe 100644 --- a/hew/tests/loop_parallel_e2e.rs +++ b/hew/tests/loop_parallel_e2e.rs @@ -23,6 +23,7 @@ use hew_core::runtime::{ use hew::commands::loop_cmd::{Args, StaticGateRunner, run_loop_with}; use hew_core::backpressure::GateCheck; +use hew_core::config::LoopModelConfig; /// Process-wide lock for HOME mutation. Tests in this binary may run on /// separate threads; serializing them keeps the env swap safe. @@ -366,6 +367,7 @@ fn e2e_parallel_jobs_2_with_mock_spawner() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) @@ -443,6 +445,7 @@ fn e2e_parallel_merge_conflict_files_bug_task() { Some(&spawner), None, FallbackConfig::default(), + LoopModelConfig::default(), &gate, &repo, ) diff --git a/skills/core/hew-decompose.md b/skills/core/hew-decompose.md index b67365e..c908419 100644 --- a/skills/core/hew-decompose.md +++ b/skills/core/hew-decompose.md @@ -342,6 +342,21 @@ a task type — see Step 5.) Priority inflation kills the signal. Reserve P0 for the critical path. If everything is P0, you have not decomposed enough. +### Optional — flag heavy tasks for a stronger model + +If one task in the batch is meaningfully harder than the rest +(thorny refactor, gnarly algorithm, architectural call inside the +slice), route just that task to a stronger model when `hew loop` +spawns it. Two cheap surfaces, both per-task: + +- Add `` anywhere in the description. +- Or `bd label add model:opus-4-7` after the task lands. + +For batch-wide policy (e.g. "every P0 runs on opus-4-7"), use +`hew config set loop.model.by_priority.P0 opus-4-7` once and skip +the per-task annotation. See `docs/LOOP.md` "Per-task model +selection" for the full precedence chain. + ## Step 7 — concrete example, end to end User asks for "auth on an existing FastAPI app." Plan is approved. diff --git a/skills/core/hew-plan.md b/skills/core/hew-plan.md index 077f8ec..981ee49 100644 --- a/skills/core/hew-plan.md +++ b/skills/core/hew-plan.md @@ -258,6 +258,16 @@ read these per-plan deviations *in addition to* the project's `CONVENTION:craft.*` memories. Deviations are scoped to the plan id; they do not bleed across features. +### Note for `hew loop` runs + +If you expect this plan to drive an autonomous `hew loop` and one or +two tasks are clearly heavier than the rest, mention it now — the +decomposer will mark those tasks with `` +or a `model:` label so the loop routes them to a stronger +model. Project-wide policy (e.g. "all P0s on opus-4-7") lives in +`loop.model.by_priority` / `loop.model.by_type` and is set once at +the config layer; see `docs/LOOP.md` "Per-task model selection." + ## What you don't do - **No tasks.** That is `hew-decompose`. Do not run `hew task new` here.