Deferred from v0.3 per ADR-0014 §4 and audit gap G2 in docs/internal/audit-2026-05-22-phase8-skill-review.md.
Scope
Build the eval suite that measures graph-extraction quality for each (route × model) pair so the v0.3 "configurable model" gate can graduate from "configurable + caveat" to "configurable + measured" — and so future versions can auto-flip the default route when a local model demonstrably passes the bar.
- Standard corpus (held-out documents covering technical + casual + multi-entity passages).
- Metrics: entity recall, relation precision, schema-violation rate, structured-output reliability.
- Per-route runner (upstream / primary / agent slot).
- Output report consumable by docs (
docs/memory/graph.md) + dashboard.
When this lands
- Once landed, ADR-0014 §4 caveat copy ("Graph quality varies by model. We don't currently measure it for you — your results may vary.") can be replaced with a per-model quality readout.
- Could enable an auto-default-on path for proven model families (additive; no schema migration).
Tag
v0.4 (no v0.3 label).
Deferred from v0.3 per ADR-0014 §4 and audit gap G2 in
docs/internal/audit-2026-05-22-phase8-skill-review.md.Scope
Build the eval suite that measures graph-extraction quality for each
(route × model)pair so the v0.3 "configurable model" gate can graduate from "configurable + caveat" to "configurable + measured" — and so future versions can auto-flip the default route when a local model demonstrably passes the bar.docs/memory/graph.md) + dashboard.When this lands
Tag
v0.4 (no v0.3 label).