No active non-live autopilot-safe tasks remain in this fallback queue.
AP-1(test: guard historical backlog pointers) is closed byd3f8eb7.AP-2(docs: refresh autopilot state snapshot) is closed bycd6e7ba. Usedocs/plans/2026-05-01-backlog.mdfor context; the live GraceKelly/Mistral benchmark lane requires staged runtime and explicit opt-in only, and is not an active local backlog item. 2026-05-30 branch note: Colab remote benchmark setup is merged tomasterthrough PR #1 at415d4c8; current state is inAGENT_STATE.mdanddocs/sessions/next-session-3-subagents.md. Master CI and Pages deploy passed. No additional local backlog item is open. 2026-05-30 live opt-in note: commit7b0d9eeclosed a runtime quality blocker by failing closed on incompatible Chroma embedding dimensions. A separate ignored eval collection passed a 3-case live Mistral regression; the default localrag_docs_defaultcollection still needs a deliberate rebuild before it should be used for full RAG quality measurement. 2026-05-30 R3/R4 note: commit71367a7batches multi-documentgrade_docsinto one structured LLM call with fallback to the old per-doc path. Master CI and Pages passed on that commit. 2026-05-30 R4 observability note: commitc0b6d24adds trace events forverify_factsextract-claims and per-claim LLM calls. Master CI and Pages passed on that commit. 2026-05-30 R7 note: commitc964211expands the checked-in RU curated seed set from 20 to 35 cases and adds a guard test. Local mock regression passed 35/35; master CI passed. A final CI guard also makes PRregression-evaltrackevaluation/curated_cases.jsonlchanges.
Historical safe-task snapshot. The tasks below are closed in current history; use
docs/plans/2026-05-01-backlog.mdas the active backlog source. The only remaining benchmark lane is live GraceKelly/Mistral work: explicit opt-in only. It requires staged runtime and is not an active local backlog item.
- Allowed files/directories:
scripts/,README.md,docs/ - Acceptance criteria: a non-mutating local gate command documents and runs the same safe checks used by the runner.
- Required verification: run the new wrapper in dry-run or list mode, then run
git diff --check. - Forbidden scope:
.env,deploy/, Docker, Helm, live services, dependency changes, production DB, external APIs.
- Allowed files/directories:
README.md,docs/ - Acceptance criteria: Windows-specific pytest guidance is consolidated with the current
-p no:schemathesisand.tmp/pytestbasetemp recommendation. - Required verification:
git diff --check. - Forbidden scope: source code, tests, CI, deploy configs, generated reports.
- Allowed files/directories:
tests/test_provider_settings.py,tests/test_mistral_provider.py,config/providers.yml - Acceptance criteria: tests cover placeholder or missing direct-provider API keys without making network calls.
- Required verification:
python -m pytest tests/test_provider_settings.py tests/test_mistral_provider.py -q -p no:schemathesis --basetemp=.tmp/pytestandruff check tests/test_provider_settings.py tests/test_mistral_provider.py config/providers.yml. - Forbidden scope:
.env, real API keys, live provider calls, production config, deploy files.
- Allowed files/directories:
scripts/autopilot.ps1,tests/,docs/ - Acceptance criteria: protocol behavior for PAUSE, BLOCKED, and allowed paths is covered without invoking real
piorcodex. - Required verification: relevant new tests plus
powershell -ExecutionPolicy Bypass -File scripts/autopilot.ps1 -DryRun. - Forbidden scope: scheduler installation, production config, secrets, live external services.