fix(ensemble-workflow): null-skip fail-open in shared harness + harness regression tests#16
Merged
Merged
Conversation
…egression tests Dogfood 另外三個 skill(lecture/academic/compose)+ 共用 harness:3 個 skill 的 harness contract(欄位名)全部正確、無 mismatch;唯一真 bug 在共用 harness。 修正: - null-skip fail-OPEN:Workflow 在使用者中途 skip agent 時 agent() 回 null, 但 review/codex/da 三個 .then 只處理 null 的 findings、沒處理 ok flag → 被 skip 的 reviewer 當成 ok:true(乾淨通過)→ 繞過 fail-closed → 假 PASS。 三處 .then 把 r==null 視為 ok:false。repro + 整合測試已證實。 新增: - test/ensemble-workflow.test.mjs — harness 的 8 個 node regression (unknown profile/空 lens/skip/throw/DA 缺席→HIGH integrity、codex→INFO、 mergeDedup malformed severity 穩健)。把 fail-open 修正鎖死。接進 run.sh + CI。 Docs: - plugin CLAUDE.md 修 drift:4 個實際 skill + 雙 backend + fail-closed。 bump 2.14.0 → 2.14.1(plugin.json + marketplace.json)。
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
摘要
對另外三個 skill(lecture / academic / compose)+ 共用 harness 做與 code-review 三輪同等強度的 dogfood(task 2)。結論:三個 skill 的 harness contract 全部正確,唯一真 bug 在共用 harness —— 一個 fail-OPEN 假綠燈。順手把 harness 補上 regression 測試、修了 plugin CLAUDE.md 的 drift。
Dogfood 結果
ensemble-lecture-reviewensemble-academic-reviewdisableLenses:["number-verifier"]、priors:{"reference-verifier","da"}、codexMaxTime 走 profile 預設 900 —— 欄位名/值全對。disableLenses 永不空集(methodology/writing 恆在 + harness 空集 backstop)。ensemble-composecsv模組(非 naive split)、空 key/focus skip、與 harness customLenses filter 對稱;includeLenses/customLenses 欄位對得上;大 CSV 由 harness slice 封頂。ensemble-workflow.js修的 bug(共用 harness,fail-OPEN)
null-skip fail-open:Workflow runtime 在「使用者中途 skip 某 agent」時讓
agent()回null(官方語意)。但 review / codex / devil's-advocate 三個.then只用(r && r.findings) || []處理 null 的 findings、沒處理okflag:.catch只攔 throw、攔不到 null 回傳。後果:被 skip 的 core lens / DA 不會觸發 fail-closed 的 HIGH integrity finding,verdict 可能假 PASS —— 與 code-review 三輪一直在防的「git fatal → 假綠燈」同一類,只是換成 JS 的 null 路徑。repro + 整合測試已證實:修前 skip security →
PASS(無 integrity);修後 →FINDINGS+security:HIGH。新增:harness regression 測試(task 2 = task 1 的同等待遇)
harness 原本零測試。
test/ensemble-workflow.test.mjs(8 個,純 node)把 workflow script body 包成可 import 的 async 函式、注入 mock 的agent/parallel/phase/log/args實跑整個 orchestration:已接進
test/run.sh與.github/workflows/test.yml。Docs
plugin
CLAUDE.md修 drift:原本只列已不存在的單一/parallel-ai-agents:ensemble-review+ 「4 teammates + 1 Codex」舊架構 → 改成實際的 4 個 skill + 雙 backend(Workflow harness 預設、legacy fallback)+ fail-closed 說明。驗證
版本
2.14.0→2.14.1(plugin.json + marketplace.json 同步)。CHANGELOG 已補[2.14.1]。