Skip to content

feat(m3): SubscriptionCLIBackend (CLI experimental) + ClaudeCodeCLIAdapter#10

Open
suzuke wants to merge 3 commits into
feat/m3-reporter-d3from
feat/m3-cli-subscription
Open

feat(m3): SubscriptionCLIBackend (CLI experimental) + ClaudeCodeCLIAdapter#10
suzuke wants to merge 3 commits into
feat/m3-reporter-d3from
feat/m3-cli-subscription

Conversation

@suzuke
Copy link
Copy Markdown
Owner

@suzuke suzuke commented Apr 26, 2026

Summary

Stacked on #9 (M3 PR 15 reporter d3). Largest M3 PR — adds the experimental SubscriptionCLIBackend wrapping subscription CLIs (Claude Code / Codex / Gemini) as agent backends. Per spec §3.1+§3.2+§3.3+§INV-1+§INV-3.

Scope: Option B+stubsClaudeCodeCLIAdapter is the active implementation; CodexCLIAdapter / GeminiCLIAdapter are stubs that subclass the ABC and raise AdapterNotImplementedError("Gated to PR 16b/c"). This forces the base class to handle 3-adapter shape from day 1 instead of refactoring later.

Truth-in-labeling

The CLI is a "complete agent product" (spec §3.3) with its own agent loop. Crucible's CheatResistancePolicy ACL does NOT constrain it — the CLI sees the host filesystem, can hit the network, etc. This PR is honest about that:

  1. Two-flag opt-in — backend refuses to construct unless config sets BOTH:
    • experimental.allow_cli_subscription: true (acknowledges experimental status)
    • experimental.acknowledge_unsandboxed_cli: true (acknowledges ACL doesn't apply)
  2. Isolation tag — every AttemptNode is tagged isolation="cli_subscription_unsandboxed" (parallel to spec §11.2 Q5's isolation=local_unsafe).
  3. Capabilitiescapabilities() returns {"agent_loop_external", "host_fs_visible"} so callers can branch on degraded ACL.
  4. Documentation uses spec §INV-1 wording: "no bypass observed in N adversarial trials" — never "secure".

The scratch dir provides reproducibility (only declared editable+readonly files visible; modified-files diff capture), NOT security. Operators wanting actual isolation should use Docker mode (spec §INV-2).

Compliance gate

Per spec §3.2, each adapter is gated by a benign-parse compliance harness. The gate is enforced, not advisory:

  • verify_recent_pass_with_path() checks: same cli_binary_path AND same cli_version AND ≤30d old AND pass-rate ≥99% (spec §3.2 release threshold).
  • Backend __init__ calls _check_compliance() — refuses to construct without a recent passing report.
  • experimental.allow_stale_compliance: true overrides with a RED-LETTER WARNING log line.
  • AttemptNode metadata records the actual report file path (compliance_report_path), not the binary path — auditors can follow the trail to the JSONL evidence file.

Reviewer trail

Round Verdict Findings
1 (design) NEEDS_TWEAK 8 Qs + 5 spec-conformance items: scope (Option B+stubs), enforced gate, binary snapshot at construct, schema-version guard, terminate→grace→kill, tri-state safety, secret redaction, ACL truth-in-labeling. Plus: §3.2/§3.3 split, naming collision avoid, AgentResult.backend_metadata, snake_case cli_subscription, §INV-3 belt-and-braces.
2 (impl) REJECTED Bug #1: -p heuristic in _SECRET_FLAG_NAMES redacted the prompt (Claude Code uses -p for the prompt, not password). Every AttemptNode's cli_argv showed prompt as <redacted> — destroying postmortem observability. Bug #2: compliance_report_path metadata recorded CLI binary path, not the report file path. Plus polish: env_allowlist per spec §4.1, dead crucible compliance-check reference.
3 (fix) VERIFIED Bug #1: removed -p heuristic + switched claude_code_cli.py to --print long form (defense in depth) + inverted bad assertion + new end-to-end regression. Bug #2: verify_recent_pass_with_path() threads source path; metadata field now correct. Polish folded in: env_allowlist annotates secret-named entries; phantom CLI command reference removed.
(R3 nit) is_secret_env_name import was unused; now used to tag secret-named entries in env_allowlist audit trail.

Stats

Test plan

Reviewer Q1-Q8 + 5 spec items

  • Q1: CodexCLIAdapter / GeminiCLIAdapter stubs raise AdapterNotImplementedError
  • Q2: verify_recent_pass_with_path rejects stale / wrong-version / below-threshold reports
  • Q3: binary path + version snapshotted at construct, propagated to metadata
  • Q4: stream-json schema_version unknown → unknown_schema=True (no crash)
  • Q5: timeout → terminate → 5s grace → kill; stdout-cap terminates subprocess
  • Q6: SafetyFilterState.{DETECTED, NOT_DETECTED, UNKNOWN} with explicit no-coercion
  • Q7: argv redaction handles --password=foo and --password foo; env redaction strips secret-named
  • Q8: two-flag opt-in + scratch dir + isolation tag + degraded capabilities
  • §3.2/§3.3 split: TrialClassificationSafetyFilterState (disjoint enums)
  • Naming: new file claude_code_cli.py; existing SDK-based claude_code.py untouched
  • AgentResult.backend_metadata: dict[str, Any] field added
  • BACKEND_KIND = "cli_subscription" (snake_case per §4.1)
  • §INV-3: build_argv deliberately omits --shell/--bash/--eval/--code-act

Round-2 bug regressions (CRITICAL)

Round-2 polish

  • env_allowlist metadata records env var NAMES (no values); secret-named entries tagged "NAME:<secret-name>"
  • Compliance error message no longer references phantom crucible compliance-check CLI

Known limitations / non-blockers

  • BENIGN_TASK_SUITE has 3/20 tasks (spec §3.2 mandates 20). Reviewer accepted as PR 16a follow-up; the harness framework + persistence + freshness check are ready.
  • crucible compliance-check CLI command not implemented in this PR. Error message points operators at the compliance module directly. Click wrapper deferred to PR 16a.
  • CodexCLIAdapter / GeminiCLIAdapter are stubs raising AdapterNotImplementedError. PR 16b / 16c implementations.
  • Subprocess env passthrough: full host env passed to CLI subprocess (CLI tools need provider auth env vars). Operators in shared CI must understand this — recorded as env_allowlist metadata for audit.
  • docs/CLI-SUBSCRIPTION-BACKEND.md user-facing doc deferred to PR 18 (marketing wording audit) where §INV-1 wording rules apply.

Migration / usage

# .crucible/config.yaml
agent:
  type: cli-subscription
  cli_subscription:
    adapter: claude-code-cli
    cli_binary_path: null  # PATH lookup; or explicit absolute path
    timeout_seconds: 600
    stdout_cap_bytes: 10485760  # 10 MB
  experimental:
    allow_cli_subscription: true        # acknowledge EXPERIMENTAL
    acknowledge_unsandboxed_cli: true   # acknowledge ACL doesn't apply
    allow_stale_compliance: false       # require recent ≥99% gate report
# 1. Run compliance gate (consumes user's CC quota — defer to PR 16a CLI)
python -c "from crucible.agents.cli_subscription.compliance import ..."

# 2. Run experiments with the gated backend
crucible run --tag t1

🤖 Generated with Claude Code

suzuke and others added 3 commits April 26, 2026 09:37
Per spec §3.1+§3.2+§3.3+§INV-1: experimental backend wrapping
subscription CLIs (Claude Code / Codex / Gemini) as agent providers.
Gated behind ≥99% benign-parse compliance + two-flag opt-in.

Reviewer round 1 NEEDS_TWEAK: 8 Qs + 5 spec-conformance tightenings.
All addressed in this commit.

**Scope: option B+stubs** (reviewer Q1):
- ClaudeCodeCLIAdapter: actual implementation, parses
  `claude -p ... --output-format=stream-json` NDJSON output
- CodexCLIAdapter / GeminiCLIAdapter: stubs that subclass the ABC and
  raise `AdapterNotImplementedError` on `build_argv` / `parse_output`.
  Forces base class to handle 3-adapter shape from day 1; codex/gemini
  wired in PR 16b/c follow-ups.

**Q2 — gate enforced not advisory**:
- `compliance.py` defines `ComplianceReport`, `BenignTask`,
  `TrialClassification` (parse_success + spec §3.2's 4 failure labels:
  parse_failure / model_refusal / format_drift / cli_error).
- `verify_recent_pass()`: requires same `cli_binary_path` + same
  `cli_version` + report ≤30d old + pass_rate ≥ threshold.
- `SubscriptionCLIBackend.__init__` calls `_check_compliance()` and
  REFUSES to construct unless a passing report exists.
- `experimental.allow_stale_compliance: true` overrides with a
  RED-LETTER WARNING log line.

**Q3 — binary resolution at construct**:
- `SubscriptionCLIAdapter._resolve_binary()` runs once at
  construction; absolute path snapshotted.
- `_read_version()` runs `<binary> --version` once; cached.
- Both written to every AttemptNode for that adapter instance.

**Q4 — stream-json schema-version guard**:
- `claude_code_cli.py` checks each event's `schema_version` against
  `KNOWN_STREAM_JSON_SCHEMAS = {"1", "1.0", "v1"}`. Unknown sets
  `unknown_schema=True`; downstream classified as parse_failure.
  Adapter does NOT crash on schema drift.
- Non-JSON lines also mark `unknown_schema=True`.

**Q5 — subprocess management**:
- Per-call timeout default 600s. On timeout: `terminate()` →
  5s grace → `kill()`. Captures partial stdout for forensics.
- Stdout cap default 10MB. On overflow: subprocess is TERMINATED
  (not just truncated) so the CLI doesn't keep racking up
  subscription quota off-budget.
- Streaming read loop with deadline + size checks between chunks.

**Q6 — tri-state safety filter detection**:
- `safety.py`'s `SafetyFilterState` is `DETECTED | NOT_DETECTED | UNKNOWN`.
- Hierarchy: structured-event signal (primary, e.g. `tool_use_denied`,
  `stop_reason: refusal`) → phrase heuristics (per-adapter table) →
  `UNKNOWN` if neither. Reviewer pin: don't coerce unknowns to false.

**Q7 — argv/env redaction**:
- `redaction.py`'s `redact_argv()` handles both `--password=hunter2`
  and `--password hunter2` forms. Patterns: `--api-key`, `--apikey`,
  `--token`, `--password`, `--secret`, `-p`.
- `redact_env()` uses `(?i)(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL|AUTH|BEARER)`.
- Test fixtures explicitly include `--password=hunter2` and
  `OPENAI_API_KEY=sk-real` to assert neither lands in recorded `cli_argv`.

**Q8 — ACL reframe (the load-bearing one)**:
Three-layer truth-in-labeling:
1. **Scratch dir** (`scratch.py`): copies declared editable + readonly
   files into a temp dir, runs CLI there, copies modified editables
   back. NOT security — reproducibility + clean diff capture.
2. **Two-flag opt-in**: backend refuses to construct unless config
   sets BOTH `experimental.allow_cli_subscription` AND
   `experimental.acknowledge_unsandboxed_cli`. Friction is the
   security feature for §INV-1 marketing wording compliance.
3. **Isolation tag**: every AttemptNode tagged
   `isolation="cli_subscription_unsandboxed"` (parallel to spec §11.2
   Q5's `isolation=local_unsafe`). `capabilities()` returns
   `{"agent_loop_external", "host_fs_visible"}` so callers can branch
   on degraded ACL.

**Spec-conformance tightenings**:
1. §3.2 vs §3.3 split: `compliance.py` does §3.2 classification only;
   `safety.py` does §3.3 `provider_safety_filter_active` separately.
2. Naming collision avoided: new file is `claude_code_cli.py`
   (NOT `claude_code.py` which is the existing SDK-based backend).
3. AgentResult extended with `backend_metadata: dict[str, Any]` for
   adapter-to-AttemptNode propagation; doesn't couple adapter to ledger.
4. backend_kind is snake_case `cli_subscription` (per spec §4.1);
   config uses hyphen `cli-subscription` for the type discriminator.
5. §INV-3 belt-and-braces: `claude_code_cli.py` deliberately omits
   any flag that re-enables CodeAct/REPL/eval modes; comment block
   explicitly forbids future additions.

**Stats**:
- 11 files changed (~1,800 LOC added)
- 35 new tests in test_cli_subscription.py covering all 8 reviewer Qs
  + spec conformance + factory dispatch
- Full suite: 2687 passed / 4 skipped (was 2650 in PR 15), 0 regressions

**NOT in this PR (deferred per scope discipline)**:
- Wiring CC to compliance gate runner CLI command (post-merge)
- Codex/Gemini adapter implementations (PR 16b/c)
- `crucible compliance-check` subcommand UX (depends on harness usage)
- `docs/CLI-SUBSCRIPTION-BACKEND.md` user-facing doc (PR 18 marketing
  audit will cover this with proper §INV-1 wording)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer round 2 REJECTED PR 16 with two real bugs. Both fixed.

🔴 Bug #1 — `-p` heuristic in `_SECRET_FLAG_NAMES` redacted the prompt:
- Claude Code CLI uses `-p` for the PROMPT flag (not password). My
  heuristic listed `-p` as a `--password` short form, so every recorded
  `cli_argv` showed the prompt as `<redacted>` — destroying the single
  most important piece of postmortem observability. Worse: I had codified
  the buggy behavior in test `test_redact_argv_secret_short_p_form`.
- Removed `-p` from `_SECRET_FLAG_NAMES`; short-flag heuristics are too
  ambiguous across CLIs (used for port / project / profile / **print**
  in different tools). Better to under-redact than to silently destroy
  data. Operators who need to redact secrets in `-p` arguments must use
  the explicit long form.
- Switched `claude_code_cli.py` to use `--print` long form instead of
  `-p` — clearer + immune to any future short-flag redaction additions.
- Inverted the bad assertion: `test_redact_argv_does_not_redact_dash_p`
  now asserts the prompt IS preserved. New test
  `test_claude_code_cli_argv_preserves_prompt_through_redaction` runs
  the full `build_argv → redact_argv` chain with a known prompt and
  asserts it appears verbatim.

🟡 Bug #2 — `compliance_report_path` metadata recorded the CLI binary
path instead of the JSONL report file:
- Threading: `verify_recent_pass_with_path()` returns the source `Path`
  alongside the report. `SubscriptionCLIBackend._check_compliance()`
  stores both as `_compliance_report` + `_compliance_report_path`.
  Metadata field now points to the actual `.jsonl` evidence file so
  auditors can follow the trail.
- Existing `verify_recent_pass()` keeps backward compat; new callers
  use `verify_recent_pass_with_path()` for the audit trail.
- Regression test
  `test_compliance_report_path_metadata_is_actual_file_path` asserts
  `backend._compliance_report_path` is a `.jsonl` file, NOT the binary.

Reviewer round 2 polish (folded in):
- `env_allowlist` metadata field added (spec §4.1 mandate). Records
  the NAMES of env vars visible to the subprocess (no values, ever).
- Error message in `_check_compliance` no longer references a phantom
  `crucible compliance-check` CLI command. Points operators at the
  `compliance` module directly; CLI UX wrapper deferred to PR 16a.

Reviewer round 2 verified items unchanged:
- Q1 stubs (Codex/Gemini AdapterNotImplementedError)
- Q3 binary + version snapshot at construct
- Q4 schema-version guard
- Q5 timeout/cap kill semantics
- Q6 tri-state safety
- Q7 (other) redaction patterns
- Q8 two-flag opt-in + scratch dir + isolation tag

Tests: 37 (was 35; +2 regressions for Bugs 1+2). Full suite: 2732 passed
+ 1 pre-existing failure (`test_create_agent_unknown_raises`, regex/case
mismatch — verified pre-existing on PR 15 baseline) + 4 skipped. 0
regressions from PR 16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer round 3 was VERIFIED with one tiny nit: `is_secret_env_name`
was imported but never used in `cli_subscription_backend.py`. Reviewer
suggested either removing the import OR using it to annotate the
`env_allowlist` audit trail.

Picked the latter — more useful: secret-named env entries now get
tagged `"NAME:<secret-name>"` in the recorded `env_allowlist` so
auditors see WHICH entries were sensitive without revealing values.
Benign entries stay as `"NAME"`. Values are never recorded either way.

Tests still pass (37 in test_cli_subscription.py). Pure annotation
change; no behavior shift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant