docs(m3): wording audit per spec §INV-1 + new CLI-SUBSCRIPTION-BACKEND.md#12
Open
suzuke wants to merge 2 commits into
Open
docs(m3): wording audit per spec §INV-1 + new CLI-SUBSCRIPTION-BACKEND.md#12suzuke wants to merge 2 commits into
suzuke wants to merge 2 commits into
Conversation
…D.md (M3 PR 18)
Per spec §10 + §INV-1: marketing wording reviewed against safety-claim
rules. Five spots softened to observation framing; new doc shipped.
Spots fixed (reviewer round 1 sweep + Q2 per-backend restructure):
- README.md:162 + README.zh-TW.md:151 — restructured into per-backend
bullet list. Three backends, three honest sentences:
• claude-code (SDK): "ACL-bounded tool surface; no shell access"
• smolagents: "no bypass observed across the adversarial test corpus
in tests/security/" with re-verifiable command
• cli-subscription: "runs unsandboxed; ACL does NOT constrain it"
- docs/FAQ.md:72-81 + docs/FAQ.zh-TW.md:72-83 — same per-backend
qualification on "Is it safe?" Q&A. "Untrusted workloads" wording
removed from Docker mitigation; replaced with explicit configuration
list (network=none / cap_drop=ALL / read_only_root=True per §INV-2).
- docs/CHANGELOG.md:22 — Docker Sandbox entry now states the
configuration enforced (verifiable from sandbox.py), not an
abstract containment claim.
New doc:
- docs/CLI-SUBSCRIPTION-BACKEND.md — full M3 PR 16 user-facing write-up
with §INV-1-compliant wording. Sections: what it is/isn't, compliance
gate (95%/99% thresholds, 30-day freshness), trial result classification
(tri-state safety filter), configuration, what's recorded per attempt,
per-adapter status, risk acknowledgement.
CHANGELOG additions: M3 PRs #4-#11 (M2 PR 10-14 + M3 PR 15-18) added
under "Unreleased — M3" section. Each entry uses §INV-1 wording from
the start (e.g., "no bypass observed across the adversarial test
corpus" not "secure", "configuration enforced" not "isolated").
Reviewer round 1 catches folded in:
- N number: reviewer ran `pytest tests/security/ --collect-only` and
found 2072 (not the 3400 I cited from stale memory). Wording cites
the path so readers can re-verify, NOT a bare number that drifts.
- Categorical separation: tests/security/ covers L1 ACL + L2 executor
escape only — NOT L3 Docker. CHANGELOG/FAQ describe Docker mode by
configuration (verifiable from sandbox.py), not containment claim.
- Chinese wording: deliberately avoided 「保證」「絕對」「完全」「不可能」.
Used 「未觀察到」「預設配置為」.
- Error-message consistency: CLI-SUBSCRIPTION-BACKEND.md uses the same
vocabulary as code-emitted strings ("diagnostic only", "experimental",
"two-flag opt-in") to avoid drift between docs and runtime messages.
Sentinel test (reviewer Q4 exception):
- tests/test_docs_exist.py — 1-test asserts CLI-SUBSCRIPTION-BACKEND.md
exists (referenced from README + FAQ + CHANGELOG + code error
messages). 3-LOC regression catch for silent rename/delete.
Stats: 6 files changed, +132 / -7 LOC. New tests: 1. Full suite 2762
passed + 1 pre-existing failure (unchanged from prior PRs) + 4 skipped.
Closes M3 deliverables: PR 15 (interactive d3) / 16 (cli_subscription)
/ 17 (polish) / 18 (this).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer round 2 verdict was VERIFIED with one borderline phrasing flagged as non-blocking. Tightening anyway: "actual isolation" reads as a near-absolute claim if quoted out of context. "Stronger filesystem isolation" is comparative and stays §INV-1-safe under skim-quote. One word change. Ship-blocking? No. Worth folding in? Yes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #11. Final M3 PR. Pure-docs audit per spec §10 + §INV-1: review marketing wording against safety-claim rules; ship the M3 PR 16 user-facing doc.
What changed
README.md,README.zh-TW.md,docs/FAQ.md,docs/FAQ.zh-TW.md,docs/CHANGELOG.md) — softened §INV-1 violations. "The agent cannot run arbitrary commands" was a false generalization (true for SDK/smolagents, false for cli-subscription); restructured into per-backend three-bullet lists. Docker mode described by configuration (network=none / cap_drop=ALL / read_only_root=True per §INV-2), not by abstract containment claim. Chinese wording avoids 「保證/絕對/完全/不可能」 absolute modifiers.docs/CLI-SUBSCRIPTION-BACKEND.md— full M3 PR 16 user-facing write-up (was promised but unshipped through PR 16 fix-ups). 98 LOC. Sections: what it is/isn't (with explicit "What it does NOT give you" four-bullet), compliance gate (95%/99% thresholds, 30-day freshness), tri-state safety detection, configuration, what's recorded per attempt, per-adapter status, risk acknowledgement.docs/CHANGELOG.md— M3 PRs added under "Unreleased — M3" section. Each entry written in §INV-1-compliant wording from the start (no fresh violations baked into the same PR that fixes the old ones).tests/test_docs_exist.py) — 1 test assertsdocs/CLI-SUBSCRIPTION-BACKEND.mdexists with sane size. Caught silent rename / delete regressions; reviewer Q4 exception.Reviewer trail
pytest tests/security/ --collect-only); Q1 categorical (tests/security covers L1+L2 only, NOT L3 Docker — describe Docker by config not containment); Q2 README structure (per-backend list, not buried qualifier); Q4 sentinel test exception; Q5 CHANGELOG entries OK with §INV-1 wording from day 1. Plus 4 missed catches: README.zh-TW.md mirror, Chinese wording rules, file CREATE not edit, error-message consistency.env_allowlistformat matches code). One borderline phrasing flagged as non-blocking ("actual isolation" → "stronger filesystem isolation") — folded in commit0f2621c.Stats
425a276+0f2621c)test_create_agent_unknown_raises— exists at PR 15 baseline; NOT a PR 18 regression) + 4 skipped. 0 regressions from PR 18.Closes M3
This is the last PR in the M3 cut. Stack:
🤖 Generated with Claude Code