Skip to content

docs: align prompt-injection thresholds in CLAUDE.md and ARCHITECTURE.md to security.ts (v1.6.4.0 catch-up)#1290

Closed
brycealan wants to merge 1 commit intogarrytan:mainfrom
brycealan:fix-warn-threshold-doc-drift
Closed

docs: align prompt-injection thresholds in CLAUDE.md and ARCHITECTURE.md to security.ts (v1.6.4.0 catch-up)#1290
brycealan wants to merge 1 commit intogarrytan:mainfrom
brycealan:fix-warn-threshold-doc-drift

Conversation

@brycealan
Copy link
Copy Markdown
Contributor

@brycealan brycealan commented May 1, 2026

Summary

CLAUDE.md and ARCHITECTURE.md were missed when WARN was bumped 0.60 → 0.75 in d75402b (v1.6.4.0, #1135). browse/src/security.ts:37 has WARN: 0.75 and BROWSER.md:743 was updated alongside that commit; CLAUDE.md:290 and ARCHITECTURE.md:159 still read 0.60.

This PR brings the two stale docs in line with the source-of-truth and the sister doc that was already updated. Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold table).

No code change. No behavior change. Pure doc-vs-code alignment.

What's actually true

browse/src/security.ts:35-51 is authoritative:

export const THRESHOLDS = {
  BLOCK: 0.85,
  WARN: 0.75,
  LOG_ONLY: 0.40,
  SOLO_CONTENT_BLOCK: 0.92,
} as const;

The v1.6.4.0 commit message stated the change directly:

"THRESHOLDS.WARN bumped 0.60 → 0.75 — borderline fires drop out of the 2-of-N ensemble pool."

Why this matters

  • CLAUDE.md is project-binding and read by both human contributors and AI agents grounded in it. An agent quoting CLAUDE.md will tell users the cross-confirm threshold is 0.60 — wrong by 20% on a security-critical number.
  • ARCHITECTURE.md is the canonical system-design doc; operators reading it to debug or tune the security stack get the wrong mental model of when ensemble cross-confirm fires.
  • The SOLO_CONTENT_BLOCK: 0.92 entry is the floor that prevents testsavant / deberta from solo-firing on phishing-flavored benign content. Operators who don't know it exists can't reason about why a high single-layer score didn't BLOCK.

Verification

$ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
browse/src/security.ts:37:  WARN: 0.75,
CLAUDE.md:290:- WARN: 0.75 — cross-confirm threshold ...        # was 0.60
ARCHITECTURE.md:159:...two ML classifiers at >= WARN (0.75)...  # was (0.60)
BROWSER.md:743:- WARN: 0.75 — cross-confirm threshold ...       # already correct

All four sources now agree.

Files changed

  • CLAUDE.md — fix WARN: 0.600.75 (line 290), add SOLO_CONTENT_BLOCK: 0.92 row with the FP-floor rationale.
  • ARCHITECTURE.md — fix inline WARN (0.60)(0.75) (line 159).

Test plan

  • grep "0\.60\|0\.75" across CLAUDE.md, ARCHITECTURE.md, BROWSER.md, and security.ts shows all four files agree on 0.75.
  • No code paths touched; no test runs needed.

How this was found

Surfaced by a multi-artifact audit that fused CLAUDE.md, ARCHITECTURE.md, BROWSER.md, and the security source. The drift is invisible from any single file — each looks self-consistent — but emerges when you cross-check the three docs against the code.

🤖 Generated with Claude Code

…h-up)

CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped
0.60 → 0.75 in d75402b (v1.6.4.0, "cut Haiku classifier FP from 44% to
23%, gate now enforced", garrytan#1135). browse/src/security.ts:37 has WARN: 0.75
and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and
ARCHITECTURE.md still read 0.60.

Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in
security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold
table).

No code change. No behavior change. Pure doc-vs-code alignment.

Verification:
  $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
  browse/src/security.ts:37:  WARN: 0.75,
  CLAUDE.md:290: - \`WARN: 0.75\` ...
  ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)...
  BROWSER.md:743: - \`WARN: 0.75\` ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan
Copy link
Copy Markdown
Owner

Thanks @brycealan — your fix shipped in v1.30.0.0 (#1391) with credit in the CHANGELOG. Closing since it's already on main. Appreciate the contribution.

@garrytan garrytan closed this May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants