docs: align prompt-injection thresholds in CLAUDE.md and ARCHITECTURE.md to security.ts (v1.6.4.0 catch-up) by brycealan · Pull Request #1290 · garrytan/gstack

brycealan · 2026-05-01T20:35:17Z

Summary

CLAUDE.md and ARCHITECTURE.md were missed when WARN was bumped 0.60 → 0.75 in d75402b (v1.6.4.0, #1135). browse/src/security.ts:37 has WARN: 0.75 and BROWSER.md:743 was updated alongside that commit; CLAUDE.md:290 and ARCHITECTURE.md:159 still read 0.60.

This PR brings the two stale docs in line with the source-of-truth and the sister doc that was already updated. Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold table).

No code change. No behavior change. Pure doc-vs-code alignment.

What's actually true

browse/src/security.ts:35-51 is authoritative:

export const THRESHOLDS = {
  BLOCK: 0.85,
  WARN: 0.75,
  LOG_ONLY: 0.40,
  SOLO_CONTENT_BLOCK: 0.92,
} as const;

The v1.6.4.0 commit message stated the change directly:

"THRESHOLDS.WARN bumped 0.60 → 0.75 — borderline fires drop out of the 2-of-N ensemble pool."

Why this matters

CLAUDE.md is project-binding and read by both human contributors and AI agents grounded in it. An agent quoting CLAUDE.md will tell users the cross-confirm threshold is 0.60 — wrong by 20% on a security-critical number.
ARCHITECTURE.md is the canonical system-design doc; operators reading it to debug or tune the security stack get the wrong mental model of when ensemble cross-confirm fires.
The SOLO_CONTENT_BLOCK: 0.92 entry is the floor that prevents testsavant / deberta from solo-firing on phishing-flavored benign content. Operators who don't know it exists can't reason about why a high single-layer score didn't BLOCK.

Verification

$ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
browse/src/security.ts:37:  WARN: 0.75,
CLAUDE.md:290:- WARN: 0.75 — cross-confirm threshold ...        # was 0.60
ARCHITECTURE.md:159:...two ML classifiers at >= WARN (0.75)...  # was (0.60)
BROWSER.md:743:- WARN: 0.75 — cross-confirm threshold ...       # already correct

All four sources now agree.

Files changed

CLAUDE.md — fix WARN: 0.60 → 0.75 (line 290), add SOLO_CONTENT_BLOCK: 0.92 row with the FP-floor rationale.
ARCHITECTURE.md — fix inline WARN (0.60) → (0.75) (line 159).

Test plan

grep "0\.60\|0\.75" across CLAUDE.md, ARCHITECTURE.md, BROWSER.md, and security.ts shows all four files agree on 0.75.
No code paths touched; no test runs needed.

How this was found

Surfaced by a multi-artifact audit that fused CLAUDE.md, ARCHITECTURE.md, BROWSER.md, and the security source. The drift is invisible from any single file — each looks self-consistent — but emerges when you cross-check the three docs against the code.

🤖 Generated with Claude Code

…h-up) CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped 0.60 → 0.75 in d75402b (v1.6.4.0, "cut Haiku classifier FP from 44% to 23%, gate now enforced", garrytan#1135). browse/src/security.ts:37 has WARN: 0.75 and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and ARCHITECTURE.md still read 0.60. Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold table). No code change. No behavior change. Pure doc-vs-code alignment. Verification: $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md browse/src/security.ts:37: WARN: 0.75, CLAUDE.md:290: - \`WARN: 0.75\` ... ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)... BROWSER.md:743: - \`WARN: 0.75\` ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

garrytan · 2026-05-10T05:16:50Z

Thanks @brycealan — your fix shipped in v1.30.0.0 (#1391) with credit in the CHANGELOG. Closing since it's already on main. Appreciate the contribution.

garrytan mentioned this pull request May 9, 2026

v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke #1391

Merged

7 tasks

garrytan closed this May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: align prompt-injection thresholds in CLAUDE.md and ARCHITECTURE.md to security.ts (v1.6.4.0 catch-up)#1290

docs: align prompt-injection thresholds in CLAUDE.md and ARCHITECTURE.md to security.ts (v1.6.4.0 catch-up)#1290
brycealan wants to merge 1 commit intogarrytan:mainfrom
brycealan:fix-warn-threshold-doc-drift

brycealan commented May 1, 2026 •

edited

Loading

Uh oh!

garrytan commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brycealan commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's actually true

Why this matters

Verification

Files changed

Test plan

How this was found

Uh oh!

garrytan commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brycealan commented May 1, 2026 •

edited

Loading