docs: add explicit Threat Model section to README by waitdeadai · Pull Request #9 · waitdeadai/llm-dark-patterns

waitdeadai · 2026-05-16T15:04:13Z

Summary

Adds a six-item Threat Model section to the README between "Not a jailbreak" and "Parent harness".
Mirrors the companion paper's threat model (agent-closeout-bench paper/uai-2026/main.tex §Threat Model and Limitations).
Cross-references the companion benchmark as the surface where lexical brittleness is measured rather than hidden.

Why now

Reviewer-grade limitations were previously documented in PR descriptions and in the companion benchmark repo, but never surfaced at the top of this README. Operators considering this suite for safety-critical workflows should see them before relying on it.

Threat model items added

Lexical evasion
Hook misconfiguration
Runtime bypass
In-band manipulation
Evidence-marker limitations
Coverage and language scope (English-only, Claude Code Stop/SubagentStop)

Test plan

README renders correctly on GitHub markdown preview.
No broken links to agent-closeout-bench.
Wording is conservative — no claim of prompt-injection immunity, no claim of universal coverage.

🤖 Generated with Claude Code

Six enumerated failure modes mirroring the companion paper (agent-closeout-bench paper/uai-2026/main.tex §Threat Model and Limitations): 1. Lexical evasion 2. Hook misconfiguration 3. Runtime bypass 4. In-band manipulation 5. Evidence-marker limitations 6. Coverage and language scope Reviewer-grade limitations were previously documented in PR descriptions and the companion benchmark; this surfaces them at the top of the README so operators considering the suite for safety-critical workflows see them before relying on it. Cross-references the companion paper's benchmark (agent-closeout-bench) as the surface where lexical brittleness is measured rather than hidden. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

waitdeadai merged commit 3381845 into main May 16, 2026
2 checks passed

waitdeadai deleted the docs/threat-model branch May 16, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add explicit Threat Model section to README#9

docs: add explicit Threat Model section to README#9
waitdeadai merged 1 commit into
mainfrom
docs/threat-model

waitdeadai commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waitdeadai commented May 16, 2026

Summary

Why now

Threat model items added

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant