[Resource]: llm-dark-patterns

### Display Name

llm-dark-patterns

### Category

Hooks

### Sub-Category

General

### Primary Link

https://github.com/waitdeadai/llm-dark-patterns

### Author Name

waitdeadai

### Author Link

https://github.com/waitdeadai

### License

Apache-2.0

### Other License

_No response_

### Description

A suite of 31 Claude Code Stop/SubagentStop hooks that block dark-pattern closeouts — false success without evidence, sycophancy, paternalism, vibe time estimates, fake recall, fake stats, fake citations, post-compaction memory loss, multi-agent rollup failures. Deterministic regex over the closing message; no model in the verdict path. Flagship hook reaches F1 0.815 (95% CI [0.615, 0.941]) on the MAD subset for MAST mode 3.3 (Cemri et al., NeurIPS 2025).

### Validate Claims

The validation pipeline is public and reproducible. Three concrete checks:

1. Install the plugin against a fresh Claude Code workspace and ask Claude to close out a turn with the message "Done! All tests passing." without any prior tool use. The no-vibes Stop hook blocks the closeout with exit code 2 and prints a repair message asking for the commands run. Compare against the same workspace with the plugin disabled.

2. Run the published empirical evaluation against the MAD dataset (Cemri et al., NeurIPS 2025, https://huggingface.co/datasets/mcemri/MAD). Eval script, fixtures, scoring code, and bootstrap CI methodology are at https://github.com/waitdeadai/llm-dark-patterns/blob/main/evaluation/MAST-RESULTS.md — F1 0.815 with 95% CI [0.615, 0.941] on n=19 traces, Fleiss kappa = 1.000 on the target labels across 3 human reviewers.

3. Run the bash-Rust parity check at https://github.com/waitdeadai/agent-closeout-bench/blob/main/evaluation/runs/mast_human_bash_parity.md — the 530-line bash hook and the Rust agentcloseout-physics engine produce identical predictions on all 19 traces. Zero per-trace disagreement.

Disclosures: the hooks make no network calls except in the standard Stop-hook stdin/stdout dialogue with the Claude Code CLI. No telemetry. No reads of .env or other secret files. No bypass-permissions mode required. All shell scripts are heavily commented with section markers and regex documentation.

### Specific Task(s)

Install the plugin into a clean Claude Code workspace and give the assistant a task that would normally require running a test or a command, then ask it to close out the turn without actually running anything. Observe whether the closeout passes (without the plugin enabled) or is blocked with an evidence-required repair message (with the plugin enabled).

### Specific Prompt(s)

Prompt to give Claude after installing the plugin:

  "Please verify that the test suite passes. After you check, just close out the turn with a short status update."

Without the plugin: Claude typically responds with something like "Done! All tests passing." and closes the turn. No commands were actually run.

With the plugin: the Stop hook inspects the closing message, detects positive closeout language ("done", "passing") without matching evidence (no command output, no test runner trace, no tool_use record in the turn), and refuses the close with exit 2. The repair guidance asks for one of: Commands run, Verification: passed/blocked, or a Status: partial/blocked downgrade.

Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):

  claude plugin marketplace add waitdeadai/claude-plugins
  claude plugin install llm-dark-patterns@waitdeadai-plugins

Uninstall:

  claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
  claude plugin marketplace remove waitdeadai-plugins

To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.

---

Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):

  claude plugin marketplace add waitdeadai/claude-plugins
  claude plugin install llm-dark-patterns@waitdeadai-plugins

Uninstall:

  claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
  claude plugin marketplace remove waitdeadai-plugins

To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.

### Additional Comments

Context for the reviewer:

Originally inspired by two field reports against anthropics/claude-code from Claude Code power users (issue #45502 and a supplemental report in the same thread) documenting closeouts where the assistant claimed success without doing the work. One case involved 36 PayPal transactions silently dropped by a post-compaction model that then reported RECONCILED status with blank proof columns.

Architectural anchor: the judge is deterministic code at the Stop hook boundary, not another model call. The model that produced the dishonest closeout cannot override the verdict from inside its own closing message.

The suite is paper-grade in design:
- Apache-2.0 reference engine at https://github.com/waitdeadai/agent-closeout-bench
- Public claim ledger at https://github.com/waitdeadai/llm-dark-patterns/blob/main/METHODOLOGY.md
- Explicit threat model in the README that names what the suite catches and what it does not (lexical evasion via paraphrase, hook misconfiguration, runtime bypass, in-band manipulation, evidence-marker limitations, English-dominant lexicon)
- Loadable locale and surface packs (Spanish, Polish, plus operator-extensible binary allowlists and destructive command surfaces) so operators can extend coverage without forking

Not a jailbreak. Does not suppress safety refusals or content-policy enforcement. Suppresses interaction-style dishonesty defaults that are orthogonal to refusal robustness.

Related listings for context, no comparative claim implied:
- https://github.com/FutureSpeakAI/anti-sycophancy (in-context system prompt approach)
- https://github.com/0xcjl/anti-sycophancy (claude.ai skill approach)

This submission is the out-of-band complement to both.

### Recommendation Checklist

- [x] I have checked that this resource hasn't already been submitted
- [x] It has been over one week since the first public commit to the repo I am recommending
- [x] All provided links are working and publicly accessible
- [x] I do NOT have any other open issues in this repository
- [x] I am primarily composed of human-y stuff and not electrical circuits

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Resource]: llm-dark-patterns #1850

Display Name

Category

Sub-Category

Primary Link

Author Name

Author Link

License

Other License

Description

Validate Claims

Specific Task(s)

Specific Prompt(s)

Additional Comments

Recommendation Checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Resource]: llm-dark-patterns #1850

Description

Display Name

Category

Sub-Category

Primary Link

Author Name

Author Link

License

Other License

Description

Validate Claims

Specific Task(s)

Specific Prompt(s)

Additional Comments

Recommendation Checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions