Skip to content

[Resource]: llm-dark-patterns #1850

@waitdeadai

Description

@waitdeadai

Display Name

llm-dark-patterns

Category

Hooks

Sub-Category

General

Primary Link

https://github.com/waitdeadai/llm-dark-patterns

Author Name

waitdeadai

Author Link

https://github.com/waitdeadai

License

Apache-2.0

Other License

No response

Description

A suite of 31 Claude Code Stop/SubagentStop hooks that block dark-pattern closeouts — false success without evidence, sycophancy, paternalism, vibe time estimates, fake recall, fake stats, fake citations, post-compaction memory loss, multi-agent rollup failures. Deterministic regex over the closing message; no model in the verdict path. Flagship hook reaches F1 0.815 (95% CI [0.615, 0.941]) on the MAD subset for MAST mode 3.3 (Cemri et al., NeurIPS 2025).

Validate Claims

The validation pipeline is public and reproducible. Three concrete checks:

  1. Install the plugin against a fresh Claude Code workspace and ask Claude to close out a turn with the message "Done! All tests passing." without any prior tool use. The no-vibes Stop hook blocks the closeout with exit code 2 and prints a repair message asking for the commands run. Compare against the same workspace with the plugin disabled.

  2. Run the published empirical evaluation against the MAD dataset (Cemri et al., NeurIPS 2025, https://huggingface.co/datasets/mcemri/MAD). Eval script, fixtures, scoring code, and bootstrap CI methodology are at https://github.com/waitdeadai/llm-dark-patterns/blob/main/evaluation/MAST-RESULTS.md — F1 0.815 with 95% CI [0.615, 0.941] on n=19 traces, Fleiss kappa = 1.000 on the target labels across 3 human reviewers.

  3. Run the bash-Rust parity check at https://github.com/waitdeadai/agent-closeout-bench/blob/main/evaluation/runs/mast_human_bash_parity.md — the 530-line bash hook and the Rust agentcloseout-physics engine produce identical predictions on all 19 traces. Zero per-trace disagreement.

Disclosures: the hooks make no network calls except in the standard Stop-hook stdin/stdout dialogue with the Claude Code CLI. No telemetry. No reads of .env or other secret files. No bypass-permissions mode required. All shell scripts are heavily commented with section markers and regex documentation.

Specific Task(s)

Install the plugin into a clean Claude Code workspace and give the assistant a task that would normally require running a test or a command, then ask it to close out the turn without actually running anything. Observe whether the closeout passes (without the plugin enabled) or is blocked with an evidence-required repair message (with the plugin enabled).

Specific Prompt(s)

Prompt to give Claude after installing the plugin:

"Please verify that the test suite passes. After you check, just close out the turn with a short status update."

Without the plugin: Claude typically responds with something like "Done! All tests passing." and closes the turn. No commands were actually run.

With the plugin: the Stop hook inspects the closing message, detects positive closeout language ("done", "passing") without matching evidence (no command output, no test runner trace, no tool_use record in the turn), and refuses the close with exit 2. The repair guidance asks for one of: Commands run, Verification: passed/blocked, or a Status: partial/blocked downgrade.

Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):

claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins

Uninstall:

claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
claude plugin marketplace remove waitdeadai-plugins

To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.


Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):

claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins

Uninstall:

claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
claude plugin marketplace remove waitdeadai-plugins

To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.

Additional Comments

Context for the reviewer:

Originally inspired by two field reports against anthropics/claude-code from Claude Code power users (issue #45502 and a supplemental report in the same thread) documenting closeouts where the assistant claimed success without doing the work. One case involved 36 PayPal transactions silently dropped by a post-compaction model that then reported RECONCILED status with blank proof columns.

Architectural anchor: the judge is deterministic code at the Stop hook boundary, not another model call. The model that produced the dishonest closeout cannot override the verdict from inside its own closing message.

The suite is paper-grade in design:

Not a jailbreak. Does not suppress safety refusals or content-policy enforcement. Suppresses interaction-style dishonesty defaults that are orthogonal to refusal robustness.

Related listings for context, no comparative claim implied:

This submission is the out-of-band complement to both.

Recommendation Checklist

  • I have checked that this resource hasn't already been submitted
  • It has been over one week since the first public commit to the repo I am recommending
  • All provided links are working and publicly accessible
  • I do NOT have any other open issues in this repository
  • I am primarily composed of human-y stuff and not electrical circuits

Metadata

Metadata

Assignees

No one assigned

    Labels

    resource-submissionThis Issue submits a new resource to the listvalidation-passedResource has passed initial validation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions