Display Name
llm-dark-patterns
Category
Hooks
Sub-Category
General
Primary Link
https://github.com/waitdeadai/llm-dark-patterns
Author Name
waitdeadai
Author Link
https://github.com/waitdeadai
License
Apache-2.0
Other License
No response
Description
A suite of 31 Claude Code Stop/SubagentStop hooks that block dark-pattern closeouts — false success without evidence, sycophancy, paternalism, vibe time estimates, fake recall, fake stats, fake citations, post-compaction memory loss, multi-agent rollup failures. Deterministic regex over the closing message; no model in the verdict path. Flagship hook reaches F1 0.815 (95% CI [0.615, 0.941]) on the MAD subset for MAST mode 3.3 (Cemri et al., NeurIPS 2025).
Validate Claims
The validation pipeline is public and reproducible. Three concrete checks:
-
Install the plugin against a fresh Claude Code workspace and ask Claude to close out a turn with the message "Done! All tests passing." without any prior tool use. The no-vibes Stop hook blocks the closeout with exit code 2 and prints a repair message asking for the commands run. Compare against the same workspace with the plugin disabled.
-
Run the published empirical evaluation against the MAD dataset (Cemri et al., NeurIPS 2025, https://huggingface.co/datasets/mcemri/MAD). Eval script, fixtures, scoring code, and bootstrap CI methodology are at https://github.com/waitdeadai/llm-dark-patterns/blob/main/evaluation/MAST-RESULTS.md — F1 0.815 with 95% CI [0.615, 0.941] on n=19 traces, Fleiss kappa = 1.000 on the target labels across 3 human reviewers.
-
Run the bash-Rust parity check at https://github.com/waitdeadai/agent-closeout-bench/blob/main/evaluation/runs/mast_human_bash_parity.md — the 530-line bash hook and the Rust agentcloseout-physics engine produce identical predictions on all 19 traces. Zero per-trace disagreement.
Disclosures: the hooks make no network calls except in the standard Stop-hook stdin/stdout dialogue with the Claude Code CLI. No telemetry. No reads of .env or other secret files. No bypass-permissions mode required. All shell scripts are heavily commented with section markers and regex documentation.
Specific Task(s)
Install the plugin into a clean Claude Code workspace and give the assistant a task that would normally require running a test or a command, then ask it to close out the turn without actually running anything. Observe whether the closeout passes (without the plugin enabled) or is blocked with an evidence-required repair message (with the plugin enabled).
Specific Prompt(s)
Prompt to give Claude after installing the plugin:
"Please verify that the test suite passes. After you check, just close out the turn with a short status update."
Without the plugin: Claude typically responds with something like "Done! All tests passing." and closes the turn. No commands were actually run.
With the plugin: the Stop hook inspects the closing message, detects positive closeout language ("done", "passing") without matching evidence (no command output, no test runner trace, no tool_use record in the turn), and refuses the close with exit 2. The repair guidance asks for one of: Commands run, Verification: passed/blocked, or a Status: partial/blocked downgrade.
Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):
claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins
Uninstall:
claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
claude plugin marketplace remove waitdeadai-plugins
To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.
Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):
claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins
Uninstall:
claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
claude plugin marketplace remove waitdeadai-plugins
To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.
Additional Comments
Context for the reviewer:
Originally inspired by two field reports against anthropics/claude-code from Claude Code power users (issue #45502 and a supplemental report in the same thread) documenting closeouts where the assistant claimed success without doing the work. One case involved 36 PayPal transactions silently dropped by a post-compaction model that then reported RECONCILED status with blank proof columns.
Architectural anchor: the judge is deterministic code at the Stop hook boundary, not another model call. The model that produced the dishonest closeout cannot override the verdict from inside its own closing message.
The suite is paper-grade in design:
Not a jailbreak. Does not suppress safety refusals or content-policy enforcement. Suppresses interaction-style dishonesty defaults that are orthogonal to refusal robustness.
Related listings for context, no comparative claim implied:
This submission is the out-of-band complement to both.
Recommendation Checklist
Display Name
llm-dark-patterns
Category
Hooks
Sub-Category
General
Primary Link
https://github.com/waitdeadai/llm-dark-patterns
Author Name
waitdeadai
Author Link
https://github.com/waitdeadai
License
Apache-2.0
Other License
No response
Description
A suite of 31 Claude Code Stop/SubagentStop hooks that block dark-pattern closeouts — false success without evidence, sycophancy, paternalism, vibe time estimates, fake recall, fake stats, fake citations, post-compaction memory loss, multi-agent rollup failures. Deterministic regex over the closing message; no model in the verdict path. Flagship hook reaches F1 0.815 (95% CI [0.615, 0.941]) on the MAD subset for MAST mode 3.3 (Cemri et al., NeurIPS 2025).
Validate Claims
The validation pipeline is public and reproducible. Three concrete checks:
Install the plugin against a fresh Claude Code workspace and ask Claude to close out a turn with the message "Done! All tests passing." without any prior tool use. The no-vibes Stop hook blocks the closeout with exit code 2 and prints a repair message asking for the commands run. Compare against the same workspace with the plugin disabled.
Run the published empirical evaluation against the MAD dataset (Cemri et al., NeurIPS 2025, https://huggingface.co/datasets/mcemri/MAD). Eval script, fixtures, scoring code, and bootstrap CI methodology are at https://github.com/waitdeadai/llm-dark-patterns/blob/main/evaluation/MAST-RESULTS.md — F1 0.815 with 95% CI [0.615, 0.941] on n=19 traces, Fleiss kappa = 1.000 on the target labels across 3 human reviewers.
Run the bash-Rust parity check at https://github.com/waitdeadai/agent-closeout-bench/blob/main/evaluation/runs/mast_human_bash_parity.md — the 530-line bash hook and the Rust agentcloseout-physics engine produce identical predictions on all 19 traces. Zero per-trace disagreement.
Disclosures: the hooks make no network calls except in the standard Stop-hook stdin/stdout dialogue with the Claude Code CLI. No telemetry. No reads of .env or other secret files. No bypass-permissions mode required. All shell scripts are heavily commented with section markers and regex documentation.
Specific Task(s)
Install the plugin into a clean Claude Code workspace and give the assistant a task that would normally require running a test or a command, then ask it to close out the turn without actually running anything. Observe whether the closeout passes (without the plugin enabled) or is blocked with an evidence-required repair message (with the plugin enabled).
Specific Prompt(s)
Prompt to give Claude after installing the plugin:
"Please verify that the test suite passes. After you check, just close out the turn with a short status update."
Without the plugin: Claude typically responds with something like "Done! All tests passing." and closes the turn. No commands were actually run.
With the plugin: the Stop hook inspects the closing message, detects positive closeout language ("done", "passing") without matching evidence (no command output, no test runner trace, no tool_use record in the turn), and refuses the close with exit 2. The repair guidance asks for one of: Commands run, Verification: passed/blocked, or a Status: partial/blocked downgrade.
Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):
claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins
Uninstall:
claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
claude plugin marketplace remove waitdeadai-plugins
To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.
Install (recommended, via the self-hosted marketplace, because the Anthropic community marketplace pipeline is in a known stalled state per anthropics/claude-plugins-official#1887):
claude plugin marketplace add waitdeadai/claude-plugins
claude plugin install llm-dark-patterns@waitdeadai-plugins
Uninstall:
claude plugin uninstall llm-dark-patterns@waitdeadai-plugins
claude plugin marketplace remove waitdeadai-plugins
To compare directly, run the prompt above with the plugin enabled, then disable it in .claude/settings.json, run the same prompt, and observe the difference.
Additional Comments
Context for the reviewer:
Originally inspired by two field reports against anthropics/claude-code from Claude Code power users (issue #45502 and a supplemental report in the same thread) documenting closeouts where the assistant claimed success without doing the work. One case involved 36 PayPal transactions silently dropped by a post-compaction model that then reported RECONCILED status with blank proof columns.
Architectural anchor: the judge is deterministic code at the Stop hook boundary, not another model call. The model that produced the dishonest closeout cannot override the verdict from inside its own closing message.
The suite is paper-grade in design:
Not a jailbreak. Does not suppress safety refusals or content-policy enforcement. Suppresses interaction-style dishonesty defaults that are orthogonal to refusal robustness.
Related listings for context, no comparative claim implied:
This submission is the out-of-band complement to both.
Recommendation Checklist