[claude-hackernews] Reply draft: TrainForgeTester Show HN, scenario-tests vs in-loop hook seam (id=48000135)#53
[claude-hackernews] Reply draft: TrainForgeTester Show HN, scenario-tests vs in-loop hook seam (id=48000135)#53NiveditJain wants to merge 1 commit intomainfrom
Conversation
… hook seam (id=48000135) Top-level reply on a fresh Show HN about deterministic scenario tests for tool-calling AI agents. Frames the test-time vs runtime-policy seam (scenarios catch enumerated regressions; PreToolUse hooks catch novel destructive call shapes the test didn't seed). Names exactly one built-in policy (block-rm-rf) tied to the OP's "wrong actions" framing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds a draft markdown file containing a Show HN reply to a post about TrainForgeTester. The reply includes disclosure language, an argument comparing deterministic scenario testing with runtime policy enforcement, tailored guidance for the FailProof team, and contextual notes on thread positioning and engagement. ChangesDraft Show HN Reply
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@drafts/2026-05-04T104002Z.md`:
- Around line 37-41: The fenced code block starting with "(disclosure: I work on
FailProof AI: https://github.com/exospherehost/failproofai)" is missing a
language tag (MD040); update the opening fence from ``` to ```text so the block
is explicitly marked as plain text and the markdown linter stops flagging it.
Ensure only the opening triple backticks are changed to include "text" and leave
the block content and closing fence unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 144dd105-8089-425e-9113-b76083baca37
📒 Files selected for processing (1)
drafts/2026-05-04T104002Z.md
| ``` | ||
| (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai) | ||
|
|
||
| Scenario tests and an in-loop policy layer feel like complements with different remits. Scenarios test correctness ("for this prompt, the agent should call A then B not C") - they catch what you can enumerate. What they can't catch is the long tail in production: a regression introduces a new path the test didn't seed, and the agent reaches a destructive call you wouldn't have predicted. A PreToolUse hook is the catch-net for that tail; it intercepts based on the shape of the call about to fire, not on whether a matching scenario exists. Something like block-rm-rf denies any bash call whose text matches rm -rf regardless of which prompt got the agent there. Tests gate intended behaviors, hooks gate always-wrong call shapes - and shipping both layers together is more honest than either alone. | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block to satisfy markdown lint.
Line 37 opens a code fence without a language, which triggers MD040. Please mark it as plain text.
Suggested fix
-```
+```text
(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
Scenario tests and an in-loop policy layer feel like complements with different remits. Scenarios test correctness ("for this prompt, the agent should call A then B not C") - they catch what you can enumerate. What they can't catch is the long tail in production: a regression introduces a new path the test didn't seed, and the agent reaches a destructive call you wouldn't have predicted. A PreToolUse hook is the catch-net for that tail; it intercepts based on the shape of the call about to fire, not on whether a matching scenario exists. Something like block-rm-rf denies any bash call whose text matches rm -rf regardless of which prompt got the agent there. Tests gate intended behaviors, hooks gate always-wrong call shapes - and shipping both layers together is more honest than either alone.</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 37-37: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @drafts/2026-05-04T104002Z.md around lines 37 - 41, The fenced code block
starting with "(disclosure: I work on FailProof AI:
https://github.com/exospherehost/failproofai)" is missing a language tag
(MD040); update the opening fence from totext so the block is explicitly
marked as plain text and the markdown linter stops flagging it. Ensure only the
opening triple backticks are changed to include "text" and leave the block
content and closing fence unchanged.
</details>
<!-- fingerprinting:phantom:triton:hawk:f83972be-a659-450e-8be6-9b05a64bffc4 -->
<!-- d98c2f50 -->
<!-- This is an auto-generated comment by CodeRabbit -->
Summary
block-rm-rf) tied directly to the OP's "wrong actions" framing - no comma-list, no install command, no feature dump, no dashboard plug. ASCII punctuation only. ~135 words, in the working-shape band.Discovery + thread URLs
Test plan
block-rm-rf), not a comma-list. Nonpm install -g failproofai, nofailproofai policies --install, no~/.failproofai/paths, no three-scope or 39-policies talk, no dashboard plug. Body under ~150 words. Disclosure line lowercased "disclosure:" inside parens, single repo URL.[flagged]or[dead]markers, OP not replied to anyone yet that would make my framing redundant.comments/if/when you ask Claude to archive it with the permalink.🤖 Generated with Claude Code
Summary by CodeRabbit