[claude-hackernews] Reply draft: $38k Bedrock runaway, LLM-call vs tool-call layer (id=47933355)#43
Conversation
…l layer (id=47933355) OP asked for guardrails after a $37.9k uncached-input-tokens runaway through a coding-agent stack (Droid -> LiteLLM -> Bedrock -> Opus 4.6). Reply names the layer that actually owns the cap (LiteLLM max_input_tokens / IAM rate), and offers FailProof only for the narrow per-workflow kill-switch slice (custom PreToolUse counter) the OP explicitly listed - without pretending the hook layer can see token spend.
📝 WalkthroughWalkthroughA single Markdown draft file is added to document an HN reply explaining where AWS Bedrock token-cost caps should be enforced—at the LLM-call layer (e.g., LiteLLM limits) rather than at tool-invocation seams—and positioning FailProof's PreToolUse mechanism as a per-workflow kill-switch layer. ChangesNew HN Reply Draft
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@drafts/2026-05-04T003422Z.md`:
- Around line 18-22: The fenced code block that starts with ``` on the quoted
discussion block is missing a language tag and triggers markdownlint MD040;
update that opening fence to include a language hint (e.g., change ``` to
```text or ```md) so the block is explicitly tagged and the linter no longer
reports MD040.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4a1090f4-1701-4eb2-b48a-707a4b20e070
📒 Files selected for processing (1)
drafts/2026-05-04T003422Z.md
| ``` | ||
| (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai) | ||
|
|
||
| The expensive line item was uncached input tokens compounding inside the LLM call, so the cap has to live at the layer that sees those tokens: LiteLLM with `max_input_tokens` per route, or an IAM Bedrock rate cap. Budget alarms run after the fact. Claude Code's hook layer (where FailProof sits) only sees the tool-call seam: Bash, Read, Write, MCP, etc. It cannot reason about token spend on a single Bedrock call. Where the hook layer does help is the per-workflow kill switch you listed: a custom PreToolUse policy that counts tool invocations against a per-session ceiling and denies past it. That bounds how many turns a runaway can attempt; it does not bound a single megaturn that ships 5GB of context. | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block to satisfy markdownlint.
Line 18 opens a fenced block without a language, which triggers MD040. Add text (or md) to keep lint clean.
Suggested fix
-```
+```text
(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
The expensive line item was uncached input tokens compounding inside the LLM call, so the cap has to live at the layer that sees those tokens: LiteLLM with `max_input_tokens` per route, or an IAM Bedrock rate cap. Budget alarms run after the fact. Claude Code's hook layer (where FailProof sits) only sees the tool-call seam: Bash, Read, Write, MCP, etc. It cannot reason about token spend on a single Bedrock call. Where the hook layer does help is the per-workflow kill switch you listed: a custom PreToolUse policy that counts tool invocations against a per-session ceiling and denies past it. That bounds how many turns a runaway can attempt; it does not bound a single megaturn that ships 5GB of context.</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 18-18: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @drafts/2026-05-04T003422Z.md around lines 18 - 22, The fenced code block
that starts with on the quoted discussion block is missing a language tag and triggers markdownlint MD040; update that opening fence to include a language hint (e.g., change to text or md) so the block is explicitly tagged
and the linter no longer reports MD040.
</details>
<!-- fingerprinting:phantom:triton:hawk:3d885039-18ee-4ee4-b0de-1ba68a5fb715 -->
<!-- d98c2f50 -->
<!-- This is an auto-generated comment by CodeRabbit -->
Discovery path
Browser-driven sweep across
/ask,/show(pages 1-2),/news,/newest, plus a series ofhn.algolia.comsearches with jittered 4-9s delays between page loads (agent sandbox,agent guardrails,tool call policy,claude code permissions,claude code hooks,claude code dies,claude code broke,agent autonomous production,agent committed code,agentic coding production,mcp gateway,claude force push,agent rm -rf,skip-permissions,cursor rules,anthropic agent,AGENTS.md claude,PreToolUse,claude skills plugin,claude code secrets,agent environment,claude code deleted,agent vibe coding ruined). All recent agent-policy / hook-manager / sandbox / gateway Show HN's in the past week were already covered by open / merged PRs in this repo (#11, #13, #14, #17, #20, #22, #23, #24, #25, #26, #27, #28, #29, #30, #31, #32, #33, #34, #35, #36, #37, #38, #39, #40, #41, #42). The fresh / uncovered candidates split into two buckets: cross-domain Show HN's that don't pass the FailProof thread-fit gate (AI CAD Harness 47977694, Pollen 47961935, DAC 47949066, Dirac 47920787), and pure vent / meta threads also blocked by the gate ("Agentic Coding Is a Trap" 48002442, Codex-vs-Claude-Code 47945185, Claude Code postmortem reflection 47957402, Loom 47936461). The remaining viable target was a self-post byZephyr0xexplicitly asking for guardrail recommendations after a runaway-cost incident.Thread
Why this thread (and the gate it sits on)
OP's pain is fundamentally at the LLM-call layer, not the tool-call layer that FailProof addresses. The strict reading of the thread-fit gate in
INSTRUCTIONS.md("the parent's pain is at the model layer ... FailProof does not solve model regressions; saying so reads as opportunism") would skip this thread.The reply is shaped to honor that gate by:
max_input_tokensper route, or an IAM Bedrock rate cap. (These are exactly the mechanisms OP listed - the comment validates the OP's instinct rather than redirecting them at FailProof.)If that framing reads as still-too-pitch-y on review, the right call is to abandon the draft - the reviewing user will judge.
Reply payload (final)
Anti-pitch checklist
npm install -g failproofai, nofailproofai policies --install✓Bash, Read, Write, MCP, is Claude Code tool names) ✓~/.failproofai// dashboard / Agent Monitor / version-number talk ✓drafts/andcomments/foritem?id=47933355and all open PR bodies viagh pr view --json body; no prior coverage. The LLM-call vs tool-call layer framing does not paraphrase any prior FailProof reply in this repo ✓Posting workflow
This branch is the draft only - per
CLAUDE.md"Comments via PR (never direct post)", Claude does not submit to HN. After review, the user posts manually to https://news.ycombinator.com/item?id=47933355 (textarea is the bottom of the page; copy the fenced block from the draft file verbatim) and merges this PR. If the user wants the comment-permalink logged intocomments/, they ask explicitly afterwards - the draft does not preemptively write there.Summary by CodeRabbit