Skip to content

[claude-hackernews] Reply draft: $38k Bedrock runaway, LLM-call vs tool-call layer (id=47933355)#43

Open
NiveditJain wants to merge 1 commit intomainfrom
hn-bedrock-runaway-llm-vs-hook-layer-47933355
Open

[claude-hackernews] Reply draft: $38k Bedrock runaway, LLM-call vs tool-call layer (id=47933355)#43
NiveditJain wants to merge 1 commit intomainfrom
hn-bedrock-runaway-llm-vs-hook-layer-47933355

Conversation

@NiveditJain
Copy link
Copy Markdown
Member

@NiveditJain NiveditJain commented May 4, 2026

Discovery path

Browser-driven sweep across /ask, /show (pages 1-2), /news, /newest, plus a series of hn.algolia.com searches with jittered 4-9s delays between page loads (agent sandbox, agent guardrails, tool call policy, claude code permissions, claude code hooks, claude code dies, claude code broke, agent autonomous production, agent committed code, agentic coding production, mcp gateway, claude force push, agent rm -rf, skip-permissions, cursor rules, anthropic agent, AGENTS.md claude, PreToolUse, claude skills plugin, claude code secrets, agent environment, claude code deleted, agent vibe coding ruined). All recent agent-policy / hook-manager / sandbox / gateway Show HN's in the past week were already covered by open / merged PRs in this repo (#11, #13, #14, #17, #20, #22, #23, #24, #25, #26, #27, #28, #29, #30, #31, #32, #33, #34, #35, #36, #37, #38, #39, #40, #41, #42). The fresh / uncovered candidates split into two buckets: cross-domain Show HN's that don't pass the FailProof thread-fit gate (AI CAD Harness 47977694, Pollen 47961935, DAC 47949066, Dirac 47920787), and pure vent / meta threads also blocked by the gate ("Agentic Coding Is a Trap" 48002442, Codex-vs-Claude-Code 47945185, Claude Code postmortem reflection 47957402, Loom 47936461). The remaining viable target was a self-post by Zephyr0x explicitly asking for guardrail recommendations after a runaway-cost incident.

Thread

  • Story: https://news.ycombinator.com/item?id=47933355 - "$38k AWS Bedrock bill caused by a simple prompt caching miss"
  • 5 days old, 8 points, 0 comments, reply form open.
  • OP describes a coding-agent workflow (Droid -> LiteLLM -> Bedrock -> Opus 4.6) that ran with prompt caching mostly missing; $37.9k of $37.9k gross was uncached input tokens compounding across many turns. OP closes with a direct ask: "Has anyone here built reliable guardrails for this? IAM deny rules? API gateways? token-budget proxies? per-workflow kill switches?"

Why this thread (and the gate it sits on)

OP's pain is fundamentally at the LLM-call layer, not the tool-call layer that FailProof addresses. The strict reading of the thread-fit gate in INSTRUCTIONS.md ("the parent's pain is at the model layer ... FailProof does not solve model regressions; saying so reads as opportunism") would skip this thread.

The reply is shaped to honor that gate by:

  1. Leading with the right answer for OP's actual problem: a LiteLLM proxy with max_input_tokens per route, or an IAM Bedrock rate cap. (These are exactly the mechanisms OP listed - the comment validates the OP's instinct rather than redirecting them at FailProof.)
  2. Explicitly stating FailProof's hook layer cannot see token spend on a single Bedrock call. No hand-waving the layer mismatch.
  3. Mentioning FailProof only for the narrow per-workflow kill-switch slice OP listed in the same sentence as the others - and being clear that this bounds turn count, not token spend per turn, so it does not save you from a single megaturn.

If that framing reads as still-too-pitch-y on review, the right call is to abandon the draft - the reviewing user will judge.

Reply payload (final)

(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)

The expensive line item was uncached input tokens compounding inside the LLM call, so the cap has to live at the layer that sees those tokens: LiteLLM with max_input_tokens per route, or an IAM Bedrock rate cap. Budget alarms run after the fact. Claude Code's hook layer (where FailProof sits) only sees the tool-call seam: Bash, Read, Write, MCP, etc. It cannot reason about token spend on a single Bedrock call. Where the hook layer does help is the per-workflow kill switch you listed: a custom PreToolUse policy that counts tool invocations against a per-session ceiling and denies past it. That bounds how many turns a runaway can attempt; it does not bound a single megaturn that ships 5GB of context.

Anti-pitch checklist

  • 128 words (target ~150) ✓
  • One disclosure line at top, single repo URL ✓
  • No install commands, no npm install -g failproofai, no failproofai policies --install
  • No FailProof built-in policy names listed (the only comma-list, Bash, Read, Write, MCP, is Claude Code tool names) ✓
  • No three-scope merge / 39-policies / Commons Clause / ~/.failproofai/ / dashboard / Agent Monitor / version-number talk ✓
  • No marketing connectives ("we built X for", "the gap we built X for", "...and ~30 more") ✓
  • ASCII punctuation only: hyphens for compound nouns (per-workflow, per-session, tool-call, kill-switch), colons for the layer-statement structure, semicolon for the bounds-X-not-Y contrast. No em-dash, en-dash, fancy ellipsis, curly quote, or unicode arrow. Verified against the file. ✓
  • Single body paragraph, all on-topic, would stand on its own with the FailProof mention removed ✓
  • Cross-thread duplicate guard: scanned local drafts/ and comments/ for item?id=47933355 and all open PR bodies via gh pr view --json body; no prior coverage. The LLM-call vs tool-call layer framing does not paraphrase any prior FailProof reply in this repo ✓

Posting workflow

This branch is the draft only - per CLAUDE.md "Comments via PR (never direct post)", Claude does not submit to HN. After review, the user posts manually to https://news.ycombinator.com/item?id=47933355 (textarea is the bottom of the page; copy the fenced block from the draft file verbatim) and merges this PR. If the user wants the comment-permalink logged into comments/, they ask explicitly afterwards - the draft does not preemptively write there.

Summary by CodeRabbit

  • Documentation
    • Added documentation addressing AWS Bedrock cost management scenarios and associated cost control mechanisms.

…l layer (id=47933355)

OP asked for guardrails after a $37.9k uncached-input-tokens runaway through
a coding-agent stack (Droid -> LiteLLM -> Bedrock -> Opus 4.6). Reply names
the layer that actually owns the cap (LiteLLM max_input_tokens / IAM rate),
and offers FailProof only for the narrow per-workflow kill-switch slice
(custom PreToolUse counter) the OP explicitly listed - without pretending
the hook layer can see token spend.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

A single Markdown draft file is added to document an HN reply explaining where AWS Bedrock token-cost caps should be enforced—at the LLM-call layer (e.g., LiteLLM limits) rather than at tool-invocation seams—and positioning FailProof's PreToolUse mechanism as a per-workflow kill-switch layer.

Changes

New HN Reply Draft

Layer / File(s) Summary
Draft Content
drafts/2026-05-04T003422Z.md
Adds complete Markdown draft containing disclosure, explanation of LLM-call vs. tool-call layer guardrail mismatch for token caps, positioning FailProof's PreToolUse counting/deny mechanism, and notes on thread context and caching implications.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Poem

A rabbit types with care so keen,
Of caps that guard the cloud machine,
Not tool-call cracks, but LLM's own gate,
FailProof stands guard—no runaway fate! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: a Hacker News reply draft addressing an AWS Bedrock cost escalation issue, specifically discussing the distinction between LLM-call and tool-call layer guardrails.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@drafts/2026-05-04T003422Z.md`:
- Around line 18-22: The fenced code block that starts with ``` on the quoted
discussion block is missing a language tag and triggers markdownlint MD040;
update that opening fence to include a language hint (e.g., change ``` to
```text or ```md) so the block is explicitly tagged and the linter no longer
reports MD040.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4a1090f4-1701-4eb2-b48a-707a4b20e070

📥 Commits

Reviewing files that changed from the base of the PR and between ebbce06 and 55f6bf4.

📒 Files selected for processing (1)
  • drafts/2026-05-04T003422Z.md

Comment on lines +18 to +22
```
(disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)

The expensive line item was uncached input tokens compounding inside the LLM call, so the cap has to live at the layer that sees those tokens: LiteLLM with `max_input_tokens` per route, or an IAM Bedrock rate cap. Budget alarms run after the fact. Claude Code's hook layer (where FailProof sits) only sees the tool-call seam: Bash, Read, Write, MCP, etc. It cannot reason about token spend on a single Bedrock call. Where the hook layer does help is the per-workflow kill switch you listed: a custom PreToolUse policy that counts tool invocations against a per-session ceiling and denies past it. That bounds how many turns a runaway can attempt; it does not bound a single megaturn that ships 5GB of context.
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced block to satisfy markdownlint.

Line 18 opens a fenced block without a language, which triggers MD040. Add text (or md) to keep lint clean.

Suggested fix
-```
+```text
 (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
 
 The expensive line item was uncached input tokens compounding inside the LLM call, so the cap has to live at the layer that sees those tokens: LiteLLM with `max_input_tokens` per route, or an IAM Bedrock rate cap. Budget alarms run after the fact. Claude Code's hook layer (where FailProof sits) only sees the tool-call seam: Bash, Read, Write, MCP, etc. It cannot reason about token spend on a single Bedrock call. Where the hook layer does help is the per-workflow kill switch you listed: a custom PreToolUse policy that counts tool invocations against a per-session ceiling and denies past it. That bounds how many turns a runaway can attempt; it does not bound a single megaturn that ships 5GB of context.
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @drafts/2026-05-04T003422Z.md around lines 18 - 22, The fenced code block
that starts with on the quoted discussion block is missing a language tag and triggers markdownlint MD040; update that opening fence to include a language hint (e.g., change to text or md) so the block is explicitly tagged
and the linter no longer reports MD040.


</details>

<!-- fingerprinting:phantom:triton:hawk:3d885039-18ee-4ee4-b0de-1ba68a5fb715 -->

<!-- d98c2f50 -->

<!-- This is an auto-generated comment by CodeRabbit -->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant