Skip to content

[claude-hackernews] Reply draft: Capsule Bash Show HN, inline-eval escape hatch (id=48009460)#58

Open
NiveditJain wants to merge 1 commit intomainfrom
hn-capsule-bash-inline-eval-48009460
Open

[claude-hackernews] Reply draft: Capsule Bash Show HN, inline-eval escape hatch (id=48009460)#58
NiveditJain wants to merge 1 commit intomainfrom
hn-capsule-bash-inline-eval-48009460

Conversation

@NiveditJain
Copy link
Copy Markdown
Member

@NiveditJain NiveditJain commented May 4, 2026

Summary

Draft of a reply on the Show HN of Capsule Bash (sandboxed bash for agents, by mavdol04). The OP explicitly invites design feedback.

  • Discovery path: /newest sweep, then filtered for agent/sandbox/hook keywords. Thread surfaced ~20 minutes after submission, 1 point and 2 comments at draft time. Algolia past-week searches for claude code agent, agent guardrails, agent deleted, agent sandbox all came up either empty or already-covered (PR [claude-hackernews] Reply draft: harness-outside-sandbox, PreToolUse firewall layer (id=47990675) #17, [claude-hackernews] Reply draft: Lilith-zero Show HN, transport vs hook layer (id=47875939) #50, [claude-hackernews] Reply draft: AgentPort vs runtime-hook layer (id=47950752) #11, etc.).
  • Thread fit (INSTRUCTIONS.md "Tone for discussing it on HN" gate): Show HN of an adjacent product (sandbox), OP soliciting design discussion. Reply leads with a substantive critique of the design (python3 -c / node -e as core commands re-open the freedom the sandbox tries to clamp down on, because Python/Node fs ops happen inside their own runtimes, not the bash interpreter), then offers the harness's PreToolUse hook as the complementary seam with one custom-policy snippet.
  • Anti-pitch checks: disclosure line at top, single substantive on-topic paragraph, one snippet (no built-in policy name listed alongside), no install command, no comma-list of policies, no version numbers, no dashboard plug, no two-link pattern. Body is ~145 words. ASCII-only punctuation throughout.
  • Duplicate scan: grep item?id=48009460 clean across drafts/ + comments/; cross-PR scan via gh pr diff clean.

Files

  • drafts/2026-05-04T151151Z.md (the draft + insight + notes; **Status:** draft (pending manual post))

Thread URL

https://news.ycombinator.com/item?id=48009460

Test plan

  • Read the draft body in drafts/2026-05-04T151151Z.md
  • If posting, copy the fenced My reply block verbatim into the HN composer on the thread (top-level, replying to the story)
  • If flagged or rate-limited, leave the file in place and note the outcome here so the next session can adjust
  • After posting, optionally ask Claude to log the comment permalink under comments/<ts>.md and append the permalink to the HN: line in the draft
  • Merge PR after posting (merge = "I posted it")

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Added a new draft covering sandbox security architecture and implementation strategies with policy examples for community discussion.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

A new draft Markdown reply to a Show HN thread about Capsule Bash is added, discussing sandbox architecture limitations and proposing syscall-level or pre-parsing mitigations for inline -c/-e interpreter bypass, with a FailProof AI policy example and operational guidance.

Changes

Capsule Bash Security Draft Reply

Layer / File(s) Summary
Draft Content
drafts/2026-05-04T151151Z.md
Adds a 50-line Markdown draft comprising metadata, thread summary, main reply with policy example, FailProof team guidance, and operational findings from the composition and browsing workflow.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Poem

🐰 A draft in the burrow, so clever and neat,
Of sandboxes broken by -c and -e!
With hooks and with parsing, we'll seal up the seams,
While the FailProof team polishes policy dreams. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a draft reply to a Hacker News Show HN post about Capsule Bash, with focus on the inline-eval escape hatch issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 3/5 reviews remaining, refill in 18 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
drafts/2026-05-04T151151Z.md (2)

19-20: ⚡ Quick win

Qualify the python3 -c / node -e claim to avoid overstatement.

Right now it reads as if the reintroduction of filesystem runtime capabilities will occur unconditionally once those commands are “core commands”. If Capsule Bash could be configured to exclude Python/Node, or to block -c/-e in its own tool layer, this draft risks being perceived as an absolute bypass claim rather than an architectural observation.

Suggestion: tweak sentences to something like “If python3/node are available as inline evaluators inside the sandboxed executor…” so the critique stays accurate across implementations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@drafts/2026-05-04T151151Z.md` around lines 19 - 20, Update the paragraph that
asserts "python3 -c" and "node -e" re-open the sandbox to soften the absolute
claim: change wording to qualify that this happens only if those inline
evaluators are available or not blocked by the sandbox/tool layer (e.g., use
phrasing like “If python3/node are available as inline evaluators inside the
sandboxed executor…”), mention the alternative mitigations (blocking -c/-e at
the Capsule Bash tool layer or intercepting at syscall layer), and retain the
note about the PreToolUse hook as the complementary seam; ensure the revised
sentence references the same symbols ("python3 -c", "node -e", "Capsule Bash",
"PreToolUse hook") so reviewers can locate the change.

16-34: ⚡ Quick win

Add a language tag to the fenced reply block (MD040).

The fenced block that wraps the HN reply has no language specified. Add a language tag like text to satisfy markdownlint MD040 without changing the reply contents you intend to post.

Suggested change
-```
+```text
 (disclosure: I work on FailProof AI: https://github.com/exospherehost/failproofai)
 
 The "too much freedom" framing is fair, but `python3 -c` and `node -e` as core commands re-open most of it. Once the agent runs `python3 -c "import shutil; shutil.rmtree(p)"`, the fs ops happen inside the Python runtime, not your bash interpreter, so the sandbox's structured-feedback layer never sees them. Same for `node -e "require('fs').unlinkSync(...)"`. Either intercept at the syscall layer, or pre-parse the inline source and refuse blocked imports. Adjacent angle: agents also call Edit/Write/MCP tools that bypass Bash entirely, so a Bash-only sandbox leaves that surface uncovered. The complementary seam is the harness's PreToolUse hook, which sees every tool call before any executor runs:
 
   import { customPolicies, allow, deny } from "failproofai";
 
   customPolicies.add({
     name: "block-inline-eval",
     match: { events: ["PreToolUse"] },
     fn: ({ toolName, toolInput }) => {
       if (toolName !== "Bash") return allow();
       if (/\b(python3?|node)\s+-(c|e)\b/i.test(toolInput?.command ?? "")) {
         return deny("inline -c/-e blocked; drop a script file instead");
       }
       return allow();
     },
   });

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @drafts/2026-05-04T151151Z.md around lines 16 - 34, The fenced code block
wrapping the HN reply is missing a language tag which triggers markdownlint
MD040; update the opening fence from "" to include a language tag (e.g., "text") so the block becomes fenced with a language specifier while leaving
the block's contents exactly as-is; ensure you modify the same fenced reply
block shown in the diff (the triple-backtick fenced block containing the
FailProof AI disclosure and code snippet) and do not alter any of the reply
text.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @drafts/2026-05-04T151151Z.md:

  • Around line 19-20: Update the paragraph that asserts "python3 -c" and "node
    -e" re-open the sandbox to soften the absolute claim: change wording to qualify
    that this happens only if those inline evaluators are available or not blocked
    by the sandbox/tool layer (e.g., use phrasing like “If python3/node are
    available as inline evaluators inside the sandboxed executor…”), mention the
    alternative mitigations (blocking -c/-e at the Capsule Bash tool layer or
    intercepting at syscall layer), and retain the note about the PreToolUse hook as
    the complementary seam; ensure the revised sentence references the same symbols
    ("python3 -c", "node -e", "Capsule Bash", "PreToolUse hook") so reviewers can
    locate the change.
  • Around line 16-34: The fenced code block wrapping the HN reply is missing a
    language tag which triggers markdownlint MD040; update the opening fence from
    "" to include a language tag (e.g., "text") so the block becomes fenced
    with a language specifier while leaving the block's contents exactly as-is;
    ensure you modify the same fenced reply block shown in the diff (the
    triple-backtick fenced block containing the FailProof AI disclosure and code
    snippet) and do not alter any of the reply text.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `eb40469f-ada2-4ca4-91db-9acc64851329`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between ebbce06017d58f009ca7dfa9dbb3e0dcf1bba4df and 8e08b1f86ff410c3e046c543323c0df7fa1e19fd.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `drafts/2026-05-04T151151Z.md`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant