Skip to content

Graduated guard with secure-exec sandbox tier and learning feedback loop #704

@AlexMikhalev

Description

@AlexMikhalev

Summary

Extend the CommandGuard from binary allow/deny to a three-tier decision system (allow/sandbox/deny), integrate rivet-dev/secure-exec as the sandbox execution layer, and wire the PostToolUse learning capture into an adaptive pattern evolution loop.

Motivation

Today the PreToolUse guard has no middle ground: a command is either blocked entirely (destructive pattern match) or runs with full host access. Commands that are suspicious but not outright destructive (e.g., curl | sh, writes to ~/.ssh/, chmod 777, eval of dynamic strings) pass through unchecked.

The learning capture system records failures but does not feed back into the guard patterns. Guard patterns are static KG entries that only change via manual editing.

Proposed Design

Three convergence layers

Layer 1 -- Graduated Guard Decisions

  • Add GuardDecision::Sandbox to guard_patterns.rs alongside existing Allow and Block
  • Add a third pattern set: suspicious_patterns (between safe and destructive)
  • Create ~/.config/terraphim/kg/guard-suspicious/ directory with initial entries
  • The PreToolUse hook reads the three-valued decision and routes accordingly

Layer 2 -- Secure Exec as Sandbox Tier

  • When guard returns sandbox, run the command inside a V8 isolate via secure-exec
  • Permission profiles scoped by command type: read-only project dir, network deny-by-default, no process spawning, 64 MB memory cap, 10s CPU time limit
  • 17ms cold start makes this viable for interactive development
  • Thin wrapper script at ~/.claude/hooks/sandbox-exec.ts

Layer 3 -- Learning-Driven Pattern Evolution

  • Extend learn hook to capture sandbox outcomes (permission violations, timeouts, clean passes)
  • Promotion/demotion rules: 3+ clean sandbox runs suggests promotion to allow; permission violation promotes to deny
  • Auto-generate KG entries into guard-staging/ directory
  • Human review gate via terraphim-agent guard review before patterns become active

Architecture

Command arrives
    |
    v
Guard (Aho-Corasick pattern match)
    |
    +-- destructive match ---------> DENY (never runs)
    +-- suspicious match ----------> SANDBOX (V8 isolate via secure-exec)
    +-- KG-recognised safe match --> ALLOW (full host access)
    +-- unknown --------------------> SANDBOX (default-to-sandbox)
    |
    v
Replace (KG synonym substitution, runs regardless of tier)
    |
    v
Execute (tier-appropriate: raw shell or sandboxed)
    |
    v
Learn (capture outcome, feed back into guard patterns)
    |
    +-- repeated sandbox success --> promote to ALLOW
    +-- sandbox violation/failure -> promote to DENY
    +-- new pattern discovered ----> add to KG

Implementation Phases

  • Phase 0 (1 day): Evaluation spike -- verify secure-exec child_process bridge enforces filesystem restrictions inside V8 isolate. This is the critical gate for Layer 2.
  • Phase 1 (2 days): Add GuardDecision::Sandbox to guard_patterns.rs, create suspicious pattern KG entries, update pre_tool_use.sh
  • Phase 2 (3 days): Create sandbox-exec.ts wrapper, define permission profiles, wire into PreToolUse hook
  • Phase 3 (3 days): Extend learning capture for sandbox outcomes, build promotion/demotion engine, KG auto-generation with staging directory
  • Phase 4 (1 day): Integrate guard pattern review into daily sweep

Key Risk

Whether secure-exec's bridged child_process module enforces isolate-level filesystem restrictions on spawned shell commands. If not, Layer 2 applies only to AI-generated JS/TS code execution, not arbitrary shell commands. Layers 1 and 3 remain valuable regardless.

References

  • Design document: plans/terraphim-secure-exec-convergence.md in cto-executive-system
  • Secure Exec: https://github.com/rivet-dev/secure-exec (Apache-2.0, 17ms cold start, 3.4 MB memory)
  • Current guard implementation: crates/terraphim_agent/src/guard_patterns.rs
  • Current hook scripts: ~/.claude/hooks/pre_tool_use.sh, post_tool_use.sh
  • Related: secure-exec knowledge entry at knowledge/external/context-engineering/secure-exec-sandboxless-code-execution.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions