-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Summary
Extend the CommandGuard from binary allow/deny to a three-tier decision system (allow/sandbox/deny), integrate rivet-dev/secure-exec as the sandbox execution layer, and wire the PostToolUse learning capture into an adaptive pattern evolution loop.
Motivation
Today the PreToolUse guard has no middle ground: a command is either blocked entirely (destructive pattern match) or runs with full host access. Commands that are suspicious but not outright destructive (e.g., curl | sh, writes to ~/.ssh/, chmod 777, eval of dynamic strings) pass through unchecked.
The learning capture system records failures but does not feed back into the guard patterns. Guard patterns are static KG entries that only change via manual editing.
Proposed Design
Three convergence layers
Layer 1 -- Graduated Guard Decisions
- Add
GuardDecision::Sandboxtoguard_patterns.rsalongside existingAllowandBlock - Add a third pattern set:
suspicious_patterns(between safe and destructive) - Create
~/.config/terraphim/kg/guard-suspicious/directory with initial entries - The PreToolUse hook reads the three-valued decision and routes accordingly
Layer 2 -- Secure Exec as Sandbox Tier
- When guard returns
sandbox, run the command inside a V8 isolate via secure-exec - Permission profiles scoped by command type: read-only project dir, network deny-by-default, no process spawning, 64 MB memory cap, 10s CPU time limit
- 17ms cold start makes this viable for interactive development
- Thin wrapper script at
~/.claude/hooks/sandbox-exec.ts
Layer 3 -- Learning-Driven Pattern Evolution
- Extend
learn hookto capture sandbox outcomes (permission violations, timeouts, clean passes) - Promotion/demotion rules: 3+ clean sandbox runs suggests promotion to allow; permission violation promotes to deny
- Auto-generate KG entries into
guard-staging/directory - Human review gate via
terraphim-agent guard reviewbefore patterns become active
Architecture
Command arrives
|
v
Guard (Aho-Corasick pattern match)
|
+-- destructive match ---------> DENY (never runs)
+-- suspicious match ----------> SANDBOX (V8 isolate via secure-exec)
+-- KG-recognised safe match --> ALLOW (full host access)
+-- unknown --------------------> SANDBOX (default-to-sandbox)
|
v
Replace (KG synonym substitution, runs regardless of tier)
|
v
Execute (tier-appropriate: raw shell or sandboxed)
|
v
Learn (capture outcome, feed back into guard patterns)
|
+-- repeated sandbox success --> promote to ALLOW
+-- sandbox violation/failure -> promote to DENY
+-- new pattern discovered ----> add to KG
Implementation Phases
- Phase 0 (1 day): Evaluation spike -- verify secure-exec
child_processbridge enforces filesystem restrictions inside V8 isolate. This is the critical gate for Layer 2. - Phase 1 (2 days): Add
GuardDecision::Sandboxtoguard_patterns.rs, create suspicious pattern KG entries, updatepre_tool_use.sh - Phase 2 (3 days): Create
sandbox-exec.tswrapper, define permission profiles, wire into PreToolUse hook - Phase 3 (3 days): Extend learning capture for sandbox outcomes, build promotion/demotion engine, KG auto-generation with staging directory
- Phase 4 (1 day): Integrate guard pattern review into daily sweep
Key Risk
Whether secure-exec's bridged child_process module enforces isolate-level filesystem restrictions on spawned shell commands. If not, Layer 2 applies only to AI-generated JS/TS code execution, not arbitrary shell commands. Layers 1 and 3 remain valuable regardless.
References
- Design document:
plans/terraphim-secure-exec-convergence.mdin cto-executive-system - Secure Exec: https://github.com/rivet-dev/secure-exec (Apache-2.0, 17ms cold start, 3.4 MB memory)
- Current guard implementation:
crates/terraphim_agent/src/guard_patterns.rs - Current hook scripts:
~/.claude/hooks/pre_tool_use.sh,post_tool_use.sh - Related: secure-exec knowledge entry at
knowledge/external/context-engineering/secure-exec-sandboxless-code-execution.md