document_type	architecture
version	1.0
prd_version	1.0
status	draft

Architecture Document: DarkShell

Context Engineering Principle — Extended ToC Pattern: Each section provides a concise summary with references to full detail. The Component Map (Section 3b) is machine-readable YAML for automated agent consumption. Structural decisions frontloaded; implementation detail in KICKSTART.md source analysis sections.

1. System Overview

DarkShell is an enhancement layer on top of NVIDIA OpenShell. It does NOT replace or modify any OpenShell component. Instead, it adds new code paths alongside existing ones within the same crate structure.

┌─────────────────────────────────────────────────────────────────┐
│  HOST                                                           │
│                                                                 │
│  ┌─────────────────────────────────────────────────────┐       │
│  │  darkshell CLI (openshell-cli crate)                │       │
│  │  ┌───────────┬───────────┬───────────┬────────────┐ │       │
│  │  │ Upstream  │ DarkShell │ DarkShell │ DarkShell  │ │       │
│  │  │ commands  │ file xfer │ mcp mgmt  │ blueprints │ │       │
│  │  │ (unchanged│ (rsync,   │ (add/list │ (create    │ │       │
│  │  │  tar,ssh) │  progress)│  /remove) │  --from)   │ │       │
│  │  └───────────┴───────────┴───────────┴────────────┘ │       │
│  └───────────────────┬─────────────────────────────────┘       │
│                      │ gRPC                                     │
│  ┌───────────────────▼─────────────────────────────────┐       │
│  │  Gateway (openshell-server crate) — UNCHANGED       │       │
│  │  Sandbox lifecycle, provider storage, policy mgmt   │       │
│  └───────────────────┬─────────────────────────────────┘       │
│                      │ k3s pod                                  │
│  ┌─────────────────────────────────────────────────────┐       │
│  │  MCP Bridge Daemon (NEW — darkshell-mcp crate)      │       │
│  │  stdio-to-HTTP proxy, credential injection,         │       │
│  │  auto port-forward into sandbox                     │       │
│  └──────────┬──────────────────────────────────────────┘       │
│             │ port forward                                      │
│─────────────┼───────────────────────────────────────────────────│
│  SANDBOX    │ (kernel boundary: Landlock + seccomp + netns)     │
│             ▼                                                   │
│  ┌─────────────────────────────────────────────────────┐       │
│  │  Sandbox Runtime (openshell-sandbox crate) — UNCHANGED      │
│  │  proxy.rs, opa.rs, landlock.rs, seccomp.rs, netns.rs        │
│  └─────────────────────────────────────────────────────┘       │
│                                                                 │
│  ┌─────────────────────────────────────────────────────┐       │
│  │  Observability Collector (NEW — darkshell-observe)  │       │
│  │  eBPF probes, log aggregation, OTel export          │       │
│  │  (runs on HOST, reads sandbox via eBPF/log tailing) │       │
│  └─────────────────────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────────────┘

Key principle: The sandbox runtime code (openshell-sandbox) is NEVER modified. All DarkShell code lives in the CLI crate, new crates, or host-side daemons.

2. Architecture Patterns

Modular Monolith — DarkShell extends the existing OpenShell workspace of crates. New capabilities are added as modules within openshell-cli or as new crates (darkshell-mcp, darkshell-observe, darkshell-blueprint).
Request-Response — CLI commands are synchronous request-response. MCP bridge and observability use long-lived connections.
Hybrid sync/async — CLI operations are async (tokio). MCP bridge daemon is a long-running async process. eBPF collection is async with channel-based event delivery.

Layering (strictly acyclic, matching upstream):

openshell-cli ──→ darkshell-blueprint ──→ openshell-core
              ──→ darkshell-mcp ─────────→ openshell-core
              ──→ darkshell-observe ─────→ openshell-core
              ──→ openshell-core ──→ openshell-sandbox ──→ openshell-server

openshell-cli depends on all three new crates. New crates depend on openshell-core for shared types but NEVER on openshell-sandbox or openshell-server (those are upstream and unchanged). darkshell-blueprint also depends on darkshell-mcp (to orchestrate MCP bridge setup during blueprint creation).

3. System Components

ID	Component	Responsibility	Technology	Dependencies
COMP-001	CLI Enhancement Layer	New commands (exec, mcp, blueprint) and enhanced upload/download in openshell-cli	Rust, clap, tokio, indicatif	openshell-core, COMP-002, COMP-003, COMP-004
COMP-002	Rsync Transfer Module	Delta upload via rsync-over-SSH alongside existing tar transfer	Rust, rsync (external binary), SSH ProxyCommand	openshell-core (SSH config)
COMP-003	MCP Bridge Daemon	Host-side stdio-to-HTTP proxy for MCP servers with credential isolation	Rust, tokio, hyper, JSON-RPC	openshell-core (providers, forward)
COMP-004	Blueprint Engine	Parse blueprint YAML, orchestrate sandbox creation with full configuration	Rust, serde_yaml	openshell-core, COMP-003
COMP-005	Observability Collector	eBPF probes for file/process tracing, log aggregation, OTel export	Rust, aya (eBPF), opentelemetry, tracing	None (host-side, reads sandbox state)
COMP-006	Progress Reporter	Wrap tar/rsync streams with progress bars showing bytes, rate, ETA	Rust, indicatif	openshell-core (transfer streams)
COMP-007	Policy Tools	Validate policy YAML, test policy queries, network diagnostics	Rust, regorus (OPA)	openshell-core (policy types)
COMP-008	Lifecycle Manager	Snapshots, health checks, resource limits, image save with sanitization	Rust, tar, k8s API	openshell-core (uses gateway gRPC API via client stubs in openshell-cli)
COMP-UPSTREAM-001	Gateway (unchanged)	Sandbox lifecycle, provider storage, policy management	Rust, tonic (gRPC), k3s	—
COMP-UPSTREAM-002	Sandbox Runtime (unchanged)	Proxy, OPA, Landlock, seccomp, netns	Rust, regorus, landlock, seccompiler	—
COMP-UPSTREAM-003	SSH Transport (unchanged)	ProxyCommand tunnel, tar upload/download	Rust, openssh	—

3b. Component Map (Machine-Readable)

components:
  - id: COMP-001
    name: "CLI Enhancement Layer"
    layer: presentation
    purity: effectful-shell
    criticality: CRITICAL
    dependencies: [openshell-core, COMP-002, COMP-003, COMP-004, COMP-006]
    interfaces_provided: [IF-001, IF-002, IF-003, IF-004, IF-005]
    interfaces_consumed: [IF-010, IF-011]
    crate: openshell-cli
    files:
      - src/run.rs          # New command handlers alongside existing
      - src/ssh.rs           # New transfer functions alongside existing
      - src/main.rs          # New clap subcommands
      - src/mcp.rs           # NEW — MCP CLI commands
      - src/blueprint.rs     # NEW — Blueprint parsing and orchestration
      - src/progress.rs      # NEW — Progress bar wrapping
    requirements: [FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007]

  - id: COMP-002
    name: "Rsync Transfer Module"
    layer: infrastructure
    purity: effectful-shell
    criticality: CRITICAL
    dependencies: [openshell-core]
    interfaces_provided: [IF-006]
    interfaces_consumed: [IF-010]
    crate: openshell-cli
    files:
      - src/ssh.rs           # sandbox_sync_up_rsync() alongside sandbox_sync_up()
    requirements: [FR-001]

  - id: COMP-003
    name: "MCP Bridge Daemon"
    layer: infrastructure
    purity: effectful-shell
    criticality: CRITICAL
    dependencies: [openshell-core]
    interfaces_provided: [IF-007, IF-008]
    interfaces_consumed: [IF-010, IF-011]
    crate: darkshell-mcp
    files:
      - src/lib.rs           # Public API re-exports
      - src/bridge.rs        # stdio-to-HTTP proxy daemon
      - src/registry.rs      # MCP server registration and lifecycle
      - src/credential.rs    # Credential injection from provider system
      - src/policy.rs        # Auto-generate network policy entries for MCP endpoints
    requirements: [FR-008, FR-009, FR-010, FR-011, FR-013, FR-020, FR-038]

  - id: COMP-004
    name: "Blueprint Engine"
    layer: business-logic
    purity: mixed
    criticality: CRITICAL
    dependencies: [openshell-core, COMP-003]
    interfaces_provided: [IF-009]
    interfaces_consumed: [IF-010, IF-011, IF-007]
    crate: darkshell-blueprint
    files:
      - src/lib.rs           # Public API
      - src/schema.rs        # Blueprint YAML schema + validation
      - src/orchestrator.rs  # Create sandbox from blueprint (image + policy + providers + MCP + forwards)
    requirements: [FR-015, FR-016]

  - id: COMP-005
    name: "Observability Collector"
    layer: infrastructure
    purity: effectful-shell
    criticality: MEDIUM
    dependencies: [openshell-core]  # Needs gateway API to discover sandbox PID/cgroup
    interfaces_provided: [IF-012, IF-013]
    interfaces_consumed: [IF-010]   # Queries gateway for sandbox container PID namespace
    crate: darkshell-observe
    platform_requirements:
      - "Linux kernel 5.8+ for eBPF features"
      - "CAP_BPF or root for eBPF probes"
      - "Graceful degradation to log-only on macOS/WSL/older kernels"
    files:
      - src/lib.rs           # Public API
      - src/watch.rs         # Live event stream (sandbox watch)
      - src/file_audit.rs    # eBPF/fanotify file access logging
      - src/process_trace.rs # eBPF process tree tracing
      - src/otel.rs          # OpenTelemetry metrics and trace export
      - src/baseline.rs      # Behavioral baseline computation + alerting
      - src/inference_log.rs # Inference request/response logging (receives events from proxy.rs hook)
    requirements: [FR-017, FR-018, FR-019, FR-021, FR-022, FR-023]
    # Note: FR-020 (MCP tool call logging) is in COMP-003 (bridge layer)
    # Note: FR-022 uses narrow hook in proxy.rs (ADR-011) — only sandbox crate modification

  - id: COMP-006
    name: "Progress Reporter"
    layer: presentation
    purity: pure-core
    criticality: HIGH
    dependencies: []
    interfaces_provided: [IF-014]
    interfaces_consumed: []
    crate: openshell-cli
    files:
      - src/progress.rs      # ProgressBar wrapping for tar/rsync streams
    requirements: [FR-003, FR-006]

  - id: COMP-007
    name: "Policy Tools"
    layer: business-logic
    purity: mixed
    criticality: LOW
    dependencies: [openshell-core]
    interfaces_provided: [IF-015]
    interfaces_consumed: [IF-010]
    crate: openshell-cli
    files:
      - src/policy_tools.rs  # Validate, test, net-test commands
    requirements: [FR-030, FR-031, FR-032]

  - id: COMP-008
    name: "Lifecycle Manager"
    layer: business-logic
    purity: effectful-shell
    criticality: LOW
    dependencies: [openshell-core]
    interfaces_provided: [IF-016]
    interfaces_consumed: [IF-010, IF-011]
    crate: openshell-cli
    files:
      - src/lifecycle.rs     # Snapshot, restore, health, image save
    requirements: [FR-024, FR-025, FR-026, FR-027, FR-028, FR-029]

4. Interfaces

Interface ID	From	To	Protocol	SLA
IF-001	CLI (COMP-001)	User/DarkClaw	CLI (stdin/stdout/stderr + exit code)	Immediate response for all commands
IF-002	CLI exec (COMP-001)	Sandbox	SSH (`ssh -T` via ProxyCommand)	< 100ms overhead
IF-003	CLI upload (COMP-001)	Sandbox	tar-over-SSH or rsync-over-SSH	Progress reported in real-time
IF-004	CLI download (COMP-001)	Sandbox	tar-over-SSH with optional filtering	Progress reported in real-time
IF-005	CLI mcp (COMP-001)	MCP Bridge (COMP-003)	IPC (start/stop daemon, query status)	< 1s for lifecycle operations
IF-006	Rsync Module (COMP-002)	Sandbox	rsync over SSH ProxyCommand	Same transport as upstream SSH
IF-007	MCP Bridge (COMP-003)	MCP Servers	stdio (JSON-RPC)	Auto-restart within 5s on crash
IF-008	MCP Bridge (COMP-003)	Sandbox Agent	HTTP (port-forwarded into sandbox)	< 10ms added latency
IF-009	Blueprint Engine (COMP-004)	Gateway + MCP Bridge	gRPC (gateway) + IPC (bridge)	< 60s total creation time
IF-010	Various	Gateway (COMP-UPSTREAM-001)	gRPC (proto/openshell.proto)	Gateway must be running
IF-011	Various	Provider System (COMP-UPSTREAM-001)	gRPC (provider CRUD + credential retrieval)	Credentials available at sandbox start
IF-012	Observability (COMP-005)	External platforms	OTLP, Splunk HEC, Datadog API	Best-effort delivery, local buffering
IF-013	Observability (COMP-005)	User/DarkClaw	JSON lines stream (watch command)	< 1s event latency
IF-014	Progress (COMP-006)	User terminal	indicatif ProgressBar (stderr)	Real-time update at >=1Hz
IF-015	Policy Tools (COMP-007)	User	CLI output (JSON or human-readable)	Immediate for validate; exec-dependent for test
IF-016	Lifecycle (COMP-008)	Gateway + Sandbox	gRPC + SSH	Snapshot time proportional to writable FS size

5. Data Models

Entity	Storage	Primary Key	Access Patterns
Blueprint	YAML file on host filesystem	`name` field in YAML	Read at sandbox creation time; version-controlled in git
MCP Server Registration	In-memory registry in bridge daemon + PID files on host	`(sandbox_name, server_name)`	CRUD via CLI; auto-cleanup on sandbox delete
MCP Bridge State	PID file: `~/.config/darkshell/mcp/<sandbox>-<server>.pid`	`(sandbox, server)`	Bridge daemon reads on startup; CLI reads for status
Observability Events	Transient stream (not persisted by DarkShell)	timestamp + event_type + sandbox_id	Streamed to watch command or OTel exporter
Snapshot	Tar archive on host: `~/.config/darkshell/snapshots/<sandbox>/<name>.tar`	`(sandbox, snapshot_name)`	Write on snapshot; read on restore; list on query
Behavioral Baseline	Rolling statistics in memory (COMP-005)	`sandbox_id`	Updated on every event; queried for anomaly detection

Blueprint Schema

# darkshell-blueprint.yaml
apiVersion: darkshell/v1
kind: Blueprint
metadata:
  name: string                    # Required. Blueprint identifier.
  description: string             # Optional. Human-readable description.
spec:
  image: string                   # Required. Container image reference.
  policy: string                  # Optional. Path to policy YAML file.
  providers:                      # Optional. List of provider names to attach.
    - string
  mcp_servers:                    # Optional. MCP servers to connect.
    - name: string                # Required. Server identifier.
      transport: bridge | in-sandbox | streamable-http
      command: string             # Required for bridge/in-sandbox. Server launch command.
      env:                        # Optional. Environment variable names for credentials.
        - string
      url: string                 # Required for streamable-http. Server endpoint URL.
  forwards:                       # Optional. Port forwards.
    - "[bind:]port"
  resources:                      # Optional. Resource limits.
    cpu: string                   # e.g., "2"
    memory: string                # e.g., "4Gi"
  upload:                         # Optional. Files to upload on creation.
    - "local:remote"

MCP Bridge Registration

# ~/.config/darkshell/mcp/<sandbox>-<server>.yaml
sandbox: string
server_name: string
transport: bridge | in-sandbox | streamable-http
command: string
env_keys: [string]
bridge_pid: int
forwarded_port: int
policy_entry_added: bool
status: running | stopped | error

6. Integration Contracts

System	Protocol	Authentication	Error Handling
OpenShell Gateway	gRPC (proto/openshell.proto)	mTLS or bearer token (unchanged)	Gateway unavailable → retry with exponential backoff, surface error with `darkshell doctor` remediation
SSH Transport	SSH over ProxyCommand tunnel	Gateway-mediated (no direct SSH keys)	Connection failure → check gateway status, report which hop failed (DNS? gateway? sandbox?)
rsync (P1)	rsync over SSH ProxyCommand	Same SSH auth as upstream	rsync binary absent → warn, fall back to tar with clear message
MCP Servers (stdio)	stdin/stdout JSON-RPC	Credentials injected by bridge from providers	Server crash → auto-restart (5s backoff, max 3 retries), log each restart
MCP Servers (Streamable HTTP)	HTTPS through sandbox proxy	OAuth/API key via provider → OPA policy evaluation	Connection denied → report policy rule that blocked, suggest `policy set` fix
Container Registry	OCI/Docker protocol	Registry credentials	Pull failure → report registry, image, tag, auth status, suggest `docker login`
OpenTelemetry (P26)	OTLP gRPC or HTTP	API key (platform-specific)	Export failure → buffer locally (bounded queue), retry, alert on persistent failure
Git (P17 GitOps)	HTTPS or SSH	GITHUB_TOKEN	Invalid policy YAML → reject, keep last-known-good, alert operator

7. Non-Functional Architecture

NFR	Target	Architecture Decision	Validation
NFR-001: Delta upload < 2s	rsync for delta, tar for full	rsync-over-SSH uses same ProxyCommand. Fall back to tar if rsync absent.	Benchmark suite across 100MB-5GB projects
NFR-002: Exec < 100ms (steady-state)	SSH ControlMaster multiplexing	First exec ~200ms (full handshake); subsequent < 20ms via reused connection. ControlPersist=600s. See ADR-009.	100-run benchmark: measure first vs. subsequent exec latency
NFR-003: MCP bridge < 10ms	In-process HTTP proxy	Bridge runs as tokio async task; JSON-RPC parsed in-memory; no serialization to disk.	Latency comparison: direct MCP vs. through bridge
NFR-006: Zero security weakening	All code outside sandbox boundary except one read-only hook	DarkShell code lives in CLI crate and host-side daemons. Sandbox runtime security code (landlock.rs, seccomp.rs, netns.rs, opa.rs) is NEVER modified. proxy.rs has one narrow observability hook (ADR-011) behind a feature flag — read-only, no behavioral change.	Audit: `git diff` for sandbox crate shows ONLY the inference hook. Hook is behind feature flag and compiles to no-op when disabled.
NFR-007: Credential isolation	Bridge injects, agent can't read	Bridge subprocess gets env vars from provider API. Port-forwarded HTTP endpoint carries no credentials — it's just a proxy. Agent sees HTTP responses, not raw keys.	Test: exec into sandbox, attempt to read bridge env vars
NFR-009: 100% backward compat	No modified upstream semantics	All enhancements are additive: new files, new functions, new clap subcommands. Existing command handlers untouched except to add optional flags.	Run upstream `cargo test` against darkshell binary
NFR-010: < 1hr merge time	Minimal diff surface with upstream	Keep internal crate names matching upstream. New code in separate files. Avoid modifying existing functions.	Track merge time per upstream release
NFR-011: Bridge auto-recovery	Supervised subprocess	Bridge daemon monitors MCP server subprocess. On SIGCHLD/pipe-close, restart with backoff (1s, 2s, 4s, max 3 retries).	Kill MCP server process, verify restart within 5s
NFR-014: Actionable errors	Domain-specific error types	Use `thiserror` for DarkShell-specific error enum. Every variant includes what, why, and fix suggestion.	Review every error path for context + remediation

8. Architecture Decision Records

ADR-001: Enhancements Live in CLI Crate and New Crates, Not Sandbox Runtime

Status: Accepted (with one exception — see ADR-011)
Context: DarkShell must preserve OpenShell's security model and maintain upstream merge compatibility. The sandbox runtime (openshell-sandbox) contains the kernel-enforced security code (Landlock, seccomp, netns, proxy, OPA).
Decision: All DarkShell enhancements are implemented either in the CLI crate (openshell-cli) or in new crates (darkshell-mcp, darkshell-observe, darkshell-blueprint). The openshell-sandbox and openshell-server crates are not modified, except for a single, narrow observability hook in proxy.rs for inference request/response logging (see ADR-011).
Consequences:
- Upstream merges for sandbox/server crates are trivial (minimal conflict surface)
- Security audit scope is reduced (only need to verify new code doesn't bypass boundaries)
- Some features (file access audit) require host-side eBPF instead of sandbox-side instrumentation
- MCP bridge runs on host, not in sandbox, which is actually more secure (credentials stay out)
- The proxy.rs hook is the only upstream merge friction point in the sandbox crate

ADR-002: MCP Bridge Runs on Host, Not in Sandbox

Status: Accepted
Context: stdio MCP servers need credentials (API keys) and often need network access to external APIs. Running them inside the sandbox would require either weakening Landlock (to write to system dirs) or weakening network policy (to allow arbitrary endpoints).
Decision: MCP bridge daemon runs on the host. It spawns MCP server subprocesses with host credentials, exposes them as HTTP endpoints, and port-forwards those endpoints into the sandbox. The agent in the sandbox connects to localhost:<port>.
Consequences:
- Credentials never enter the sandbox — strongest isolation
- Network policy only needs to allow the forwarded localhost port
- MCP server crashes don't affect sandbox stability
- Adds host-side process management complexity
- Filesystem-only MCP servers (e.g., Tally) can optionally run in-sandbox (P22)

ADR-003: Rsync with Tar Fallback for Delta Uploads

Status: Accepted
Context: OpenShell uses tar-over-SSH for all uploads. This is simple but transfers the entire workspace every time. rsync would transfer only changed files.
Decision: Add --rsync flag to upload. Detect rsync availability in sandbox. If unavailable, fall back to tar with a warning. Same SSH ProxyCommand transport.
Consequences:
- 15x+ speedup for typical 1-file changes on large workspaces
- Requires rsync in sandbox base image (or installed at image build time)
- Fallback ensures upload always works, even on minimal images
- No new network paths — same SSH tunnel as tar

ADR-004: Blueprint as Single Source of Truth for Sandbox Configuration

Status: Accepted
Context: Setting up a sandbox requires 5+ commands: create, upload, provider attach, policy set, forward start, MCP bridge start. This is error-prone and not version-controllable.
Decision: Introduce blueprint YAML that declares the complete sandbox configuration. darkshell sandbox create --from blueprint.yaml orchestrates all setup in a single command.
Consequences:
- Sandbox configuration is declarative, version-controlled, auditable
- Blueprints can be shared across teams and stored in git
- Validation happens before creation (fail fast with actionable errors)
- More complex CLI implementation (must orchestrate multiple subsystems)
- Blueprint schema must be forward-compatible for future enhancements

ADR-005: Observability via Host-Side eBPF, Not Sandbox Instrumentation

Status: Accepted
Context: Full observability requires seeing file access, process spawning, and syscall patterns inside the sandbox. Two approaches: instrument the sandbox runtime or observe from the host via eBPF.
Decision: Use host-side eBPF probes scoped to the sandbox's PID/network namespace. The sandbox runtime code is never modified.
Consequences:
- No changes to upstream sandbox code
- eBPF requires CAP_BPF on the host (usually available to root/Docker)
- Observation is read-only — cannot affect sandbox behavior
- Performance overhead is minimal (eBPF is kernel-optimized)
- Requires Linux kernel 5.8+ for full eBPF features (matches OpenShell's Linux requirement)

ADR-006: Three New Crates, Not One Mega-Crate

Status: Accepted
Context: DarkShell adds significant new functionality. Should it be one crate or multiple?
Decision: Three new crates:
- darkshell-mcp — MCP bridge daemon and server management
- darkshell-observe — Observability collector, eBPF, OTel export
- darkshell-blueprint — Blueprint schema parsing and orchestration
Consequences:
- Clear separation of concerns
- Each crate can be compiled and tested independently
- darkshell-observe can be optional (feature-flagged) for minimal builds
- Dependency graph remains acyclic
- More crates to manage during upstream merges (but they don't touch upstream crates)

ADR-007: Sandbox Image Save Requires Mandatory Credential Stripping

Status: Accepted
Context: Saving a running sandbox as a new image (P33) could capture credentials in environment variables, temp files, or agent-modified files.
Decision: darkshell sandbox image save is gated by:
1. Mandatory --confirm flag (no accidental saves)
2. Automated stripping of all environment variables
3. Removal of known credential paths (/tmp, provider injection points)
4. Warning listing all removed items
Consequences:
- Prevents accidental credential leakage in saved images
- Some legitimate env vars are also stripped (operator must re-inject)
- Stripping is best-effort — unknown credential locations may be missed
- Operator approval creates friction (intentional)

ADR-008: No Modification to Existing Upstream Command Semantics

Status: Accepted
Context: DarkClaw needs to detect whether darkshell or openshell is installed and use enhanced features when available. Existing commands must work identically to prevent breaking upstream-compatible workflows.
Decision: All enhancements are new subcommands (sandbox exec, mcp add, sandbox watch) or new optional flags (--rsync, --dry-run, --include). Existing command handlers are not modified. Default behavior is unchanged.
Consequences:
- darkshell sandbox upload <name> <local> behaves identically to openshell sandbox upload
- darkshell sandbox upload <name> <local> --rsync activates delta transfer
- DarkClaw can feature-detect by checking darkshell --version or trying enhanced commands
- Some enhancements (progress bar) are added to existing commands as non-breaking visual additions

ADR-009: SSH ControlMaster for Exec Performance

Status: Accepted
Context: NFR-002 targets < 100ms exec overhead. Each ssh -T invocation performs a full SSH handshake (~200-500ms). Without connection reuse, the target is physically impossible.
Decision: Use SSH ControlMaster/ControlPersist to maintain a persistent SSH connection per sandbox. First exec to a sandbox pays full handshake cost (~200ms). Subsequent exec commands reuse the multiplexed connection (< 20ms overhead). ControlSocket stored at ~/.config/darkshell/ssh/ctrl-%r@%h:%p. ControlPersist set to 600s (10 minutes idle timeout).
Consequences:
- First exec is ~200ms; subsequent are < 20ms (meets NFR-002 for steady-state)
- Persistent SSH connections consume a file descriptor per sandbox
- ControlSocket must be cleaned up when sandbox is deleted (added to FR-038)
- DarkClaw benefits most (hundreds of exec calls reuse one connection)

ADR-011: Narrow Observability Hook in proxy.rs for Inference Logging

Status: Accepted
Context: Full inference observability requires seeing prompt content, model responses, and token counts. The privacy router in proxy.rs terminates TLS and inspects HTTP at L7 — it's the only place where inference content is visible in cleartext. eBPF on the host sees encrypted bytes on the wire, not prompts. Gateway-level logging only captures routing metadata, not content. Without this hook, we cannot detect prompt injection, data exfiltration through inference, or audit agent reasoning chains.
Decision: Add a single, narrowly scoped observability hook in openshell-sandbox/src/proxy.rs at the inference routing point. The hook:
1. Is a single function call: inference_observer.on_request(&req, &resp) (or equivalent channel send)
2. Is behind a compile-time feature flag (darkshell-inference-log)
3. Does NOT modify any request/response content or routing behavior
4. Does NOT affect policy evaluation, TLS termination, or SSRF protection
5. Emits a structured event (prompt, response, model, provider, latency, token count) to a channel that darkshell-observe consumes
6. Is clearly demarcated with // BEGIN DARKSHELL HOOK / // END DARKSHELL HOOK markers for upstream merge management
7. When the feature flag is disabled, compiles to a no-op (zero runtime cost)
Consequences:
- This is the ONLY modification to openshell-sandbox — all other sandbox code remains upstream-identical
- Upstream merges for proxy.rs require manual attention at the hook point (~2 lines)
- Feature flag ensures upstream builds are unaffected
- Full inference content visibility enables prompt injection detection and data exfiltration auditing
- Configurable redaction in darkshell-observe/inference_log.rs prevents sensitive prompt data from appearing in logs (strip PII, hash fields, truncate)
- If upstream adds their own inference logging hook, we can migrate to it and remove ours

ADR-010: MCP Bridge Traffic Is Outside Sandbox Proxy Scope

Status: Accepted
Context: Port-forwarded MCP bridge traffic enters the sandbox via localhost, bypassing the HTTP CONNECT proxy and OPA policy evaluation. This is inherent to how SSH -L port forwarding works within network namespaces.
Decision: Accept that MCP bridge traffic is not evaluated by the sandbox proxy. Compensating controls:
1. Bridge-layer tool policy (FR-011) — deny-by-default at bridge, not proxy
2. MCP tool call logging (FR-020) — full audit trail at bridge layer
3. Credential isolation (FR-013) — agent never sees raw credentials
4. Bridge daemon is DarkShell-managed, not agent-managed — agent cannot modify bridge
Consequences:
- MCP tool calls are audited and policy-evaluated, but at bridge layer, not kernel layer
- A compromised agent could send arbitrary HTTP to the forwarded port, but only reach the specific MCP server behind that port (not arbitrary endpoints)
- FR-011 must be implemented as Should priority (promoted from Nice)
- Document this explicitly in security documentation

9. Deployment Topology

Local Development

Developer Workstation
├── darkshell CLI binary
├── MCP bridge daemon (auto-started by CLI)
├── Observability collector (optional, started by `sandbox watch`)
└── Docker
    └── OpenShell Gateway (k3s)
        └── Sandbox Pod(s)
            ├── openshell-sandbox supervisor (unchanged)
            └── Agent process (Claude Code, Codex, etc.)

DarkClaw Factory

Factory Host
├── DarkClaw orchestration engine
├── darkshell CLI (invoked by DarkClaw)
├── MCP bridge daemon (managed by DarkClaw via CLI)
├── Observability collector (streams to DarkClaw dashboard)
└── Docker
    └── OpenShell Gateway (k3s)
        └── Sandbox Pod(s)
            ├── openshell-sandbox supervisor (unchanged)
            └── Factory agent (runs VSDD pipeline phases)

Crate Layout

DarkShell/
├── crates/
│   ├── openshell-cli/              # UPSTREAM + DarkShell enhancements
│   │   └── src/
│   │       ├── main.rs             # Upstream + new clap subcommands
│   │       ├── run.rs              # Upstream + new command handlers
│   │       ├── ssh.rs              # Upstream + rsync, exec functions
│   │       ├── mcp.rs              # NEW — MCP CLI commands
│   │       ├── blueprint.rs        # NEW — Blueprint create orchestration
│   │       ├── progress.rs         # NEW — Progress bar wrapping
│   │       ├── policy_tools.rs     # NEW — Policy validate/test commands
│   │       └── lifecycle.rs        # NEW — Snapshot, health, image save
│   │
│   ├── openshell-core/             # UPSTREAM — unchanged
│   ├── openshell-sandbox/          # UPSTREAM — unchanged (NEVER MODIFY)
│   ├── openshell-server/           # UPSTREAM — unchanged
│   ├── openshell-router/           # UPSTREAM — unchanged
│   │
│   ├── darkshell-mcp/              # NEW — MCP bridge daemon
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── bridge.rs           # stdio-to-HTTP proxy
│   │       ├── registry.rs         # Server registration + lifecycle
│   │       ├── credential.rs       # Credential injection from providers
│   │       └── policy.rs           # Auto-generate network policy entries
│   │
│   ├── darkshell-observe/          # NEW — Observability collector
│   │   └── src/
│   │       ├── lib.rs
│   │       ├── watch.rs            # Live event stream
│   │       ├── file_audit.rs       # eBPF file access logging
│   │       ├── process_trace.rs    # eBPF process tree tracing
│   │       ├── otel.rs             # OpenTelemetry export
│   │       ├── baseline.rs         # Behavioral baseline + alerting
│   │       └── inference_log.rs    # Receives events from proxy.rs hook via channel
│   │
│   └── darkshell-blueprint/        # NEW — Blueprint engine
│       └── src/
│           ├── lib.rs
│           ├── schema.rs           # Blueprint YAML schema + validation
│           └── orchestrator.rs     # Sandbox creation orchestration
│
├── proto/                          # UPSTREAM — unchanged
├── docs/
│   ├── product-brief.md
│   ├── prd.md
│   └── architecture.md
├── KICKSTART.md
├── CLAUDE.md
└── SOUL.md

Build Configuration

# Root Cargo.toml — workspace members
[workspace]
members = [
    "crates/openshell-cli",
    "crates/openshell-core",
    "crates/openshell-sandbox",
    "crates/openshell-server",
    "crates/openshell-router",
    "crates/darkshell-mcp",
    "crates/darkshell-observe",
    "crates/darkshell-blueprint",
]

# crates/openshell-cli/Cargo.toml — feature flags for optional DarkShell components
[features]
default = ["full"]
mcp = ["dep:darkshell-mcp"]
observe = ["dep:darkshell-observe"]
blueprint = ["dep:darkshell-blueprint"]
full = ["mcp", "observe", "blueprint"]

Upstream Merge Strategy

git fetch upstream — get latest NVIDIA/OpenShell changes
git merge upstream/main into develop
Conflicts only possible in openshell-cli/src/main.rs (new clap commands) and openshell-cli/src/ssh.rs (new transfer functions)
Upstream crates (openshell-sandbox, openshell-server, openshell-core) merge cleanly because we never modify them
New crates (darkshell-*) have no upstream counterpart — no conflicts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Document: DarkShell

1. System Overview

2. Architecture Patterns

3. System Components

3b. Component Map (Machine-Readable)

4. Interfaces

5. Data Models

Blueprint Schema

MCP Bridge Registration

6. Integration Contracts

7. Non-Functional Architecture

8. Architecture Decision Records

ADR-001: Enhancements Live in CLI Crate and New Crates, Not Sandbox Runtime

ADR-002: MCP Bridge Runs on Host, Not in Sandbox

ADR-003: Rsync with Tar Fallback for Delta Uploads

ADR-004: Blueprint as Single Source of Truth for Sandbox Configuration

ADR-005: Observability via Host-Side eBPF, Not Sandbox Instrumentation

ADR-006: Three New Crates, Not One Mega-Crate

ADR-007: Sandbox Image Save Requires Mandatory Credential Stripping

ADR-008: No Modification to Existing Upstream Command Semantics

ADR-009: SSH ControlMaster for Exec Performance

ADR-011: Narrow Observability Hook in proxy.rs for Inference Logging

ADR-010: MCP Bridge Traffic Is Outside Sandbox Proxy Scope

9. Deployment Topology

Local Development

DarkClaw Factory

Crate Layout

Build Configuration

Upstream Merge Strategy

FilesExpand file tree

architecture.md

Latest commit

History

architecture.md

File metadata and controls

Architecture Document: DarkShell

1. System Overview

2. Architecture Patterns

3. System Components

3b. Component Map (Machine-Readable)

4. Interfaces

5. Data Models

Blueprint Schema

MCP Bridge Registration

6. Integration Contracts

7. Non-Functional Architecture

8. Architecture Decision Records

ADR-001: Enhancements Live in CLI Crate and New Crates, Not Sandbox Runtime

ADR-002: MCP Bridge Runs on Host, Not in Sandbox

ADR-003: Rsync with Tar Fallback for Delta Uploads

ADR-004: Blueprint as Single Source of Truth for Sandbox Configuration

ADR-005: Observability via Host-Side eBPF, Not Sandbox Instrumentation

ADR-006: Three New Crates, Not One Mega-Crate

ADR-007: Sandbox Image Save Requires Mandatory Credential Stripping

ADR-008: No Modification to Existing Upstream Command Semantics

ADR-009: SSH ControlMaster for Exec Performance

ADR-011: Narrow Observability Hook in proxy.rs for Inference Logging

ADR-010: MCP Bridge Traffic Is Outside Sandbox Proxy Scope

9. Deployment Topology

Local Development

DarkClaw Factory

Crate Layout

Build Configuration

Upstream Merge Strategy