- You are Claude Code. Actions that would be time consuming for a human — writing tests, building features, refactoring code — are fast and comparatively cheap for you.
- Conversation history gets compacted once the context window reaches its limit. Important details from earlier in the conversation — including plans, discoveries, and decisions — may be lost. Proactively write important information to files so it persists beyond context compression.
- Confirm before implementing: After writing a plan but before starting implementation, always present the plan to the user and ask if they have any questions or concerns. Do not begin coding until the user confirms.
- Verify before deleting: Before deleting any files or folders, always verify they are not referenced elsewhere in the codebase using grep or other search tools. Never assume a file is unused.
- Verify assumptions: Before acting on any assumption about the codebase (API signatures, available methods, file locations, type constraints, etc.), read the relevant source. Use grep, glob, or file reads to confirm. And write and run tests. Do not assume — check.
- Verify with builds and tests: After making changes, build the affected project and run existing tests to confirm nothing is broken. When the correct behaviour of a piece of logic is non-obvious, write a test to verify it — including temporary/throwaway tests if that is the fastest way to confirm an assumption. Remove temporary tests once they have served their purpose.
- Do not add comments that merely describe the changes made (e.g., "Modified this to fix bug X").
- Comments should be reserved for explaining the code and functionality themselves (the "how" and "why" of the logic), adhering to standard clean code practices.
- Use clear, descriptive names for all variables.
- Avoid obscure abbreviations (e.g., use
isCollectioninstead ofisColl).
- No hard-coded values: Use named constants or derive values from existing configuration rather than embedding magic numbers or durations directly in logic. If a value depends on another configurable parameter, compute it from that parameter rather than hard-coding a value that assumes a specific default.
- Write plans to a file before implementing: For non-trivial tasks, write the plan to a markdown file in the repo before starting implementation. Delete when done.
- Stop and reassess after repeated failures: If consecutive fix attempts fail to resolve an issue, stop and reconsider the approach rather than continuing to apply further fixes.
- Commits should be focused and well-delimited: Each commit should represent one coherent,
self-contained piece of work (e.g. a bug fix, a single new feature, a refactor, a docs update).
Do not bundle unrelated changes into a single commit. Compare DIFFs to do so. When a file
contains changes that belong in separate commits, use
git add -pto stage specific hunks rather than editing the file, committing, and re-applying changes. Do not add any Claude attribution or co-author lines to commit messages. - Keep CLAUDE.md up to date: After completing a TDD step or any significant implementation milestone, update the Implementation Status and Workspace Structure sections of this file to reflect the current state — including new files, new test counts, newly completed steps, and any changed conventions. This ensures future sessions start with accurate context rather than stale information.
- Ask before committing: After completing a unit of work, ask the user whether they would like to commit the changes rather than waiting for them to request it.
Code Agent is a sandboxed coding agent that runs inside a Linux VM (QEMU), with host-side filesystem interception for N-step undo capability. The project will be released as open source under MIT OR Apache-2.0 dual license.
Design documents:
project-plan.md— full architecture (WriteInterceptor trait, undo log, STDIO API, MCP server, VM)testing-plan.md— 6-layer test pyramid (L1–L6), test matrix UI-01..UI-24, spec decisionstauri-app-plan.md— desktop app design (tabs, config schema, VM lifecycle, Claude integration)desktop-impl-plan.md— desktop app implementation plan (Phases 1-4, current state, next steps)
Rust workspace at repo root: resolver = "3", edition = "2024", rust-version = "1.85".
desktop/src-tauri is excluded from the workspace (Cargo.toml exclude list).
Cargo.toml # workspace root
.cargo/config.toml # [alias] xtask = "run --manifest-path xtask/Cargo.toml --"
xtask/ # Development task runner (NOT a workspace member)
Cargo.toml # standalone crate, depends on clap
src/main.rs # CLI dispatch: build-guest subcommand
guest/ # Guest VM image build files
Dockerfile # multi-stage: compile shim + p9proxy (musl), assemble initramfs
init.sh # /init script for guest VM boot (virtiofs or p9proxy mount,
# sandbox user creation, start shim)
crates/
common/ # codeagent-common — shared types and errors
src/lib.rs # StepId, StepManager trait, StepType, StepInfo, BarrierId,
# BarrierInfo, SafeguardId, SafeguardKind, SafeguardConfig,
# SafeguardEvent, SafeguardDecision, ExternalModificationPolicy,
# SymlinkPolicy, RollbackResult, ResourceLimitsConfig,
# CodeAgentError (incl. RollbackBlocked, SafeguardDenied,
# StepUnprotected, UndoDisabled), Result<T>
control/ # codeagent-control — control channel protocol + handler
src/
lib.rs # module declarations + re-exports
error.rs # ControlChannelError enum
protocol.rs # HostMessage (Exec, Cancel, RollbackNotify),
# VmMessage (StepStarted, Output, StepCompleted),
# OutputStream
parser.rs # JSONL parsing with 1MB size limit
state_machine.rs # ControlChannelState, ControlEvent, PendingCommand,
# ActiveCommand — validates message sequences
handler.rs # QuiescenceConfig, HandlerEvent,
# ControlChannelHandler (quiescence + ambient steps)
in_flight.rs # InFlightTracker (Arc<AtomicUsize> + Notify)
tests/
control_channel.rs # CC-01..CC-07 + edge cases
control_channel_integration.rs # CC-08..CC-12 + edge cases (MockStepManager, paused time)
interceptor/ # codeagent-interceptor — undo log core
src/
lib.rs # module declarations
write_interceptor.rs # WriteInterceptor trait (13 methods)
safeguard.rs # SafeguardHandler trait, SafeguardTracker (per-step counters)
preimage.rs # path_hash, PreimageMetadata, capture/restore preimages
manifest.rs # StepManifest, ManifestEntry (JSON on disk)
rollback.rs # rollback_step (two-pass: delete→recreate→restore)
resource_limits.rs # calculate_step_size, calculate_total_log_size
gitignore.rs # build_gitignore() — opt-in .gitignore-aware preimage skipping
history.rs # read_undo_history() — standalone disk reader (no UndoInterceptor
# instance needed), FileDetail, StepDetail, UndoHistoryData
undo_interceptor.rs # UndoConfig, UndoInterceptor (impl StepManager + WriteInterceptor),
# RecoveryInfo, recover(), per-step barrier storage (BarrierEntry),
# notify_external_modification(), barriers(), rollback(count, force),
# rollback_current_step(), safeguard checks in pre_*, evict_if_needed(),
# discard(), is_undo_disabled(), version check
tests/
common/mod.rs # shared test helpers: OperationApplier, compare_opts
undo_interceptor.rs # integration tests UI-01..UI-08
wal_crash_recovery.rs # crash recovery tests CR-01..CR-07 + step reconstruction
undo_barriers.rs # undo barrier tests EB-01..EB-06, EB-08
safeguards.rs # safeguard tests SG-01..SG-06 + edge cases
resource_limits.rs # resource limit tests UI-16..UI-19, UL-01..UL-08
gitignore.rs # gitignore filter tests GI-01..GI-08
symlink_policy.rs # symlink policy tests SY-01..SY-08
proptest_model.rs # model-based property tests (proptest): undo_model, undo_model_multi_step_rollback
mcp/ # codeagent-mcp — MCP server (JSON-RPC 2.0 over local socket)
src/
lib.rs # module declarations + re-exports
error.rs # McpError enum (9 variants), JsonRpcError struct,
# JSON-RPC 2.0 error codes (standard + application-specific)
protocol.rs # JsonRpcRequest, JsonRpcResponse, JsonRpcNotification,
# ToolDefinition, ToolCallResult, ToolCallParams,
# tool arg structs (ExecuteCommandArgs, ReadFileArgs,
# WriteFileArgs, EditFileArgs, GlobArgs, GrepArgs,
# UndoArgs, etc.)
parser.rs # parse_jsonrpc() with 1MB size limit, extract_id(),
# extract_missing_field()
path_validation.rs # validate_path() — logical .. resolution + containment
router.rs # McpHandler trait (9 methods), tool_definitions(),
# McpRouter (initialize/tools_list/tools_call dispatch,
# path validation for fs tools)
server.rs # McpServer async loop (tokio::select! for requests +
# notifications, generic over AsyncRead/AsyncWrite)
tests/
mcp_server.rs # MC-01..MC-08 contract tests (30 tests)
stdio/ # codeagent-stdio — STDIO API (JSON Lines over stdin/stdout)
src/
lib.rs # module declarations + re-exports
error.rs # StdioError enum (9 variants) + ErrorDetail
version.rs # PROTOCOL_VERSION, MIN/MAX_SUPPORTED_VERSION
protocol.rs # RequestEnvelope, Request (15 variants), payload structs,
# ResponseEnvelope, ErrorDetail, Event (9 variants),
# EventEnvelope, LogEntry
parser.rs # parse_request() with 1MB size limit, envelope-based
# two-step parsing, missing field detection
path_validation.rs # validate_path() — logical .. resolution + containment
router.rs # RequestHandler trait, Router (path validation + dispatch)
server.rs # StdioServer async loop (stdin → router → stdout/stderr)
tests/
stdio_api.rs # SA-01..SA-12 contract tests (37 tests)
test-support/ # codeagent-test-support — test utilities
src/
lib.rs # re-exports
snapshot.rs # TreeSnapshot, EntrySnapshot, assert_tree_eq
workspace.rs # TempWorkspace (isolated temp dir pairs)
fixtures.rs # small_tree, rename_tree, symlink_tree, deep_tree
benches/
snapshot_capture.rs # L6 criterion benchmark: TreeSnapshot::capture (100/1000/10000 files)
interceptor/ # (benches listed below under interceptor/)
benches/
preimage_capture.rs # L6 criterion benchmarks: capture_preimage + zstd_compress (4KB/1MB/100MB)
rollback_restore.rs # L6 criterion benchmark: rollback_step (4KB/1MB/100MB)
manifest_io.rs # L6 criterion benchmarks: manifest write/read/serialize/deserialize (10/100/1000 paths)
p9/ # codeagent-p9 — 9P2000.L server (Windows filesystem backend)
Cargo.toml # depends on tokio, thiserror, codeagent-interceptor, codeagent-control
src/
lib.rs # module declarations + re-exports
error.rs # P9Error enum (Io, Protocol, Fid, NotFound, etc.),
# error_to_errno (io::ErrorKind → Linux errno)
wire.rs # WireReader/WireWriter — LE binary primitives (u8/u16/u32/u64/
# string/qid/data), MAX_MESSAGE_SIZE (16MB)
messages.rs # 23 T/R message struct pairs, type constants (TVERSION=100..),
# Rlerror, NOTAG, ToWire/decode for all message types
qid.rs # Qid struct (qtype, version, path), QidType constants
fid.rs # FidTable (HashMap<u32, FidState>), FidState (path, qid,
# open_handle, open_flags, dir_offset), insert/get/remove/
# resolve_child/update_path
server.rs # P9Server: async dispatch loop over AsyncRead/AsyncWrite,
# with_interceptor(), with_in_flight(), InFlightGuard,
# validate_name() for reserved/case-collision checks,
# dispatches 22 message types to operation handlers
operations/
mod.rs # re-exports
session.rs # handle_version, handle_auth, handle_attach, handle_clunk,
# handle_remove, handle_flush, qid_from_path (FNV-1a hash)
walk.rs # handle_walk (path traversal + containment), is_contained(),
# logical_contains(), resolve_logical()
file.rs # handle_lopen (skips open for dirs on Windows), handle_lcreate,
# handle_read, handle_write, handle_fsync,
# open_with_flags (Linux O_* → Rust OpenOptions)
dir.rs # handle_readdir (qid+offset+type+name packing), handle_mkdir,
# handle_unlinkat, handle_renameat, handle_statfs
attr.rs # handle_getattr, handle_setattr, FileAttributes struct
link.rs # handle_symlink (platform-specific), handle_readlink,
# handle_link
platform/
mod.rs # get_file_attributes(), is_reserved_name(), check_case_collision()
# (no-ops on non-Windows)
unix.rs # Real POSIX attrs via MetadataExt
windows.rs # POSIX attr synthesis (dirs=0o40755, files=0o100644/0o100755),
# RESERVED_NAMES, is_reserved_name(), check_case_collision(),
# is_executable_extension()
tests/
wire_protocol.rs # P9-01..P9-05 wire format round-trip tests (61 tests)
server_operations.rs # SO/WK/RO/WR/LK/RB integration tests (47 tests)
windows_normalization.rs # WN-01..WN-07 Windows normalization tests (8 tests, most cfg(windows))
p9proxy/ # codeagent-p9proxy — guest-side 9P proxy for virtio-serial
Cargo.toml # [[bin]] name = "p9proxy", depends on libc (Unix only)
src/
main.rs # bridges virtio-serial port to Unix socketpair for kernel
# trans=fd 9P transport; forks before mount (child proxies
# data, parent calls mount then exits); bidirectional copy
# between socket and port via two threads
sandbox/ # codeagent-sandbox — host-side agent binary ("sandbox")
Cargo.toml # [[bin]] name = "sandbox", depends on which, toml, dirs
src/
main.rs # entry point: parse CLI → branch on --protocol (stdio|mcp)
lib.rs # module declarations + re-exports
cli.rs # CliArgs (clap derive): --working-dir, --undo-dir, --vm-mode,
# --protocol, --log-level, --qemu-binary, --kernel-path,
# --initrd-path, --rootfs-path, --memory-mb, --cpus,
# --virtiofsd-binary, --config-file
config.rs # SandboxTomlConfig (command_classifier + file_watcher sections),
# FileWatcherConfig (enabled, debounce_ms, recent_write_ttl_ms,
# exclude_patterns), load_config(), default_config_dir/file_path
command_classifier.rs # CommandClassifierConfig (serde, 9 Vec<String> fields),
# CommandClassifier (HashSet-based O(1) lookup), classify(),
# sanitize(); configurable allowlists for read-only/write/
# destructive commands + git/cargo/npm subcommand lists
error.rs # AgentError enum (11 variants: SessionNotActive,
# SessionAlreadyActive, InvalidWorkingDir, QemuUnavailable,
# QemuSpawnFailed, ControlChannelFailed, VirtioFsFailed,
# FileWatcherFailed, NotImplemented, Undo, Io)
session.rs # SessionState enum (Idle | Active), Session struct with
# optional VM fields (qemu_process, fs_backends,
# in_flight_tracker, control_writer, task handles, socket_dir),
# fs_watcher_handle, recent_writes
orchestrator.rs # Orchestrator: implements RequestHandler (15 methods) +
# McpHandler (9 methods), session lifecycle, undo delegation,
# direct host fs access, safeguard confirm/configure,
# launch_vm() for QEMU + virtiofsd + control channel setup,
# agent_execute sends commands through control channel when VM
# available, fs_status reports backend/VM info; holds
# CommandClassifier for configurable command classification
safeguard_bridge.rs # SafeguardBridge: sync SafeguardHandler → async channel bridge
event_bridge.rs # HandlerEvent → STDIO Event translation + run_event_bridge()
control_bridge.rs # spawn_control_writer (mpsc → JSON Lines socket writer),
# spawn_control_reader (socket reader → ControlChannelHandler),
# serialize_host_message
recent_writes.rs # RecentBackendWrites (event-time-based suppression:
# per-path TTL + blanket counter + suppress_ended_at),
# should_suppress(path, event_time), WriteTrackingInterceptor
# (WriteInterceptor decorator for watcher suppression)
fs_watcher.rs # FsWatcherConfig, spawn_fs_watcher() — notify crate v8,
# TimestampedEvent (Instant-stamped at OS delivery),
# debounced event processing, event-time suppression,
# exclude patterns, undo dir filtering, barrier creation
fs_backend.rs # FilesystemBackend trait, NullBackend stub,
# VirtioFsBackend [cfg(not(windows))] — spawns external
# virtiofsd process (no interception),
# InterceptedBackend [cfg(unix)] — wraps virtiofs-backend
# crate's InterceptedVirtioFsBackend as FilesystemBackend,
# P9Backend [cfg(windows)] — spawns P9Server on tokio task
# with WriteInterceptor + InFlightTracker
qemu.rs # QemuConfig (full command-line builder with platform-specific
# machine/accel/fs args; Windows uses virtio-serial chardev +
# virtconsole for 9P transport), QemuProcess (spawn with socket
# readiness polling, stop, pid)
tests/
orchestrator.rs # AO-01..AO-15 + MCP-01..MCP-13 integration tests (40 tests)
undo_history.rs # UH-01..UH-14 undo history integration tests (14 tests)
fs_watcher.rs # FW-01..FW-12 filesystem watcher integration tests (12 tests)
shim/ # codeagent-shim — VM-side command executor binary
Cargo.toml # [[bin]] name = "shim", depends on codeagent-control
src/
main.rs # entry point: open /dev/virtio-ports/control, call run()
lib.rs # Shim struct (HashMap<u64, CommandHandle>), run<R,W>()
# main loop, message dispatch, cancel_all, reap_completed
error.rs # ShimError enum (Io, Json, ChannelClosed, CommandNotFound,
# MalformedMessage)
executor.rs # spawn_command (sh -c, piped output, process groups on Unix,
# drops to uid/gid 1000 via setuid/setgid in pre_exec),
# cancel_command (SIGTERM/SIGKILL on Unix, child.kill on
# Windows), stream_output (buffered interval-based flushing),
# CommandHandle
output_buffer.rs # OutputBufferConfig (max_buffer_size=4096, flush_interval=50ms)
tests/
shim_integration.rs # SH-01..SH-08 integration tests (8 tests, SH-06 ignored on
# Windows) using tokio::io::duplex()
virtiofs-backend/ # codeagent-virtiofs-backend — intercepted virtiofs filesystem backend
Cargo.toml # depends on virtiofsd (Unix only via cfg(unix)), codeagent-interceptor,
# codeagent-control
src/
lib.rs # module declarations (inode_map always; error, intercepted_fs, daemon
# behind #[cfg(unix)])
inode_map.rs # InodePathMap: inode→host path mapping (RwLock<HashMap<u64, PathBuf>>),
# FUSE_ROOT_ID, insert/get/resolve/remove/rename/rename_subtree
error.rs # VirtioFsBackendError enum (Io, Interceptor, Daemon) [Unix only]
intercepted_fs.rs # InterceptedFs: wraps PassthroughFs, implements FileSystem trait (44
# methods), WriteInterceptor pre/post hooks on 16 mutating methods,
# InFlightGuard drop guard, inode_map tracking [Unix only]
daemon.rs # InterceptedVirtioFsBackend: in-process vhost-user daemon, start/stop/
# is_running, spawns daemon on background thread [Unix only]
tests/
filesystem_backend.rs # FB-01..FB-16 L3 integration tests (16 tests, all #[ignore],
# Linux only) — POSIX syscalls → WriteInterceptor method verification
vmm-sys-util-fork/ # vmm-sys-util — fork with macOS support (excluded from
# workspace; compiled via [patch.crates-io] on Unix)
Cargo.toml # platform-gated deps, libc + bitflags always
src/ # macOS impls for FallocateMode, TempDir/TempFile,
# terminal, timer_fd (kqueue), signal, errno;
# epoll/syslog/ioctl gated on Linux
virtiofsd-fork/ # virtiofsd — fork of virtiofsd 1.13.3 with macOS compat layer
# (excluded from workspace members; compiled only when depended on
# via [patch.crates-io])
Cargo.toml # platform-gated deps: vhost/vm-memory under cfg(unix),
# capng/seccomp under cfg(linux)
src/
lib.rs # compat + cross-platform modules; sandbox/seccomp/idmap/limits
# gated behind #[cfg(target_os = "linux")]
compat/ # Platform compatibility layer (7 sub-modules)
mod.rs # re-exports all compat sub-modules
fd_ops.rs # O_PATH_OR_RDONLY, O_DIRECT, fd_to_path, open_proc_self_fd,
# reopen_fd, open_path_fd (Linux: /proc/self/fd, macOS: fcntl)
rename_ops.rs # RENAME_* constants, safe_renameat2 (Linux: SYS_renameat2,
# macOS: renameatx_np with flag translation)
stat_ops.rs # StatExt, statx (Linux: SYS_statx, macOS: fstatat)
credentials.rs # seteffuid/gid, setsupgroup, ScopedCaps (Linux: capng,
# macOS: seteuid/no-op caps)
io_ops.rs # writev_at/readv_at (Linux: pwritev2, macOS: pwritev)
os_facts.rs # OsFacts (Linux: probe openat2, macOS: always false)
types.rs # stat64, off64_t, ino64_t type aliases; lseek64, fstatat64,
# fallocate64 cross-platform wrappers
oslib.rs # delegates to compat for IO/credentials/OsFacts; gates
# mount/umount/filehandle on Linux
passthrough/ # PassthroughFs with O_PATH→O_PATH_OR_RDONLY,
# copy_file_range/syncfs platform-gated
read_dir.rs # platform split: Linux getdents64, macOS getdirentries
server.rs # FUSE server using compat RENAME_* constants
util.rs # linux_only module for pidfd_open/sfork/capabilities
e2e-tests/ # codeagent-e2e-tests — L4 QEMU E2E test infrastructure
src/
lib.rs # module declarations + re-exports
constants.rs # SANDBOX_BIN env var, DEFAULT_BINARY_NAME ("sandbox"), timeouts
messages.rs # STDIO API message builders (session_start, agent_execute, etc.)
jsonl_client.rs # JsonlClient (spawn agent, demux stdout responses/events)
mcp_client.rs # McpClient (Unix domain socket, JSON-RPC 2.0) [cfg(unix)]
tests/
session_lifecycle.rs # SL-01..SL-08 session lifecycle E2E tests (8 tests, all #[ignore])
undo_roundtrip.rs # UR-01..UR-05 undo round-trip E2E tests (5 tests, all #[ignore])
pjdfstest_subset.rs # PJ-01..PJ-05 POSIX filesystem semantics (5 tests, all #[ignore])
safeguard_flow.rs # SF-01..SF-03 safeguard E2E flow (3 tests, all #[ignore])
fuzz/ # L5 fuzz targets (excluded from workspace; cargo-fuzz)
Cargo.toml # libfuzzer-sys + deps on control/stdio/mcp/interceptor
fuzz_targets/
control_jsonl.rs # parse_vm_message + parse_host_message
stdio_json.rs # parse_request
mcp_jsonrpc.rs # parse_jsonrpc
undo_manifest.rs # serde_json::from_str::<StepManifest>
path_normalize.rs # validate_path (MCP + STDIO)
corpus/ # seed inputs per target (48 files total)
control_jsonl/ # 10 seeds
stdio_json/ # 12 seeds
mcp_jsonrpc/ # 9 seeds
undo_manifest/ # 7 seeds
path_normalize/ # 10 seeds
desktop/ # Tauri v2 desktop app (NOT a workspace member)
package.json # React + Tauri frontend deps (react 19, zustand, lucide-react,
# @tauri-apps/plugin-updater)
vite.config.ts # Vite + Tailwind CSS 4 + React plugins
tsconfig.json # TypeScript config (strict, ESNext)
index.html # HTML entry point (favicon.svg)
public/
favicon.svg # App icon (blue hexagon with grid motif)
src/ # React frontend
main.tsx # React entry point
App.tsx # root component (renders Layout + ToastContainer)
index.css # Tailwind + CSS variable dark theme
vite-env.d.ts # Vite type declarations
lib/
types.ts # TypeScript types mirroring Rust SandboxConfig, VmStatus,
# ClaudeConfigInfo, McpServerEntry, CommandClassifierSection,
# UndoStepDetail, BarrierDetail, UndoHistoryData,
# defaultConfig(), defaultCommandClassifier()
hooks/
useSandboxConfig.ts # Zustand store: config load/save with 500ms debounced auto-save,
# toast on save errors
useVmStatus.ts # Zustand store: VM start/stop/poll (2s interval)
useToastStore.ts # Zustand store: global toast notifications (success/warning/
# error/info variants, auto-dismiss)
useUndoHistory.ts # Zustand store: undo history fetch/rollback, 5s polling
components/
Layout.tsx # Sidebar nav (4 tabs: Settings, Monitor, Plug, History) + content
Toast.tsx # Global toast container (multi-variant, stacked, fixed bottom-right)
tabs/
SandboxConfig.tsx # Tab 1: collapsible sections (Working Dir, Resource Limits,
# Safeguards, Advanced, Command Classification) with DirPicker
# (validates via backend), NumberInput, Toggle, Select,
# CommandListEditor (tag-style add/remove for command lists)
VmManager.tsx # Tab 2: status panel, Start/Stop/Restart, memory/CPU sliders,
# file pickers for QEMU/kernel/initrd, auto-start + persist toggles
ClaudeIntegration.tsx # Tab 3: Claude Desktop + Code panels side-by-side, MCP server
# detection/toggle/preview/copy, CLI command generation
UndoHistory.tsx # Tab 4: undo step timeline (newest first), step detail expansion,
# barrier indicators, rollback with confirmation dialog + force option
src-tauri/ # Tauri Rust backend (standalone Cargo.toml)
Cargo.toml # deps: tauri 2, tauri-plugin-{dialog,shell,process,updater},
# serde, serde_json, toml, dirs, which,
# codeagent-interceptor (path), codeagent-common (path)
build.rs # tauri_build::build()
tauri.conf.json # window 1024x768, identifier com.codeagent.desktop, bundle config,
# updater plugin config, NSIS/macOS/deb/AppImage installer settings
capabilities/
default.json # core:default, dialog:default, shell:default, process:default,
# updater:default
icons/ # placeholder icons (32x32, 128x128, 128x128@2x, .ico, .icns)
src/
main.rs # entry point (calls lib::run)
lib.rs # plugin registration (incl. updater) + invoke_handler with all cmds
config.rs # SandboxConfig (9 sections: sandbox, vm, undo, safeguards,
# symlinks, external_modifications, gitignore, claude_code,
# command_classifier) — serde Serialize/Deserialize + Default
paths.rs # config_dir(), config_file_path(), pid_file_path() — platform paths
commands/
mod.rs # re-exports: claude, config, system, undo, vm
config.rs # read_config, write_config, get_config_path (TOML)
system.rs # get_platform, resolve_binary, validate_directory
vm.rs # VmState (separate Mutex for process/stdin/stdout), VmStatus,
# start_vm (spawn + extract I/O handles), stop_vm (kill + cleanup),
# get_vm_status (try_wait), send_mcp_request (JSON-RPC passthrough)
undo.rs # read_undo_history (delegates to codeagent_interceptor::history),
# clear_undo_history
claude.rs # ClaudeConfigInfo, McpServerEntry, detect/write/remove for
# Claude Desktop + Code configs, generate_claude_code_cli_command
- Cross-platform path handling: All internal path strings (preimage metadata, manifest keys,
touched-paths sets, path hashes) use forward slashes. Convert with
.replace('\\', "/"). Thepreimage::path_hash()function normalizes before hashing. - Platform-conditional compilation:
#[cfg(unix)]for real mode bits, symlinks, and the virtiofs backend (virtiofs-backend + InterceptedBackend in sandbox).#[cfg(windows)]for synthetic mode (0o755/0o644),symlink_file, P9Backend, reserved name checks, and case collision detection.#[cfg(target_os = "linux")]reserved for xattrs, seccomp, and Linux-specific virtiofsd modules (sandbox, idmap, limits). - First-touch semantics:
UndoInterceptorcaptures a preimage only on the first mutating touch of a path within a step. Thetouched_paths: HashSet<String>guards against duplicates. - Rollback is pop: Rolling back removes steps from history (not reversible). Two-pass algorithm: (1) delete created paths deepest-first, recreate dirs shallowest-first, restore files; (2) restore directory metadata deepest-first so child ops don't clobber parent mtime.
- On-disk layout:
{undo_dir}/version # "1" {undo_dir}/wal/in_progress/ # active step (promoted to steps/ on close) {undo_dir}/steps/{id}/ # completed steps manifest.json barriers.json # optional, per-step barrier entries preimages/{hash}.dat # zstd level 3 compressed file contents preimages/{hash}.meta.json # PreimageMetadata (path, type, mode, mtime, etc.) - Undo barriers: Barriers are stored per-step in
steps/{id}/barriers.json. A barrier withafter_step_id = Sblocks rollback of step S (because the external modification happened after S and rolling back S would destroy it).rollback(count, force)checks barriers;force: truecrosses and removes them. Barrier IDs are synthesized asstep_id * 1000 + entry_index. - Safeguards: Configurable thresholds (delete count, overwrite-large-file, rename-over-existing)
checked in
pre_*methods. On trigger, callsSafeguardHandler::on_safeguard_triggered()which blocks until Allow/Deny. On Deny,rollback_current_step()undoes all operations in the current step and cancels it. Once a safeguard kind is allowed for a step, it does not re-trigger. - Resource limits:
ResourceLimitsConfigcontrols max log size, max step count, and max single-step preimage data size. Onclose_step, FIFO eviction removes oldest steps to stay within budget. Steps exceedingmax_single_step_size_bytesare markedunprotected— they cannot be rolled back but do not block rollback of subsequent steps. Version mismatch (versionfile ≠CURRENT_VERSION) disables undo;discard()re-enables it. - Test pattern: snapshot → open step → apply operations via OperationApplier → close step →
rollback →
assert_tree_eq(before, after, opts)with large mtime tolerance. - Gitignore filtering: Opt-in via
UndoConfig { gitignore: true, .. }. When enabled, theignorecrate loads.gitignorefiles and.git/info/excludeonce at construction time. Paths matching ignore rules are silently skipped inensure_preimage,record_creation, andcapture_tree_preimages— no preimage, no manifest entry. - Symlink policy: Three-state
SymlinkPolicyenum (Ignore,ReadOnly,ReadWrite), defaultIgnore. Configured viaUndoConfig { symlink_policy: ..., .. }.Ignoreskips symlinks inensure_preimage,record_creation,capture_tree_preimages,post_symlink, andpre_link.ReadOnlyallows preimage capture (read-side) but skips symlink restore on rollback (write-side).ReadWriteenables full symlink support. Write is conditional on read — the enum prevents the invalidread=false, write=truecombination. - Shared directory access modes: Each working directory in
session.starthas anaccessfield:read_write(default) orread_only. Enforced at both mount level (virtiofsd/9P flags) and interceptor level (write rejection).read_onlydirectories have no undo tracking — noWriteInterceptorinstance, no preimage capture. See project-plan §4.10. - Two-channel architecture: The system has two separate communication channels between
host and VM. The filesystem channel (virtiofsd on Linux/macOS, 9P on Windows) carries
actual POSIX syscalls transparently — the VM kernel mounts a filesystem backed by the host,
and the host-side filesystem backend calls
WriteInterceptormethods to capture preimages. The control channel (virtio-serial, JSON Lines) carries only command orchestration — "exec this shell command", step boundary signals, terminal output. The control channel never sees filesystem operations. The agent correlates the two: all filesystem writes betweenstep_started(N)andstep_completed(N)belong to undo step N. - Control channel protocol: JSON Lines over virtio-serial. Host→VM messages:
exec,cancel,rollback_notify. VM→host messages:step_started,output,step_completed. Messages are serde-tagged (#[serde(tag = "type")]). Max message size: 1 MB (rejected before parsing). TheControlChannelStatevalidates sequences and emitsControlEvents; protocol violations produceProtocolErrorevents without breaking the channel. - Control channel handler:
ControlChannelHandler<S: StepManager>integrates the protocol state machine with step lifecycle. Afterstep_completed, a quiescence window (configurable, default 100ms idle / 2s max) waits for in-flight FS ops to drain before closing the step. Writes outside any command step open ambient steps (negative IDs, auto-close after 5s inactivity). The handler is async (tokio) and usestokio::spawnfor quiescence/ambient timeout tasks. - STDIO API protocol: JSON Lines over stdin/stdout. Envelope-based two-step parsing:
first parse
RequestEnvelope(type + request_id + payload), then dispatch on type to parse typed payload. Responses:{"type":"response","request_id":"...","status":"ok"|"error",...}. Events:{"type":"event.*","payload":{...}}. Error codes are string-based (e.g.,"unknown_operation","missing_field","path_outside_root"). Protocol version is declared insession.startpayload (optionalprotocol_versionfield; absent = v1). Path containment forfs.read/fs.listuses logical..resolution without filesystem access — rejects traversal and absolute paths outside root. - MCP server protocol: JSON-RPC 2.0 over a local socket (Unix domain socket on
Linux/macOS, named pipe on Windows). MCP lifecycle:
initialize→initialized→tools/list→tools/call. 9 tools:execute_command,read_file,write_file,edit_file,glob,grep,undo,get_undo_history,get_session_status.write_fileandedit_filecreate synthetic "API steps" for undo. Path containment validated forread_file,write_file,edit_file,glob(optional path),grep(optional path). Error codes use JSON-RPC 2.0 standard codes (-327xx) plus application-specific codes (-320xx). MCP and STDIO share the same undo log and safeguard system; safeguard events from MCP operations are forwarded as notifications. - virtiofsd-fork compat module: The
crates/virtiofsd-fork/src/compat/module provides a centralized platform abstraction layer for porting virtiofsd from Linux to macOS. Key patterns:O_PATH_OR_RDONLY(Linux:O_PATH, macOS:O_RDONLY),O_DIRECT(0 on macOS), 64-bit type aliases (stat64,off64_t,ino64_t),RENAME_*flag constants, and cross-platform wrappers forlseek64/fstatat64/fallocate64. Platform-specific syscalls (copy_file_range,syncfs,getdents64) have macOS equivalents orENOSYSfallbacks. The fork is excluded from workspace members (Unix-only) but available via[patch.crates-io]— only compiled when a Unix dependency chain requires it. - Command classification: The
CommandClassifierclassifies shell commands into ReadOnly, Write, or Destructive categories. Allowlists (read-only, write, destructive commands + git/ cargo/npm subcommand lists) are configurable viaCommandClassifierConfigloaded fromcodeagent.toml. Sanitization checks (fork bombs,sudo, device access, shell expansion attacks) are hardcoded and not user-configurable — these are security-critical. - Dependencies (all permissively licensed): blake3, filetime, ignore, notify (v8, CC0, cross-platform filesystem watching), serde (+derive), serde_json, tempfile, thiserror, tokio (rt, macros, sync, time, io-util), xattr (Linux only), zstd, chrono (+serde), which (binary resolution for QEMU/virtiofsd), toml (config loading), dirs (platform config directory). Unix-only (virtiofs backend): virtiofsd (Apache-2.0/BSD-3, via local fork), vhost-user-backend, vhost, vm-memory, log; vmm-sys-util (via local fork with macOS support). Dev-only: proptest (model-based testing), criterion (performance benchmarks, with html_reports).
The project follows a TDD sequence defined in testing-plan.md §11. All 18 TDD steps are complete:
-
TDD Step 1 (Test Oracle Infrastructure) — complete
codeagent-common: StepId, StepType, StepInfo, CodeAgentError, Result (4 unit tests)codeagent-test-support: TreeSnapshot with blake3 hashing, assert_tree_eq with configurable mtime tolerance and exclude patterns, TempWorkspace, fixture builders (18 unit tests)
-
TDD Step 2 (UndoInterceptor Core) — complete
- WriteInterceptor trait (13 methods matching project-plan §4.3.3)
- StepTracker, preimage capture (zstd + JSON metadata), StepManifest, two-pass rollback
- UndoInterceptor wiring it all together (19 unit tests)
- Integration tests UI-01..UI-08 covering: write 3x, create+write, nested dirs, delete file, delete tree, rename (dest absent), rename (dest exists), rename dir with children (8 tests)
-
TDD Step 3 (WAL + Crash Recovery) — complete
RecoveryInfostruct reports paths restored/deleted and manifest validityUndoInterceptor::recover()— always-rollback-incomplete policy per project-plan §4.8: detectswal/in_progress/, handles empty WAL, valid manifest, and missing/corrupt manifest (falls back to reconstructing manifest frompreimages/*.meta.jsonfiles)rebuild_manifest_from_preimages()— scans preimage metadata when manifest is unavailableUndoInterceptor::new()now reconstructs completed steps from on-disksteps/directoryStepTracker::add_completed_step()— supports disk-based state reconstruction- Shared test helpers extracted to
tests/common/mod.rs(OperationApplier, compare_opts) - Integration tests CR-01..CR-07 + step reconstruction test (8 tests)
-
TDD Step 4 (Undo Barriers) — complete
BarrierId,BarrierInfo,ExternalModificationPolicy,RollbackResulttypes in common crateRollbackBlockederror variant for barrier-blocked rollbackBarrierTrackermodule with in-memory state + JSON persistence (barriers.json)UndoInterceptor::notify_external_modification()— creates barriers underBarrierpolicyUndoInterceptor::rollback(count, force)— checks barriers, blocks or force-crossesUndoInterceptor::barriers()— query all barriersUndoConfig::policy— configurable external modification policy- Integration tests EB-01..EB-06, EB-08 covering: barrier creation, rollback blocking, force rollback, barrier querying, internal writes no-barrier, multiple barriers, warn policy (7 tests)
-
TDD Step 5 (Safeguards — Interceptor Level) — complete
SafeguardId,SafeguardKind,SafeguardConfig,SafeguardEvent,SafeguardDecisiontypes in common crateSafeguardDeniederror variant inCodeAgentErrorSafeguardHandlertrait — synchronous blocking callback for safeguard decisionsSafeguardTracker— per-step counters (delete count, overwrite, rename-over), threshold checks, allowed-kind tracking to prevent re-triggering after AllowUndoConfig::safeguard_config+UndoConfig::safeguard_handler— configurable safeguardUndoInterceptor::rollback_current_step()— mid-step rollback on deny (writes manifest, rolls back WAL, cancels step)- Safeguard checks in
pre_unlink(delete threshold),pre_write/pre_open_trunc(overwrite large file),pre_rename(rename-over-existing) - Integration tests SG-01..SG-06 + 5 edge cases (11 tests)
-
TDD Step 6 (Metadata Capture) — complete
xattrcrate added as Linux-only dependency for reading/writing extended attributesread_xattrs()implemented inpreimage.rsandsnapshot.rs(Linux: real xattr reads; other platforms: empty map)restore_metadata()inrollback.rsnow restores xattrs on Linux (removes stale, sets stored)- OperationApplier extended with:
open_trunc,setattr_truncate,chmod(Unix),fallocate,copy_file_range,set_xattr(Linux),remove_xattr(Linux) - Integration tests UI-09..UI-15 covering: truncate-open, truncate-setattr, chmod (Unix), xattr-set (Linux), xattr-remove (Linux), fallocate, copy-file-range (7 tests; 4 on Windows)
-
TDD Step 7 (Resource Limits + Pruning) — complete
ResourceLimitsConfigin common crate:max_log_size_bytes,max_step_count,max_single_step_size_bytes(allOption, defaultNone)StepUnprotectedandUndoDisablederror variants inCodeAgentErrorStepManifest::unprotectedfield marks steps that exceeded single-step size limit- Atomic preimage writes (temp-file-then-rename) for
.meta.jsonand.datfiles capture_preimagereturns(PreimageMetadata, u64)— includes compressed data sizeresource_limitsmodule:calculate_step_size,calculate_total_log_size(size utilities)UndoInterceptor::evict_if_needed()— FIFO eviction by step count and log sizeUndoConfig::resource_limits— configurable limitsclose_stepreturnsResult<Vec<StepId>>— list of evicted step IDs- Unprotected step tracking: skips preimage capture after threshold exceeded, blocks rollback
- Version mismatch detection on construction,
is_undo_disabled(),discard()to re-enable - Integration tests UI-16..UI-19, UL-01..UL-08 (12 tests)
-
TDD Step 8 (Control Channel Parsing + State Machine) — complete
codeagent-controlcrate: control channel protocol types, JSONL parsing, state machineControlChannelError— MalformedJson, UnknownMessageType, OversizedMessage, UnexpectedStepCompleted, DuplicateStepStarted, OutputForUnknownCommand, UnexpectedStepStarted, CancelUnknownCommandHostMessageenum (Exec, Cancel, RollbackNotify) — serde-tagged, per project-plan §4.2VmMessageenum (StepStarted, Output, StepCompleted) — serde-taggedparse_vm_message/parse_host_message— JSONL parsing with 1MB size limit (rejects before deserialization), distinguishes malformed JSON from unknown message typesControlChannelState— tracks pending (exec sent) and active (step_started received) commands, validates sequences, emitsControlEvents for the callercancel_command— handles cancellation of pending or active commands- Protocol error resilience: violations produce
ControlEvent::ProtocolError, channel continues - Integration tests CC-01..CC-07 + edge cases (18 tests), unit tests (28 tests)
-
TDD Step 9 (Control Channel Integration — Fake Shim) — complete
StepManagertrait inhandler.rs— abstracts step lifecycle for testabilityInFlightTrackerinin_flight.rs—Arc<AtomicUsize>+tokio::sync::Notifyfor tracking in-flight filesystem operations and quiescence drain detectionQuiescenceConfig— configurable idle timeout (100ms), max timeout (2s), ambient inactivity timeout (5s)HandlerEventenum — StepStarted, Output, StepCompleted, AmbientStepOpened, AmbientStepClosed, ProtocolErrorControlChannelHandler<S: StepManager>— integratesControlChannelStatewith step lifecycle, quiescence window (spawned tokio task), ambient step management- Quiescence algorithm: after
step_completed, wait for in-flight drain + idle_timeout, bounded by max_timeout; prevents hangs when operations never complete - Ambient steps: negative IDs (-1, -2, ...), auto-close after inactivity timeout, reset on each write, closed by new exec commands
MockStepManagerin test file — records open/close calls for assertion- Integration tests CC-08..CC-12 + 5 edge cases using
#[tokio::test(start_paused = true)]for deterministic time (10 tests) - InFlightTracker unit tests (7 tests)
-
TDD Step 10 (STDIO API Contract Tests) — complete
codeagent-stdiocrate: STDIO API protocol types, JSONL parsing, error taxonomy, path validation, request routing, async server loopStdioError— 9 variants: MalformedJson, UnknownOperation, MissingField, InvalidField, OversizedMessage, UnsupportedProtocolVersion, PathOutsideRoot, MissingRequestId, IoErrorDetail— structured error response body (code, message, optional field)Requestenum — 15 variants: session.{start,stop,reset,status}, undo.{rollback,history, configure,discard}, agent.{execute,prompt}, fs.{list,read,status}, safeguard.{configure,confirm}Eventenum — 9 variants: StepCompleted, AgentOutput, TerminalOutput, Warning, Error, SafeguardTriggered, ExternalModification, Recovery, UndoVersionMismatch- Envelope-based two-step parsing:
RequestEnvelope→ type dispatch → typed payload parse validate_path()— logical..resolution + containment check (no filesystem access)RequestHandlertrait — 15 async methods;Routervalidates paths and protocol versionStdioServer— async loop withtokio::select!for request/event multiplexingLogEntry— structured JSON Lines log format for stderr (timestamp, level, component)- Integration tests SA-01..SA-12 with
ServerHarness(in-process server viatokio::io::duplex) - Unit tests: 31 (protocol, parser, path_validation). Contract tests: 37 (SA-01..SA-12 + edge cases)
-
TDD Step 11 (MCP Server Contract Tests) — complete
codeagent-mcpcrate: MCP server with JSON-RPC 2.0 protocol, 9 tools, path validation, async server loop, notification forwardingMcpError— 9 variants: ParseError, InvalidRequest, MethodNotFound, InvalidParams, MissingField, PathOutsideRoot, InternalError, OversizedMessage, IoJsonRpcError— structured JSON-RPC 2.0 error object (code, message, data)JsonRpcRequest,JsonRpcResponse,JsonRpcNotification— wire typesToolDefinition,ToolCallResult,ToolCallParams— MCP tool protocol types- Tool argument structs:
ExecuteCommandArgs,ReadFileArgs,WriteFileArgs,EditFileArgs,GlobArgs,GrepArgs,UndoArgs,GetUndoHistoryArgs parse_jsonrpc()— JSONL parsing with 1MB size limit, version validationvalidate_path()— logical..resolution + containment (same algorithm as STDIO)McpHandlertrait — 9 methods for the 9 MCP toolsMcpRouter— dispatchesinitialize,tools/list,tools/call, validates paths forread_file/write_file/edit_file/glob/grepMcpServer— async loop withtokio::select!for request/notification multiplexingMcpTestHarness— in-process test server viatokio::io::duplexUndoMcpHandler(test-only) — wraps realUndoInterceptorfor MC-03/MC-04- Unit tests: 30 (error, protocol, parser, path_validation). Contract tests: 30 (MC-01..MC-08 + edge cases)
-
TDD Step 12 (Fuzz Targets — Initial) — complete
fuzz/directory withcargo-fuzzinfrastructure (excluded from workspace)- 5 fuzz targets covering all existing parsers (INV-7 parser robustness):
control_jsonl—parse_vm_message+parse_host_messagestdio_json—parse_requestmcp_jsonrpc—parse_jsonrpcundo_manifest—serde_json::from_str::<StepManifest>path_normalize—validate_path(MCP + STDIO)
- 48 seed corpus files across 5 targets (derived from unit test inputs)
p9_wiretarget skipped (9P server is Phase 3, not yet built)
-
TDD Step 17 (Model-Based Property Tests) — complete
proptestcrate added as workspace dev-dependency (v1, MIT/Apache-2.0)crates/interceptor/tests/proptest_model.rs— model-based property tests using proptestOpenum with 10 variants: WriteFile, CreateFile, DeleteFile, DeleteTree, Mkdir, Rename, OpenTrunc, SetattrTruncate, Fallocate, CopyFileRange- Pre-filter approach: operations generated freely, invalid ones skipped at runtime
- Weighted
prop_oneof!strategy (writes/creates weighted higher to maintain state) undo_model(50 cases) — per-step optional rollback with snapshot comparisonundo_model_multi_step_rollback(30 cases) — apply all steps, rollback all, verify initial state- Helper functions:
collect_files,collect_dirs,apply_op(runtime index resolution)
-
TDD Step 18 (Performance Baselines — Criterion Benchmarks) — complete
criterioncrate added as workspace dev-dependency (v0.5, MIT/Apache-2.0)crates/interceptor/benches/preimage_capture.rs— preimage capture throughput + isolated zstd compression (4KB, 1MB, 100MB) withThroughput::Bytesreportingcrates/interceptor/benches/rollback_restore.rs— rollback restore throughput (4KB, 1MB, 100MB) withiter_batchedfor per-iteration re-dirtyingcrates/interceptor/benches/manifest_io.rs— manifest write/read (filesystem I/O) + serialize/deserialize (in-memory) for 10, 100, 1000 pathscrates/test-support/benches/snapshot_capture.rs— TreeSnapshot::capture for 100, 1000, 10000 files with two-level directory structure- Deterministic pseudo-random data (LCG) for realistic zstd compression ratios
- HTML reports generated in
target/criterion/
-
TDD Step 13 (QEMU E2E: Session Lifecycle) — complete (tests written, all
#[ignore])codeagent-e2e-testscrate: E2E test infrastructure + STDIO API test clientJsonlClient— spawns agent binary, background stdout demux (responses keyed by request_id via oneshot channels, events buffered per type with Notify)McpClient— Unix domain socket JSON-RPC 2.0 client (#[cfg(unix)])- Message builders:
session_start,session_stop,session_reset,session_status,agent_execute,undo_rollback,undo_rollback_force,undo_history,safeguard_configure,safeguard_confirm(atomic request_id counter) E2eError— 7 variants: BinaryNotFound, Io, ResponseTimeout, EventTimeout, ProcessExited, JsonSerialize, StdinClosed- Integration tests SL-01..SL-08: invalid working dir, multiple dirs, persistent stop, ephemeral stop, session reset, QEMU launch failure, control channel disconnect, resource cleanup (8 tests)
-
TDD Step 14 (QEMU E2E: Undo Round-Trip) — complete (tests written, all
#[ignore])- Integration tests UR-01..UR-05: single file write rollback, multi-file mutation rollback, multi-step partial rollback, delete tree rollback, rename rollback (5 tests)
setup_session()andexecute_and_wait()test helpers- E2E-specific
compare_opts()with 2s mtime tolerance for VM filesystem granularity
-
TDD Step 15 (QEMU E2E: pjdfstest Subset) — complete (tests written, all
#[ignore])- Integration tests PJ-01..PJ-05: create+unlink, rename, chmod, symlink, mkdir+rmdir (5 tests)
exec_collect_output()helper for running shell commands and asserting output
-
TDD Step 16 (QEMU E2E: Safeguard Flow) — complete (tests written, all
#[ignore])- Integration tests SF-01..SF-03: delete threshold deny+rollback, large file overwrite allow, rename-over-existing configure+confirm round-trip (3 tests)
setup_session_with_safeguards()helper with configurable delete threshold
-
Host-Side Sandbox Binary + QEMU Integration — complete
codeagent-sandboxcrate: the CLI binary that wires all crates into a running system- Binary name:
sandbox(E2E tests referenceSANDBOX_BIN/sandbox) CliArgsvia clap derive:--working-dir,--undo-dir,--vm-mode,--protocol(stdio|mcp, default stdio),--log-level,--qemu-binary,--kernel-path,--initrd-path,--rootfs-path,--memory-mb,--cpus,--virtiofsd-binary--protocol mcpmode: auto-starts session from CLI args, runsMcpRouter+McpServeron stdin/stdout for Claude Code Desktop integrationAgentErrorenum: 10 variants (SessionNotActive, SessionAlreadyActive, InvalidWorkingDir, QemuUnavailable, QemuSpawnFailed, ControlChannelFailed, VirtioFsFailed, NotImplemented, Undo, Io)SessionStateenum:Idle|Active(Box<Session>)withArc<std::sync::Mutex<_>>Sessionstruct includes optional VM fields:qemu_process,fs_backends,in_flight_tracker,control_writer,event_bridge_handle,control_reader_handle,control_writer_handle,socket_dir,next_command_idOrchestratorimplements bothRequestHandler(15 STDIO methods) andMcpHandler(10 MCP methods) via interior mutability- Session lifecycle: start (validates dirs, creates UndoInterceptor per dir, crash recovery, version mismatch detection, optional VM launch) → stop (abort tasks, stop QEMU, stop backends, cleanup) → reset (stop + re-start with stored payload)
- VM launch path:
launch_vm()starts virtiofsd backends → spawns QEMU → connects to control socket → createsControlChannelHandler→ spawns event bridge + control reader/writer. Falls back to non-VM mode on failure with Warning event. agent_execute: sends exec commands through control channel when VM is running, returnsQemuUnavailablewhen in host-only modefs_status: returns real backend/VM info when VM is running ("virtiofsd"/"running") or "none"/"unavailable" in host-only modeQemuConfig: full command-line builder with platform-specific settings (Linux: q35/KVM, macOS: virt/HVF, Windows: q35/WHPX), filesystem sharing (virtiofs on Linux/macOS, 9P on Windows), control channel chardevQemuProcess: spawn with control socket readiness polling (100ms intervals, 30s timeout), stop (kill + wait), pid trackingVirtioFsBackend[cfg(not(windows))]: spawns upstream virtiofsd with--shared-dir/--socket-path/--cache=never, binary resolution via common paths + PATHcontrol_bridge:spawn_control_writer(mpsc → JSON Lines),spawn_control_reader(JSON Lines →ControlChannelHandler),serialize_host_message- Undo operations delegate to
UndoInterceptor: rollback, history, configure, discard - Filesystem operations: direct host filesystem access (no VM needed)
SafeguardBridge: bridges syncSafeguardHandlerto async via mpsc + oneshot channelsevent_bridge: translatesHandlerEvent→ STDIOEventUndoInterceptorimplementsStepManagerdirectly (no adapter needed)- MCP
write_file/edit_file: opens synthetic API step on interceptor, writes file, closes step - MCP
glob: pattern matching viaglobcrate, results sorted by mtime (newest first) - MCP
grep: regex search viaregex+walkdircrates, 3 output modes (files_with_matches, content, count) main.rs:--protocol stdio(default) runs StdioServer;--protocol mcpruns McpServer on stdin/stdout- MCP Bash tool: command sanitization (fork bombs, sudo, device access) + configurable
classification (read-only/write/destructive) via
CommandClassifierwithHashSetlookup config.rs:SandboxTomlConfigwithload_config()— loads from--config-filearg or platform default ({config_dir}/CodeAgent/codeagent.toml), falls back to built-in defaults- CLI unit tests (5 tests) + QC-01..QC-10 QEMU config tests (10 tests) + command classifier config tests (10 tests) + TOML config loading tests (5 tests) + integration tests AO-01..AO-15 + MCP-01..MCP-13 (40 tests) + UH-01..UH-14 undo history tests (14 tests) + FW-01..FW-12 filesystem watcher tests (12 tests) = 166 total tests
-
VM-Side Shim — complete
codeagent-shimcrate: lightweight binary that runs inside the guest VM- Binary name:
shim(compiled into guest initrd) - Opens
/dev/virtio-ports/control(or argv[1]), readsHostMessageJSON Lines, dispatches commands, writesVmMessageJSON Lines - Generic
run<R: AsyncRead, W: AsyncWrite>()for testability withtokio::io::duplex() Shimstruct: tracks running commands inHashMap<u64, CommandHandle>, message dispatch,reap_completed(),cancel_all()on shutdownexecutor:spawn_command()—sh -c <command>with piped stdout/stderr, process groups on Unix (setpgid), drops to uid/gid 1000 via setuid/setgid inpre_exec, sendsStepStarted→Output→StepCompletedcancel_command():SIGTERM→ 5s wait →SIGKILLon Unix (process group);child.kill()on non-Unix; aborts output reader tasks on cancelstream_output(): buffered interval-based flushing (OutputBufferConfig: max_buffer_size=4096, flush_interval=50ms)ShimErrorenum: Io, Json, ChannelClosed, CommandNotFound, MalformedMessage- Integration tests SH-01..SH-08 (8 tests, SH-06 cancel ignored on Windows)
-
macOS virtiofs portability (Phase 2) — complete
- Vendored
vmm-sys-utilfork (crates/vmm-sys-util-fork/): added macOS implementations forFallocateMode,TempDir/TempFile, terminal utilities, timer fd (kqueue-based), signal handling, and errno utilities. Gates Linux-only modules (epoll, syslog, ioctl) behind#[cfg(target_os = "linux")]. - Vendored
virtiofsdfork (crates/virtiofsd-fork/): forked virtiofsd 1.13.3 with comprehensive macOS compatibility layer insrc/compat/(7 sub-modules). Key adaptations:fd_ops.rs:O_PATH→O_PATH_OR_RDONLY,/proc/self/fd→fcntl(F_GETPATH),O_DIRECT→ 0 on macOSrename_ops.rs:SYS_renameat2→renameatx_npwith flag translationstat_ops.rs:SYS_statx→fstatatwith field mappingcredentials.rs:capng→seteuid/setegid(macOS has no capabilities)io_ops.rs:pwritev2/preadv2→pwritev/preadvtypes.rs: 64-bit type aliases (stat64/off64_t/ino64_t) + wrapper functionsread_dir.rs:SYS_getdents64→getdirentries(complete platform split)passthrough/mod.rs:copy_file_range→ ENOSYS,syncfs→F_FULLFSYNC,fchownat(AT_EMPTY_PATH)→fchown
- Widened cfg guards:
#[cfg(target_os = "linux")]→#[cfg(unix)]in virtiofs-backend crate (Cargo.toml deps + module gates), sandbox crate (Cargo.toml dep + fs_backend.rs + orchestrator.rs), enablingInterceptedBackendon macOS with full WriteInterceptor hooks - Both forks excluded from workspace members (Unix-only) but available via
[patch.crates-io]
- Vendored
-
9P2000.L Server (Windows Phase 3) — complete
codeagent-p9crate: full 9P2000.L protocol implementation from scratch- Wire format:
WireReader/WireWriterfor LE binary primitives, 23 T/R message struct pairs with encode/decode, max 16MB message size enforcement FidTable:HashMap<u32, FidState>mapping client handles to server state (host path, qid, open file handle, open flags, dir offset)P9Server: transport-agnostic async dispatch loop overAsyncRead + AsyncWrite, builder pattern (with_interceptor(),with_in_flight()), dispatches 22 message types- Operations: session (version/auth/attach/clunk/remove/flush), walk (path traversal +
containment via logical
..resolution), file (lopen/lcreate/read/write/fsync), dir (readdir/mkdir/unlinkat/renameat/statfs), attr (getattr/setattr), link (symlink/readlink/link), mknod (returns EPERM) - WriteInterceptor integration: pre/post hooks on all mutating operations (lcreate, write,
fsync, mkdir, unlinkat, renameat, setattr, symlink, link, remove),
InFlightGuarddrop guard for operation tracking - Platform abstraction: real POSIX attrs on Unix (
MetadataExt), synthesized attrs on Windows (dirs=0o40755, regular files=0o100644, executables=0o100755) - Windows normalization: reserved name rejection (CON, NUL, PRN, AUX, COM0-9, LPT0-9), case collision detection for create/mkdir/rename operations
- Error mapping:
io::ErrorKind→ Linux errno with platform-specific raw OS error fallbacks - Sandbox integration:
P9Backend[cfg(windows)] infs_backend.rsspawns P9Server on tokio task, QEMU connects via virtio-serial chardev + virtconsole (TCP on Windows) - 166 tests: 61 wire protocol + 50 unit (fid table, qid, error) + 47 server operations (SO/WK/RO/WR/LK/RB) + 8 Windows normalization (WN-01..WN-07)
All 18 TDD steps are complete. The host-side sandbox binary, VM-side shim, and QEMU integration code are built. The remaining work is:
Host-side agent binary — complete (see above)VM-side shim — complete (see above)QEMU integration — complete (see above)virtiofsd intercepted backend (Unix, Phases 1+2) — complete. Thecodeagent-virtiofs-backendcrate wraps the virtiofsd fork as a library dependency.InterceptedFsimplements virtiofsd'sFileSystemtrait (44 methods), intercepting 16 mutating methods withWriteInterceptorpre/post hooks andInFlightTrackerfor quiescence detection.InterceptedVirtioFsBackendruns the daemon in-process on a background thread. Integrated into the Orchestrator viaInterceptedBackendadapter infs_backend.rs. Works on both Linux and macOS via the virtiofsd fork's compat layer.9P server (Windows, Phase 3) — complete. Thecodeagent-p9crate implements the 9P2000.L protocol from scratch (not based on crosvm). Transport-agnostic async server overAsyncRead + AsyncWrite. Supports all 22 message types.WriteInterceptorpre/post hooks on all mutating operations withInFlightTrackerdrop guard. Windows-specific: reserved name rejection (CON, NUL, etc.), case collision detection, POSIX attribute synthesis. Integrated into the Orchestrator viaP9Backendadapter infs_backend.rs. QEMU connects via virtio-serial chardev + virtconsole (TCP on Windows). 166 tests (61 wire + 47 server operations + 8 Windows normalization + 50 unit).Guest image build (cargo xtask build-guest) — complete. Docker multi-stage build (Rust Alpine builder + Alpine assembler) produces vmlinuz (Alpine linux-virt kernel) + initrd.img (busybox-static + shim + p9proxy binaries + virtio kernel modules + init script). The init script auto-detects virtiofs (Unix hosts) or 9P-over-virtio-serial via p9proxy (Windows hosts), creates an unprivileged sandbox user (uid 1000), and mounts working directories before starting the shim. Xtask crate atxtask/(standalone, not a workspace member) provides the CLI. Requires Docker with BuildKit.macOS virtiofs support (Phase 2) — complete. Vendored vmm-sys-util and virtiofsd forks with macOS compat layers.InterceptedBackendnow works on all Unix platforms (Linux + macOS). The virtiofsd fork'ssrc/compat/module handles all Linux→macOS API translations (O_PATH,/proc/self/fd,statx,renameat2,getdents64, etc.).
L3 filesystem backend tests: FB-01..FB-16 are written in
crates/virtiofs-backend/tests/filesystem_backend.rs (Linux only, all #[ignore]).
They verify that POSIX syscalls arriving via FUSE trigger the correct WriteInterceptor
method calls. Will become runnable when QEMU/KVM infrastructure is available in CI.
-
External Modification Detection (Filesystem Watcher) — complete
notifycrate v8 for cross-platform filesystem watchingRecentBackendWrites:Mutex<HashMap<PathBuf, Instant>>with configurable TTL (default 5s) for tracking sandbox-originated writes. Normalized path comparison (forward slashes, case-insensitive on Windows).WriteTrackingInterceptor: decorator overArc<dyn WriteInterceptor>that records mutated paths on all 13WriteInterceptormethods. Injected betweenUndoInterceptorand filesystem backends (InterceptedBackend/P9Backend) at the sandbox level.FsWatcherConfig: debounce duration (default 2s), exclude patterns, enabled flag.spawn_fs_watcher(): createsnotify::RecommendedWatcher, bridges sync callbacks to tokio viaspawn_blocking, debounced event accumulation, filters againstRecentBackendWrites+ undo dir prefixes + configurable exclude patterns, groups by working directory, callsinterceptor.notify_external_modification(), emitsEvent::ExternalModification.FileWatcherConfiginSandboxTomlConfigfor TOML configuration (enabled, debounce_ms, recent_write_ttl_ms, exclude_patterns).- Watcher failure is non-fatal: emits
Event::Warningand continues without watching. - MCP
write_file/edit_filerecord paths toRecentBackendWritesafter writing. - Default exclude patterns:
.git/objects,.git/logs,.git/refs,node_modules. - 12 integration tests (FW-01..FW-12) covering: external detection, backend suppression, disabled watcher, exclude patterns, undo dir filtering, no-steps guard, TTL expiry, multi-dir grouping, config deserialization, defaults, WriteTrackingInterceptor recording.
-
Desktop App (Phases 1-4) — complete
- Tauri v2 + React 19 + TypeScript + Tailwind CSS 4 + Zustand stack in
desktop/ - Standalone
desktop/src-tauri/Cargo.toml(excluded from workspace, not a member) - Rust backend (8 files in
src-tauri/src/):SandboxConfigstruct (9 sections) with TOML serialization, platform-specific paths- Config TOML read/write/get_path commands
- VM lifecycle:
start_vm(spawns sandbox binary as child process, PID file),stop_vm(kill + cleanup),get_vm_status(try_wait polling),send_mcp_request(JSON-RPC passthrough to sandbox stdin/stdout) VmStatewith separateMutexfields for process, stdin, and stdout handles- Claude Desktop/Code config detection, merge-write, removal, CLI command generation
- System:
get_platform,resolve_binary,validate_directory - Undo:
read_undo_history(reads step manifests + barriers from disk)
- React frontend (14 files in
src/):- Tab 1 (Sandbox Config): collapsible sections with directory pickers (with validation), number inputs (with range display), toggles, dropdowns. 500ms debounced auto-save.
- Tab 2 (VM Manager): status panel with indicator dot, Start/Stop/Restart controls, memory/CPU sliders, file pickers, auto-start and persist-VM toggles. 2s polling.
- Tab 3 (Claude Integration): Desktop + Code panels side-by-side. MCP server detection/toggle/preview/copy, CLI command generation.
- Tab 4 (Undo History): timeline view of undo steps (newest first), step detail expansion, barrier indicators, rollback with confirmation dialog (force option). 5s polling when VM is running.
- Global toast notification system (success/warning/error/info variants, auto-dismiss, used across all tabs). Zustand store + ToastContainer component.
- Phase 4 additions:
tauri-plugin-updaterwith placeholder endpoint intauri.conf.json- Installer configuration: NSIS (Windows), macOS min version, deb/AppImage (Linux)
- SVG favicon, updater capability permission
- Tauri v2 + React 19 + TypeScript + Tailwind CSS 4 + Zustand stack in
The testing plan (§9) identifies five external test suites to adapt. None are implemented yet; all are blocked on components that don't exist. Adapt each when its target component is built.
| Suite | Source | Target Component | Phase | What It Tests |
|---|---|---|---|---|
| pjdfstest | github.com/pjd/pjdfstest (~600 tests) | virtiofsd/9P mounted filesystem | MVP | POSIX filesystem edge cases via raw syscalls (unlink open files, cross-dir renames, permission semantics, atomic rename-over). Cross-compile the real pjdfstest binary into the guest image and run a curated subset against /mnt/working (skip tests for quotas, ACLs, chown, and other features not relevant to our backends). PJ-01..PJ-05 are shell-command smoke tests that can remain as lightweight fallbacks, but the real POSIX coverage comes from the pjdfstest binary. Additionally, use pjdfstest's write operations as an undo log stress test: snapshot → run pjdfstest suite → rollback → assert_tree_eq. This verifies the undo log correctly captures every filesystem operation pjdfstest exercises, complementing the fine-grained UR-01..UR-05 tests. |
| virtiofsd unit tests | gitlab.com/virtio-fs/virtiofsd | virtiofsd fork | MVP | Path resolution safety, FUSE message parsing, upstream correctness preservation. Run in fork CI as a gate. |
| crosvm p9 fixtures | chromium.googlesource.com/crosvm | 9P server | Phase 3 | 9P wire protocol round-trip (known-byte test vectors for serialize/deserialize identity). |
| Mutagen test vectors | github.com/mutagen-io/mutagen | 9P server (Windows) | Phase 3 | Reserved-name handling (CON, NUL, LPT1), case-collision detection, chmod persistence on Windows. |
| xfstests | github.com/kdave/xfstests | Mounted filesystem (guest) | Optional/Nightly | Extended POSIX stress testing (large repos, concurrent access, filesystem corner cases). |
cargo check --workspace # type-check
cargo test --workspace # run all tests (~687 on Windows; 24 E2E+shim+FB ignored)
cargo clippy --workspace --tests # lint (must be warning-free)
cargo test -p codeagent-interceptor --test undo_interceptor # UI integration tests only
cargo test -p codeagent-interceptor --test wal_crash_recovery # CR integration tests only
cargo test -p codeagent-interceptor --test undo_barriers # EB barrier tests only
cargo test -p codeagent-interceptor --test safeguards # SG safeguard tests only
cargo test -p codeagent-interceptor --test resource_limits # UL/UI resource limit tests only
cargo test -p codeagent-interceptor --test gitignore # GI gitignore filter tests only
cargo test -p codeagent-interceptor --test symlink_policy # SY symlink policy tests only
cargo test -p codeagent-control --test control_channel # CC unit tests only
cargo test -p codeagent-control --test control_channel_integration # CC integration tests only
cargo test -p codeagent-stdio --test stdio_api # SA contract tests only
cargo test -p codeagent-mcp --test mcp_server # MC contract tests only
cargo test -p codeagent-interceptor --test proptest_model # model-based property tests only
cargo test -p codeagent-p9 # p9 server tests (166 tests; WN tests on Windows only)
cargo test -p codeagent-p9 --test wire_protocol # P9 wire format tests only (61 tests)
cargo test -p codeagent-p9 --test server_operations # P9 server operation tests only (47 tests)
cargo test -p codeagent-p9 --test windows_normalization # P9 Windows normalization tests only (8 tests)
cargo test -p codeagent-sandbox # sandbox orchestrator + CLI + QC + classifier + config + watcher + undo history tests (166 tests)
cargo test -p codeagent-sandbox --test orchestrator # AO/MCP integration tests only (40 tests)
cargo test -p codeagent-sandbox --test undo_history # UH undo history integration tests only (14 tests)
cargo test -p codeagent-sandbox --test fs_watcher # FW filesystem watcher tests only (12 tests)
cargo test -p codeagent-shim # shim tests (8 tests, 1 ignored on Windows)
cargo test -p codeagent-shim --test shim_integration # SH integration tests only
cargo test -p codeagent-virtiofs-backend # virtiofs-backend tests (16 unit + 16 ignored L3 on Linux)
cargo test -p codeagent-virtiofs-backend --test filesystem_backend --ignored # FB L3 integration tests (Linux, requires FUSE)
# E2E tests (require QEMU/KVM; all #[ignore] by default)
cargo test -p codeagent-e2e-tests # compile-check only (all tests ignored)
cargo test -p codeagent-e2e-tests --ignored # run all E2E tests (requires KVM + agent binary)
cargo test -p codeagent-e2e-tests --ignored --test session_lifecycle # SL session lifecycle only
cargo test -p codeagent-e2e-tests --ignored --test undo_roundtrip # UR undo round-trip only
cargo test -p codeagent-e2e-tests --ignored --test pjdfstest_subset # PJ POSIX tests only
cargo test -p codeagent-e2e-tests --ignored --test safeguard_flow # SF safeguard flow only
# Performance benchmarks (criterion, L6)
cargo bench -p codeagent-interceptor # interceptor benchmarks (preimage, rollback, manifest)
cargo bench -p codeagent-test-support # snapshot capture benchmarks
cargo bench --workspace # all benchmarks
cargo bench --bench preimage_capture -p codeagent-interceptor # preimage + zstd only
cargo bench --bench rollback_restore -p codeagent-interceptor # rollback only
cargo bench --bench manifest_io -p codeagent-interceptor # manifest I/O only
cargo bench --bench snapshot_capture -p codeagent-test-support # snapshot only
# Fuzz targets (require nightly + cargo-fuzz; Linux only for libFuzzer)
cd fuzz && cargo fuzz list # list all 5 fuzz targets
cd fuzz && cargo fuzz run control_jsonl -- -max_total_time=30 # fuzz smoke (30s)
cd fuzz && cargo fuzz run stdio_json -- -max_total_time=30
cd fuzz && cargo fuzz run mcp_jsonrpc -- -max_total_time=30
cd fuzz && cargo fuzz run undo_manifest -- -max_total_time=30
cd fuzz && cargo fuzz run path_normalize -- -max_total_time=30
# Guest image build (requires Docker with BuildKit)
cargo xtask build-guest # build guest image for host architecture
cargo xtask build-guest --arch aarch64 # cross-build for aarch64
cargo xtask build-guest --no-cache # rebuild without Docker cache
# Desktop app (Tauri v2, separate from workspace)
cd desktop && npm install # install frontend deps
cd desktop/src-tauri && cargo check # type-check Rust backend
cd desktop && npx tsc --noEmit # type-check TypeScript frontend
cd desktop && npm run tauri dev # run desktop app in dev mode
cd desktop && npm run tauri build # build production installer