Skip to content

Latest commit

 

History

History
459 lines (350 loc) · 13.6 KB

File metadata and controls

459 lines (350 loc) · 13.6 KB

SPACE Simulations Guide

This document describes SPACE's simulation capabilities for testing data management features without physical hardware. Simulations enable end-to-end validation of compression, deduplication, encryption, and Phase 4 protocol views in development and CI environments.

Table of Contents

Overview

SPACE simulations provide:

  • Hardware Independence: Test full data pipelines without NVMe, NVRAM, or fabric hardware
  • Modularity: Enable/disable specific simulations (e.g., skip NVMe-oF to reduce overhead)
  • Isolation: Simulations run in separate crates and containers, preventing production contamination
  • Realism: Leverage SPDK for NVMe-oF protocol emulation where available

Design Principles

  1. Separate Crates: Each simulation is a standalone crate (sim-nvram, sim-nvmeof, sim-other)
  2. Optional: Production builds exclude simulations entirely via workspace configuration
  3. Runtime Selection: The "sim" container loads only requested modules via SIM_MODULES env var

Simulation Modules

NVRAM Simulation

Crate: crates/sim-nvram Purpose: Lightweight append-only log emulation for testing write pipeline Dependencies: nvram-sim (core implementation)

Features

  • File-Backed: Persists segments to disk for multi-run tests
  • Transaction Support: Atomic multi-segment writes
  • Metadata Tracking: Stores compression, dedup, encryption metadata per segment
  • Fault Injection (future): Simulate I/O errors for resilience testing

API Example

use sim_nvram::start_nvram_sim;
use common::SegmentId;

let log = start_nvram_sim("test.log")?;
let seg_id = SegmentId(1);

// Write
log.append(seg_id, b"test data")?;

// Read
let data = log.read(seg_id)?;
assert_eq!(data, b"test data");

Configuration

use sim_nvram::{start_nvram_sim_with_config, NvramSimConfig};

let config = NvramSimConfig {
    backing_path: "/tmp/sim_nvram.log".to_string(),
    enable_fault_injection: true,  // Enable error injection
    simulated_latency_us: 100,     // Simulate 100μs latency
};

let log = start_nvram_sim_with_config(config)?;

Performance

  • Overhead: ~5% vs. real NVRAM (file I/O)
  • Throughput: Suitable for up to 10K segments in tests
  • Scalability: Single-threaded; use multiple logs for parallel tests

NVMe-oF Simulation

Crate: crates/sim-nvmeof
Purpose: Native Rust NVMe-over-TCP target for discovery/connect testing
Dependencies: anyhow, byteorder, tracing; optional spdk-rs + libc behind the spdk feature

Features

  • Native NVMe/TCP stack: Implements ICReq/ICResp, Fabrics Connect, discovery log (0x70), identify, and basic read/write
  • CI-friendly default: Works in Docker/CI without hugepages or privileged containers
  • File-backed: Automatically creates a 100MB backing file if missing
  • nvme-cli validated: Compatible with nvme discover, nvme connect, and nvme read/write
  • SPDK-opt-in: --features spdk builds the SPDK path; runtime preflight (hugepages + memlock + root) is enforced and falls back to native TCP on failure

Requirements

  • nvme CLI installed (nvme-cli package)
  • Root privileges for nvme connect and I/O device creation

SPDK mode (optional):

  • Build with cargo build -p sim-nvmeof --features spdk (Linux only)
  • Hugepages: at least 512MB free (e.g., 256x2MB) visible in /proc/meminfo
  • memlock ulimit: unlimited or >=512MB
  • Root / CAP_SYS_RESOURCE for hugepage mapping

Usage

Standalone Binary:

NODE_ID=node1 \
BACKING_PATH=/data/backing.img \
LISTEN_ADDR=0.0.0.0 \
LISTEN_PORT=4420 \
SUBSYSTEM_NQN=nqn.2024-01.io.space:sim \
  sim-nvmeof

Helper Scripts (Linux with nvme-cli):

# 1) Discovery-only check (starts sim, runs nvme discover)
./scripts/nvmeof_discover.sh

# 2) Full connect + 4KiB read/write verification (requires root)
sudo ./scripts/nvmeof_connect_io.sh

In Docker (see Containerization):

services:
  sim:
    image: space-sim:latest
    environment:
      SIM_MODULES: nvmeof
      NODE_ID: sim-node1
    ports:
      - "4420:4420"

Client Connection (requires nvme-cli):

# Discover target
nvme discover -t tcp -a 127.0.0.1 -s 4420

# Connect
sudo nvme connect -t tcp -n nqn.2024-01.io.space:sim -a 127.0.0.1 -s 4420

Troubleshooting

  • nvme CLI missing: Install nvme-cli (apt install nvme-cli, dnf install nvme-cli, etc.)
  • Permission denied: nvme connect requires root; rerun with sudo
  • Port already in use: Change LISTEN_PORT or stop conflicting service
  • SPDK not used: Check that the binary was built with --features spdk and the host has hugepages + memlock; otherwise the sim logs a warning and continues with the native TCP path

Other Simulations

Crate: crates/sim-other Purpose: Placeholder for future simulation modules

Planned Extensions

  1. GPU Offload (--features gpu-offload): Mock CUDA/OpenCL for CapsuleFlow testing
  2. ZNS SSD: Simulate zoned namespaces for append-only workloads
  3. Network Conditions: Inject latency/packet loss for mesh testing
  4. DPU/SmartNIC: Simulate hardware accelerators

Adding a New Simulation

  1. Add feature to sim-other/Cargo.toml:

    [features]
    gpu-offload = []
  2. Implement in sim-other/src/gpu.rs:

    #[cfg(feature = "gpu-offload")]
    pub fn start_gpu_offload_sim() -> Result<()> {
        // Mock GPU compression/dedup
        Ok(())
    }
  3. Update scripts/sim-entrypoint.sh to recognize SIM_MODULES=gpu

Architecture

Layered Design

┌─────────────────────────────────────────────────────────────┐
│  Application Layer (spacectl, protocol views)               │
├─────────────────────────────────────────────────────────────┤
│  Pipeline Layer (compression, dedup, encryption)            │
├─────────────────────────────────────────────────────────────┤
│  NVRAM Layer                                                 │
│  ├─ Production: nvram-sim (core)                           │
│  └─ Dev/Test: sim-nvram (wrapper with hooks)               │
├─────────────────────────────────────────────────────────────┤
│  Protocol Layer (Phase 4)                                    │
│  ├─ Production: Real NVMe-oF via SPDK                      │
│  └─ Dev/Test: sim-nvmeof (SPDK target or TCP fallback)     │
└─────────────────────────────────────────────────────────────┘

Container Architecture

docker-compose.yml
├─ spacectl: CLI + S3 server
├─ io-engine-1, io-engine-2: Data pipeline nodes
├─ metadata-mesh: Capsule registry
└─ sim: Orchestrates simulations
   ├─ /sim/nvram: NVRAM log files
   ├─ /sim/nvmeof: NVMe-oF backing image + binary
   └─ entrypoint.sh: Parses SIM_MODULES, starts selected sims

Usage

Quick Start

  1. Build and Start Environment:

    ./scripts/setup_home_lab_sim.sh

    This:

    • Builds Docker images (space-core, space-sim)
    • Configures hugepages (if Linux)
    • Starts Docker Compose with default simulations (NVRAM)
  2. Run Tests:

    ./scripts/test_e2e_sim.sh
  3. Stop:

    docker compose down

Selective Module Loading

Enable only NVRAM (lightweight, <1GB RAM):

export SIM_MODULES=nvram
docker compose up -d

Enable NVRAM + NVMe-oF (requires ~4GB RAM, Linux):

export SIM_MODULES=nvram,nvmeof
docker compose up -d

Disable all simulations (production-like):

docker compose up -d spacectl metadata-mesh io-engine-1
# Omit "sim" service

Integration with Pipeline

In Tests

// crates/capsule-registry/tests/my_test.rs
use sim_nvram::start_nvram_sim;

#[test]
fn test_with_sim() -> Result<()> {
    let log = start_nvram_sim("test.log")?;
    // Use log in pipeline tests
    Ok(())
}

Pipeline Integration Hook

The WritePipeline automatically detects simulation mode via the SPACE_SIM_MODE environment variable:

use capsule_registry::WritePipeline;
use common::CapsuleRegistry;
use nvram_sim::NvramLog;

// Method 1: Automatic simulation override
std::env::set_var("SPACE_SIM_MODE", "nvram");
let pipeline = WritePipeline::new(registry, nvram);
// Pipeline automatically uses sim-nvram at /sim/nvram/sim.log

// Method 2: Direct simulation usage (for tests)
use sim_nvram::start_nvram_sim;

let sim_log = start_nvram_sim("test.log")?;
let pipeline = WritePipeline::new(registry, sim_log);

Implementation Details (from pipeline.rs:232-251):

impl WritePipeline {
    pub fn new(registry: CapsuleRegistry, nvram: NvramLog) -> Self {
        // Pipeline Integration Hook: Check for simulation mode override
        let nvram = if env::var("SPACE_SIM_MODE").ok().as_deref() == Some("nvram") {
            info!("SPACE_SIM_MODE=nvram detected, initializing simulation NVRAM");
            match start_nvram_sim("/sim/nvram/sim.log") {
                Ok(sim_log) => {
                    info!("Simulation NVRAM initialized at /sim/nvram/sim.log");
                    sim_log
                }
                Err(err) => {
                    warn!(error = %err, "Failed to initialize simulation NVRAM, using provided NVRAM");
                    nvram
                }
            }
        } else {
            nvram
        };
        // ... rest of initialization
    }
}

Usage in Docker:

# Set environment variable in docker-compose.yml
services:
  io-engine-1:
    environment:
      SPACE_SIM_MODE: nvram  # Enable simulation mode
    volumes:
      - sim-data:/sim        # Mount sim directory

Benefits:

  • Zero code changes: Tests automatically use simulation when SPACE_SIM_MODE=nvram is set
  • Graceful fallback: Falls back to provided NVRAM if simulation init fails
  • Production safety: Simulation code only active when explicitly enabled

Testing

Unit Tests

# Test sim-nvram
cargo test -p sim-nvram --lib

# Test sim-nvmeof (may fail on non-Linux)
cargo test -p sim-nvmeof --lib

Integration Tests

# Pipeline with NVRAM sim
cargo test -p capsule-registry --test pipeline_sim_integration

# All integration tests
cargo test --workspace --tests

E2E Tests

# Native (no Docker)
./scripts/test_e2e_sim.sh --native

# Docker environment
./scripts/test_e2e_sim.sh

# Verbose logging
./scripts/test_e2e_sim.sh --verbose

Troubleshooting

Common Issues

Issue: cargo test -p sim-nvmeof fails with "SPDK not available" Solution: Expected on non-Linux. SPDK requires hugepages. Use TCP fallback for basic tests.

Issue: Docker container "sim" exits immediately Solution: Check logs: docker compose logs sim. Ensure SIM_MODULES is set (default: nvram).

Issue: Tests fail with "Segment not found" Solution: Cleanup stale test files: rm -f test_*.log test_*.log.segments

Issue: NVMe-oF target not discoverable Solution:

  1. Check hugepages: cat /proc/meminfo | grep Huge
  2. Ensure container has --privileged or --cap-add=SYS_ADMIN
  3. Check logs: docker compose logs sim | grep NVMe

Debug Logging

Enable debug logs:

export RUST_LOG=debug
cargo test -p sim-nvram -- --nocapture

Or in Docker:

docker compose up -d
docker compose logs -f sim

Performance Profiling

Benchmark NVRAM sim overhead:

cargo bench -p capsule-registry -- pipeline
# Compare "real NVRAM" vs "sim-nvram" throughput

Future Extensions

Planned Features

  1. Fault Injection API: Inject I/O errors, latency spikes, partial writes

    config.fault_injection = FaultConfig {
        error_rate: 0.01,  // 1% error rate
        error_types: vec![ErrorType::Timeout, ErrorType::Corruption],
    };
  2. Distributed Simulation: Multi-node NVRAM sync for mesh testing

  3. GPU Offload Sim: Mock CUDA kernels for CapsuleFlow

  4. Telemetry: Export Prometheus metrics from sims

  5. Record/Replay: Capture real workloads, replay in sim

Contributing

To add a new simulation module:

  1. Create crate: crates/sim-<name>
  2. Add to workspace: Cargo.toml
  3. Implement start_<name>_sim() -> Result<()> API
  4. Update scripts/sim-entrypoint.sh to parse SIM_MODULES=<name>
  5. Add tests: cargo test -p sim-<name>
  6. Document in this file

See Also