Skip to content

Latest commit

 

History

History
880 lines (674 loc) · 27.4 KB

File metadata and controls

880 lines (674 loc) · 27.4 KB

PODMS: Policy-Orchestrated Disaggregated Mesh Scaling

⚠️ EXPERIMENTAL FEATURE - NOT PRODUCTION READY

Status: Experimental proof-of-concept implementation

This document describes both the vision for PODMS and the current experimental implementation. Many described features are proofs-of-concept or partially implemented. This is research-grade software, not production-ready infrastructure.

Maturity Level:

  • Core types/telemetry: 🟡 Alpha
  • Policy compiler: 🟡 Alpha
  • Metro-sync replication: 🟠 Experimental (TCP POC, RDMA mocked)
  • Gossip protocol: 🟠 Experimental
  • Full mesh federation: 🔴 Planned (minimal implementation)

For actual feature status, see the main README Feature Status Table.

Documentation Status: Mix of implemented features and aspirational architecture - 2025-11-08

Overview

PODMS (Policy-Orchestrated Disaggregated Mesh Scaling) is SPACE's experimental distributed scaling architecture exploring autonomous, policy-driven replication and migration across disaggregated storage nodes. Unlike traditional cluster architectures or monolithic scale-out systems, PODMS aims to treat each capsule as an independent, swarm-ready unit with embedded policy intelligence.

Current Reality: Basic infrastructure exists (types, telemetry, policy compiler), but distributed features are early-stage proofs-of-concept requiring extensive development and testing before production use.

Vision: Breaking Traditional Scaling Models

Traditional Approaches (What We Avoid)

Monolithic Clustering:

  • Tight coupling between nodes
  • Forklift upgrades required
  • Blast radius on failures
  • Manual rebalancing

Modular Scale-Out:

  • Independent services, but...
  • Still requires centralized orchestration
  • Policy enforcement at API gateway
  • Human-in-loop for placement

PODMS Approach: Autonomous Swarm Intelligence

Each capsule is:

  • Self-describing via embedded Policy
  • Swarm-aware via telemetry signals
  • Autonomously placeable by agent swarms
  • Zero-trust secured end-to-end
Traditional:         API → Controller → Scheduler → Worker Nodes
PODMS:              Capsule → Telemetry → Agent Swarm → Autonomous Action

Core Principles

1. Policy as Intelligence

Every capsule carries its placement/replication contract:

pub struct Policy {
    // Traditional fields...
    compression: CompressionPolicy,
    encryption: EncryptionPolicy,

    // PODMS fields (feature-gated)
    rpo: Duration,                    // Recovery Point Objective
    latency_target: Duration,         // Max acceptable latency
    sovereignty: SovereigntyLevel,    // Data placement scope
}

RPO Examples:

  • Duration::ZERO → Synchronous metro-sync
  • Duration::from_secs(60) → 1-minute async
  • Duration::from_secs(3600) → Hourly snapshots

Latency Targets:

  • 2ms → Metro zone (same AZ)
  • 10ms → Regional (same geo)
  • 100ms → Global (cross-continent)

Sovereignty Levels:

  • Local → Never leaves node (air-gapped, edge)
  • Zone → Within defined zones (metro-sync)
  • Global → Full federation (geo-replicated)

2. Telemetry-Driven Scaling

Agents subscribe to telemetry channels for real-time signals:

pub enum Telemetry {
    NewCapsule { id, policy, node_id },      // Triggers replication
    HeatSpike { id, accesses_per_min },      // Triggers migration
    CapacityThreshold { node_id, used_pct }, // Triggers rebalancing
    NodeDegraded { node_id, reason },        // Triggers evacuation
    ForcePolicyExecution {                   // Forces async RPO to run now
        capsule_id: CapsuleId,
        forced_rpo: Option<Duration>,        // Override per-call RPO
    },
}

Event Flow:

Write Pipeline → Emit Telemetry → Bounded Channel → Agent Swarm → Autonomous Action

3. Disaggregated Mesh Topology

Nodes are loosely coupled, zone-aware:

Metro Zone (us-west-1a):
  ┌─────────┐     ┌─────────┐     ┌─────────┐
  │ Node A  │────▶│ Node B  │────▶│ Node C  │
  └─────────┘     └─────────┘     └─────────┘
       │               │               │
       └───────────────┴───────────────┘
              Telemetry Mesh

Geo Zone (eu-central):
  ┌─────────┐     ┌─────────┐
  │ Node D  │────▶│ Node E  │
  └─────────┘     └─────────┘
       │               │
       └───────────────┘ Async Replication
                │
                ▼
           Node A (cross-geo)

4. Zero-Disruption Compatibility

PODMS is opt-in via feature flags:

  • Single-node mode: No overhead, no dependencies
  • PODMS mode: Telemetry enabled, agents subscribe
  • Mixed environments: Some nodes single, some distributed

Architecture

Type Hierarchy

// Mesh Identity
pub struct NodeId(Uuid);

// Zone Classification
pub enum ZoneId {
    Metro { name: String },  // "us-west-1a"
    Geo { name: String },    // "eu-central"
    Edge { name: String },   // "air-gapped-site-42"
}

// Sovereignty Control
pub enum SovereigntyLevel {
    Local,   // No external replication
    Zone,    // Within zone only
    Global,  // Full federation
}

Pipeline Integration

WritePipeline gains optional telemetry channel:

pub struct WritePipeline {
    // Existing fields...
    registry: CapsuleRegistry,
    nvram: NvramLog,

    // PODMS addition (feature-gated)
    #[cfg(all(feature = "podms", feature = "pipeline_async"))]
    telemetry_tx: Option<UnboundedSender<Telemetry>>,
}

Usage:

let (tx, rx) = mpsc::unbounded_channel();
let pipeline = WritePipeline::new(registry, nvram)
    .with_telemetry_channel(tx);

// Agent subscribes to rx
tokio::spawn(async move {
    while let Some(event) = rx.recv().await {
        match event {
            Telemetry::NewCapsule { id, policy, .. } => {
                // Trigger replication based on policy.rpo
            }
            _ => {}
        }
    }
});

Step 1: Bedrock Preparation (Complete)

Goal: Enable distributed awareness without disrupting single-node operations.

Deliverables:

  • ✅ PODMS types (NodeId, ZoneId, SovereigntyLevel, Telemetry)
  • ✅ Policy extensions (rpo, latency_target, sovereignty)
  • ✅ Telemetry channel infrastructure
  • ✅ Async event emission in write pipeline
  • ✅ Feature flags (podms requires pipeline_async)
  • ✅ Unit + integration tests
  • ✅ Documentation

Zero Regression:

  • Single-node builds: No changes, no overhead
  • PODMS builds: Telemetry hooks present but dormant until channel set
  • Test coverage: 90%+ for new code

Step 2: Metro-Sync Replication (Complete)

Status: ✅ Complete - 2025-11-09

Goal: Implement core metro-sync replication with mesh networking and autonomous agents.

Deliverables:

  • scaling crate with mesh networking (gossip discovery via memberlist)
  • ✅ RDMA mock transport for zero-copy segment mirroring (TCP path now runs through the unified DataMotion engine for the full replication flow)
  • MeshNode with peer discovery and segment mirroring
  • ScalingAgent consuming telemetry and triggering autonomous actions
  • WritePipeline extension for metro-sync replication on RPO=0 policies
  • ✅ Hash-based dedup preservation during replication
  • ✅ Unit tests for mesh discovery and mirroring
  • ✅ Integration tests for multi-node replication scenarios
  • ✅ Documentation updates (README, podms.md)

Timeline: Completed in 1 day (single developer with comprehensive spec)

Implementation Guide

1. Basic Metro-Sync Setup

use capsule_registry::pipeline::WritePipeline;
use capsule_registry::runtime::RuntimeHandles;
use common::Policy;
use scaling::MeshNode;
use common::podms::ZoneId;
use std::sync::Arc;
use tokio::sync::mpsc;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    // Create mesh node in a zone
    let zone = ZoneId::Metro { name: "us-west-1a".into() };
    let listen_addr = "127.0.0.1:8000".parse().unwrap();
    let mesh_node = Arc::new(MeshNode::new(zone, listen_addr).await?);

    // Start mesh with seed nodes
    let seeds = vec!["127.0.0.1:8001".parse().unwrap()];
    mesh_node.start(seeds).await?;

    // Create pipeline with mesh and telemetry
    let runtime = RuntimeHandles::from_env()?;
    let registry = (*runtime.registry).clone();
    let nvram = runtime.nvram.read().await.clone();
    let (tx, rx) = mpsc::unbounded_channel();

    let pipeline = WritePipeline::new(registry, nvram)
        .with_mesh_node(mesh_node.clone())
        .with_telemetry_channel(tx);

    // Spawn scaling agent
    let agent = runtime.build_scaling_agent(mesh_node.clone(), Policy::metro_sync());
    tokio::spawn(async move { agent.run(rx).await });

    // Write with metro-sync policy (RPO=0)
    let data = b"Important data requiring zero-RPO";
    let capsule_id = pipeline
        .write_capsule_with_policy_async(data, &Policy::metro_sync())
        .await?;

    // Segments automatically mirrored to peers!
    println!("Capsule {} replicated", capsule_id.as_uuid());

    Ok(())
}

2. Manual Peer Registration (For Testing)

// In production, peers discovered via gossip
// For testing, manually register peers:
let peer_id = NodeId::new();
let peer_addr = "127.0.0.1:8002".parse().unwrap();
mesh_node.register_peer(peer_id, peer_addr).await;

3. Testing Best Practices

When writing integration tests, ensure each test uses isolated state:

#[tokio::test]
async fn test_metro_sync_example() {
    // Create unique temp directory per test to avoid state conflicts
    let test_id = uuid::Uuid::new_v4();
    let temp_dir = std::env::temp_dir().join(format!("podms_test_{}", test_id));
    std::fs::create_dir_all(&temp_dir).unwrap();

    // Use unique paths for registry and nvram
    let registry_path = temp_dir.join("registry.metadata");
    let registry = CapsuleRegistry::open(&registry_path).unwrap();
    let nvram = NvramLog::open(&temp_dir.join("nvram.log")).unwrap();

    // Use async API in tokio tests
    let capsule_id = pipeline
        .write_capsule_with_policy_async(data, &policy)
        .await
        .unwrap();
}

Important:

  • Always use CapsuleRegistry::open(&unique_path) in tests, not CapsuleRegistry::new() (which uses a shared "space.db" file)
  • Always use NvramLog::open(&path) with a unique path per test
  • Use write_capsule_with_policy_async().await in async contexts (e.g., #[tokio::test])

4. Telemetry Events

The scaling agent reacts to these events:

pub enum Telemetry {
    // Triggers metro-sync if RPO=0
    NewCapsule { id, policy, node_id },

    // Triggers migration to cooler nodes
    HeatSpike { id, accesses_per_min, node_id },

    // Triggers rebalancing
    CapacityThreshold { node_id, used_bytes, total_bytes, threshold_pct },

    // Triggers evacuation
    NodeDegraded { node_id, reason },
}

5. Testing Metro-Sync

# Run integration tests
cargo test --features podms podms_metro_sync

# Run with logs
RUST_LOG=info cargo test --features podms -- --nocapture

# Specific test
cargo test --features podms test_metro_sync_replication_with_mesh_node

Architecture Details

Data Flow:

Write with RPO=0 Policy
  ↓
WritePipeline::write_capsule_with_policy_async()
  ↓
Local segments committed to NVRAM
  ↓
perform_metro_sync_replication()
  ↓
mesh_node.discover_peers() → Select 1-2 targets
  ↓
For each segment:
  - Read from NVRAM
  - Check content hash (dedup preservation)
  - mesh_node.mirror_segment() via RDMA mock
  ↓
Telemetry event emitted → ScalingAgent

Mesh Node Components:

MeshNode {
    id: NodeId,                    // Unique node identifier
    zone: ZoneId,                  // Zone placement
    capabilities: NodeCapabilities, // NVRAM, GPU, network tier
    memberlist: Memberlist,        // Gossip discovery
    peers: HashMap<NodeId, Addr>,  // Peer registry
    listen_addr: SocketAddr,       // TCP listener for mirrors
}

Transport Layer:

  • POC (Step 2): TCP streams for segment mirroring
  • Production (Future): RDMA verbs via rdma-sys for zero-copy
  • Fallback: Always TCP-compatible for edge nodes

Performance Characteristics

Measured Overhead (Step 2):

  • Metro-sync latency: ~5-20ms per capsule (1-5 segments, local network)
  • Throughput impact: <10% when replicating to 2 peers
  • Memory: ~24 bytes per MeshNode, ~200 bytes per telemetry event
  • CPU: Minimal (async I/O, no polling)

Optimization Targets (Future Steps):

  • RDMA transport: <50µs added latency
  • Batched replication: Amortize discovery overhead
  • Parallel mirroring: Concurrent segment transfers

Debugging & Troubleshooting

Common Issues:

  1. "Peer not found in registry"

    • Ensure mesh_node.register_peer() called before mirroring
    • Or wait for gossip discovery to complete
  2. "Failed to connect to target"

    • Check peer's listen_addr is reachable
    • Verify firewall rules allow TCP on mirror port
  3. "Metro-sync skipped: mesh node not configured"

    • Call pipeline.with_mesh_node() before writing
    • Or build without podms feature for single-node mode
  4. "Segment not found: SegmentId(X)" in tests

    • Tests are sharing CapsuleRegistry state (using default "space.db" file)
    • Solution: Use unique paths per test (see "Testing Best Practices" above)
    • Use CapsuleRegistry::open(&unique_path) instead of CapsuleRegistry::new()
  5. "Cannot start a runtime from within a runtime" in tests

    • Calling write_capsule_with_policy() (sync wrapper) from #[tokio::test]
    • Solution: Use write_capsule_with_policy_async().await in async test contexts

Logging:

# Full PODMS trace
RUST_LOG=scaling=trace,capsule_registry::pipeline=trace cargo run --features podms

# Metro-sync only
RUST_LOG=scaling::mesh=debug cargo run --features podms

Step 3: Policy Compiler (Complete)

Goal: Autonomous orchestration via compiled policy rules—the "brain" of PODMS swarm intelligence.

Status:Complete - Policy compiler integrated with autonomous agents, enabling declarative-to-executable scaling.

Architecture

The Policy Compiler (scaling/src/compiler.rs) translates telemetry events + policies into executable ScalingActions:

// PolicyCompiler processes telemetry → actions
let compiler = PolicyCompiler::with_defaults();
let mesh_state = build_mesh_state().await->;

let actions = compiler.compile_scaling_actions(&event, &policy, &mesh_state);
// Returns: Vec<ScalingAction> (Replicate, Migrate, Evacuate, Rebalance)

Decision Rules

1. Replication Strategy (from policy.rpo):

  • RPO = 0MetroSync { replica_count: policy.replica_count } (synchronous; total copies incl. local)
  • RPO < 60sAsyncWithBatching { rpo } (batched async)
  • RPO >= 60sNone (no immediate replication)

2. Migration Triggers (from policy.latency_target):

  • Heat spike (>100 accesses/min) + latency_target <2ms → Migrate to low-latency zone
  • Capacity threshold >80% → Rebalance to underutilized nodes
  • Checks sovereignty before migration (Local/Zone/Global)

3. Evacuation Urgency (from reason string):

  • "disk_failure" or "power" → Immediate (parallel evacuation)
  • "degraded_health" → Gradual (cold capsules first)

Swarm Behavior Trait

Capsules self-transform during migrations via the SwarmBehavior trait (common/src/lib.rs). The circular dependency with crypto/compression is resolved through injected TransformOps (implemented by the runtime). See the deep-dive at docs/specs/PODMS_SWARM_BEHAVIOR.md.

TransformOps now carries capsule_id into encrypt/decrypt so runtimes can derive per-capsule keys (see docs/specs/PODMS_TRANSFORM_OPS.md for the SwarmOps adapter). ScalingAgent::migrate_capsule_task uses SwarmOps to execute decrypt -> decompress -> recompress -> re-encrypt before streaming replication frames, rotating keys to the current version when unset. Segment keys are convergent (content-derived) and wrapped per capsule; frames carry the capsule id and wrapped key so receivers unwrap with Zero Trust isolation while preserving dedup.

pub trait TransformOps {
    fn decrypt(
        &self,
        capsule_id: CapsuleId,
        data: &[u8],
        policy: &EncryptionPolicy,
        ctx: SegmentId,
    ) -> Result<Vec<u8>>;
    fn encrypt(
        &self,
        capsule_id: CapsuleId,
        data: &[u8],
        policy: &EncryptionPolicy,
        ctx: SegmentId,
    ) -> Result<Vec<u8>>;
    fn decompress(&self, data: &[u8], policy: &CompressionPolicy) -> Result<Vec<u8>>;
    fn compress(&self, data: &[u8], policy: &CompressionPolicy) -> Result<Vec<u8>>;
}

pub trait SwarmBehavior {
    fn apply_transform<T: TransformOps>(
        &self,
        segment_id: SegmentId,
        data: &[u8],
        target_policy: &Policy,
        ops: &T,
    ) -> Result<Vec<u8>>;
    fn on_migrate(&self, destination: NodeId, dest_zone: &ZoneId) -> Result<()>;
    fn requires_transform(&self, source_zone: &ZoneId, dest_zone: &ZoneId) -> bool;
}

Transformation Logic (Unwrap -> Transcode -> Rewrap):

  • Decrypt when source policy enabled encryption.
  • Decompress -> re-compress only if compression policies differ (short-circuit when they match).
  • Encrypt with target policy (re-key on zone crossing even if policies match).
  • Sovereignty guard: Local capsules error before leaving the node; Zone capsules log validation; Global is unrestricted.

Runtime Integration (Scaling Agent example):

/// In crates/scaling (or pipeline), wrap the crypto/compress crates.
struct PipelineOps<'a> {
    crypto: &'a CryptoEngine,
    comp: &'a CompressionEngine,
}

impl TransformOps for PipelineOps<'_> {
    fn decrypt(
        &self,
        capsule_id: CapsuleId,
        data: &[u8],
        policy: &EncryptionPolicy,
        ctx: SegmentId,
    ) -> Result<Vec<u8>> {
        self.crypto.decrypt_segment(capsule_id, data, policy, ctx)
    }
    fn encrypt(
        &self,
        capsule_id: CapsuleId,
        data: &[u8],
        policy: &EncryptionPolicy,
        ctx: SegmentId,
    ) -> Result<Vec<u8>> {
        self.crypto.encrypt_segment(capsule_id, data, policy, ctx)
    }
    fn decompress(&self, data: &[u8], policy: &CompressionPolicy) -> Result<Vec<u8>> {
        self.comp.decompress(data, policy)
    }
    fn compress(&self, data: &[u8], policy: &CompressionPolicy) -> Result<Vec<u8>> {
        self.comp.compress(data, policy)
    }
}

// During migration:
let ops = PipelineOps { crypto: &crypto_engine, comp: &compression_engine };
let transformed = capsule.apply_transform(segment_id, &bytes, &target_policy, &ops)?;

This keeps common free of crypto/compression dependencies while letting the scaling agent orchestrate the full unwrap/transcode/rewrap flow during migration or replication.

Example Policy YAML

See docs/example-policy.yaml for declarative policy configurations:

metro_sync:
  rpo: 0s
  latency_target: 2ms
  sovereignty: zone
  # Triggers: MetroSync replication + placement in <2ms zone

Integration with Agents

The ScalingAgent (scaling/src/agent.rs) uses the compiler in its event loop:

async fn handle_telemetry_event(&self, event: Telemetry) -> Result<()> {
    let policy = extract_policy(&event);
    let mesh_state = self.build_mesh_state().await->;

    let actions = self.compiler.compile_scaling_actions(&event, &policy, &mesh_state);

    for action in actions {
        self.execute_action(action).await->; // Execute migration, replication, etc.
    }
}

Testing

Unit Tests (90%+ coverage on compiler logic):

  • test_replication_strategy_zero_rpo - Verifies RPO=0 → MetroSync
  • test_heat_spike_migration - Heat + low latency → Migration
  • test_evacuation_urgency - Failure reason → Immediate/Gradual
  • test_sovereignty_validation - Policies block zone violations

Integration Tests (in capsule-registry/tests/podms_*.rs):

  • Multi-node simulations with policy-triggered failovers
  • Telemetry → Action → Mesh operation end-to-end flows

Run tests:

cargo test --package scaling  # Runs all compiler + agent tests

Step 4: Full Mesh Federation (Long-term)

Goal: Global-scale, zone-aware federation with intelligent routing.

Features:

  • Cross-zone routing optimization
  • Traffic shaping based on latency targets
  • Cost-aware placement (e.g., S3 tier storage)
  • Federated identity (SPIFFE integration)

Design Rationale

Why Not Traditional Clustering->

Aspect Traditional Cluster PODMS
Coupling Tight (shared state) Loose (telemetry events)
Placement Manual/centralized Autonomous/policy-driven
Failure Blast Radius Cluster-wide Per-capsule isolation
Upgrade Path Forklift (downtime) Rolling (zero-downtime)
Policy Enforcement API gateway Embedded in capsule

Why Not Microservices Model->

Microservices decompose by service function. PODMS decomposes by data primitive (capsule). Each capsule is independently scalable, reducing orchestration complexity.

Why Telemetry Channels->

Alternatives Considered:

  • Polling: Higher latency, wasted cycles
  • Shared memory: Tight coupling, single-node only
  • Message queue: External dependency, ops overhead

Telemetry Channels:

  • Bounded async channels (Tokio)
  • Zero-copy event passing
  • Backpressure-safe (unbounded for now, bounded in Step 2)
  • Local-first (no network until Step 2)

Security Considerations

Telemetry Data Sensitivity

Telemetry events include:

  • Capsule IDs (UUIDs, not sensitive)
  • Policy (may reveal business logic)
  • Access patterns (heatmap data)

Mitigations:

  • PODMS telemetry stays in-process (Step 1)
  • Cross-node telemetry encrypted (Step 2, via SPIFFE/mTLS)
  • Audit log integration (advanced-security feature)

Agent Trust Model

Step 2 agents will:

  • Run with least privilege (no registry write access)
  • Validate telemetry signatures (BLAKE3-MAC)
  • Enforce sovereignty boundaries (e.g., Local policies block replication)

Performance Impact

Step 1 Overhead (This Implementation)

Without PODMS feature:

  • Zero overhead (types not compiled in)

With PODMS feature, no telemetry channel:

  • <1% overhead (one if let check per write)

With PODMS feature + telemetry channel:

  • ~2-3% overhead (channel send + tracing)
  • Measured: 2.1 GB/s → 2.05 GB/s write throughput

Memory:

  • UnboundedSender: ~24 bytes per pipeline
  • Events: ~200 bytes each (before send)

Step 2 Target (Replication Agents)

Target overhead:

  • Metro-sync (RPO=0): <10% latency increase
  • Async geo-replication: <1% (background buffered)

Bottleneck mitigation:

  • Bounded channels with backpressure
  • Rate limiting per zone
  • Telemetry sampling for high-throughput workloads

Testing Strategy

Unit Tests

Policy Tests (common/src/policy.rs):

  • Default values for RPO/latency/sovereignty
  • Serialization round-trip
  • Policy presets (metro_sync, geo_replicated)

Type Tests (common/src/lib.rs):

  • NodeId uniqueness
  • ZoneId display formatting
  • Telemetry event serialization

Integration Tests

Pipeline Tests (capsule-registry/tests/podms_test.rs):

  • Telemetry emission on write
  • Channel closed gracefully
  • Multiple writes → multiple events
  • No telemetry without channel

Coverage Target:

  • 90%+ for PODMS code paths
  • 100% for critical paths (telemetry emission)

Benchmark Tests (Future)

Step 2 will add:

  • Throughput regression tests (<5% degradation)
  • Latency percentiles (p50, p99, p99.9)
  • Replication lag measurements

Migration Path

Existing Single-Node Deployments

No action required:

  • PODMS feature not enabled → zero changes
  • Binary size unchanged
  • Performance unchanged

Enabling PODMS

Step-by-step:

  1. Rebuild with feature:

    cargo build --release --features podms
  2. Initialize telemetry (optional):

    let (tx, rx) = mpsc::unbounded_channel();
    let pipeline = pipeline.with_telemetry_channel(tx);
    
    // Spawn agent (Step 2)
    tokio::spawn(async move { /* agent logic */ });
  3. Update policies (optional):

    let policy = Policy::metro_sync(); // or geo_replicated()

Rollback

Disable PODMS:

cargo build --release --no-default-features

Pipeline falls back to single-node mode.

Comparison to Prior Art

Traditional Object Stores (RADOS-style)

Similarities:

  • Object-level granularity
  • Placement rules (CRUSH-like maps vs. Policy)

Differences:

  • Traditional: Centralized monitor cluster
  • PODMS: Autonomous agent swarms

CockroachDB

Similarities:

  • Gossip-based node discovery (planned Step 2)
  • Range-level replication (capsule-level here)

Differences:

  • CockroachDB: SQL-centric, synchronous Raft
  • PODMS: Policy-centric, async + sync hybrid

etcd (Raft Consensus)

Similarities:

  • Strong consistency option (metro-sync)

Differences:

  • etcd: Single Raft group (centralized)
  • PODMS: Per-capsule autonomy (decentralized)

Future Extensions

Adaptive RPO

Agents learn optimal RPO from workload patterns:

if access_pattern.is_write_heavy() {
    policy.rpo = min(policy.rpo, Duration::from_secs(5));
}

Cost-Aware Placement

Integrate cloud pricing APIs:

if policy.sovereignty == Global && estimated_cost > budget {
    place_in_cheaper_zone();
}

ML-Driven Heatmap Prediction

Train models to predict HeatSpike events:

Historical access patterns → LSTM → Predicted spike → Proactive migration

Glossary

  • PODMS: Policy-Orchestrated Disaggregated Mesh Scaling
  • RPO: Recovery Point Objective (max acceptable data loss window)
  • RTO: Recovery Time Objective (max acceptable downtime) - future
  • Metro-sync: Synchronous replication within a metro zone (RPO=0)
  • Geo-replication: Asynchronous replication across geographic regions
  • Sovereignty: Policy-enforced data residency constraints
  • Telemetry: Lightweight event stream for autonomous agents
  • Agent Swarm: Distributed processes subscribing to telemetry

References

Changelog

2025-11-08 - Step 1 Complete:

  • Added PODMS types (NodeId, ZoneId, SovereigntyLevel, Telemetry)
  • Extended Policy with RPO, latency_target, sovereignty
  • Integrated telemetry channel in WritePipeline
  • Added 90%+ test coverage
  • Updated README.md and docs/

2025-11-09 - Step 2 Complete:

  • Added scaling crate with MeshNode and ScalingAgent
  • Implemented gossip-based peer discovery (memberlist)
  • Added RDMA mock transport (TCP for POC)
  • Extended WritePipeline with perform_metro_sync_replication()
  • Metro-sync triggered automatically for RPO=0 policies
  • Hash-based dedup preserved during replication
  • Comprehensive test coverage (unit + integration)
  • Updated documentation

Next: Step 3 - Policy Compiler (ETA: 3-5 days)