You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Convert the existing PoC to a working MVP of Aruna V3: a P2P distributed scientific data management system supporting single-realm federation with S3-compatible storage, CRDT-based metadata, and role-based authorization.
Core capabilities:
Full federation where any node can serve as entry point and provide comprehensive answers by coordinating with peers
Data replication and redundancy across configurable N nodes ensuring availability when individual nodes go offline
Content-addressed immutable data storage with S3-compatible interface (PUT/GET/LIST/multipart)
Collaborative metadata editing using Automerge CRDTs with RO-Crate JSON-LD format
Full-text and structured search across distributed metadata with bitmap-based authorization filtering
Path-based role permissions with wildcard support and deny-overrides-allow semantics
OIDC-based authentication with realm-signed tokens
Federation behavior:
A user connecting to Node A can discover and access data stored on Node B (subject to authorization)
Search queries return results from across the realm, deduplicated and authorization-filtered
Metadata updates propagate to all nodes holding replicas via Automerge sync
Data objects replicate according to configured replication factor
Node discovery via DHT allows dynamic cluster membership
Fuzz test corpus established for protocol and parsing code
Test Concept
Unit testing (per crate):
Each crate maintains unit tests covering its internal logic. Network and storage dependencies are abstracted behind traits allowing mock injection. Target coverage >80% for core logic paths.
aruna-metadata: Fuzz RO-Crate JSON-LD parsing with malformed documents. Fuzz Tantivy query parsing with arbitrary search strings.
Fuzz tests use cargo-fuzz or arbitrary crate. Goal is discovering unhappy paths and crash conditions before they occur in production. Fuzz corpus maintained in repository.
Integration testing (aruna crate):
Multi-node integration tests run multiple Aruna instances within a single test binary, each bound to different ports. Tests verify cluster formation, cross-node data access, search aggregation, permission propagation, and behavior during simulated node failures.
End-to-end testing:
Scripted scenarios exercising the full user journey: authentication, group creation, data upload with replication verification, concurrent metadata editing, search with authorization filtering, permission revocation, and continued access during node failure.
Risks / Blockers
Technical risks:
Iroh ecosystem evolving; API changes may require adaptation. Mitigation: pin versions, abstract behind traits.
Automerge performance on large documents untested at scale. Mitigation: benchmark early, consider document splitting if needed.
Roaring bitmap memory usage with many roles and large indexes. Mitigation: monitor during testing, evaluate sparse optimizations if needed.
Design questions to resolve:
Message routing from network layer to domain crates: channel-based dispatch with typed message enums appears most testable, but needs validation.
Replication factor configuration: per-realm default with per-resource override, exact configuration format TBD.
DHT query caching duration: affects consistency vs network overhead tradeoff.
Dependencies:
OIDC provider availability for authentication testing (can use mock provider for CI).
iroh-dht-experiment stability (experimental status acknowledged).
Scope boundaries:
Multi-realm federation explicitly out of scope; single realm only.
Policy engine (DR-007) deferred to post-MVP.
Compute orchestration, TES, DRS (DR-008) deferred to post-MVP.
Full S3 API compatibility not targeted; focus on core operations.
Goal
Convert the existing PoC to a working MVP of Aruna V3: a P2P distributed scientific data management system supporting single-realm federation with S3-compatible storage, CRDT-based metadata, and role-based authorization.
Core capabilities:
Federation behavior:
Workplan
aruna-network - P2P networking foundation [feat] aruna-network: Iroh P2P network implementation #221
aruna-auth - Authentication and authorization
*for single segment, trailing**for recursive)aruna-orga - Organizational structure management
aruna-data - Content-addressed data storage
aruna-metadata - Scientific metadata management
aruna - Application composition
Integration and testing
Definition of Done
Test Concept
Unit testing (per crate):
Each crate maintains unit tests covering its internal logic. Network and storage dependencies are abstracted behind traits allowing mock injection. Target coverage >80% for core logic paths.
Fuzz testing (per crate where applicable):
Fuzz tests use cargo-fuzz or arbitrary crate. Goal is discovering unhappy paths and crash conditions before they occur in production. Fuzz corpus maintained in repository.
Integration testing (aruna crate):
Multi-node integration tests run multiple Aruna instances within a single test binary, each bound to different ports. Tests verify cluster formation, cross-node data access, search aggregation, permission propagation, and behavior during simulated node failures.
End-to-end testing:
Scripted scenarios exercising the full user journey: authentication, group creation, data upload with replication verification, concurrent metadata editing, search with authorization filtering, permission revocation, and continued access during node failure.
Risks / Blockers
Technical risks:
Design questions to resolve:
Dependencies:
Scope boundaries: