As object storage systems continue to grow in scale, the cost and energy consumption of online storage media (SSD / HDD) increasingly become system bottlenecks. For data that is accessed extremely infrequently but must be retained for long periods (cold data), using tape as a cold storage medium—combined with automated hot/cold tiering and retrieval mechanisms—is a proven, industry‑standard, and cost‑optimal solution.
The goal of this project is to build a system that is:
S3‑compatible, tape‑backed, and capable of automatic hot/cold tiering and on‑demand restore (thawing)
The system targets government, enterprise, research, finance, and media workloads that require low cost, long‑term retention, and high reliability.
- Maintain full S3 protocol transparency for upper‑layer applications, requiring no business logic changes
- Support automatic migration of object data to tape and other cold storage media
- Provide controllable, observable, and extensible restore and rehydration capabilities
- Support enterprise‑grade archive workflows involving offline tapes and human intervention
- Achieve a practical engineering balance among performance, capacity, and reliability
- Support standard S3 semantics for object upload, download, and metadata operations
- Support triggering cold data retrieval via extended APIs or S3 Restore semantics
- Do not break existing S3 clients, SDKs, or applications
Support policy‑based migration of object data from general‑purpose storage to cold storage media:
- Lifecycle rules (time‑based)
- Access frequency (hot/cold identification)
- Object tags or bucket‑level policies
After archival:
- Object metadata remains visible and accessible
- Data and metadata are decoupled
- Metadata permanently resides on online storage
-
Provide restore APIs for users to explicitly request data retrieval
-
Automatically rehydrate data back to hot/warm storage after restore
-
Implement a complete restore state machine:
pendingin‑progresscompletedexpired
-
Support TTL‑based expiration of restored data
-
Provide a cache layer for restored data to avoid repeated tape reads
-
Serve GET requests directly from cache for recently accessed objects
-
Cache policies support:
- Capacity limits
- Access frequency
- TTL‑based eviction
- Cold data archival write throughput ≥ 300 MB/s
- System throughput scales near‑linearly via horizontal expansion
- Overall bandwidth increases by adding nodes and tape drives
- Restore throughput ≥ 1 restore task per 5 minutes
- Restore capacity scales near‑linearly with resource growth
- Implement restore queues and scheduling to minimize tape swaps
-
Support offline tape storage scenarios
-
When a restore request hits an offline tape:
- Automatically send notifications (messages / tickets / alerts)
- Prompt operators to scan or confirm tape identifiers
-
Support batch restore merging:
- Merge multiple requests targeting the same tape or archive bundle
- Execute unified notification, loading, and reading workflows
Support multiple cold data reliability strategies:
- Dual‑tape replication (multiple copies)
- Tape‑level or file‑level erasure coding (EC)
Reliability and storage efficiency are configurable trade‑offs.
- Support automatic update and deletion of cold data
- Allow multiple updates or deletions during the data lifecycle
- Support at least one full cold‑data maintenance cycle per year
- Provide verifiable and traceable consistency between metadata and tape data
- Support mainstream tape libraries and tape devices
- At minimum, support LTO‑9 and LTO‑10
- Remain vendor‑neutral with respect to tape manufacturers
The system adopts a layered, decoupled architecture:
- S3 Access Layer: External object access interface
- Metadata & Policy Management Layer: Hot/cold state, lifecycle, and indexing
- Tiering & Scheduling Layer: Core archival and restore orchestration
- Cache Layer: Online cache for restored data
- Tape Management Layer: Unified abstraction for tapes, drives, and libraries
- Notification & Collaboration Layer: Offline tape and human interaction
- Handles PUT / GET / HEAD requests
- Intercepts access to cold objects and evaluates state
- Can be implemented via MinIO or a custom S3 proxy
- Maintains object hot/cold state and archive locations
- Manages tape indexing and restore state machines
- Metadata is always stored on online storage (KV / RDB)
- Scans objects eligible for archival
- Performs batch aggregation and sequential tape writes
- Coordinates tape drives for write operations
- Manages restore request queues
- Merges restore requests targeting the same tape or archive bundle
- Controls concurrency and minimizes tape swaps
- Stores restored data
- Serves GET requests with low latency
- Supports multi‑tier caching (SSD / HDD)
- Abstracts tape libraries, drives, and media
- Tracks offline tape status
- Integrates with notification systems for human intervention
- Object is written via S3
- Lifecycle policy is triggered
- Scheduler aggregates objects and writes them to tape
- Metadata hot/cold state is updated
- Online storage space is reclaimed
- User issues a restore request
- System checks tape online/offline status
- Online: automatic scheduling and rehydration
- Offline: notification and human intervention
- Data is restored and written into cache
- Object becomes readable again
- High Availability: Metadata and schedulers must support HA
- Observability: Full monitoring of archive, restore, and tape states
- Compliance: Optional WORM and long‑term retention policies
- Extensibility: Adding tapes or drives must not disrupt existing systems
This section translates the design into an implementable architecture using object storage, S3, custom schedulers, and tape systems.
| Layer | Technology | Description |
|---|---|---|
| S3 Access | MinIO / RustFS | S3‑compatible API, PUT/GET/HEAD/Restore semantics |
| Metadata | KV / RDB (etcd / PostgreSQL) | Hot/cold state, tape index, task state |
| Scheduling | Custom Scheduler (Rust) | Core archive and recall logic |
| Cache | Local SSD / HDD + LRU | Restored data cache |
| Tape Access | LTFS / Vendor SDK / SCSI | LTO‑9 / LTO‑10 integration |
| Notification | Webhook / MQ / Ticketing | Human collaboration |
- After archival, only metadata and placeholders remain online
- Regular GET on cold objects returns an S3‑like
InvalidObjectState - RestoreObject or extended APIs trigger recall scheduling
This can be implemented via:
- MinIO Lifecycle / ILM extensions, or
- Direct integration into RustFS object access paths
- bucket / object / version
storage_class: HOT / WARM / COLDarchive_id(archive bundle ID)tape_id/tape_set- checksum / size
restore_status/restore_expire_at
-
Aggregation unit for sequential tape writes
-
Minimum scheduling unit for tape I/O
-
Used to:
- Maximize throughput
- Merge restore requests
Write Path
- Object written to MinIO / RustFS
- Lifecycle policy marks it as
COLD_PENDING - Archive Scheduler aggregates objects
- Sequential tape write (≥ 300 MB/s)
- Metadata updated and online data released
Key Points
- Tape writes must be sequential
- Support dual‑write or EC strategies
- Scheduler must be aware of tape, drive, and library states
Restore Flow
-
User issues Restore request
-
Recall task enters queue
-
Scheduler merges requests by
archive_id -
Tape status check:
- Online: schedule read
- Offline: wait and notify
-
Sequential tape read
-
Data written to cache and state updated
Concurrency and Scaling
- One tape drive = one sequential read pipeline
- Adding drives increases restore throughput linearly
- Restored data enters cache before hot storage
- Supports LRU / LFU and TTL
- Cache hits directly satisfy GET requests
- Cache expiration triggers re‑restore if needed
- Tape states: ONLINE / OFFLINE / UNKNOWN
- Restore hitting OFFLINE tape triggers notification
- Notification includes tape_id / location / archive_id
- Human confirmation brings tape back ONLINE
- Multiple requests share a single load/read cycle
- Dual‑tape replication across libraries
- Tape‑level EC (scheduler‑assisted)
- Metadata records full replica topology
- Periodic tape readability verification
| Module | Reason |
|---|---|
| Archive / Recall Scheduler | Highly domain‑specific policies |
| Metadata Model | Strong consistency and complex state machines |
| Request Merging & Human Workflow | Not exposed by cloud vendor solutions |
- S3: MinIO / RustFS
- Metadata: etcd / PostgreSQL
- Notification: Existing ticketing / alert systems
- Tape drivers: Vendor SDKs
This system is positioned as:
A custom‑built HSM + tape archival scheduling system for object storage
S3 provides interface compatibility, tape delivers extreme cost efficiency, and the core technical value lies in scheduling, metadata, and workflow orchestration.
The project is not a single storage product, but a complete enterprise‑grade solution integrating S3 access, HSM scheduling, tape management, and human collaboration—designed for workloads with strict requirements on cost, reliability, and long‑term data retention.