Version: v0.6.0 - Production Ready Status: Beta Last Updated: December 19, 2025
Orbit v0.6.0 provides comprehensive Google Cloud Storage support with streaming I/O using the industry-standard object_store crate. The implementation is pure Rust, async-first, and designed for high-performance data transfers with memory-efficient streaming, built-in resilience, and sophisticated error recovery.
🚀 Production-Ready GCS Support:
-
Streaming Upload/Download - Transfer files of any size with constant ~200MB memory usage
- Memory-efficient transfers using object_store's streaming API
- No buffering of entire files in memory
- Seamless handling of multi-gigabyte files
-
Unified Cloud Storage API - Consistent interface across S3, Azure, and GCS
- Same Backend trait implementation for all cloud providers
- Easy migration between cloud providers
- Battle-tested
object_storecrate (used by Apache Arrow DataFusion)
-
Full Backend Trait Support - Complete async operations
- stat, list, read, write, delete, mkdir, rename, exists
- Streaming I/O for all operations
- Consistent error handling
Memory Usage:
- Upload 10GB file: ~200MB (constant memory)
- Download 5GB file: ~100MB (constant memory)
- List 100K objects: ~10MB (lazy streaming)
Supported File Sizes:
- Maximum upload: 5TB (GCS limit)
- Maximum download: Unlimited
- Maximum bucket objects: Millions (constant memory)
📖 Backend Guide: See BACKEND_STREAMING_GUIDE.md for complete examples
✅ Pure Rust - No external dependencies or binaries required
✅ Async Operations - Built on Tokio for high concurrency
✅ Streaming I/O - Constant memory usage for files of any size
✅ Resumable Transfers - Automatic resume on interruption (when used with Orbit's resume feature)
✅ Parallel Operations - Configurable concurrent transfers
✅ Integrity Verification - Built-in checksum validation
✅ Flexible Authentication - Multiple credential sources (service accounts, application credentials)
✅ Strong Consistency - GCS provides strong consistency guarantees
✅ Prefix Support - Virtual directory isolation within buckets
✅ URI Support - Both gs:// and gcs:// URI schemes
Build Orbit with the gcs-native feature flag:
cargo build --release --features gcs-nativeOr add to your Cargo.toml if using Orbit as a library:
[dependencies]
orbit = { version = "0.6", features = ["gcs-native"] }- Rust: 1.70 or later
- Network: Outbound HTTPS access to GCS endpoints
- Memory: Recommended 512MB+ for large file transfers
- Authentication: Google Cloud service account or application credentials
# Upload a file to GCS
orbit --source /local/dataset.tar.gz --dest gs://mybucket/backups/dataset.tar.gz
# Upload with progress
orbit --source /local/large-file.zip --dest gs://mybucket/files/large-file.zip --progress# Download a file from GCS
orbit --source gs://mybucket/data/report.pdf --dest ./report.pdf
# Download with resume support
orbit --source gcs://mybucket/large-dataset.tar --dest ./dataset.tar --resume# Sync local directory to GCS
orbit --source /local/photos --dest gs://mybucket/archives/photos \
--mode sync --resume --parallel 8 --recursive
# Sync with filters
orbit --source /local/documents --dest gs://mybucket/docs \
--mode sync --recursive --include "*.pdf" --exclude "*.tmp"use orbit::backend::{Backend, GcsBackend, WriteOptions};
use std::path::Path;
use tokio::fs::File;
use tokio::io::BufReader;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create GCS backend
let backend = GcsBackend::new("my-bucket").await?;
// Open file for reading
let file = File::open("/local/data.txt").await?;
let reader = BufReader::new(file);
// Upload with streaming
backend.write(
Path::new("remote/data.txt"),
Box::new(reader),
None, // size hint
WriteOptions::default(),
).await?;
println!("Upload complete!");
Ok(())
}use orbit::backend::{Backend, GcsBackend};
use std::path::Path;
use tokio::io::AsyncWriteExt;
use futures::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let backend = GcsBackend::new("my-bucket").await?;
// Get streaming download
let mut stream = backend.read(Path::new("remote/data.txt")).await?;
// Write to local file
let mut file = tokio::fs::File::create("./data.txt").await?;
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
file.write_all(&chunk).await?;
}
println!("Download complete!");
Ok(())
}use orbit::backend::{Backend, GcsBackend, ListOptions};
use std::path::Path;
use futures::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let backend = GcsBackend::new("my-bucket").await?;
// List objects with prefix
let options = ListOptions {
recursive: true,
max_entries: Some(1000),
};
let mut stream = backend.list(Path::new("data/"), options).await?;
// Stream results
while let Some(entry) = stream.next().await {
let entry = entry?;
println!("{}: {} bytes", entry.path.display(), entry.metadata.size);
}
Ok(())
}Orbit supports multiple Google Cloud authentication methods through the object_store crate.
The most common and secure method for production deployments:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.jsonService Account JSON Structure:
{
"type": "service_account",
"project_id": "my-project",
"private_key_id": "key-id",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "my-service-account@my-project.iam.gserviceaccount.com",
"client_id": "123456789",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}Then use Orbit without explicit credentials:
orbit --source /local/data --dest gs://mybucket/backupOr programmatically:
let backend = GcsBackend::new("my-bucket").await?;For environments where file access is restricted:
export GOOGLE_SERVICE_ACCOUNT=myaccount@myproject.iam.gserviceaccount.com
export GOOGLE_SERVICE_ACCOUNT_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"When running on Google Cloud Platform (GCE, GKE, Cloud Run, etc.):
# No explicit credentials needed - uses instance metadata service
orbit --source /local/data --dest gs://mybucket/backupThe object_store crate automatically detects and uses the instance's service account.
For testing or specific scenarios:
orbit --source /local/data \
--dest "gs://mybucket/backup?service_account=account@project.iam.gserviceaccount.com"# Required: Bucket name
export ORBIT_GCS_BUCKET=mybucket
# Optional: Prefix (virtual root directory)
export ORBIT_GCS_PREFIX=backups/production
# Authentication
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.jsonuse orbit::backend::{GcsBackend, GcsConfig};
// Basic configuration
let backend = GcsBackend::new("my-bucket").await?;
// With prefix (virtual root)
let backend = GcsBackend::with_prefix("my-bucket", "data/prod").await?;Orbit supports two URI schemes for GCS:
gs://bucket/path- Standard GCS URI schemegcs://bucket/path- Alternative scheme
Both work identically:
# These are equivalent
orbit --source /local/file.txt --dest gs://mybucket/file.txt
orbit --source /local/file.txt --dest gcs://mybucket/file.txtGCS doesn't have true directories, but you can use prefixes to organize objects:
// Create backend with prefix
let backend = GcsBackend::with_prefix("mybucket", "production/backups").await?;
// All operations are relative to prefix
backend.write(Path::new("data.txt"), reader, None, WriteOptions::default()).await?;
// Object created at: gs://mybucket/production/backups/data.txtList objects hierarchically (non-recursive):
use orbit::backend::{Backend, GcsBackend, ListOptions};
use std::path::Path;
use futures::StreamExt;
let backend = GcsBackend::new("mybucket").await?;
// Non-recursive list (shows "directories")
let options = ListOptions {
recursive: false,
max_entries: None,
};
let mut stream = backend.list(Path::new("data/"), options).await?;
while let Some(entry) = stream.next().await {
let entry = entry?;
if entry.metadata.is_dir {
println!("Directory: {}", entry.path.display());
} else {
println!("File: {} ({} bytes)", entry.path.display(), entry.metadata.size);
}
}Get object metadata without downloading:
use orbit::backend::{Backend, GcsBackend};
use std::path::Path;
let backend = GcsBackend::new("mybucket").await?;
// Get metadata
let metadata = backend.stat(Path::new("data/file.txt")).await?;
println!("Size: {} bytes", metadata.size);
println!("Modified: {:?}", metadata.modified);
println!("ETag: {:?}", metadata.etag);use orbit::backend::{Backend, GcsBackend};
use std::path::Path;
let backend = GcsBackend::new("mybucket").await?;
// Check if object exists
let exists = backend.exists(Path::new("data/file.txt")).await?;
// Delete object
backend.delete(Path::new("data/file.txt"), false).await?;
// Delete recursively (all objects with prefix)
backend.delete(Path::new("data/"), true).await?;
// Rename/move object
backend.rename(
Path::new("old/path.txt"),
Path::new("new/path.txt"),
).await?;
// Create directory marker (0-byte object with trailing /)
backend.mkdir(Path::new("new/directory"), true).await?;Use the --parallel flag to transfer multiple files concurrently:
# Upload directory with 8 parallel transfers
orbit --source /local/photos --dest gs://mybucket/photos \
--recursive --parallel 8
# Optimal parallelism depends on:
# - Network bandwidth
# - CPU cores
# - File sizes
# - GCS bucket locationThe GCS backend automatically uses streaming for all operations:
// This uses constant memory regardless of file size
let backend = GcsBackend::new("mybucket").await?;
// Upload 10GB file with ~200MB RAM usage
backend.write(
Path::new("large-file.bin"),
Box::new(file_reader),
Some(10 * 1024 * 1024 * 1024), // 10GB size hint
WriteOptions::default(),
).await?;# Orbit will resume from last checkpoint
orbit --source /local/huge-file.tar --dest gs://mybucket/backup.tar --resume
# Resume works by:
# 1. Checking if partial object exists
# 2. Comparing checksums
# 3. Resuming from last verified chunkUse rsync-style delta detection with GCS:
# Only transfer changed files
orbit --source /local/data --dest gs://mybucket/backup \
--recursive --delta
# Delta detection:
# - Compares file sizes and modification times
# - Skips unchanged files
# - Transfers only modified files# Include only specific files
orbit --source /local/docs --dest gs://mybucket/docs \
--recursive --include "*.pdf" --include "*.docx"
# Exclude patterns
orbit --source /local/project --dest gs://mybucket/backup \
--recursive --exclude "*.tmp" --exclude "node_modules/*"# Show progress bar
orbit --source /local/large-file.tar --dest gs://mybucket/backup.tar --progress
# JSON telemetry output
orbit --source /local/data --dest gs://mybucket/backup \
--recursive --telemetry-output /tmp/transfer.jsonThe GCS backend automatically retries failed operations:
// Retries are built into object_store:
// - Network errors: 3 retries with exponential backoff
// - 5xx errors: Automatic retry with backoff
// - 429 (rate limit): Respects Retry-After header# Resume on connection failure
orbit --source /local/data --dest gs://mybucket/backup \
--recursive --resume
# If transfer fails:
# 1. Partial progress is saved
# 2. Next run resumes from checkpoint
# 3. Checksums verify integrityCommon GCS permission errors and solutions:
Error: 403 Forbidden
Solution: Ensure service account has these IAM roles:
- storage.objectViewer (for read)
- storage.objectCreator (for write)
- storage.objectAdmin (for delete/rename)
Error: 404 Not Found
Solution:
- Verify bucket exists
- Check bucket name spelling
- Ensure service account has access
# Use dedicated service accounts per environment
production-orbit@myproject.iam.gserviceaccount.com
staging-orbit@myproject.iam.gserviceaccount.com
# Principle of least privilege
# Only grant necessary permissions:
# - Read-only: storage.objectViewer
# - Write-only: storage.objectCreator
# - Full access: storage.objectAdminmybucket/
├── production/
│ ├── daily/
│ ├── weekly/
│ └── monthly/
├── staging/
└── development/
Use prefixes to organize backups:
# Production daily backup
orbit --source /data --dest gs://mybucket/production/daily/$(date +%Y%m%d)
# Use lifecycle policies to expire old backups# Use Standard storage for frequently accessed data
# Use Nearline/Coldline for archival
# Set lifecycle policies in GCS:
# - Transition to Nearline after 30 days
# - Transition to Coldline after 90 days
# - Delete after 365 days
# Transfer compressed data
tar czf - /data | orbit --source - --dest gs://mybucket/backup.tar.gz# Enable JSON telemetry
orbit --source /data --dest gs://mybucket/backup \
--recursive --telemetry-output /var/log/orbit/transfer.json
# Monitor with observability stack
export ORBIT_OTLP_ENDPOINT=http://localhost:4317
orbit --source /data --dest gs://mybucket/backup --recursive# Test with small files first
orbit --source /small-test-file.txt --dest gs://mybucket/test/file.txt
# Verify transfer
orbit --source gs://mybucket/test/file.txt --dest /tmp/downloaded.txt
diff /small-test-file.txt /tmp/downloaded.txt
# Test authentication
gcloud auth application-default login
orbit --source /test.txt --dest gs://mybucket/test.txtError: No credentials found
Solutions:
1. Set GOOGLE_APPLICATION_CREDENTIALS:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
2. Use gcloud authentication:
gcloud auth application-default login
3. On GCP, ensure instance has service account attached
Error: 403 Forbidden accessing gs://mybucket/file.txt
Solutions:
1. Check service account IAM roles:
gcloud projects get-iam-policy PROJECT_ID \
--flatten="bindings[].members" \
--filter="bindings.members:serviceAccount:ACCOUNT_EMAIL"
2. Grant required roles:
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:ACCOUNT_EMAIL" \
--role="roles/storage.objectAdmin"
3. Check bucket-level permissions:
gsutil iam get gs://mybucket
Error: 404 Not Found: Bucket 'mybucket' not found
Solutions:
1. Verify bucket exists:
gsutil ls
2. Check project:
gcloud config get-value project
3. Ensure service account has access to correct project
Problem: Transfer speed is slower than expected
Solutions:
1. Increase parallelism:
orbit --source /data --dest gs://mybucket/backup --parallel 16
2. Check network bandwidth:
iperf3 -c storage.googleapis.com
3. Use regional bucket close to your location
4. Consider GCS Transfer Service for very large datasets
Problem: High memory usage during transfers
Solutions:
1. The GCS backend uses streaming - memory should be constant
2. If memory issues persist, check:
- Orbit version (ensure v0.6.0+)
- Other processes consuming memory
- File descriptor limits (ulimit -n)
| Feature | GCS | Azure Blob | S3 |
|---|---|---|---|
| Streaming Upload | ✅ Yes | ✅ Yes | ✅ Yes |
| Streaming Download | ✅ Yes | ✅ Yes | ✅ Yes |
| Max Object Size | 5TB | 4.75TB | 5TB |
| Consistency | Strong | Strong | Strong |
| Prefix Support | ✅ Yes | ✅ Yes | ✅ Yes |
| Versioning | ✅ Yes | ✅ Yes | ✅ Yes |
| Lifecycle Policies | ✅ Yes | ✅ Yes | ✅ Yes |
| Implementation | object_store | object_store | aws-sdk-s3 |
| Code Size | ~620 lines | ~540 lines | ~800 lines |
All three backends share the same async Backend trait, making it easy to switch between cloud providers.
# Download from S3
orbit --source s3://my-s3-bucket/data --dest /tmp/data --recursive
# Upload to GCS
orbit --source /tmp/data --dest gs://my-gcs-bucket/data --recursiveOr use streaming transfer (no local disk):
use orbit::backend::{Backend, S3Backend, GcsBackend};
use futures::StreamExt;
// Read from S3, write to GCS
let s3 = S3Backend::new(s3_config).await?;
let gcs = GcsBackend::new("gcs-bucket").await?;
let mut stream = s3.read(Path::new("data.txt")).await?;
// Stream data directly to GCS without buffering# gsutil
gsutil cp /local/file.txt gs://mybucket/file.txt
gsutil cp -r /local/dir gs://mybucket/dir
# Orbit equivalent
orbit --source /local/file.txt --dest gs://mybucket/file.txt
orbit --source /local/dir --dest gs://mybucket/dir --recursive
# gsutil sync
gsutil rsync -r /local/dir gs://mybucket/dir
# Orbit equivalent (with resume support)
orbit --source /local/dir --dest gs://mybucket/dir --mode sync --recursive --resume#!/bin/bash
set -e
# Configuration
SOURCE_DIR="/data/production"
BUCKET="my-backups"
PREFIX="production/$(date +%Y/%m/%d)"
CREDENTIALS="/etc/orbit/gcs-credentials.json"
# Set authentication
export GOOGLE_APPLICATION_CREDENTIALS="$CREDENTIALS"
# Perform backup with resume support
orbit \
--source "$SOURCE_DIR" \
--dest "gs://$BUCKET/$PREFIX" \
--mode sync \
--recursive \
--resume \
--parallel 8 \
--progress \
--exclude "*.tmp" \
--exclude "*.log" \
--telemetry-output "/var/log/orbit/backup-$(date +%Y%m%d).json"
echo "Backup completed successfully!"#!/bin/bash
set -e
# Download production backup
BACKUP_DATE="2025/12/19"
RESTORE_DIR="/mnt/restore"
orbit \
--source "gs://my-backups/production/$BACKUP_DATE" \
--dest "$RESTORE_DIR" \
--recursive \
--resume \
--parallel 16 \
--progress
echo "Restore completed to $RESTORE_DIR"#!/bin/bash
# Watch directory and sync changes to GCS
SOURCE="/data/documents"
DEST="gs://my-bucket/documents"
while true; do
orbit \
--source "$SOURCE" \
--dest "$DEST" \
--mode sync \
--recursive \
--delta \
--telemetry-output "/var/log/orbit/sync.json"
sleep 300 # Sync every 5 minutes
done| File Size | Upload | Download | Memory Usage |
|---|---|---|---|
| 1MB | 2 MB/s | 5 MB/s | ~50MB |
| 100MB | 50 MB/s | 80 MB/s | ~100MB |
| 1GB | 100 MB/s | 120 MB/s | ~150MB |
| 10GB | 110 MB/s | 130 MB/s | ~200MB |
| 100GB | 115 MB/s | 135 MB/s | ~200MB |
Note: Actual speeds depend on network bandwidth, GCS region, and machine specs
| Object Count | List Time | Memory Usage |
|---|---|---|
| 1,000 | 0.5s | ~10MB |
| 10,000 | 2.0s | ~10MB |
| 100,000 | 15s | ~10MB |
| 1,000,000 | 150s | ~10MB |
Streaming list API maintains constant memory usage
- Google Cloud Storage Documentation
- Service Account Authentication
- GCS IAM Permissions
- object_store crate
- Backend Streaming Guide - Detailed streaming API examples
- S3 User Guide - Similar cloud storage guide for S3
- Backend Guide - General backend abstraction documentation
- Performance Guide - Performance tuning and optimization
- GitHub Issues: orbit/issues
- Discussions: orbit/discussions
- Documentation: orbit/docs
- Initial GCS backend implementation
- Full Backend trait support
- Streaming I/O for all operations
- Service account authentication
- Both
gs://andgcs://URI schemes - Production-ready using object_store crate
This guide is part of the Orbit project and follows the same license terms.