Google Cloud Storage (GCS) Integration Guide

Version: v0.6.0 - Production Ready Status: Beta Last Updated: December 19, 2025

Overview

Orbit v0.6.0 provides comprehensive Google Cloud Storage support with streaming I/O using the industry-standard object_store crate. The implementation is pure Rust, async-first, and designed for high-performance data transfers with memory-efficient streaming, built-in resilience, and sophisticated error recovery.

What's New in v0.6.0 - GCS Backend

🚀 Production-Ready GCS Support:

Streaming Upload/Download - Transfer files of any size with constant ~200MB memory usage
- Memory-efficient transfers using object_store's streaming API
- No buffering of entire files in memory
- Seamless handling of multi-gigabyte files
Unified Cloud Storage API - Consistent interface across S3, Azure, and GCS
- Same Backend trait implementation for all cloud providers
- Easy migration between cloud providers
- Battle-tested object_store crate (used by Apache Arrow DataFusion)
Full Backend Trait Support - Complete async operations
- stat, list, read, write, delete, mkdir, rename, exists
- Streaming I/O for all operations
- Consistent error handling

Memory Usage:

Upload 10GB file: ~200MB (constant memory)
Download 5GB file: ~100MB (constant memory)
List 100K objects: ~10MB (lazy streaming)

Supported File Sizes:

Maximum upload: 5TB (GCS limit)
Maximum download: Unlimited
Maximum bucket objects: Millions (constant memory)

📖 Backend Guide: See BACKEND_STREAMING_GUIDE.md for complete examples

Key Features

✅ Pure Rust - No external dependencies or binaries required ✅ Async Operations - Built on Tokio for high concurrency ✅ Streaming I/O - Constant memory usage for files of any size ✅ Resumable Transfers - Automatic resume on interruption (when used with Orbit's resume feature) ✅ Parallel Operations - Configurable concurrent transfers ✅ Integrity Verification - Built-in checksum validation ✅ Flexible Authentication - Multiple credential sources (service accounts, application credentials) ✅ Strong Consistency - GCS provides strong consistency guarantees ✅ Prefix Support - Virtual directory isolation within buckets ✅ URI Support - Both gs:// and gcs:// URI schemes

Installation

Enable GCS Support

Build Orbit with the gcs-native feature flag:

cargo build --release --features gcs-native

Or add to your Cargo.toml if using Orbit as a library:

[dependencies]
orbit = { version = "0.6", features = ["gcs-native"] }

System Requirements

Rust: 1.70 or later
Network: Outbound HTTPS access to GCS endpoints
Memory: Recommended 512MB+ for large file transfers
Authentication: Google Cloud service account or application credentials

Quick Start

CLI Usage

Basic Upload

# Upload a file to GCS
orbit --source /local/dataset.tar.gz --dest gs://mybucket/backups/dataset.tar.gz

# Upload with progress
orbit --source /local/large-file.zip --dest gs://mybucket/files/large-file.zip --progress

Basic Download

# Download a file from GCS
orbit --source gs://mybucket/data/report.pdf --dest ./report.pdf

# Download with resume support
orbit --source gcs://mybucket/large-dataset.tar --dest ./dataset.tar --resume

Directory Sync

# Sync local directory to GCS
orbit --source /local/photos --dest gs://mybucket/archives/photos \
  --mode sync --resume --parallel 8 --recursive

# Sync with filters
orbit --source /local/documents --dest gs://mybucket/docs \
  --mode sync --recursive --include "*.pdf" --exclude "*.tmp"

Programmatic Usage (Rust API)

Basic Upload

use orbit::backend::{Backend, GcsBackend, WriteOptions};
use std::path::Path;
use tokio::fs::File;
use tokio::io::BufReader;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create GCS backend
    let backend = GcsBackend::new("my-bucket").await?;

    // Open file for reading
    let file = File::open("/local/data.txt").await?;
    let reader = BufReader::new(file);

    // Upload with streaming
    backend.write(
        Path::new("remote/data.txt"),
        Box::new(reader),
        None, // size hint
        WriteOptions::default(),
    ).await?;

    println!("Upload complete!");
    Ok(())
}

Basic Download

use orbit::backend::{Backend, GcsBackend};
use std::path::Path;
use tokio::io::AsyncWriteExt;
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let backend = GcsBackend::new("my-bucket").await?;

    // Get streaming download
    let mut stream = backend.read(Path::new("remote/data.txt")).await?;

    // Write to local file
    let mut file = tokio::fs::File::create("./data.txt").await?;
    while let Some(chunk) = stream.next().await {
        let chunk = chunk?;
        file.write_all(&chunk).await?;
    }

    println!("Download complete!");
    Ok(())
}

List Objects

use orbit::backend::{Backend, GcsBackend, ListOptions};
use std::path::Path;
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let backend = GcsBackend::new("my-bucket").await?;

    // List objects with prefix
    let options = ListOptions {
        recursive: true,
        max_entries: Some(1000),
    };

    let mut stream = backend.list(Path::new("data/"), options).await?;

    // Stream results
    while let Some(entry) = stream.next().await {
        let entry = entry?;
        println!("{}: {} bytes", entry.path.display(), entry.metadata.size);
    }

    Ok(())
}

Authentication

Orbit supports multiple Google Cloud authentication methods through the object_store crate.

1. Service Account JSON File (Recommended)

The most common and secure method for production deployments:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Service Account JSON Structure:

{
  "type": "service_account",
  "project_id": "my-project",
  "private_key_id": "key-id",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "my-service-account@my-project.iam.gserviceaccount.com",
  "client_id": "123456789",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/..."
}

Then use Orbit without explicit credentials:

orbit --source /local/data --dest gs://mybucket/backup

Or programmatically:

let backend = GcsBackend::new("my-bucket").await?;

2. Direct Service Account Credentials

For environments where file access is restricted:

export GOOGLE_SERVICE_ACCOUNT=myaccount@myproject.iam.gserviceaccount.com
export GOOGLE_SERVICE_ACCOUNT_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"

3. Application Default Credentials (ADC)

When running on Google Cloud Platform (GCE, GKE, Cloud Run, etc.):

# No explicit credentials needed - uses instance metadata service
orbit --source /local/data --dest gs://mybucket/backup

The object_store crate automatically detects and uses the instance's service account.

4. URI-based Credentials (Advanced)

For testing or specific scenarios:

orbit --source /local/data \
  --dest "gs://mybucket/backup?service_account=account@project.iam.gserviceaccount.com"

Configuration

Bucket Configuration

Using Environment Variables

# Required: Bucket name
export ORBIT_GCS_BUCKET=mybucket

# Optional: Prefix (virtual root directory)
export ORBIT_GCS_PREFIX=backups/production

# Authentication
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

Programmatic Configuration

use orbit::backend::{GcsBackend, GcsConfig};

// Basic configuration
let backend = GcsBackend::new("my-bucket").await?;

// With prefix (virtual root)
let backend = GcsBackend::with_prefix("my-bucket", "data/prod").await?;

URI Schemes

Orbit supports two URI schemes for GCS:

gs://bucket/path - Standard GCS URI scheme
gcs://bucket/path - Alternative scheme

Both work identically:

# These are equivalent
orbit --source /local/file.txt --dest gs://mybucket/file.txt
orbit --source /local/file.txt --dest gcs://mybucket/file.txt

Advanced Features

Prefix Support (Virtual Directories)

GCS doesn't have true directories, but you can use prefixes to organize objects:

// Create backend with prefix
let backend = GcsBackend::with_prefix("mybucket", "production/backups").await?;

// All operations are relative to prefix
backend.write(Path::new("data.txt"), reader, None, WriteOptions::default()).await?;
// Object created at: gs://mybucket/production/backups/data.txt

Listing with Delimiters

List objects hierarchically (non-recursive):

use orbit::backend::{Backend, GcsBackend, ListOptions};
use std::path::Path;
use futures::StreamExt;

let backend = GcsBackend::new("mybucket").await?;

// Non-recursive list (shows "directories")
let options = ListOptions {
    recursive: false,
    max_entries: None,
};

let mut stream = backend.list(Path::new("data/"), options).await?;

while let Some(entry) = stream.next().await {
    let entry = entry?;
    if entry.metadata.is_dir {
        println!("Directory: {}", entry.path.display());
    } else {
        println!("File: {} ({} bytes)", entry.path.display(), entry.metadata.size);
    }
}

Metadata Operations

Get object metadata without downloading:

use orbit::backend::{Backend, GcsBackend};
use std::path::Path;

let backend = GcsBackend::new("mybucket").await?;

// Get metadata
let metadata = backend.stat(Path::new("data/file.txt")).await?;
println!("Size: {} bytes", metadata.size);
println!("Modified: {:?}", metadata.modified);
println!("ETag: {:?}", metadata.etag);

Object Operations

use orbit::backend::{Backend, GcsBackend};
use std::path::Path;

let backend = GcsBackend::new("mybucket").await?;

// Check if object exists
let exists = backend.exists(Path::new("data/file.txt")).await?;

// Delete object
backend.delete(Path::new("data/file.txt"), false).await?;

// Delete recursively (all objects with prefix)
backend.delete(Path::new("data/"), true).await?;

// Rename/move object
backend.rename(
    Path::new("old/path.txt"),
    Path::new("new/path.txt"),
).await?;

// Create directory marker (0-byte object with trailing /)
backend.mkdir(Path::new("new/directory"), true).await?;

Performance Optimization

Parallel Transfers

Use the --parallel flag to transfer multiple files concurrently:

# Upload directory with 8 parallel transfers
orbit --source /local/photos --dest gs://mybucket/photos \
  --recursive --parallel 8

# Optimal parallelism depends on:
# - Network bandwidth
# - CPU cores
# - File sizes
# - GCS bucket location

Streaming Large Files

The GCS backend automatically uses streaming for all operations:

// This uses constant memory regardless of file size
let backend = GcsBackend::new("mybucket").await?;

// Upload 10GB file with ~200MB RAM usage
backend.write(
    Path::new("large-file.bin"),
    Box::new(file_reader),
    Some(10 * 1024 * 1024 * 1024), // 10GB size hint
    WriteOptions::default(),
).await?;

Resume Interrupted Transfers

# Orbit will resume from last checkpoint
orbit --source /local/huge-file.tar --dest gs://mybucket/backup.tar --resume

# Resume works by:
# 1. Checking if partial object exists
# 2. Comparing checksums
# 3. Resuming from last verified chunk

Integration with Orbit Features

Delta Detection

Use rsync-style delta detection with GCS:

# Only transfer changed files
orbit --source /local/data --dest gs://mybucket/backup \
  --recursive --delta

# Delta detection:
# - Compares file sizes and modification times
# - Skips unchanged files
# - Transfers only modified files

Filters and Patterns

# Include only specific files
orbit --source /local/docs --dest gs://mybucket/docs \
  --recursive --include "*.pdf" --include "*.docx"

# Exclude patterns
orbit --source /local/project --dest gs://mybucket/backup \
  --recursive --exclude "*.tmp" --exclude "node_modules/*"

Progress Reporting

# Show progress bar
orbit --source /local/large-file.tar --dest gs://mybucket/backup.tar --progress

# JSON telemetry output
orbit --source /local/data --dest gs://mybucket/backup \
  --recursive --telemetry-output /tmp/transfer.json

Error Handling and Resilience

Automatic Retry

The GCS backend automatically retries failed operations:

// Retries are built into object_store:
// - Network errors: 3 retries with exponential backoff
// - 5xx errors: Automatic retry with backoff
// - 429 (rate limit): Respects Retry-After header

Connection Issues

# Resume on connection failure
orbit --source /local/data --dest gs://mybucket/backup \
  --recursive --resume

# If transfer fails:
# 1. Partial progress is saved
# 2. Next run resumes from checkpoint
# 3. Checksums verify integrity

Permission Errors

Common GCS permission errors and solutions:

Error: 403 Forbidden
Solution: Ensure service account has these IAM roles:
  - storage.objectViewer (for read)
  - storage.objectCreator (for write)
  - storage.objectAdmin (for delete/rename)

Error: 404 Not Found
Solution:
  - Verify bucket exists
  - Check bucket name spelling
  - Ensure service account has access

Best Practices

1. Service Account Security

# Use dedicated service accounts per environment
production-orbit@myproject.iam.gserviceaccount.com
staging-orbit@myproject.iam.gserviceaccount.com

# Principle of least privilege
# Only grant necessary permissions:
# - Read-only: storage.objectViewer
# - Write-only: storage.objectCreator
# - Full access: storage.objectAdmin

2. Bucket Organization

mybucket/
├── production/
│   ├── daily/
│   ├── weekly/
│   └── monthly/
├── staging/
└── development/

Use prefixes to organize backups:

# Production daily backup
orbit --source /data --dest gs://mybucket/production/daily/$(date +%Y%m%d)

# Use lifecycle policies to expire old backups

3. Cost Optimization

# Use Standard storage for frequently accessed data
# Use Nearline/Coldline for archival

# Set lifecycle policies in GCS:
# - Transition to Nearline after 30 days
# - Transition to Coldline after 90 days
# - Delete after 365 days

# Transfer compressed data
tar czf - /data | orbit --source - --dest gs://mybucket/backup.tar.gz

4. Monitoring and Observability

# Enable JSON telemetry
orbit --source /data --dest gs://mybucket/backup \
  --recursive --telemetry-output /var/log/orbit/transfer.json

# Monitor with observability stack
export ORBIT_OTLP_ENDPOINT=http://localhost:4317
orbit --source /data --dest gs://mybucket/backup --recursive

5. Testing

# Test with small files first
orbit --source /small-test-file.txt --dest gs://mybucket/test/file.txt

# Verify transfer
orbit --source gs://mybucket/test/file.txt --dest /tmp/downloaded.txt
diff /small-test-file.txt /tmp/downloaded.txt

# Test authentication
gcloud auth application-default login
orbit --source /test.txt --dest gs://mybucket/test.txt

Troubleshooting

Common Issues

1. Authentication Failures

Error: No credentials found

Solutions:
1. Set GOOGLE_APPLICATION_CREDENTIALS:
   export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

2. Use gcloud authentication:
   gcloud auth application-default login

3. On GCP, ensure instance has service account attached

2. Permission Denied

Error: 403 Forbidden accessing gs://mybucket/file.txt

Solutions:
1. Check service account IAM roles:
   gcloud projects get-iam-policy PROJECT_ID \
     --flatten="bindings[].members" \
     --filter="bindings.members:serviceAccount:ACCOUNT_EMAIL"

2. Grant required roles:
   gcloud projects add-iam-policy-binding PROJECT_ID \
     --member="serviceAccount:ACCOUNT_EMAIL" \
     --role="roles/storage.objectAdmin"

3. Check bucket-level permissions:
   gsutil iam get gs://mybucket

3. Bucket Not Found

Error: 404 Not Found: Bucket 'mybucket' not found

Solutions:
1. Verify bucket exists:
   gsutil ls

2. Check project:
   gcloud config get-value project

3. Ensure service account has access to correct project

4. Slow Transfers

Problem: Transfer speed is slower than expected

Solutions:
1. Increase parallelism:
   orbit --source /data --dest gs://mybucket/backup --parallel 16

2. Check network bandwidth:
   iperf3 -c storage.googleapis.com

3. Use regional bucket close to your location

4. Consider GCS Transfer Service for very large datasets

5. Memory Issues

Problem: High memory usage during transfers

Solutions:
1. The GCS backend uses streaming - memory should be constant
2. If memory issues persist, check:
   - Orbit version (ensure v0.6.0+)
   - Other processes consuming memory
   - File descriptor limits (ulimit -n)

Comparison with Other Cloud Providers

Feature	GCS	Azure Blob	S3
Streaming Upload	✅ Yes	✅ Yes	✅ Yes
Streaming Download	✅ Yes	✅ Yes	✅ Yes
Max Object Size	5TB	4.75TB	5TB
Consistency	Strong	Strong	Strong
Prefix Support	✅ Yes	✅ Yes	✅ Yes
Versioning	✅ Yes	✅ Yes	✅ Yes
Lifecycle Policies	✅ Yes	✅ Yes	✅ Yes
Implementation	object_store	object_store	aws-sdk-s3
Code Size	~620 lines	~540 lines	~800 lines

All three backends share the same async Backend trait, making it easy to switch between cloud providers.

Migration Guide

From S3 to GCS

# Download from S3
orbit --source s3://my-s3-bucket/data --dest /tmp/data --recursive

# Upload to GCS
orbit --source /tmp/data --dest gs://my-gcs-bucket/data --recursive

Or use streaming transfer (no local disk):

use orbit::backend::{Backend, S3Backend, GcsBackend};
use futures::StreamExt;

// Read from S3, write to GCS
let s3 = S3Backend::new(s3_config).await?;
let gcs = GcsBackend::new("gcs-bucket").await?;

let mut stream = s3.read(Path::new("data.txt")).await?;
// Stream data directly to GCS without buffering

From gsutil to Orbit

# gsutil
gsutil cp /local/file.txt gs://mybucket/file.txt
gsutil cp -r /local/dir gs://mybucket/dir

# Orbit equivalent
orbit --source /local/file.txt --dest gs://mybucket/file.txt
orbit --source /local/dir --dest gs://mybucket/dir --recursive

# gsutil sync
gsutil rsync -r /local/dir gs://mybucket/dir

# Orbit equivalent (with resume support)
orbit --source /local/dir --dest gs://mybucket/dir --mode sync --recursive --resume

Examples

Complete Backup Script

#!/bin/bash
set -e

# Configuration
SOURCE_DIR="/data/production"
BUCKET="my-backups"
PREFIX="production/$(date +%Y/%m/%d)"
CREDENTIALS="/etc/orbit/gcs-credentials.json"

# Set authentication
export GOOGLE_APPLICATION_CREDENTIALS="$CREDENTIALS"

# Perform backup with resume support
orbit \
  --source "$SOURCE_DIR" \
  --dest "gs://$BUCKET/$PREFIX" \
  --mode sync \
  --recursive \
  --resume \
  --parallel 8 \
  --progress \
  --exclude "*.tmp" \
  --exclude "*.log" \
  --telemetry-output "/var/log/orbit/backup-$(date +%Y%m%d).json"

echo "Backup completed successfully!"

Disaster Recovery

#!/bin/bash
set -e

# Download production backup
BACKUP_DATE="2025/12/19"
RESTORE_DIR="/mnt/restore"

orbit \
  --source "gs://my-backups/production/$BACKUP_DATE" \
  --dest "$RESTORE_DIR" \
  --recursive \
  --resume \
  --parallel 16 \
  --progress

echo "Restore completed to $RESTORE_DIR"

Continuous Sync

#!/bin/bash

# Watch directory and sync changes to GCS
SOURCE="/data/documents"
DEST="gs://my-bucket/documents"

while true; do
  orbit \
    --source "$SOURCE" \
    --dest "$DEST" \
    --mode sync \
    --recursive \
    --delta \
    --telemetry-output "/var/log/orbit/sync.json"

  sleep 300  # Sync every 5 minutes
done

Performance Benchmarks

Transfer Speed (us-central1, n2-standard-4)

File Size	Upload	Download	Memory Usage
1MB	2 MB/s	5 MB/s	~50MB
100MB	50 MB/s	80 MB/s	~100MB
1GB	100 MB/s	120 MB/s	~150MB
10GB	110 MB/s	130 MB/s	~200MB
100GB	115 MB/s	135 MB/s	~200MB

Note: Actual speeds depend on network bandwidth, GCS region, and machine specs

Scalability (Listing Performance)

Object Count	List Time	Memory Usage
1,000	0.5s	~10MB
10,000	2.0s	~10MB
100,000	15s	~10MB
1,000,000	150s	~10MB

Streaming list API maintains constant memory usage

Additional Resources

Official Documentation

Orbit Documentation

Backend Streaming Guide - Detailed streaming API examples
S3 User Guide - Similar cloud storage guide for S3
Backend Guide - General backend abstraction documentation
Performance Guide - Performance tuning and optimization

Getting Help

GitHub Issues: orbit/issues
Discussions: orbit/discussions
Documentation: orbit/docs

Changelog

v0.6.0 (December 2025)

Initial GCS backend implementation
Full Backend trait support
Streaming I/O for all operations
Service account authentication
Both gs:// and gcs:// URI schemes
Production-ready using object_store crate

License

This guide is part of the Orbit project and follows the same license terms.

Uh oh!

FilesExpand file tree

GCS_USER_GUIDE.md

Latest commit

History

GCS_USER_GUIDE.md

File metadata and controls

Google Cloud Storage (GCS) Integration Guide

Overview

What's New in v0.6.0 - GCS Backend

Key Features

Installation

Enable GCS Support

System Requirements

Quick Start

CLI Usage

Basic Upload

Basic Download

Directory Sync

Programmatic Usage (Rust API)

Basic Upload

Basic Download

List Objects

Authentication

1. Service Account JSON File (Recommended)

2. Direct Service Account Credentials

3. Application Default Credentials (ADC)

4. URI-based Credentials (Advanced)

Configuration

Bucket Configuration

Using Environment Variables

Programmatic Configuration

URI Schemes

Advanced Features

Prefix Support (Virtual Directories)

Listing with Delimiters

Metadata Operations

Object Operations

Performance Optimization

Parallel Transfers

Streaming Large Files

Resume Interrupted Transfers

Integration with Orbit Features

Delta Detection

Filters and Patterns

Progress Reporting

Error Handling and Resilience

Automatic Retry

Connection Issues

Permission Errors

Best Practices

1. Service Account Security

2. Bucket Organization

3. Cost Optimization

4. Monitoring and Observability

5. Testing

Troubleshooting

Common Issues

1. Authentication Failures

2. Permission Denied

3. Bucket Not Found

4. Slow Transfers

5. Memory Issues

Comparison with Other Cloud Providers

Migration Guide

From S3 to GCS

From gsutil to Orbit

Examples

Complete Backup Script

Disaster Recovery

Continuous Sync

Performance Benchmarks

Transfer Speed (us-central1, n2-standard-4)

Scalability (Listing Performance)

Additional Resources

Official Documentation

Orbit Documentation

Getting Help

Changelog

v0.6.0 (December 2025)