This repository contains a single AWS CloudFormation template that deploys a distributed, scalable architecture for migrating data from any S3-compatible storage provider to Amazon S3 using rclone. It is designed to handle petabyte-scale migrations from providers such as IBM Cloud Object Storage, Google Cloud Storage, and Azure Blob Storage.
Single-instance migration approaches often suffer from lack of visibility, frequent failures, source-side throttling, and performance saturation. This solution addresses those challenges with a fan-out architecture that distributes work across multiple parallel workers, provides granular progress tracking, and automatically retries failed transfers.
This sample was published alongside an AWS Storage Blog post.
Disclaimer: This is sample code intended for educational and demonstration purposes. Review and test thoroughly before using in a production environment.
Figure 1: Distributed cross-cloud migration architecture showing the three layers — Discovery (ECS Fargate lister), Queueing (Amazon SQS), and Execution (EC2 Auto Scaling workers with rclone).
The CloudFormation template deploys the following components:
| Component | Resources | Purpose |
|---|---|---|
| Networking | VPC, 3 public subnets (3 AZs), Internet Gateway, route tables, security group | Outbound internet access for cross-cloud transfers |
| Message queue | SQS main queue + dead-letter queue (both SSE-encrypted) | Fan-out work distribution between lister and workers |
| Credentials | AWS Secrets Manager (3 secrets) | Secure storage for source endpoint credentials |
| Logging | Amazon CloudWatch Log Groups (encrypted with AWS KMS, 7-day retention) | Centralized logging for lister and worker components |
| IAM | 3 least-privilege roles (ECS execution, ECS task, EC2 worker) | Scoped permissions for each component |
| Lister | ECS Fargate task (python:3.13-slim) | Enumerates source objects, batches 20 keys per SQS message |
| Workers | Amazon EC2 Auto Scaling group (r5n.xlarge, 0–5 instances, 6 rclone processes each) | Copies objects from source to Amazon S3 using rclone |
- You run an
aws ecs run-taskcommand (provided as a stack output) specifying source and destination buckets. - The Lister Fargate task connects to the source endpoint using credentials from AWS Secrets Manager, lists all objects, and sends batches of 20 keys as messages to the SQS queue.
- EC2 workers poll the queue and run
rclone copytofor each object key. On success, the message is deleted. On failure, the message visibility is reset for immediate retry. - After 2 failed attempts, messages move to the dead-letter queue for investigation.
- The Amazon EC2 Auto Scaling group adjusts worker count based on queue depth using a target tracking scaling policy.
- Workers protect themselves from scale-in termination while actively processing a message.
This architecture was tested with a 2.7 PB dataset migrated from IBM Cloud Object Storage, achieving 20–80 Gbps aggregate throughput. The migration completed in approximately 2 weeks at roughly $2,000 in compute costs. Results may vary based on file sizes, network conditions, and source provider throttling limits.
- An AWS account with permissions to create CloudFormation stacks, IAM roles, VPCs, ECS clusters, EC2 instances, SQS queues, and Secrets Manager secrets
- AWS CLI v2 installed and configured
- HMAC credentials (access key and secret key) for your S3-compatible source storage provider
- A destination Amazon S3 bucket configured with required security controls (must already exist — see SECURITY.md for bucket security prerequisites including Block Public Access, encryption at rest, and TLS enforcement)
aws cloudformation deploy \
--template-file cross-cloud-s3-migration.yaml \
--stack-name cross-cloud-migration \
--capabilities CAPABILITY_IAM \
--region <your-region>Replace the placeholder values with your actual source storage credentials:
aws secretsmanager put-secret-value \
--secret-id /migration/source_endpoint \
--secret-string "https://s3.us-south.cloud-object-storage.appdomain.cloud" \
--region <your-region>
aws secretsmanager put-secret-value \
--secret-id /migration/source_access_key \
--secret-string "<your-access-key>" \
--region <your-region>
aws secretsmanager put-secret-value \
--secret-id /migration/source_secret_key \
--secret-string "<your-secret-key>" \
--region <your-region>The stack outputs a ready-to-use CLI command for starting migration jobs:
aws cloudformation describe-stacks \
--stack-name cross-cloud-migration \
--query 'Stacks[0].Outputs[?OutputKey==`RunTaskCommand`].OutputValue' \
--output text \
--region <your-region>Replace YOUR-SOURCE-BUCKET and YOUR-DEST-BUCKET in the output command with your actual bucket names and run it. You can optionally set a PREFIX value to migrate only objects matching a specific key prefix.
aws ecs run-task \
--cluster <cluster-from-output> \
--task-definition <task-def-from-output> \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[<subnet-from-output>],securityGroups=[<sg-from-output>],assignPublicIp=ENABLED}" \
--overrides '{"containerOverrides": [{"name": "lister", "environment": [{"name": "SOURCE_BUCKET", "value": "my-source-bucket"}, {"name": "DEST_BUCKET", "value": "my-dest-bucket"}, {"name": "QUEUE_URL", "value": "<queue-url-from-output>"}, {"name": "PREFIX", "value": ""}]}]}' \
--region <your-region>| What to monitor | Where to find it |
|---|---|
| Lister progress | CloudWatch Logs → /migration/lister |
| Worker activity | CloudWatch Logs → /migration/workers |
| Queue depth | SQS console → ApproximateNumberOfMessagesVisible |
| Failed messages | SQS console → dead-letter queue |
| Worker scaling | EC2 Auto Scaling console → activity history |
The lister logs each S3 listing page, every SQS message sent, and a final summary with total keys and messages. Workers log each object copy operation with progress indicators.
The primary cost drivers for this solution are:
- EC2 instances: r5n.xlarge instances ($0.298/hr in us-east-1). The Amazon EC2 Auto Scaling group scales from 0 to 5 instances based on queue depth, so costs scale with workload.
- Data transfer in: AWS does not charge for inbound data transfer from the internet. Data flowing from the source provider to EC2 workers via the Internet Gateway incurs no AWS charges.
- Data transfer to S3: Transfers from EC2 to S3 within the same Region are free.
- ECS Fargate: The lister task runs briefly (minutes) and costs are minimal.
- SQS, Secrets Manager, CloudWatch: Costs are negligible for typical migration workloads.
This solution deliberately avoids NAT Gateways. NAT Gateways charge $0.045/GB for data processing, which would add significant cost for large-scale migrations. Instead, workers use public subnets with direct Internet Gateway access (no per-GB charge).
Delete the CloudFormation stack to remove all resources:
aws cloudformation delete-stack \
--stack-name cross-cloud-migration \
--region <your-region>This removes all resources created by the template. Your source and destination buckets are not affected.
- IAM least privilege: Each component (ECS task, EC2 worker) has a dedicated IAM role scoped to the minimum required permissions.
- Encryption at rest: SQS queues and CloudWatch Log Groups encrypted with customer-managed AWS KMS keys (automatic annual key material rotation enabled). Source credentials encrypted in AWS Secrets Manager.
- Encryption in transit: rclone uses HTTPS by default for all S3-compatible endpoints.
- No inbound traffic: The security group allows all outbound traffic but no inbound traffic.
- Secrets management: Source credentials stored in AWS Secrets Manager, encrypted at rest by default.
- IMDSv2: Enforced on EC2 instances via Launch Template MetadataOptions (HttpTokens: required).
- Input validation: Worker validates SQS message structure and uses regex allowlists for bucket names and object keys.
- Network isolation: The VPC is purpose-built for migration workloads with no shared resources.
For detailed security documentation, see:
- SECURITY.md — Shared responsibility model, data classification, risk assessment, AWS KMS key management, and access logging guidance
- DESIGN.md — Architecture design decisions and security trade-offs
See CONTRIBUTING for information on how to contribute to this project.
This library is licensed under the MIT-0 License. See the LICENSE file.
