AWS S3 Cross-Account Transfer Pipeline

A Nextflow pipeline designed to efficiently and reliably transfer massive datasets (e.g., 35TB+) between AWS S3 buckets across different AWS accounts.

By leveraging Nextflow, this pipeline parallelizes AWS CLI s3 sync operations, automatically handles retries for transient network/API failures, and allows you to cleanly resume interrupted transfers without starting from scratch.

📋 Prerequisites

Nextflow: (version 22.0 or later).
AWS CLI: Ensure the AWS Command Line Interface is installed (aws --version).
AWS Credentials: You must be authenticated locally (e.g., aws configure or via SSO) with an IAM entity that has appropriate permissions.

⚠️ CRITICAL: Cross-Account Permissions

To successfully copy data from Account A (Source) to Account B (Destination), your active AWS credentials must have:

Read access (s3:GetObject, s3:ListBucket) to the source buckets.
Write access (s3:PutObject) to the destination buckets.

Important Note on Object Ownership: This pipeline automatically applies the --acl bucket-owner-full-control flag. Without this flag, files transferred to Account B would still be "owned" by Account A, making them unreadable to Account B. Ensure the Destination bucket has ACLs enabled or bucket policies allowing s3:PutObjectAcl.

⚙️ Configuration

1. The Input File (`buckets.csv`)

Define your transfers in a file named buckets.csv in the root directory. Format the file as source,destination with no spaces and no headers.

s3://source-account-bucket-1/,s3://destination-account-bucket-1/
s3://source-bucket/batch-1/,s3://dest-bucket/batch-1/
s3://source-bucket/batch-2/,s3://dest-bucket/batch-2/

2. The batch job exmaple

aws batch submit-job \
   --job-name nf-transfer \
   --job-queue priority-maf-pipelines  \
   --job-definition nextflow-production \
   --container-overrides command="FischbachLab/nf-transfer, \ 
   "--buckets_list", "s3://genomics-workflow-core/Results/transfer/buckets.csv" "

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
data		data
README.md		README.md
buckets.csv		buckets.csv
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS S3 Cross-Account Transfer Pipeline

📋 Prerequisites

⚠️ CRITICAL: Cross-Account Permissions

⚙️ Configuration

1. The Input File (`buckets.csv`)

2. The batch job exmaple

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AWS S3 Cross-Account Transfer Pipeline

📋 Prerequisites

⚠️ CRITICAL: Cross-Account Permissions

⚙️ Configuration

1. The Input File (buckets.csv)

2. The batch job exmaple

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. The Input File (`buckets.csv`)

Packages