Skip to content

FischbachLab/nf-transfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS S3 Cross-Account Transfer Pipeline

A Nextflow pipeline designed to efficiently and reliably transfer massive datasets (e.g., 35TB+) between AWS S3 buckets across different AWS accounts.

By leveraging Nextflow, this pipeline parallelizes AWS CLI s3 sync operations, automatically handles retries for transient network/API failures, and allows you to cleanly resume interrupted transfers without starting from scratch.

📋 Prerequisites

  1. Nextflow: (version 22.0 or later).
  2. AWS CLI: Ensure the AWS Command Line Interface is installed (aws --version).
  3. AWS Credentials: You must be authenticated locally (e.g., aws configure or via SSO) with an IAM entity that has appropriate permissions.

⚠️ CRITICAL: Cross-Account Permissions

To successfully copy data from Account A (Source) to Account B (Destination), your active AWS credentials must have:

  • Read access (s3:GetObject, s3:ListBucket) to the source buckets.
  • Write access (s3:PutObject) to the destination buckets.

Important Note on Object Ownership: This pipeline automatically applies the --acl bucket-owner-full-control flag. Without this flag, files transferred to Account B would still be "owned" by Account A, making them unreadable to Account B. Ensure the Destination bucket has ACLs enabled or bucket policies allowing s3:PutObjectAcl.

⚙️ Configuration

1. The Input File (buckets.csv)

Define your transfers in a file named buckets.csv in the root directory. Format the file as source,destination with no spaces and no headers.

s3://source-account-bucket-1/,s3://destination-account-bucket-1/
s3://source-bucket/batch-1/,s3://dest-bucket/batch-1/
s3://source-bucket/batch-2/,s3://dest-bucket/batch-2/

2. The batch job exmaple

aws batch submit-job \
   --job-name nf-transfer \
   --job-queue priority-maf-pipelines  \
   --job-definition nextflow-production \
   --container-overrides command="FischbachLab/nf-transfer, \ 
   "--buckets_list", "s3://genomics-workflow-core/Results/transfer/buckets.csv" "

About

AWS S3 Cross-Account Transfer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors