NRPCOPY

This is a collection of scripts for copying files from the a t2 cluster (ie uaf-4.t2.ucsd.edu) to a PVC on the NRP Nautilus cluster.

The script runs on UAF/T2. It uses krsync — a thin wrapper that tunnels rsync over kubectl exec — to stream files directly into a long-lived pod that has your PVC on the namespace mounted. Files are split into batches and run in parallel background processes. This was designed for the axol1tl namespace, UAF, and the traindatavol pvc but can be generalized. I hope you find it useful! :)

Setup

All of these need to be set up on UAF before things will work

NRP, kubectl and kubelogin setup This comes from the NRP getting started guide.

ssh into your T2 cluster (of course you need access first)

#can do uaf-1,2,3 or 4
ssh username@uaf-2.t2.ucsd.edu

log into nrp.ai in your browser. You need to be in the namespace where the files are being copied.
install kubectl

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
echo "$(cat kubectl.sha256)  kubectl" | sha256sum --check
chmod +x kubectl
mkdir -p ~/.local/bin
mv ./kubectl ~/.local/bin/kubectl

get the path variables set properly and verify it works:

export PATH="$HOME/.local/bin:$PATH"
source ~/.bashrc
which kubectl
kubectl version --client

install krew +kubelogin (this takes a while)

(
  set -x; cd "$(mktemp -d)" &&
  OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
  ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/arm.*$/arm/')" &&
  KREW="krew-${OS}_${ARCH}" &&
  curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
  tar zxvf "${KREW}.tar.gz" &&
  ./"${KREW}" install krew
)

set path, install and verify

export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
source ~/.bashrc
kubectl krew install oidc-login
kubectl oidc-login --version

get nrp config file and copy it to the cluster

on the cluster:

mkdir ~/.kube

on your local machine: download-> https://nrp.ai/config

scp ~/Downloads/config-2 mequinna@uaf-4.t2.ucsd.edu:~/.kube/config

log into nrp from t2 cluster (can use namespace of choice, example here is axol1tl)

kubectl get nodes
kubectl get pods -n axol1tl

#if you want a default namespace
kubectl config set contexts.nautilus.namespace axol1tl

add to bashrc

do this so you dont have to src the paths again in another session

nano ~/.bashrc
#add these to the bottom:
export PATH="$HOME/.local/bin:$PATH"
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
source ~/.bashrc

Script setup

after doing above, ssh into t2 cluster and clone the git repo

bash
git clone https://github.com/quinnanm/nrpcopy.git
cd nrpcopy

you should have a few scripts of note: kube_copy.py, krsync and ymls/copy-pod.yml

set up the pod for copying

kubectl apply -f ymls/copy-pod.yml
#check its running:
kubectl get pods -n axol1tl

copy-pod needs to be "Running" for any of this to work.

give krsync permissions

this is needed for it to work. if you get permission issues this is a likely culprit

chmod +x krsync
ls -la krsync
# should show -rwxr-xr-x

prepare for liftoff

you need an input directory on the T2 of files you want to copy, a pvc on NRP to copy to, an output directory on that pvc, a namespace, and your running copy-pod

Basic usage

python kube_copy.py \
  --input-dirs /indir/name \
  --output-path /outdir/name \
  --namespace nameofnamespace \
  --pvc pvc name

This will find all .root files under the input directory, split them into batches of 100, run up to 4 batches in parallel, block until everything is done, and print a summary.

Always do a dry run first to verify the file list and destination paths before copying anything:

python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/QCD \
  --output-path /data/ntuples \
  --namespace axol1tl \
  --pvc mequinna-pvc \
  --dry-run

By default the script copies all .root files. To copy a different file type:

python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/MyData \
  --output-path /data/ADsamples/MyData \
  --namespace axol1tl \
  --pvc traindatavol \
  --filetype '*.h5'

Here is what I did in my working example: flat means there are no nested dirs like the input dirs, just the target output dirs.

#try it
python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/VBFHto2B_25 \
  --output-path /data/ADsamples/VBFHto2B_25 \
  --namespace axol1tl \
  --pvc traindatavol \
  --copy-pod copy-pod \
  --files-per-job 50 \
  --max-parallel 4 \
  --flat \
  --dry-run

#submit for real
python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/VBFHto2B_25 \
  --output-path /data/ADsamples/VBFHto2B_25 \
  --namespace axol1tl \
  --pvc traindatavol \
  --copy-pod copy-pod \
  --files-per-job 50 \
  --max-parallel 4 \
  --flat

#resubmit failed jobs
python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/VBFHto2B_25 \
  --output-path /data/ADsamples/VBFHto2B_25 \
  --namespace axol1tl \
  --pvc traindatavol \
  --copy-pod copy-pod \
  --files-per-job 50 \
  --max-parallel 4 \
  --flat \
  --skip-existing

#or to avoid printouts/holding the command line hostage
python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/VBFHto2B_25 \
  --output-path /data/ADsamples/VBFHto2B_25 \
  --namespace axol1tl \
  --pvc traindatavol \
  --copy-pod copy-pod \
  --files-per-job 50 \
  --max-parallel 4 \
  --flat \
  --no-wait

log files are printed in copy_logs/.

then check the status on nrp:

kubectl exec -it copy-pod -n axol1tl -- bash
ls /data/ADsamples/VBFHto2B_25/
ls /data/ADsamples/VBFHto2B_25/ | wc -l
exit

dont forget to delete the pod when done!

kubectl delete pod copy-pod -n axol1tl

Common recipes

Copy multiple sample directories with prefixes, flat output:

python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/QCD \
               /ceph/cms/store/user/mequinna/ntuples/TTbar \
               /ceph/cms/store/user/mequinna/ntuples/WJets \
  --prefix QCD TTbar WJets \
  --flat \
  --output-path /data/ntuples \
  --namespace axol1tl \
  --pvc mequinna-pvc

With --prefix, each file is renamed PREFIX_originalname.root. With --flat, all files land in one directory regardless of the subdirectory structure on UAF. Without --flat, the subdirectory structure is preserved under --output-path.

First-time run — auto-create the pod:

python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/QCD \
  --output-path /data/ntuples \
  --namespace axol1tl \
  --pvc mequinna-pvc \
  --create-pod

Fire and forget — return immediately, check later:

python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/QCD \
  --output-path /data/ntuples \
  --namespace axol1tl \
  --pvc mequinna-pvc \
  --no-wait

The script launches batches in the background and exits. The background processes survive SSH disconnection. The script prints the exact --summarize command to run when you come back to check results.

Resume after interruption — skip already-copied files:

python kube_copy.py \
  --input-dirs /ceph/cms/store/user/mequinna/ntuples/QCD \
  --output-path /data/ntuples \
  --namespace axol1tl \
  --pvc mequinna-pvc \
  --skip-existing

Checking results

While batches are running (tail all batch logs):

tail -f copy_logs/batch_0520-142301_*.log

Check which batches are still going:

grep -l "BATCH_DONE" copy_logs/batch_0520-142301_*.log   # finished
ps aux | grep batch_                                       # still running

Get the full summary (works mid-run too — shows which batches aren't done yet):

python kube_copy.py \
  --summarize copy_logs/batch_0520-142301_*.log \
  --output-path /data/ntuples \
  --namespace axol1tl \
  --pvc mequinna-pvc \
  --copy-pod copy-pod

This prints counts of succeeded / failed / size-mismatched files, lists any problem files, and if there are failures prints a ready-to-run resubmit command.

Resubmitting failures — just copy-paste the resubmit command from the summary output. It pre-fills --skip-existing so already-copied files are not re-copied.

All options

Flag	Default	Description
`--input-dirs`	required	One or more source directories on UAF. Recursively finds all `.root` files.
`--output-path`	required	Destination path inside the PVC, e.g. `/data/ntuples`.
`--namespace`	`axol1tl`	Kubernetes namespace.
`--pvc`	required	PVC name, e.g. `mequinna-pvc`.
`--copy-pod`	`copy-pod`	Name of the long-lived pod with the PVC mounted.
`--create-pod`	off	Create the copy pod if it doesn't exist.
`--prefix`	none	One prefix string per input dir. `--prefix QCD TTbar` renames files to `QCD_file.root`, `TTbar_file.root`. Count must match `--input-dirs`.
`--filetype`	`*.root`	File pattern to match. e.g. `--filetype '.h5'` or `--filetype ''` for all files.
`--flat`	off	Put all output files in one flat directory. Without this, subdirectory structure from the source is preserved.
`--files-per-job`	`100`	Number of files per batch.
`--max-parallel`	`4`	Maximum number of batches running simultaneously.
`--skip-existing`	off	Check the pod before copying and skip files already present.
`--no-wait`	off	Launch batches and return immediately. Use `--summarize` to check results later.
`--summarize`	off	Parse log files and print summary. No copying. Pass log glob: `--summarize copy_logs/batch_*.log`. Also needs `--output-path`, `--namespace`, `--pvc`, `--copy-pod` for the resubmit command.
`--krsync`	`./krsync`	Path to krsync wrapper. Created automatically if missing.
`--log-dir`	`./copy_logs`	Directory for per-batch shell scripts and log files.
`--log-file`	`copy_summary.json`	JSON file summarising all file statuses at the end of a blocking run.
`--dry-run`	off	Print everything that would happen without copying anything.

How it works

The script discovers all .root files recursively under each --input-dirs path.
Files are split into batches of --files-per-job. For each batch a shell script is written to --log-dir.
Up to --max-parallel batch scripts run simultaneously as background processes. Each script rsyncs its files via krsync (which tunnels rsync over kubectl exec) into the copy pod, then checks file sizes to verify each transfer.
Each file in the batch log gets a status line: OK:, FAILED:, or SIZEMISMATCH:. The batch ends with BATCH_DONE.
In blocking mode the script watches all batches and prints a live progress line. In --no-wait mode it exits immediately after launch.
At the end (or when you run --summarize) the logs are parsed and results reported.

Troubleshooting

Pod not found / not Running

kubectl get pods -n axol1tl
kubectl describe pod copy-pod -n axol1tl

If the pod is stuck in Pending, check PVC status: kubectl get pvc -n axol1tl.

kubectl auth expired NRP uses OIDC tokens that expire. Re-authenticate with:

kubectl get pods -n axol1tl   # triggers browser login

rsync fails immediately Make sure the krsync file is executable: chmod +x ./krsync. Also confirm the copy pod is Running before starting.

Size mismatch on a file The file was partially transferred. Run --summarize to get the resubmit command — it will list the affected files and retry them with --skip-existing so everything else is left alone.

Check what's on the PVC

kubectl exec -n axol1tl copy-pod -- find /data -name "*.root" | wc -l
kubectl exec -n axol1tl copy-pod -- du -sh /data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NRPCOPY

Setup

Basic usage

Common recipes

Checking results

All options

How it works

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ymls		ymls
.gitignore		.gitignore
README.md		README.md
krsync		krsync
kube_copy.py		kube_copy.py

Folders and files

Latest commit

History

Repository files navigation

NRPCOPY

Setup

Basic usage

Common recipes

Checking results

All options

How it works

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages