Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 99 additions & 2 deletions docs/How-to/habrok_cluster_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ ssh-copy-id -i ~/.ssh/id_rsa.pub YOUR_USERNAME@login1.hb.hpc.rug.nl

Once you have added your SSH key to Habrok, modify the entry below and insert it into your `~/.ssh/config` file
```
Host habrok1
HostName interactive1.hb.hpc.rug.nl
Host habrok
HostName login1.hb.hpc.rug.nl
User YOUR_USERNAME
IdentityFile ~/.ssh/id_rsa
ServerAliveInterval 120
Expand Down Expand Up @@ -80,3 +80,100 @@ You can also submit single PROTEUS runs to the nodes. For example:
```console
sbatch --mem-per-cpu=3G --time=1440 --wrap "proteus start -oc input/all_options.toml"
```

## Transferring data from Habrok to Kapteyn

Habrok and Kapteyn are on different networks. Habrok cannot reach Kapteyn (the firewall blocks outgoing SSH), and although Kapteyn can reach Habrok, Habrok requires two-factor authentication (2FA) for every connection, which makes automated transfers from Kapteyn difficult.

So you cannot simply run `rsync` or `scp` in either direction between the two clusters. The workaround is to relay data through a machine that can reach both, like your laptop:

```
Habrok --> your laptop --> Kapteyn (norma2)
pull push
```

### Prerequisites

You need SSH access to both clusters configured on your laptop. See the [Habrok SSH setup](#access-the-habrok-cluster) above and the [Kapteyn cluster guide](kapteyn_cluster_guide.md) for SSH config instructions, including the ProxyJump setup needed to reach `norma2`.

Test that both connections work before proceeding:

```console
ssh habrok # will ask for your TOTP code
ssh norma2 # key-based, no 2FA
```

### Step 1: Pull data from Habrok to your laptop

On Habrok, PROTEUS output typically lives in `/scratch/<habrok_user>/proteus_output/`. Check what is there:

```console
ssh habrok 'ls -lh /scratch/<habrok_user>/proteus_output/'
```

Pull it to a temporary folder on your laptop:

```console
mkdir -p /tmp/habrok_transfer
rsync -avz habrok:/scratch/<habrok_user>/proteus_output/my_run/ /tmp/habrok_transfer/my_run/
```

Replace `<habrok_user>` with your Habrok username (e.g., `p000000`) and `my_run` with your simulation directory name.

If you only need the CSV and plots (not the raw per-timestep data), add `--exclude=data/` to save time and disk space:

```console
rsync -avz --exclude=data/ habrok:/scratch/<habrok_user>/proteus_output/my_run/ /tmp/habrok_transfer/my_run/
```

### Step 2: Push data from your laptop to Kapteyn

Push the staged data to the Kapteyn dataserver:

```console
ssh norma2 'mkdir -p /dataserver/users/formingworlds/<kapteyn_user>/proteus_output/my_run'
rsync -avz /tmp/habrok_transfer/my_run/ norma2:/dataserver/users/formingworlds/<kapteyn_user>/proteus_output/my_run/
```

Replace `<kapteyn_user>` with your Kapteyn username.

### Step 3: Clean up

Remove the temporary staging data from your laptop:

```console
rm -rf /tmp/habrok_transfer/my_run
```

### Alternative: direct pipe (no staging on your laptop)

Instead of storing data on your laptop in between, you can pipe the data straight through in a single command using SSH and `tar`:

First, make sure the target directory exists on Kapteyn:

```console
ssh norma2 'mkdir -p /dataserver/users/formingworlds/<kapteyn_user>/proteus_output'
```

Then pipe the data through:

```console
ssh habrok 'tar -cf - -C /scratch/<habrok_user>/proteus_output my_run' \
| ssh norma2 'tar -xf - -C /dataserver/users/formingworlds/<kapteyn_user>/proteus_output'
```

This streams data from Habrok through your laptop to Kapteyn without writing anything to disk locally. The downside is that if the connection drops, you have to start over from scratch (unlike `rsync`, which can resume). This approach is best for smaller transfers.

To exclude the `data/` directory (slim transfer):

```console
ssh habrok 'tar -cf - --exclude=data -C /scratch/<habrok_user>/proteus_output my_run' \
| ssh norma2 'tar -xf - -C /dataserver/users/formingworlds/<kapteyn_user>/proteus_output'
```

### Tips

- **rsync is incremental.** If the transfer gets interrupted (laptop goes to sleep, WiFi drops), re-run the same `rsync` command. It picks up where it left off and only transfers new or changed files.
- **Check sizes first.** Before pulling, check how large the data is: `ssh habrok 'du -sh /scratch/<habrok_user>/proteus_output/my_run/'`. Large runs can be tens of GB.
- **The `data/` directory is often not needed.** It contains raw NetCDF/JSON output at every timestep. The `runtime_helpfile.csv` and `plots/` directory are usually sufficient for analysis.
- **Kapteyn storage quotas.** The formingworlds dataserver has also limited space. Check your usage with `ssh norma2 'du -sh /dataserver/users/formingworlds/<kapteyn_user>/'` before transferring large datasets.
Loading