Add Ansible Script to Allow Nodes to Keep Their Own External Data Up-to-date #134

cailafinn · 2025-10-13T17:23:35Z

Adds a new role (that is, by default, set to run never in the playbook) to the agent playbook that downloads and then sets up a crontab job to keep the external data store up-to-date. This stops nodes having to wait until a build runs on them to download the data, improving parallelization.

The changes also adjust the way the data is stored on the host machine, moving it from a docker volume (which is inaccessible from the host machine) to a mount in the root directory. This allows viewing and manipulation of the contents of the volume to be performed without having to enter into a docker container.

This change was implemented primarily to get around a bug in a packaging script where new data could not be downloaded.
Having the data present on the machine ahead of time was an easier solution.

To Test

Take a linux node offline.
SSH into the node. Remove some data from the /{agent_name}_external_data/MD5 directory.
Wait a while.
Check that the crontab job downloaded the missing files by checking update_log.txt
Remove the crontab job (sudo crontab -e -u root)
Run the ansible script
ansible-playbook -i inventory.txt jenkins-agent-production.yml -t "mirror, agent" -u {fedID}
SSH back into the machine and check the crontab job is present and the data has been downloaded.

idigs · 2025-10-16T19:58:34Z

Linux/external-data-mirror/ansible/roles/mirror-data/tasks/update-external-data.sh

 if [ -z "${RSYNC_PROCESS_IDS}" ]; then
        echo "running rsync..."
-        rsync -az --perms -o -g  $SERVER_IP:/srv/$FTP_SRV_DIR/ftp/external-data/MD5/ /external-data/MD5/
+        rsync -azvW --perms -o -g  $SERVER_IP:/srv/$FTP_SRV_DIR/ftp/external-data/MD5/ /external-data/MD5/


--whole-file, -W This option disables rsync's delta-transfer algorithm, which causes all transferred files to be sent whole. The transfer may be faster if this option is used when the bandwidth between the source and destination machines is higher than the bandwidth to disk (especially when the "disk" is actually a networked filesystem). This is the default when both the source and destination are specified as local paths, but only if no batch-writing option is in effect.

Including the -W flag seemed to reduce some of the flakiness that was happening when trying to rsync data though the load balancer.

cailafinn added 3 commits October 10, 2025 13:40

Mirror data from the ISIS server

d53f6a2

Fix SSH connection issues

c4a44c7

Use mounts rather than volumes for ext data

ed112be

idigs reviewed Oct 16, 2025

View reviewed changes

sf1919 added this to ISIS core workstream v6.15.0 Nov 19, 2025

MialLewis moved this to Waiting for Review in ISIS core workstream v6.15.0 Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ansible Script to Allow Nodes to Keep Their Own External Data Up-to-date #134

Add Ansible Script to Allow Nodes to Keep Their Own External Data Up-to-date #134

Uh oh!

cailafinn commented Oct 13, 2025

Uh oh!

idigs Oct 16, 2025

Uh oh!

cailafinn Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Ansible Script to Allow Nodes to Keep Their Own External Data Up-to-date #134

Are you sure you want to change the base?

Add Ansible Script to Allow Nodes to Keep Their Own External Data Up-to-date #134

Uh oh!

Conversation

cailafinn commented Oct 13, 2025

To Test

Uh oh!

idigs Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

cailafinn Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants