Bunya has several spaces users have access to to keep their data and software.
Important
These spaces are subject to the Bunya User Data Spaces Operational Procedure and the UQ Research Data Management Policy and UQ Research Data Management Procedure.
Individual, not sharable, spaces
home directory
scratch user directory
Space during jobs
TMPDIR
Spaces to host data and software
opendata
licenseddata
sw
Group spaces
Group projects
Storage
UQRDM Q storage records
The spaces below are individual spaces. This means, by default, they are only accessible by the user who this space belongs to. These spaces should NOT be shared with any other users. If a shared space is required please see further below in the section on shared spaces.
- Every user has a home directory. This is where a user lands when they login.
- The home directory of a new user is created when the user logs in for the very first time via ssh. onBunya login will not create a user's home directory.
- Quotas are 50GB and 1 million files.
- Quotas on a user's home directory will not be increased.
- The home directory should be used to install software, such as conda, python and R environments, scripts, and other software installed by
makeandmake install, etc. - The home directory should not be used for data and output from calculations. The reason for this is that if it goes over one of the quotas (space or files) a user can effectively lock themsleves out of their account if they do not clean up (get back under quota) quickly.
- The home directory is not backed up.
- Deleted files from the home directory cannot be recovered.
- The home directory must not be shared with other users.
- Every user has a directory in
/scratch/user. - Quotas are 300GB and 1 million files.
- The 300GB quota can be exceeded up to 5TB but only up to 2 weeks. The quota has to fall below 300GB within the 2 weeks period or the user can no longer write to
/scratch/user. - Quotas in a user's
/scratch/userdirectory may be increased. - The scratch user directory should be used to keep input and output of calculations.
- The scratch user directory should not be used to install software.
- The scratch user directory is not backed up.
- Deleted files from the scratch user directory cannot be recovered.
- Files not accessed for 90 days are auto deleted and cannot be recovered.
- The scratch user directory must not be shared with other users.
Users can use the command rquota on Bunya to check their current quotas and usage. It provides quotas and usage for /home, /scratch/user and /scratch/project (more information below) they have access too.
Users can use the command
/usr/lpp/mmfs/bin/mmlsquota -u $USER --block-size=auto scratch:user
to check on their quotas and grace period in /scratch/user.
-
$TMPDIRis created automatically for each Slurm job and is then automatically deleted once the Slurm job finishes. It is the ideal place for temporary files of jobs. -
$TMPDIRcurrently provides upto 10TB (soft limit) and 11TB (hard limit) of temporary space for upto 10,485,760 (soft limit) and 11,534,336 (hard limit) files. These limits apply to the aggregate across all running jobs for each user. -
$TMPDIRis pre-set and unique for each running job. Do not create your own$TMPDIRor overwrite it with something else in your scripts. -
$TMPDIRdoes not count towards user quotas in/homeor/scratch/useror project quotas in/scratch/project. -
$TMPDIRis not/scratch/user. -
$TMPDIRis not/tmp. -
$TMPDIRis recommended if calculations produce a very large amount of (often very small) files. -
You can check your current utilisation of
$TMPDIRwith the following command/usr/lpp/mmfs/bin/mmlsquota --block-size=auto -u $USER scratch:temp -
Software that allows a flag or option to set a temporary or scratch directory should always use
$TMPDIRand NOT/tmp. -
To use
$TMPDIRfor software that does not allow to set a temporary/scratch directory, change to$TMPDIR(cd $TMPDIR), then copy all required input files to$TMPDIRor use the full path to point to input files in/scratchor/homeor/QRISdata(for/QRISdatarestrictions apply, see below). After the calculation copy all ouput needed to/scratchor/QRISdata(see below for restrictions on/QRISdata) and make sure to tar and/or zip output if required. This will need to be done in your Slurm submit script.cd $TMPDIR cp input-files . srun ... cp output-files /scratch/.../. (or /QRISdata/.../.)
The opendata space is a read-only space to house data sets and models. Current data includes, GTDB, BLAST, Kraken, CheckM2, SingleM, nuScens, nuScenes-C, huggingface, ollama, gguf, and many more.
Users are required to check if the data they need is already housed in /scratch/opendata or could be housed there before requesting an increase to their /scratch/user space or scratch project group space.
If data is not yet available in /scratch/opendata users can submit a request for data to be installed there.
Information needed to consider an instal request:
- Link to webpage and downlaod page
- Which data to install if multiple versions are available
- Link to license and usage terms and/or agreements
- Details of size and number of files
The licenseddata space is a read-only space to house licensed data sets and models.
Users are required to check if the data they need is already housed in /scratch/licenseddata or could be housed there before requesting an increase to their /scratch/user space or scratch project group space.
If data is not yet available in /scratch/licenseddata users can submit a request for data to be installed there.
Information needed to consider an instal request:
- Link to webpage and downlaod page
- Which data to install if multiple versions are available
- Link to license and usage terms and/or agreements
- If needed procedure to check users have agreed to license and usage terms and/or agreements and adding user access
- Details of size and number of files
Scratch projects are shared group spaces in /scratch that provide more space and space that is shared by members of a group.
- Apply for a scratch project by sending a request to rcc-support@uq.edu.au and you will then be send the link to the application form.
- Only the group leader or grant holder can apply for a scratch projet
- Scratch projects should not be used to install software that needs to be shared. Groups should apply for space in
/sw. - Scratch projects can be used to share data sets and output that multiple users need access to during their calculations.
- Scratch projects should not be used to provide additional scratch space for single users. For single users the quotas in
/scratch/usercan be increased. - Scratch projects should not contain directories for each of the different users.
- The scratch project directory is not backed up.
- Deleted files from the scratch project directory cannot be recovered.
- Files not accessed for 90 days are auto deleted and cannot be recovered.
- Groups should only apply for a single scratch project. Application for multiple projects for the same group are discouraged.
Scratch projects require an access group. This can be an RDM storage record access group (QNNNN) or a specific access group created by RCC for the scratch project. Users, not RCC, manage the access groups either via the RDM portal for QNNNN groups, or via the QRIScloud Portal.
For RDM storage record access groups (QNNNN) the RDM storage record owner add users as collaborators to the RDM storage recrod via the RDM portal. For QRIScloud access groups, the group owner or administrator needs to go to the QRIScloud Portal and click on Account to log in. Then they need to go to Services Dashboard and there look under Groups for the respective Scratch Project group and click on the link. This page outlines how to add and remove users from the access group for the proejct.
New users added either way will have to wait for this to take affect as permissions set in the RDM portal need to be propagated to Bunya. In both case user should create a new login to Bunya to have new groups available in their environment. Users can check which groups they belong to by typing the command groups on Bunya.
Users in the access group who also have access to Bunya will have access to the scratch project on Bunya. They have read-write access to the scratch project directory. However, directories and files created by other scratch project members in the scratch project directories are only readable (and executable) but not writable (cannot be deleted for files, cannot be written to for directories) to others.
In the below example:
-
user-2can change intodirectory-1but they will not be able to write todirectory-1or delete files indirectory-1 -
user-1can change intodirectory-2and can write todirectory-2and delete files indirectory-2 -
Only
user-1can delete or changefile-1 -
user-1anduser-2can delete or changefile-2 -
Only
user-1change delete or changeexecutable-1 -
user-1anduser-2change delete or changeexecutable-2 -
To make
directory-1writable to other users thanuser-1,user-1needs to run
chmod -R g+w directory-1
where -R means that this is run recursively for all subdirectories and files
g+w adds write permissions to the group.
drwxr-sr-x. 2 user-1 Project_Access_Group 4K Feb 1 19:08 directory-1
drwxrwsr-x. 2 user-2 Project_Access_Group 4K Oct 9 2023 directory-2
-rw-r--r--. 1 user-1 Project_Access_Group 6M Mar 25 13:54 file-1
-rw-rw-r--. 1 user-2 Project_Access_Group 6M Mar 25 13:54 file-2
-rwxr-xr-x. 1 user-1 Project_Access_Group 6M Mar 8 14:01 executable-1
-rwxrwxr-x. 1 user-2 Project_Access_Group 6M Mar 8 14:01 executable-2
Users or groups who want to share software should check if it is already installed as a software module on Bunya and if not contact rcc-support@uq.edu.au to see if it could be installed. See software modules on Bunya and the Conda on Bunya Guide.
Groups that need to share a range of software and/or share custom build software can apply for space in /sw. Usually only one custodion with install (write) access will be permitted with the rest of the group accessing the software read-only similar to all Bunya software.
Only software is permitted to be installed in /sw and any data and/or additional files are required to be housed in /scratch/opendata and/or /scratch/licenseddata.
Software installed in /sw/ must not point to any location in /QRISData.
The UQ RDM User Guides provide a lot of information about RDM research records and RDM storage records and how to use and administer these.
UQ RDM storage records, where users selected that the data should be available on HPC when they have applied for the RDM storage record (it cannot be changed afterwards), are automatically available on Bunya in /QRISdata/QNNNN, where QNNNN is the storage record number and can be found as part of the short identifier for the RDM storage record.
/QRISdata/QNNNNare shared spaces with default quotas of 1TB and 1 million files. This can be increased by applying for more storage via the RDM portal.- RHD students who have dual credentials (staff and student) may need to add the other credential as a collaborator. For example, if your RDM was created with your student account, then you will need to add your staff account as a collaborator if you need access to it from the staff credentials. You manage collaborators on your RDM via the RDM portal.
- Use
ls /QRISdata/QNNNN/(the/at the end is important) orcd /QRISdata/QNNNNto see the RDM storage record. Due to the automount mechanism, the RDM Q storage allocation folder/QRISdata/QNNNNneeds to be mounted to become visible in the filesystem.
The following are important behaviours to observe when interacting with the /QRISdata storage from Bunya:
- Users are not allowed to submit jobs (using
sbatchorsalloc) from a directory in/QRISdata. Jobs that are running from /QRISdata impact the service and other users and may be deleted without warning. In your running jobs, it is the value of the environment variable $SLURM_SUBMIT_DIR - The /QRISdata storage is designed as archival space (see How /QRISdata works section, below).
Write once, read rarely. Bigger files are better than lots of small files. - You should not do multiple reads from a directory in
/QRISdata.
A once-only read of a few input files on /QRISdata at the start of your computations is OK. You should be using/scratchfor inputs to calculations when they are repeatedly being accessed. - You should not do perform multiple writes into a directory in
/QRISdata. - Standard output should not be written to
/QRISdata. Use /scratch instead. - A once-off write at the end of your job or interactive session is permitted. However, the general data workflow should be using
/scratchfor intensive input and output of calculations. - Software should not be installed in
/QRISdata. This is because accessing software is often repeatedly accessing/QRISdatawhich is not permitted. - Do not unpack archives or tar files directly in
/QRISdata. Unpack archives into a directory in/scratch - Do not move directories with many files to
/QRISdata, tar or archive these first as lots of (small) files can cause problems (not just for you but also others). - Do not perform file-by-file copies to RDM (using tools like rsync). This is especially true when very large numbers of files and folders are involved.
- You should monitor your RDM quota consumption periodically.
If you are not aware of your current quotas on an RDM, login to the QRIScloud services portal and click on the Q collection you need.
These commands will tell you how many GB or TB and how many inodes you have in your RDM Q storage allocation.
Always do this from a compute node (interactive job, or onBunya session)
The /QRISdata filesystem provides access from Bunya to UQ RDM collections (as well as a smaller number of collections that predate UQ RDM).
The storage technology behind /QRISdata consists of multiple layers of storage, and software that manages the copies of your data within those multiple layers. There are also active links to other caches at St Lucia campus that allow you to drag and drop your file onto the St Lucia R:\ drive and have it appear automatically at the remote computer centre that houses Bunya and the RDM Q collections.
| Layer | Purpose | Response Time |
|---|---|---|
| GPFS Cache | Used for intersite transfers and is mounted onto Bunya HPC | Immediate once mounted onto Bunya |
| Zero Watt Storage (ZWS) | Disk drives that operate like tapes. Only powered on when required. | <1 minute to activate a read from off |
| Robotic Tape Silo | Deep archive copies | Can take several minutes to commence reading |
The hierarchical storage management (HSM) software will move files downwards when they are not in active use in the top layer. If a file is required, but is not in the top layer, then it will be recalled from ZWS, or tape and copied into place on the GPFS Cache layer.
#Total size of the data in your RDM collection
du -sh --apparent-size /QRISdata/QNNNN
#How much is in the GPFS cache layer
du -sh /QRISdata/QNNNN
#How many files and folders for the entire collection (this is much quicker than du -s --inodes)
df -i /QRISdata/QNNNN
- If you need to ascertain where the data is you can use these variants of the du command
#You can explore subdirectories by adding to the path for the du command.
du -sh /QRISdata/QNNNN/something/*
du -sh --apparent-size /QRISdata/QNNNN/something/*
du -s --inodes /QRISdata/QNNNN/something/*
On a compute node via an onBunya session or interactive batch job
Use ls -salh FILEPATH
The output contains the size-occupied-on-disk in the first column and the actual-size in column 6
#This one is in the GPFS cache layer (size on disk matches actual size)
[uquser@bun104 Q0837]$ ls -salh Training.tar
367M -rw-r--r--+ 1 uquser Q0837RW 367M Oct 30 2023 Training.tar
#This one is in the GPFS cache layer too but the size on disk is actually bigger because files occupy at least one block (512)
[uquser@bun104 Q0837]$ ls -salh Readme.md
512 -rw-rw----+ 1 Q0837 Q0837RW 1 Sep 13 2023 Readme.md
#This one is not in GPFS cache (zero on disk but the actual filesize is 1.7MB)
[uquser@bun104 Q0837]$ ls -salh .LDAUtype1.tgz
0 -rw-rw----+ 1 Q0837 Q0837RW 1.7M Dec 16 2019 .LDAUtype1.tgz
#This one is also not in GPFS cache.
#Sometimes the size-occupied-on-disk value will be 512 (bytes) instead of 0.
[uquser@bun104 Q0837]$ ls -salh *.dat
512 -rw-rw----+ 1 Q0837 Q0837RW 2.5G Dec 25 2019 HappyHolidays.dat
On a compute node via an onBunya session or interactive batch job
Use the recall_medici command. That command should be in your standard PATH, but in case it isn't, you can copy it completely using the icon on the right side of this code box:
/usr/local/bin/recall_medici FILEPATH
Replace FILEPATH with the name of the file(s) you wish to retrieve. Wildcards are also supported so you can retrieve the files in a folder.
The recall_medici command is also available on data.qriscloud.org.au if you don't have access to Bunya.
At a time between 20 past and 40 past the hour at the hours of 08, 12, 16, and 22, access to the RDMs from Bunya can be delayed. It happens while the service access controls are being updated. Access can appear non-responsive for a few minutes. Best to wait 5 minutes and try again.
As stated in the first section of the /QRISdata section of this guide, you won't see your folder unless it has been mounted onto /QRISdata.
- Use
ls /QRISdata/QNNNN/(the/at the end is important) orcd /QRISdata/QNNNNto see the RDM storage record.
Due to the automount the RDM storage record needs to be used to be seen.
Please refer to our UQRDM Overview Guide and links therein to UQRDM portal user information.