Useful scripts and examples for The University of Chicago's Research Computing Center (https://rcc.uchicago.edu). All SLURM examples target the Midway3 cluster using the caslake partition.
-
.bashrc— A starter.bashrcthat adds safety aliases to prevent accidentalrm *disasters. -
src/machine_info.py— Reports the number of physical cores, logical cores, and threads per core usingpsutilandos.cpu_count(). Useful for understanding the hardware topology of a node. -
src/monkey_shakespear.py— Generates random Shakespearean-style sentences by combining subjects, verbs, objects, and adverbs. Each sentence is prefixed with the hostname, making it easy to see which node produced it. Used by several of the SLURM demos below.
data/— A collection of FASTA files and a sample database used by the examples.
The examples are numbered in order of increasing complexity. Each builds on concepts from the previous one.
The simplest possible batch job: one node, one task, one core. Demonstrates basic #SBATCH directives, SLURM environment variables, and output file naming with %j. No srun is needed because the script runs directly on the single allocated node.
Scripts:
single_node.sbatch— Prints hostname, sleeps for 10 seconds, and reports start/end times.single_node_monkey.sbatch— Same structure, but runsmonkey_shakespear.pyinstead of sleeping.debug.sbatch— An example BLASTp job showing how to print SLURM variables for debugging.
Demonstrates a common multi-node mistake. The script requests 6 nodes and lists them with scontrol show hostnames, but the actual command runs without srun — so only the head node executes it. The srun line is commented out to make the bug visible.
Key lesson: Without srun, commands in a multi-node job only run on the head node. The other nodes sit idle while you're still charged for them. Uncomment the srun line to see the correct behavior.
The corrected version of the hello world pattern. Uses srun to run monkey_shakespear.py on 24 nodes simultaneously, with all output appended to a single shared file (monkey_shakespear_book.txt). Each line identifies which node wrote it via the hostname prefix.
Introduces SLURM array jobs with --array=0-10. Each of the 11 tasks receives a unique $SLURM_ARRAY_TASK_ID and runs independently — no srun required, since each array element is already its own SLURM job. The script demonstrates how to use printf to zero-pad the task ID (e.g., 00, 01, ..., 10) for consistent filenames.
Output files use the %j.%a pattern, where %j is the job ID and %a is the array task ID.
A complete dynamic pipeline driven by a Python script (custom_array.py) that:
- Reads a database file and splits it into N equal-sized chunks.
- Writes each chunk to a temporary file (
tempdb_0.db,tempdb_1.db, ...). - Generates a custom
.sbatchfile at runtime with--array=0-N. - Submits it with
subprocess.run(['sbatch', ...])and captures the job ID. - Generates a cleanup
.sbatchfile with--dependency=afterok:<job_id>. - Submits the cleanup job, which moves all output and temporary files into a workspace directory.
This is a pattern for production workflows where the number of tasks depends on the input data.
Shows how to chain jobs so that each one waits for the previous one to finish. Includes both a bash and a Python implementation.
Bash (dependency.bash): Uses awk '{print $4}' to extract the job ID from sbatch output, then passes it via --dependency=afterok and --dependency=afterany.
Python (dependency.py): Wraps submission in a function with error handling. If any submission fails, all previously submitted jobs are cancelled with scancel. Each job appends a line to workflow.out, creating a log of the execution chain.
Dependency types:
| Flag | Behavior |
|---|---|
afterok |
Run only if the dependency succeeded (exit code 0) |
afterany |
Run regardless of success or failure |
afternotok |
Run only if the dependency failed |
A hybrid approach that combines SLURM multi-node allocation with Python's multiprocessing.Pool. The SLURM directives request 4 nodes with 4 CPUs per task, and the Python script reads SLURM environment variables to configure the pool size accordingly.
Each worker receives a database chunk identifier, simulates processing with a random sleep, and writes output to a file tagged with the job ID, node count, and chunk number. This pattern maximizes utilization by using SLURM for inter-node distribution and multiprocessing for intra-node parallelism.
A benchmarking demonstration using BLASTp for sequence alignment. Note: This has not been updated for Midway3.
All scripts assume the mpcs56430 account on the caslake partition. To run any example:
ssh <your_cnetid>@midway3.rcc.uchicago.edu
cd RCC-Utilities/<example_directory>
sbatch <script_name>.sbatchFor the Python-driven workflows (5_fragment_array_job, 6_dependency_workflows), run the Python script directly from the login node:
python custom_array.py
python dependency.py