SLURMify is a tool for creating and validating SLURM job scripts using a Python-based configuration approach. This documentation explains how to use the library to define SLURM jobs with proper validation against system constraints.
SLURMify allows you to create SLURM job scripts by defining Python objects that represent your job requirements. The tool handles validation and generates proper SLURM batch scripts.
SLURMify uses the following classes from config_info.py to define job configurations:
- Resources - Define computational resource requirements
- Environment - Define environment setup commands
- System - Define system configuration (contains Resources)
- Logs - Define log file paths
- Module/Modules - Define software modules to load
- Job - Define a complete SLURM job
- Jobs - Collection of Job objects
To create a SLURM job, you need to create a Python file with your job configuration:
from utils.config_info import Job, System, Jobs, Resources
# Create a Jobs collection
Jobs = Jobs()
# Add a simple job
Jobs.add_job(
Job(
name="MinimalExample",
system=System(
name="lxp",
resources=Resources(
account="lxp", # Your project account
),
),
exec_command="srun echo 'Hello World'", # Command to execute
)
)from utils.config_info import Job, Environment, System, Jobs, Resources, Logs, Modules, Module
# Create environment setup
python_env = Environment(
name="PyEnv",
commands=[
"pip install -r requirements.txt",
"python setup.py install"
]
)
# Define resources
resources = Resources(
account="lxp", # Your project account
cores=128, # CPUs per task
gpu=4, # GPUs per task
mode="default", # QoS mode
nodes=1, # Number of nodes
time="2:0:0", # Maximum runtime
partitions="gpu" # Partition to use
)
# Define system
my_system = System(
name="MyProgram",
resources=resources
)
# Define modules to load
modules = Modules(
list_of_modules=[
Module(name="env/release/2023.1"),
Module(name="Apptainer/1.3.1-GCCcore-12.3.0")
]
)
# Define log locations
logs = Logs(
default="job-%j.out",
error="job-%j.err"
)
# Create job
my_job = Job(
name="MyJob",
environments=[python_env],
system=my_system,
logs=logs,
modules=modules,
exec_command="srun python my_script.py"
)
# Create Jobs collection and add job
Jobs = Jobs()
Jobs.add_job(my_job)The Resources class accepts the following parameters:
| Parameter | Description | Default |
|---|---|---|
| account | Project account ID | Required |
| partitions | SLURM partition (cpu, gpu, fpga, largemem) | "cpu" |
| cores | CPUs per task | 1 |
| gpu | GPUs per task | None |
| mode | QoS mode | "default" |
| nodes | Number of nodes | None (auto-calculated) |
| time | Maximum job runtime (HH:MM:SS) | "00:15:00" |
| ntasks | Number of tasks | 1 |
SLURMify supports various QoS modes with different constraints:
| QoS Mode | Max Time | Max Nodes | Description |
|---|---|---|---|
| dev | 06:00 | 1 | Interactive development |
| test | 00:30 | 5% | Testing and debugging |
| short | 06:00 | 5% | Small jobs for backfilling |
| default | 48:00 | 25% | Standard production jobs |
| long | 144:00 | 5% | Long-running jobs |
| large | 24:00 | 70% | Very large scale executions |
After creating your configuration file (e.g., my_config.py), you can generate SLURM scripts:
-
Update the path in
main.pyto point to your config file. -
Run the main script:
python main.py
The generated scripts will be placed in the out directory.
For more advanced usage examples, check the sample configurations:
minimalTest.py- Minimal configurationpySimpleConfig.py- Simple but complete configurationcomplexConfig.py- Complex example with multiple components
SLURMify automatically validates your job configuration against system constraints:
- Checks if resources are within system limits
- Validates QoS mode and time constraints
- Ensures proper GPU allocation for GPU partitions
- Optimizes node allocation based on requested CPUs
When validation issues are detected, SLURMify will either:
- Make automatic corrections for minor issues
- Report errors for major issues that need your attention
For a complete example of a multi-node GPU job, see complexConfig.py which demonstrates:
- Setting up a vLLM environment
- Configuring a head node with workers
- Defining environment variables and commands
- Setting up Ray for distributed processing
SLURMIfy does not implement all parameters of SLURM at least not if you need to validate them. There is a way to allow any SLURM parameter to be passed to the final script. Environments are passed straight to the script without any validation. This allows you to pass any SLURM parameter to the script. This is not recommended as it bypasses the validation process and can lead to errors in your job submission. This only works for SLURMIfy configuration files not In the webUI since the webUI uses RestAPI to submit jobs and does not allow any SLURM parameters to be passed.
SLURMify provides a flexible command-line interface with various options:
| Parameter | Description |
|---|---|
-f, --file |
Path to the Python configuration file |
-I, --init |
Generate a new configuration file at specified path |
-o, --output |
Output file path for the generated SLURM script |
-t, --test |
Run in testing mode |
-v, --verbose |
Enable verbose output |
--validation-only |
Only validate configuration without generating script |
| Parameter | Description |
|---|---|
--web |
Enable web interface mode |
--api |
Enable API mode |
| Parameter | Description |
|---|---|
--module-search |
Search for a module in the module list |
--create-module-list |
Create a module list from the HPC system |
The generate subcommand creates SLURM scripts directly from command-line parameters:
slurmify generate --name myjob --account p200000 --command "srun python script.py"| Parameter | Description |
|---|---|
--name |
Name of the job |
--account |
Project account ID (e.g., p200000) |
--command |
Command to execute (e.g., 'srun python script.py') |
| Parameter | Description | Default |
|---|---|---|
--partition |
SLURM partition to use | "cpu" |
--cores |
CPUs per task | 1 |
--nodes |
Number of nodes | 1 |
--ntasks |
Number of tasks | 1 |
--gpu |
GPUs per task | None |
--qos |
Quality of service mode | "default" |
--time |
Maximum runtime (HH:MM:SS) | "00:15:00" |
| Parameter | Description | Default |
|---|---|---|
--logs-default |
Path for stdout logs | "slurmify-%j.out" |
--logs-error |
Path for stderr logs | "slurmify-%j.err" |
--env |
Environment setup commands (can use multiple times) | None |
--module |
Modules to load (can use multiple times) | None |
--output |
Output file path | ./out/<job_name>.sh |
Generate a SLURM script from a configuration file:
python main.py -f configs/my_config.pyGenerate a new configuration template:
python main.py --init new_config.pyGenerate a SLURM script directly with parameters:
python main.py generate --name test-job --account p200000 --partition gpu --gpu 4 --time 1:00:00 --command "srun python train.py" --module "env/release/2023.1" --module "CUDA/12.0.1"Validate a configuration file without generating a script:
python main.py -f configs/my_config.py --validation-onlySLURMify includes a set of test configurations to ensure the library works as expected. You can run the tests using:
python main.py -tSLURMify provides a web interface for job submission. You can access it by running:
./run.shThis will start the API required for the AI Agent and the web interface. You can then access the web UI at http://localhost:8501.
The script is mostly to run as a service not for local usage. It will start the web interface on port 8501 by default. You can also run it with the --web option to enable the web interface mode only.
poython main.py --webThe Api can be run in the same way:
python main.py --api