xqute schedules, submits, monitors, and manages batch jobs across local, HPC, cloud, and container backends — all through a single async Python API. It's built for bioinformatics pipelines, ML hyperparameter sweeps, batch data processing, and any workload that needs to fan out across heterogeneous compute.
- Blazingly fast — built on
asynciowithuvloop; thousands of jobs, minimal overhead - Six scheduler backends — local, SGE, Slurm, SSH, Google Cloud Batch, Docker/Podman/Apptainer
- Plugin system — 14 lifecycle hooks let you add logging, notifications, or custom logic without touching core code
- Error strategies — automatic retry with configurable limits, or halt-the-world on first failure
- File-based status tracking — jobs self-report via status files; survives network failures and scheduler quirks
- Daemon mode —
keep_feedinglets you add jobs dynamically at any point - Cloud storage — workdirs on GCS (
gs://), Azure (az://), or S3 (s3://) - Path translation — seamless SpecPath / MountedPath duality for cross-machine execution
- Timeouts — per-job timeout enforcement via
coreutils timeout
pip install xquteWith optional extras:
pip install 'xqute[gs]' # Google Cloud Storage support
pip install 'xqute[cloudsh]' # Cloud shell supportimport asyncio
from xqute import Xqute
async def main():
xqute = Xqute(forks=3)
for _ in range(10):
await xqute.feed(["sleep", "1"])
await xqute.run_until_complete()
asyncio.run(main())xqute = Xqute(forks=3)
# Start — returns immediately
await xqute.run_until_complete(keep_feeding=True)
# Feed jobs dynamically
for i in range(100):
await xqute.feed(["python", "train.py", str(i)])
await asyncio.sleep(0.1)
# Signal done and wait for everything to finish
await xqute.stop_feeding()xqute ships with six schedulers. Swap the scheduler argument to switch.
xqute = Xqute(
scheduler="slurm",
forks=100,
scheduler_opts={
"partition": "gpu",
"time": "24:00:00",
"mem": "8G",
"gres": "gpu:1",
},
)xqute = Xqute(
scheduler="sge",
forks=100,
scheduler_opts={
"q": "1-day",
"l": ["h_vmem=4G", "gpu=1"],
},
)xqute = Xqute(
scheduler="ssh",
forks=100,
scheduler_opts={
"servers": {
"node1": {"user": "alice", "host": "node1.example.com", "keyfile": "/home/alice/.ssh/id_rsa"},
"node2": {"user": "alice", "host": "node2.example.com", "keyfile": "/home/alice/.ssh/id_rsa"},
}
},
)Note: SSH servers must share the same filesystem and use key-based auth.
xqute = Xqute(
scheduler="gbatch",
forks=100,
scheduler_opts={
"project": "my-gcp-project",
"location": "us-central1",
"taskGroups": [{
"taskSpec": {
"runnables": [{
"container": {"imageUri": "ubuntu", "entrypoint": "bash", "commands": ["-c", "..."]}
}]
},
"taskCount": 500,
"parallelism": 100,
}],
},
)xqute = Xqute(
scheduler="container",
forks=10,
scheduler_opts={
"image": "docker://python:3.12",
"entrypoint": "/bin/bash",
"bin": "docker",
"volumes": ["/data:/data"],
"envs": {"TF_CPP_MIN_LOG_LEVEL": "2"},
},
)14 lifecycle hooks via simplug. Example — send Slack notifications on failures:
from xqute import simplug as pm
@pm.impl
async def on_job_failed(scheduler, job):
import requests
requests.post(WEBHOOK, json={"text": f"Job {job.index} failed"})See the Plugins page for the full list of hooks and more examples.
Full documentation is at pwwang.github.io/xqute:
- Quick Start — get running in minutes
- User Guide — initialization, error handling, monitoring
- Schedulers — all six backends with config reference
- Plugins — lifecycle hooks and plugin authoring
- Advanced — custom schedulers, Dask/Airflow integration, perf tuning
- API Reference — auto-generated from source
Implement three async methods to add your own backend:
from xqute import Scheduler
class MyScheduler(Scheduler):
name = "mycluster"
async def submit_job(self, job):
"""Submit and return a unique job ID."""
async def kill_job(self, job):
"""Kill the job given its JID."""
async def job_is_running(self, job):
"""Return True if the job is still running."""Then pass it directly: Xqute(scheduler=MyScheduler, ...).
Jobs are wrapped in a bash template with an EXIT trap that writes status files (job.status, job.rc, job.stdout, job.stderr) into a per-job metadir. The polling loop reads these files — no scheduler API calls for status. This design makes xqute resilient to network hiccups and scheduler oddities.
INIT → QUEUED → SUBMITTED → RUNNING → FINISHED
↓ ↓
KILLING → FAILED
Issues and PRs welcome on GitHub. See AGENTS.md for dev setup and conventions.
MIT — see LICENSE.