HPC Status Monitor

A single pane of glass for your HPC fleet. At a glance: what's up, what's slow, where your jobs will wait, and how much allocation you have left.

What it's for

You have work to run on HPC systems, but the systems are scattered across sites, schedulers, and login nodes. Before submitting a job you want to know:

Is the system even up right now?
Which queue will get me running fastest?
Am I close to burning through my allocation?
Is $SCRATCH about to purge my files?

The Status Monitor answers those questions in one place, refreshed continuously, so you don't have to ssh around and run five different commands to decide where to submit.

What you'll see

Fleet status. Every HPC system you have access to, with a status (UP / DEGRADED / MAINTENANCE / DOWN), its login node, its scheduler, and when it was last checked. Click a system for the full details page.

Queue health. For each system, live queue depth, node availability, and core demand — so you can pick the queue that isn't backed up.

Quota usage. Your allocations in core-hours, how fast you're burning them, and warnings before you hit the limit. Broken down by subproject where relevant.

Storage. Capacity and usage for $HOME, $WORK, and $SCRATCH on every system, with purge-window reminders for scratch.

Insights. Automatic recommendations — "this queue is draining, try that one", "you're at 92% of your allocation", "scratch is filling up".

Quick start

./scripts/run.sh

Then open http://localhost:8080.

Watching your own clusters

If you're using Parallel Works, authenticate and the dashboard will pick up every cluster you have access to:

pip install pw-client
pw auth             # paste your ACTIVATE API key
./scripts/run.sh

It works with PBS/Slurm HPC clusters, GPU servers (via nvidia-smi), and plain compute nodes.

Help while you're using it

Every page has a Help button in the top-right with a quick reference for the HPC terms you'll see (core-hours, walltime, draining queues, the difference between $HOME / $WORK / $SCRATCH, and so on). Each metric also has a ⓘ tooltip that explains what it means.

Deployments

The monitor supports branded deployments. The HPCMP build, for example, uses the HPCMP purple palette and logo mark — launch it with:

CONFIG_FILE=configs/config.hpcmp.yaml ./scripts/run.sh

For operators and developers

Deploying it for your team: docs/deployment.md
Configuration options: docs/configuration.md
REST API: docs/api.md
HPC glossary: docs/glossary.md

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
configs		configs
docs		docs
examples		examples
schemas		schemas
scripts		scripts
src		src
tests		tests
web		web
yamls		yamls
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REFACTOR.md		REFACTOR.md
pyproject.toml		pyproject.toml
workflow.yaml		workflow.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HPC Status Monitor

What it's for

What you'll see

Quick start

Watching your own clusters

Help while you're using it

Deployments

For operators and developers

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HPC Status Monitor

What it's for

What you'll see

Quick start

Watching your own clusters

Help while you're using it

Deployments

For operators and developers

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages