Supports per GPU compute limits (number of processes, utilization rate, memory usage) on a per-(UNIX)user/worker basis, load-balancing, multiple nodes(machines) and more.
Tested on tensorflow-gpu tasks.
Installation (virtual python environment such as venv/conda is recommended)
cd /path/to/install
git clone https://github.com/jigangkim/nvidia-gpu-scheduler.git
cd /path/to/install/nvidia-gpu-scheduler
pip install . # standard installation
pip install -e . # editable (develop mode) installationUsage (dummy example: json)
cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example.py --identity scheduler --config_ext .json# Run worker
python example.py --identity worker --config_ext .jsonUsage (dummy example: gin)
cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example.py --identity scheduler --config_ext .gin# Run worker
python example.py --identity worker --config_ext .ginUsage (OpenAI baselines example)
cd /path/to/install/nvidia-gpu-scheduler
# Run job server
python example_openaibaselines.py --identity scheduler# Run worker
python example_openaibaselines.py --identity worker