opensmi is split into two layers:
-
Backend (Python CLI)
- Poll nodes via SSH
- Parse
nvidia-smioutput - Store allocations (JSON today; can migrate to SQLite)
- Compute violations
- Optional notifications (Slack)
-
Frontend (TUI, Bun + OpenTUI)
- Renders dashboard/detail views
- Calls the Python CLI for actions (
alloc set/clear,kill)
The backend runs a small bash script over SSH:
- reads GPU inventory
- reads active compute processes
- maps PID → Linux user via
/proc/<pid>
Default state dir: ~/.opensmi/
opensmi.json(cluster topology)allocations.json(GPU assignments)
State dir is configurable via:
--state-dirOPENSMI_STATE_DIR
killis best-effort.- Killing other users typically requires passwordless sudo or running as root.