-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
When jobs are resubmitted to Slurm using gridtk resubmit, there is a delay before the updated job information appears in sacct. Since gridtk list relies on sacct internally, it continues to display the previous state of the job for some time after resubmission, while direct Slurm commands like squeue show the current accurate state.
Example demonstrating the issue:
$ gridtk resubmit
Resubmitted job 1
$ gridtk list
job-id slurm-id nodes state job-name output dependencies command
-------- ---------- ------- ------------- ---------- ------------------------ -------------- ----------------------------------------------------------------------------------------
1 2894248 node01 CANCELLED (0) test-ff logs/test-ff.2894248.out gridtk submit --time 0-8 --mem 32G test.sh
$ squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2894248 cpu test-ff amir R INVALID 1 mode01
$ gridtk list
job-id slurm-id nodes state job-name output dependencies command
-------- ---------- ------- ------------- ---------- ------------------------ -------------- ----------------------------------------------------------------------------------------
1 2894248 node01 CANCELLED (0) test-ff logs/test-ff.2894248.out gridtk submit --time 0-8 --mem 32G test.sh
As shown above, squeue correctly displays the job as running (status "R"), but gridtk list still shows it as "CANCELLED (0)" due to the delay in sacct updates.
Metadata
Metadata
Assignees
Labels
No labels