Skip to content

The dynamique partition does not work with the TaskPlugins task/cgroup #388

@thibauthourlier

Description

@thibauthourlier

Hi,
I tried to use the dynamique partition. I could add the different types of nodes but the jobs where dying quickly. The error was

  slurmstepd: error: common_file_write_uints: write value '39981' to '/sys/fs/cgroup/cpuset/slurm/uid_20006/job_6457/step_batch/cgroup.procs' failed: No space left on device
  slurmstepd: error: unable to add pids to '/sys/fs/cgroup/cpuset/slurm/uid_20006/job_6457/step_batch'
  slurmstepd: error: task_g_set_affinity: File exists
  slurmstepd: error: _exec_wait_child_wait_for_parent: failed: Interrupted system call
  slurmstepd: error: job_manager: exiting abnormally: Slurmd could not execve job

When I modified the slurm config to only use task/affinity in the TaskPlugins, any jobs could run. The hpc and the htc partition do not have this problem.

This is how I was creating the node sets:

scontrol create nodename=ukdri-cluster2-dyn4-[1-10] Feature=dyn,Standard_F4s_V2 cpus=4 State=CLOUD RealMemory=7782
scontrol create nodename=ukdri-cluster2-dyn4-[1-3] Feature=dyn,StandardF48s_V2 cpus=48 State=CLOUD RealMemory=93388

Thanks,
Thibaut

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions