smp: respect cgroup CPU limits for appliance SMP#36
smp: respect cgroup CPU limits for appliance SMP#36ssahani wants to merge 1 commit intolibguestfs:masterfrom
Conversation
|
There are multiple issues here, but the first one is that |
mlutils/unix_utils.mli
Outdated
| Note this never fails. In case we cannot get the number of | ||
| cores it returns 1. *) | ||
|
|
||
| val cgroup_v2_cpus : ?root:string -> unit -> int option |
Right Thinking of new Cgroup module in unix_utils.ml/mli |
|
Yes, a new Cgroup or CGroup module would make more sense. |
mlutils/unix_utils.ml
Outdated
| let quota, period = | ||
| Scanf.sscanf line "%Ld %Ld" (fun q p -> (q, p)) in | ||
| if period > 0L then | ||
| Some (Int64.to_int (Int64.div quota period)) |
There was a problem hiding this comment.
This feels like it could return 0 causing problems elsewhere.
|
I'm not very familiar with how this cgroup works, maybe @crobinso will be in a better position to review this. However if |
Yes, when cpu.max starts with "max" (unlimited), v2_cpus returns None. The caller nr_cpus_available then falls through to try cgroup v1, and if that also returns none, it falls back to Sysconf.nr_processors_online() (i.e. _SC_NPROCESSORS_ONLN). So unlimited cgroups correctly resolve to the actual online processor count. |
c158621 to
32ed696
Compare
|
IMO cgroupv1 support is not necessary. cgroupv2 has been the default since fedora 31 and rhel9, and it's a boot time kernel config that trickles down to containers. I think it's even fully removed in rhel10 and deprecated at systemd level. so even testing that path nowadays will be a pain an additional improvement would be replacing the online cpu check with sched_getaffinity. this would make running under |
Add a new Cgroup module in unix_utils with v2_cpus that parses /sys/fs/cgroup/cpu.max to detect cgroup v2 CPU limits and return the effective CPU count (quota/period). The kernel uses long/u64 for quota and period values, so we parse them as Int64. Add nr_cpus_available that checks cgroup v2 CPU limits first, falling back to nr_processors_online when no limits are set. Cgroup v1 is not supported as cgroupv2 has been the default since Fedora 31 and RHEL 9, and is removed in RHEL 10. Resolves: https://redhat.atlassian.net/browse/RHEL-152766
When running inside a container with CPU limits, virt-v2v ignored cgroup constraints and used the host's total CPU count for SMP. Add cgroup v2 (cpu.max) and v1 (cfs_quota_us/cfs_period_us) detection to Sysconf, falling back to nr_processors_online.
Resolves: https://redhat.atlassian.net/browse/RHEL-152766