Skip to content

smp: respect cgroup CPU limits for appliance SMP#36

Open
ssahani wants to merge 1 commit intolibguestfs:masterfrom
ssahani:RHEL-152766
Open

smp: respect cgroup CPU limits for appliance SMP#36
ssahani wants to merge 1 commit intolibguestfs:masterfrom
ssahani:RHEL-152766

Conversation

@ssahani
Copy link
Copy Markdown

@ssahani ssahani commented Mar 27, 2026

When running inside a container with CPU limits, virt-v2v ignored cgroup constraints and used the host's total CPU count for SMP. Add cgroup v2 (cpu.max) and v1 (cfs_quota_us/cfs_period_us) detection to Sysconf, falling back to nr_processors_online.

Resolves: https://redhat.atlassian.net/browse/RHEL-152766

@rwmjones
Copy link
Copy Markdown
Member

There are multiple issues here, but the first one is that Unix_utils.Sysconf is a module with bindings related to the sysconf(3) function. Since you're not adding another binding for sysconf(3) why are new functions being added here?

Note this never fails. In case we cannot get the number of
cores it returns 1. *)

val cgroup_v2_cpus : ?root:string -> unit -> int option
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the documentation?

@ssahani
Copy link
Copy Markdown
Author

ssahani commented Mar 27, 2026

There are multiple issues here, but the first one is that Unix_utils.Sysconf is a module with bindings related to the sysconf(3) function. Since you're not adding another binding for sysconf(3) why are new functions being added here?

Right Thinking of new Cgroup module in unix_utils.ml/mli

@rwmjones
Copy link
Copy Markdown
Member

Yes, a new Cgroup or CGroup module would make more sense.

let quota, period =
Scanf.sscanf line "%Ld %Ld" (fun q p -> (q, p)) in
if period > 0L then
Some (Int64.to_int (Int64.div quota period))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it could return 0 causing problems elsewhere.

@rwmjones
Copy link
Copy Markdown
Member

I'm not very familiar with how this cgroup works, maybe @crobinso will be in a better position to review this.

However if cpu.max starts with max .. does that mean it's unlimited so shouldn't we directly got to _SC_NPROCESSORS_ONLN?

@ssahani
Copy link
Copy Markdown
Author

ssahani commented Mar 27, 2026

I'm not very familiar with how this cgroup works, maybe @crobinso will be in a better position to review this.

However if cpu.max starts with max .. does that mean it's unlimited so shouldn't we directly got to _SC_NPROCESSORS_ONLN?

Yes, when cpu.max starts with "max" (unlimited), v2_cpus returns None. The caller nr_cpus_available then falls through to try cgroup v1, and if that also returns none, it falls back to Sysconf.nr_processors_online() (i.e. _SC_NPROCESSORS_ONLN). So unlimited cgroups correctly resolve to the actual online processor count.

@ssahani ssahani force-pushed the RHEL-152766 branch 2 times, most recently from c158621 to 32ed696 Compare March 27, 2026 12:35
@crobinso
Copy link
Copy Markdown
Collaborator

IMO cgroupv1 support is not necessary. cgroupv2 has been the default since fedora 31 and rhel9, and it's a boot time kernel config that trickles down to containers. I think it's even fully removed in rhel10 and deprecated at systemd level. so even testing that path nowadays will be a pain

an additional improvement would be replacing the online cpu check with sched_getaffinity. this would make running under podman --cpuset-cpus and taskset ... guestfish do the right thing as well (could be a separate PR though)

Add a new Cgroup module in unix_utils with v2_cpus that parses
/sys/fs/cgroup/cpu.max to detect cgroup v2 CPU limits and return
the effective CPU count (quota/period).  The kernel uses long/u64
for quota and period values, so we parse them as Int64.

Add nr_cpus_available that checks cgroup v2 CPU limits first,
falling back to nr_processors_online when no limits are set.

Cgroup v1 is not supported as cgroupv2 has been the default since
Fedora 31 and RHEL 9, and is removed in RHEL 10.

Resolves: https://redhat.atlassian.net/browse/RHEL-152766
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants