Might be a bug: AllowedCPUs set to a fixed range due to endianness mismatch

## Background
We experienced significant performance degration for workloads running on sysbox after upgrading k8s os system from `Ubuntu 20.04 (cgroup v1)` to `Ubuntu 22.04 (cgroup v2)`.

## Potential Root Cause

- When creating containers, sysbox writes `AllowedCPUs` into the transient systemd scope unit and its drop-in via the D-Bus APIs:
  - `/run/systemd/transient/crio-<unit>.scope` - created via `StartTransientUnit`
  - `/run/systemd/transient/crio-<unit>.scope.d/50-AllowedCPUs.conf` - set via `SetUnitProperties`
- When building the above D-Bus requests, sysbox packs the CPU range into a byte stream using **big-endian ordering**. 
- However, based on systemd’s D-Bus CPU mask handling (see [cpu-set-util.c](https://github.com/systemd/systemd/blob/v259/src/shared/cpu-set-util.c#L363-L398)), the D-Bus CPU mask representation appears to expect a **little-endian**  byte stream. Using big-endian ordering therefore **reverses the byte order and can cause the interpreted CPU range to drift**. For example:
```
# big-endian encoding (current behavior)
CPU Range: 2-5      Bytes: 3c         Actual CPUs: [2 3 4 5]
CPU Range: 6-9      Bytes: 03 c0      Actual CPUs: [0 1 14 15]
CPU Range: 10-13    Bytes: 3c 00      Actual CPUs: [2 3 4 5]
CPU Range: 14-17    Bytes: 03 c0 00   Actual CPUs: [0 1 14 15]

# little-endian encoding (systemd expectation)
CPU Range: 2-5      Bytes: 3c         Actual CPUs: [2 3 4 5]
CPU Range: 6-9      Bytes: c0 03      Actual CPUs: [6 7 8 9]
CPU Range: 10-13    Bytes: 00 3c      Actual CPUs: [10 11 12 13]
CPU Range: 14-17    Bytes: 00 c0 03   Actual CPUs: [14 15 16 17]
```
- Only until with `cgroup v2`, the `EffectiveCPUs` is constrained by `AllowedCPUs`([ref](https://manpages.ubuntu.com/manpages/focal/en/man5/systemd.resource-control.5.html#:~:text=as%20EffectiveCPUs%3D.-,This%20setting%20is%20supported%20only%20with%20the%20unified%20control%20group%20hierarchy.,-AllowedMemoryNodes%3D%0A%20%20%20%20%20%20%20%20%20%20%20Restrict)) after systemd reloads, which **causes significant CPU idling and forces workloads to contend for an overlapping CPU range, resulting in performance degradation**.

## Direct Proof
Using the sysbox Unit Tests ([cpuset_test.go](https://github.com/nestybox/sysbox-runc/blob/48e6bc3a7f310029b5ab06067f875d1d011117fd/libcontainer/cgroups/systemd/cpuset_test.go#L26)) byte stream as input and the [systemd-dbus function](https://github.com/systemd/systemd/blob/v259/src/shared/cpu-set-util.c#L363-L398) to retrieve the actual CPU set, we can observe a drift (see [C program](https://www.programiz.com/online-compiler/9O0FFvED0KVCi)). Only when reversing the byte string can we retrieve the correct CPU set.

## Reproduce

1. Enable static cpu manager policy on kubelet

```
--cpu-cfs-quota=false --cpu-manager-policy=static
```
2. Create 3 runners with 4 static CPU each running on sysbox
```
template:
  metadata:
    annotations:
      io.kubernetes.cri-o.userns-mode: auto:size=65536
  spec:
    runtimeClassName: sysbox-runc
    containers:
      - name: runner-x
        resources:
          limits:
            cpu: "4"
            memory: 256Mi
          requests:
            cpu: "4"
            memory: 256Mi
```
3. Get the init state: effective cpu is correctly matched the kubelet setup `10-13`
```
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-1,34-63","entries":{"xx":{"runner-1":"2-5","runner-2":"6-9","runner-3":"10-13"}},"checksum":2693667676}

systemctl show crio-<unit>.scope -p EffectiveCPUs,AllowedCPUs
EffectiveCPUs=10-13
AllowedCPUs=2-5

cat /sys/fs/cgroup/kubepods.slice/kubepods-pod<uid>.slice/crio-<unit>.scope/cpuset.cpus
10-13
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod<uid>.slice/crio-<unit>.scope/cpuset.cpus.effective
10-13

# Set by dbus StartTransientUnit
cat /run/systemd/transient/crio-<unit>.scope | grep AllowedCPUs
AllowedCPUs=2-5

# Set by dbus SetUnitProperties
cat /run/systemd/transient/crio-<unit>.scope.d/50-AllowedCPUs.conf  | grep AllowedCPUs
AllowedCPUs=2-5
```
4. Trigger a systemd reloads
```
systemctl daemon-reload
```
5. Effective CPU will then drift into the wrong range `2-5`
```
systemctl show crio-<unit>.scope -p EffectiveCPUs,AllowedCPUs
EffectiveCPUs=2-5
AllowedCPUs=2-5

cat /sys/fs/cgroup/kubepods.slice/kubepods-pod<uid>.slice/crio-<unit>.scope/cpuset.cpus
2-5
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod<uid>.slice/crio-<unit>.scope/cpuset.cpus.effective
2-5
```

## Potential Fix
https://github.com/zehongwong/sysbox-runc/pull/1



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Might be a bug: AllowedCPUs set to a fixed range due to endianness mismatch #986

Background

Potential Root Cause

Direct Proof

Reproduce

Potential Fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Might be a bug: AllowedCPUs set to a fixed range due to endianness mismatch #986

Description

Background

Potential Root Cause

Direct Proof

Reproduce

Potential Fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions