Skip to content

[WIP] fix-stress(guest): cgroup memory.max + pids.max to prevent VM panic under OOM#605

Draft
G4614 wants to merge 4 commits into
boxlite-ai:mainfrom
G4614:fix/cgroup-oom-protection
Draft

[WIP] fix-stress(guest): cgroup memory.max + pids.max to prevent VM panic under OOM#605
G4614 wants to merge 4 commits into
boxlite-ai:mainfrom
G4614:fix/cgroup-oom-protection

Conversation

@G4614
Copy link
Copy Markdown
Contributor

@G4614 G4614 commented May 27, 2026

Cap each container's memory/PIDs via cgroup memory.max + pids.max so a hostile workload can't panic the guest kernel — pinning the limits youki already applies to an explicit /boxlite/<id> cgroup, and adding the enforcement test the survival test alone can't provide (the guest OOM killer keeps the VM up either way).

Test plan

Unit (boxlite-guest, memory_limit_from_meminfo): 90% of MemAvailable, strictly below the raw available (headroom reserved); reads MemAvailable, not the larger MemTotal/MemFree; None on missing/unparseable/zero.

Integration:

  • cgroup_limits_are_enforced_on_the_container — reads the container's own /proc/self/cgroup and asserts a /boxlite/<id> path with memory.max bounded below the VM and pids.max = 512. Two-side verified (drops to /:youki:<id> and fails without the cgroups_path).
  • cgroup_limits_keep_vm_alive_under_pids_and_memory_bombs — one 128 MB box survives three escalating waves: a 1000-fork pids bomb, a single 512 MB allocation (4× the VM), and a 200×2 MB fork+alloc bomb; after each the VM stays Running and exec works.
observed without cgroups_path with cgroups_path
container cgroup /:youki:<id> (default name) /boxlite/<id> (explicit)
memory.max applied (2×VM + 512 MiB) applied (2×VM + 512 MiB)
pids.max 512 512
enforcement test fails (path ≠ /boxlite/) passes

@G4614 G4614 changed the title fix(guest): cgroup memory.max + pids.max to prevent VM panic under OOM [WIP] fix(guest): cgroup memory.max + pids.max to prevent VM panic under OOM May 27, 2026
@G4614 G4614 force-pushed the fix/cgroup-oom-protection branch 2 times, most recently from bbeae7c to eda1369 Compare May 27, 2026 14:27
@G4614 G4614 marked this pull request as draft May 28, 2026 10:02
Without cgroup limits, user processes can exhaust VM physical memory
causing guest kernel panic → KVM_EXIT_SHUTDOWN → box death.

Fix: set cgroup v2 resource limits on the container:
- memory.max = 90% of MemAvailable at container creation
- pids.max = 512

Verified: 128MB VM + gcc + 200×2MB fork bomb — without cgroup the VM
dies 2/3 of the time (global OOM kills too slowly). With cgroup: 100%
survival, cgroup OOM kills only container processes, guest agent safe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 force-pushed the fix/cgroup-oom-protection branch from eda1369 to 2888872 Compare May 28, 2026 12:20
Ubuntu and others added 2 commits May 28, 2026 14:57
Make the cgroup OOM protection a tested, hardened resource guard.

spec.rs — split the memory-cap math (90% of MemAvailable) into a pure
memory_limit_from_meminfo(&str) behind a thin /proc read, mirroring the
OS-boundary/pure-decision split in util::disk_space, and name the pids ceiling
(CONTAINER_PIDS_MAX = 512). Behavior unchanged. Add unit tests that had been
missing: 90% with headroom; reads MemAvailable not the larger MemTotal/MemFree
listed first (a wrong-field bug would cap too high and defeat the protection);
None on missing/unparseable/zero so the caller sets no bogus cap.

stress_oom.rs — rebuild the integration test into three escalating waves
against one 128 MB box: a 1000-fork pids bomb (vs pids.max=512), a single
512 MB allocation (4× the VM, vs memory.max), and the original 200×2 MB
fork+alloc. Each wave asserts the VM stays Running and exec still works, with
inter-wave reaping. Drops dead host-side compile/base64 cruft (the flood is
built in-box) and the useless format!, so the file is clippy-clean under
-D warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rt enforcement

youki already applies the spec's memory.max + pids.max, but with no
cgroups_path it does so under a generated default name (`/:youki:<id>`). Set an
explicit `/boxlite/<id>` path (the layout the author originally intended) so the
container's cgroup is predictable for tooling and inspection.

Add an integration test that was missing: cgroup_limits_are_enforced_on_the_container
reads the container's own /proc/self/cgroup and asserts a /boxlite/<id> path with
memory.max bounded below the VM and pids.max = 512 — verified two-sided (drops to
`/:youki:<id>` and fails without the cgroups_path). The pre-existing survival test
proves the VM stays up through pids/memory bombs, but the guest kernel OOM killer
keeps it alive with or without the cgroup, so it can't guard enforcement on its
own; this test does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@G4614 G4614 changed the title [WIP] fix(guest): cgroup memory.max + pids.max to prevent VM panic under OOM [WIP] fix-stress(guest): cgroup memory.max + pids.max to prevent VM panic under OOM May 29, 2026
…M survival

The pids-bomb wave only checked the VM stayed up, which a kernel that merely
coped would also pass. Capture the flood's reported fork count and assert
pids.max blocked the bulk of the 1000-fork bomb (succeeded forks well below
the requested 1000, bounded by the 512 CONTAINER_PIDS_MAX ceiling) — proving
the limit is enforced under load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant