[WIP] fix-stress(guest): cgroup memory.max + pids.max to prevent VM panic under OOM#605
Draft
G4614 wants to merge 4 commits into
Draft
[WIP] fix-stress(guest): cgroup memory.max + pids.max to prevent VM panic under OOM#605G4614 wants to merge 4 commits into
G4614 wants to merge 4 commits into
Conversation
bbeae7c to
eda1369
Compare
Without cgroup limits, user processes can exhaust VM physical memory causing guest kernel panic → KVM_EXIT_SHUTDOWN → box death. Fix: set cgroup v2 resource limits on the container: - memory.max = 90% of MemAvailable at container creation - pids.max = 512 Verified: 128MB VM + gcc + 200×2MB fork bomb — without cgroup the VM dies 2/3 of the time (global OOM kills too slowly). With cgroup: 100% survival, cgroup OOM kills only container processes, guest agent safe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eda1369 to
2888872
Compare
Make the cgroup OOM protection a tested, hardened resource guard. spec.rs — split the memory-cap math (90% of MemAvailable) into a pure memory_limit_from_meminfo(&str) behind a thin /proc read, mirroring the OS-boundary/pure-decision split in util::disk_space, and name the pids ceiling (CONTAINER_PIDS_MAX = 512). Behavior unchanged. Add unit tests that had been missing: 90% with headroom; reads MemAvailable not the larger MemTotal/MemFree listed first (a wrong-field bug would cap too high and defeat the protection); None on missing/unparseable/zero so the caller sets no bogus cap. stress_oom.rs — rebuild the integration test into three escalating waves against one 128 MB box: a 1000-fork pids bomb (vs pids.max=512), a single 512 MB allocation (4× the VM, vs memory.max), and the original 200×2 MB fork+alloc. Each wave asserts the VM stays Running and exec still works, with inter-wave reaping. Drops dead host-side compile/base64 cruft (the flood is built in-box) and the useless format!, so the file is clippy-clean under -D warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rt enforcement youki already applies the spec's memory.max + pids.max, but with no cgroups_path it does so under a generated default name (`/:youki:<id>`). Set an explicit `/boxlite/<id>` path (the layout the author originally intended) so the container's cgroup is predictable for tooling and inspection. Add an integration test that was missing: cgroup_limits_are_enforced_on_the_container reads the container's own /proc/self/cgroup and asserts a /boxlite/<id> path with memory.max bounded below the VM and pids.max = 512 — verified two-sided (drops to `/:youki:<id>` and fails without the cgroups_path). The pre-existing survival test proves the VM stays up through pids/memory bombs, but the guest kernel OOM killer keeps it alive with or without the cgroup, so it can't guard enforcement on its own; this test does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…M survival The pids-bomb wave only checked the VM stayed up, which a kernel that merely coped would also pass. Capture the flood's reported fork count and assert pids.max blocked the bulk of the 1000-fork bomb (succeeded forks well below the requested 1000, bounded by the 512 CONTAINER_PIDS_MAX ceiling) — proving the limit is enforced under load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cap each container's memory/PIDs via cgroup
memory.max+pids.maxso a hostile workload can't panic the guest kernel — pinning the limits youki already applies to an explicit/boxlite/<id>cgroup, and adding the enforcement test the survival test alone can't provide (the guest OOM killer keeps the VM up either way).Test plan
Unit (
boxlite-guest,memory_limit_from_meminfo): 90% ofMemAvailable, strictly below the raw available (headroom reserved); readsMemAvailable, not the largerMemTotal/MemFree;Noneon missing/unparseable/zero.Integration:
cgroup_limits_are_enforced_on_the_container— reads the container's own/proc/self/cgroupand asserts a/boxlite/<id>path withmemory.maxbounded below the VM andpids.max= 512. Two-side verified (drops to/:youki:<id>and fails without the cgroups_path).cgroup_limits_keep_vm_alive_under_pids_and_memory_bombs— one 128 MB box survives three escalating waves: a 1000-fork pids bomb, a single 512 MB allocation (4× the VM), and a 200×2 MB fork+alloc bomb; after each the VM staysRunningand exec works.cgroups_pathcgroups_path/:youki:<id>(default name)/boxlite/<id>(explicit)memory.maxpids.max/boxlite/)