docs: root cause analysis for AL2023 EC2 c8i VM boot failure#425
Open
DorianZheng wants to merge 2 commits intomainfrom
Open
docs: root cause analysis for AL2023 EC2 c8i VM boot failure#425DorianZheng wants to merge 2 commits intomainfrom
DorianZheng wants to merge 2 commits intomainfrom
Conversation
0bc3063 to
23509fe
Compare
Guest kernel (libkrunfw 6.12.62) triggers i8042 CMD_RESET_CPU during early boot on nested KVM with host kernel 6.1 (Amazon Linux 2023). The reset causes immediate _exit(0) with no console output. Root cause: the guest kernel detects an incompatible CPU/hardware configuration under kernel 6.1's nested KVM emulation and performs a hardware reset via the i8042 controller. Ubuntu 24.04 (kernel 6.17) works because it provides better nested VMX emulation. This is an unreported configuration upstream — libkrun has not been tested on nested KVM Linux hosts.
23509fe to
eebcc59
Compare
…ID verification Add four security improvements to the OCI image pull pipeline, closing gaps identified by comparing with Docker (containerd) and Podman (containers/image): - Size validation: LayerInfo now carries expected size from manifest descriptors; StagedDownload.commit() rejects blobs with mismatched size before hash check (prevents disk exhaustion from oversized blobs) - Foreign layer URL rejection: layers_from_image() rejects layers with non-distributable media types or foreign URLs (CVE-2020-15157 mitigation) - HashingWriter: new AsyncWrite wrapper computes SHA256 inline during download, eliminating the post-download re-read and halving I/O while maintaining independent verification from oci-client - DiffID verification: verify_diff_id() decompresses and hashes layer tarballs to verify uncompressed content matches rootfs.diff_ids from the image config, called during layer_extracted()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Documents the root cause of BoxLite VMs failing to start on Amazon Linux 2023 (kernel 6.1) on EC2 c8i instances with nested KVM.
Root Cause
The guest kernel (Linux 6.12.62 in libkrunfw) triggers an i8042 CMD_RESET_CPU during early boot, causing immediate
_exit(0)with no console output. The guest kernel detects an incompatible CPU configuration under kernel 6.1's nested KVM and falls back to hardware reset.Shutdown sequence
reset_evtEventFd → VMM calls_exit(0)Why Ubuntu 24.04 works
Ubuntu 24.04 (kernel 6.17) provides better nested VMX emulation that satisfies the guest kernel's requirements. KVM capabilities (ept, vpid, etc.) are identical between both kernels — the difference is in the VMX implementation details.
Upstream status
Investigation method
Added
eprintlnandstd::fs::writeinstrumentation to libkrun's i8042 device handler (devices/src/legacy/i8042.rs) and vCPU run loop (vmm/src/linux/vstate.rs). Confirmed reset via/tmp/krun-i8042-reset.logfile written by instrumented binary.