vmm: avoid pause deadlock on CPU hotplug MMIO#150
Closed
phip1611 wants to merge 1 commit into
Closed
Conversation
# TL;DR In 979fb1c we replaced the old deadlock with another deadlock. This commit finally resolves (hopefully) all dead locks on that code path by not holding `CpuManager::vcpu_states` while waiting for vCPU pause acknowledgements. A vCPU can receive the pause kick while servicing the ACPI CPU hotplug MMIO device, and that MMIO path also needs `vcpu_states`. Holding the mutex across the wait phase deadlocks pause against that MMIO access. # Problem `signal_vcpus()` used to lock `CpuManager::vcpu_states` for the whole function, signal every vCPU, and then wait for each vCPU to acknowledge the kick. That lock scope is too wide. A vCPU is allowed to observe the kick in userspace rather than returning directly from `KVM_RUN`. During boot, `vcpu0` can be in an MMIO access on the ACPI CPU hotplug device when pause arrives. `AcpiCpuHotplugController::read()` and `write()` both lock `vcpu_states` to inspect or update the selected vCPU state. The deadlock looks like this: VMM thread vCPU thread ---------- ---------- lock(vcpu_states) signal_vcpus() wait for ack ---------------------> receives pause kick enters ACPI CPU hotplug MMIO lock(vcpu_states) [blocks] wait for ack <--------------------- cannot set vcpu_run_interrupted The VMM thread waits for `vcpu_run_interrupted` to flip, but the vCPU cannot reach the pause acknowledgement path because it is sleeping on the same mutex. The debug logs matched that cycle exactly: signal delivery kept working, `vcpu0` stayed in one unmatched `run()` invocation, the stuck thread sampled in `futex_do_wait`, and the backtrace pointed at `AcpiCpuHotplugController::read()`. # Reproducer This was reproducible by continuously issuing `pause()` / `resume()` from while a Linux guest was still booting. That boot-tim window reliably exercises the ACPI CPU hotplug MMIO access that participates in the deadlock. Once the guest had finished booting, the problem became much harder to trigger (as there is no MMIO operation without explicit CPU plugging). # Solution Keep the existing two-phase behavior so all vCPUs are still signalled before the wait phase, but narrow the lifetime of the `vcpu_states` mutex. Reacquire it only long enough to access one `VcpuState` at a time in each phase. That preserves the original pause semantics and the fast signal-all / wait-all structure, while removing the lock inversion with the ACPI CPU hotplug MMIO path. This also remains safe if a vCPU is hot-removed while pause is in progress. Hot-remove does not shrink `vcpu_states`; it stops the thread and clears the `VcpuState` handle in place. `signal_vcpus()` can therefore snapshot the vector length up front, and if a vCPU disappears between the signal and wait phases, `wait_until_signal_acknowledged()` will observe `handle.is_none()` and return successfully. The interruption handshake itself lives in atomics inside each `VcpuState`. The outer mutex is only needed to reach the state objects, not to keep the acknowledgement protocol correct. Dropping the mutex between iterations therefore does not weaken the pause protocol, but it does allow MMIO handlers and other `vcpu_states` users to make forward progress while the VMM waits for the kick to be observed. On-behalf-of: SAP philipp.schuster@sap.com Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
f0aa7d1 to
3751233
Compare
|
I don't think this really fixes the problem, I think you just shrunk the time-window for the deadlock. |
Member
Author
yup, we are holding the lock for the whole retry loop.. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of cloud-hypervisor#8092. We also have the same problem in our fork, where I originally designed a solution that I then upstreamed...
Hints for Reviewers
Steps to Undraft